I think that submitting the spark job on behalf of user01 will solve the problem. You may also try to set a sticky bit on /data/user01/rdd folder if you want to allow multiple users writing to /data/user01/rdd same at same time, but i'd not recommend allow multiple users writing to same dir *exactly *because of _temporary folder: race condition and tampering - you do not want to mess with that kind of things unless you 100% guarantee that simultaneous writing by multiple users will never happen.
Anyway, try following: 1. hdfs dfs -chmod 1777 /data/user01/rdd 2. run spark-submit with user A to append data to /data/user01/rdd (make sure you have correct HDFS umask set) 3. run spark-submit with user B to append data to /data/user01/rdd (make sure you have correct HDFS umask set) 4. run spark-submit with user C to read data from /data/user01/rdd In order to make user from domain A be able to read data without job failing, i'd recommend to partition /data/user01/rdd (e.d., /data/user01/rdd/my_column=domain_a with corresponding UGO/ACL permissions) and filter read data by my_column: spark.read.format("parquet").load("/data/user01/rdd").filter(functions.expr("my_column=domain_a")) Note #1, the question is what exact goal you are trying to achieve Note #2, I personally do not recommend allowing writing to the same dir by multiple users also because simple security guidelines (see Simple Security property and star security property of Bell-LaPadula security model) чт, 25 мар. 2021 г. в 19:24, Kwangsun Noh <nohkwang...@gmail.com>: > Thank you for the first answer to my question. > > Unfortunately, I have to make totally different tables > > and It is not possible to make only one table via UGI. > > --- > > below is the sample codes I wrote. > > org.apache.hadoop.security.UserGroupInformation.createRemoteUser("user01").doAs(new > java.security.PrivilegedExceptionAction[Unit] { > def run = { spark.range(1, > 10).write.mode("overwrite").save("hdfs://localhost:9000/data/user01/rdd") } > }) > > --- > > I got an error when i save the rdd in doAs method. > > > [spark@6974321cbd3d ~]$ /hadoop/bin/hdfs dfs -rmr /data/tester > rmr: DEPRECATED: Please use '-rm -r' instead. > Deleted /data/tester > [spark@6974321cbd3d ~]$ /hadoop/bin/hdfs dfs -ls -R / > drwxrwxrwx - spark supergroup 0 2021-03-25 17:18 /data > drwxr-xr-x - user01 supergroup 0 2021-03-25 17:13 /data/user01 > [spark@6974321cbd3d ~]$ /spark/bin/spark-shell --master local[2] > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/spark-3.1.1-bin-hadoop2.7/jars/spark-unsafe_2.12-3.1.1.jar) to > constructor java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > 21/03/25 17:18:52 WARN NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Spark context Web UI available at http://6974321cbd3d:4040 > Spark context available as 'sc' (master = local[2], app id = > local-1616692738434). > Spark session available as 'spark'. > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.1.1 > /_/ > > Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.10) > Type in expressions to have them evaluated. > Type :help for more information. > > scala> > org.apache.hadoop.security.UserGroupInformation.createRemoteUser("user01").doAs(new > java.security.PrivilegedExceptionAction[Unit] { > | def run = { spark.range(1, > 10).write.mode("overwrite").save("hdfs://localhost:9000/data/user01/rdd") } > | }) > 21/03/25 17:19:26 ERROR FileFormatWriter: Aborting job > a24c727a-8d12-4c77-95aa-3e36c4a3063e. > org.apache.hadoop.security.AccessControlException: Permission denied: > user=user01, access=WRITE, > inode="/data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:482) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3039) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1041) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:661) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1953) > at > org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:626) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:414) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:182) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:220) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anon$1.run(<console>:25) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anon$1.run(<console>:24) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32) > at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:34) > at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:36) > at $line14.$read$$iw$$iw$$iw.<init>(<console>:38) > at $line14.$read$$iw$$iw.<init>(<console>:40) > at $line14.$read$$iw.<init>(<console>:42) > at $line14.$read.<init>(<console>:44) > at $line14.$read$.<init>(<console>:48) > at $line14.$read$.<clinit>(<console>) > at $line14.$eval$.$print$lzycompute(<console>:7) > at $line14.$eval$.$print(<console>:6) > at $line14.$eval.$print(<console>) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) > at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) > at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) > at > scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) > at > scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) > at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:912) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:912) > at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762) > at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464) > at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239) > at org.apache.spark.repl.Main$.doMain(Main.scala:78) > at org.apache.spark.repl.Main$.main(Main.scala:58) > at org.apache.spark.repl.Main.main(Main.scala) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): > Permission denied: user=user01, access=WRITE, > inode="/data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:482) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3039) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1041) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:661) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at org.apache.hadoop.ipc.Client.call(Client.java:1476) > at org.apache.hadoop.ipc.Client.call(Client.java:1413) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy23.rename(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:487) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy24.rename(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1951) > ... 83 more > 21/03/25 17:19:26 WARN SQLHadoopMapReduceCommitProtocol: Exception while > aborting null > org.apache.hadoop.security.AccessControlException: Permission denied: > user=user01, access=ALL, > inode="/data/user01/rdd/_temporary/0/task_202103251719248930935174538512908_0000_m_000000":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSubAccess(FSPermissionChecker.java:348) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:265) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:110) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3104) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1127) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:708) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2046) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:707) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:703) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:714) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.cleanupJob(FileOutputCommitter.java:463) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.abortJob(FileOutputCommitter.java:482) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.abortJob(HadoopMapReduceCommitProtocol.scala:233) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:230) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anon$1.run(<console>:25) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anon$1.run(<console>:24) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:24) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:30) > at $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:32) > at $line14.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:34) > at $line14.$read$$iw$$iw$$iw$$iw.<init>(<console>:36) > at $line14.$read$$iw$$iw$$iw.<init>(<console>:38) > at $line14.$read$$iw$$iw.<init>(<console>:40) > at $line14.$read$$iw.<init>(<console>:42) > at $line14.$read.<init>(<console>:44) > at $line14.$read$.<init>(<console>:48) > at $line14.$read$.<clinit>(<console>) > at $line14.$eval$.$print$lzycompute(<console>:7) > at $line14.$eval$.$print(<console>:6) > at $line14.$eval.$print(<console>) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745) > at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021) > at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574) > at > scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41) > at > scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37) > at > scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41) > at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600) > at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:894) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:912) > at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:912) > at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:762) > at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:464) > at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:485) > at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:239) > at org.apache.spark.repl.Main$.doMain(Main.scala:78) > at org.apache.spark.repl.Main$.main(Main.scala:58) > at org.apache.spark.repl.Main.main(Main.scala) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at org.apache.spark.deploy.SparkSubmit.org > $apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): > Permission denied: user=user01, access=ALL, > inode="/data/user01/rdd/_temporary/0/task_202103251719248930935174538512908_0000_m_000000":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSubAccess(FSPermissionChecker.java:348) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:265) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:110) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3104) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1127) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:708) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at org.apache.hadoop.ipc.Client.call(Client.java:1476) > at org.apache.hadoop.ipc.Client.call(Client.java:1413) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy23.delete(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:545) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy24.delete(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2044) > ... 83 more > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989) > at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:438) > at > org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:415) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:293) > at $anon$1.run(<console>:25) > at $anon$1.run(<console>:24) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746) > ... 49 elided > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=user01, access=WRITE, > inode="/data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:482) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3039) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1041) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:661) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490) > at > org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) > at > org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1953) > at > org.apache.hadoop.hdfs.DistributedFileSystem.rename(DistributedFileSystem.java:626) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:414) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:428) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:362) > at > org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334) > at > org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:48) > at > org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:182) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:220) > ... 75 more > Caused by: org.apache.hadoop.ipc.RemoteException: Permission denied: > user=user01, access=WRITE, > inode="/data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001":spark:supergroup:drwxr-xr-x > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:399) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:258) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:193) > at > org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1896) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameTo(FSDirRenameOp.java:482) > at > org.apache.hadoop.hdfs.server.namenode.FSDirRenameOp.renameToInt(FSDirRenameOp.java:62) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3039) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:1041) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:661) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1029) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:957) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at java.base/javax.security.auth.Subject.doAs(Subject.java:423) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2957) > > at org.apache.hadoop.ipc.Client.call(Client.java:1476) > at org.apache.hadoop.ipc.Client.call(Client.java:1413) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) > at com.sun.proxy.$Proxy23.rename(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.rename(ClientNamenodeProtocolTranslatorPB.java:487) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) > at com.sun.proxy.$Proxy24.rename(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.rename(DFSClient.java:1951) > ... 83 more > > scala> :q > [spark@6974321cbd3d ~]$ /hadoop/bin/hdfs dfs -ls -R / > drwxrwxrwx - spark supergroup 0 2021-03-25 17:18 /data > drwxr-xr-x - user01 supergroup 0 2021-03-25 17:19 /data/user01 > drwxr-xr-x - user01 supergroup 0 2021-03-25 17:19 > /data/user01/rdd > drwxr-xr-x - user01 supergroup 0 2021-03-25 17:19 > /data/user01/rdd/_temporary > drwxr-xr-x - user01 supergroup 0 2021-03-25 17:19 > /data/user01/rdd/_temporary/0 > drwxr-xr-x - spark supergroup 0 2021-03-25 17:19 > /data/user01/rdd/_temporary/0/_temporary > drwxr-xr-x - spark supergroup 0 2021-03-25 17:19 > /data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001 > -rw-r--r-- 3 spark supergroup 479 2021-03-25 17:19 > /data/user01/rdd/_temporary/0/task_202103251719241075056615342665709_0000_m_000001/part-00001-b106411e-6a3b-4f18-a7cc-95abbf471ac6-c000.snappy.parquet > drwxr-xr-x - spark supergroup 0 2021-03-25 17:19 > /data/user01/rdd/_temporary/0/task_202103251719248930935174538512908_0000_m_000000 > -rw-r--r-- 3 spark supergroup 475 2021-03-25 17:19 > /data/user01/rdd/_temporary/0/task_202103251719248930935174538512908_0000_m_000000/part-00000-b106411e-6a3b-4f18-a7cc-95abbf471ac6-c000.snappy.parquet > [spark@6974321cbd3d ~]$ > > > 2021년 3월 26일 (금) 오전 12:23, "Yuri Oleynikov (יורי אולייניקוב)" < > yur...@gmail.com>님이 작성: > >> Assuming that all tables have same schema, you can make entire global >> table partitioned by some column. Then apply specific UGOs >> permissions/ACLs per partition subdirectory >> >> >> On 25 Mar 2021, at 15:13, Kwangsun Noh <nohkwang...@gmail.com> wrote: >> >> >> >> Hi, Spark users. >> >> >> Currently I have to make multiple tables in hdfs using spark api. >> >> The tables need to made by each other users. >> >> >> For example, table01 is owned by user01, table02 is owned by user02 like >> below. >> >> >> path | owner:group | permission >> >> /data/table01/ | user01:spark |770 >> >> /data/table01/_SUCESS | user01:spark |770 >> >> /data/table01/part_xxxxx | user01:spark |770 >> >> /data/table01/part_xxxxx | user01:spark |770 >> >> ... >> >> /data/table02/ | user02:spark |770 >> >> ... >> >> /data/table03/ | user03:spark |770 >> >> ... >> >> >> >> >> Actually I used the UGI to make them. but the directories was made as i >> expect. >> >> But the files (part_xxxxx) was made by the user that launched the spark >> application. >> >> >> Is it possible to do what i want ? >> >>