You are trying to save temp data to a root folder / You need to set hadoop.tmp.dir (I think that is the one) property.
Kostya On Thu, Nov 3, 2016 at 9:50 AM, 'Piyush Narang' via Scalding Development < [email protected]> wrote: > The javadoc on the distinct function > <http://twitter.github.io/scalding/index.html#com.twitter.scalding.typed.TypedPipe@distinct(implicitord:Ordering[_%3E:T]):com.twitter.scalding.typed.TypedPipe[T]> > seems to comment on your use-case: > "Returns the set of distinct elements in the TypedPipe This is the same > as: .map((_, ())).group.sum.keys If you want a distinct while joining, > consider: instead of: a.join(b.distinct.asKeys) manually do the distinct: > a.join(b.asKeys.sum) The latter creates 1 map/reduce phase rather than 2" > > Could you try that? (val joinedSets = data1.join(data2.asKeys.sum) ). > Not sure if the hdfs permissions error you're running into is while > temporary job output is being written out between 2 MR jobs / somewhere > else. > > On Wed, Nov 2, 2016 at 11:54 PM, Nikhil J Joshi <[email protected]> > wrote: > >> Hi, >> I am trying to perform join with distinct keys in Scalding as >> >> val joinedSets = data1 .join(data2.distinct.asKeys) >> >> And the above operation raises hdfs permissions error (stacktrace below), >> while the same without the distinct clause works well. I am imagining that >> the clause is forcing some disk io at the wrong path. Can anyone suggest a >> remedy? >> >> Thanks, >> Nikhil >> >> stacktrace: >> >> Job setup failed : org.apache.hadoop.security.AccessControlException: >> Permission denied: user=username, access=WRITE, >> inode="/":hdfs:hdfs:drwxr-xr-x >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heckFsPermission(FSPermissionChecker.java:271) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heck(FSPermissionChecker.java:257) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heck(FSPermissionChecker.java:238) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heckPermission(FSPermissionChecker.java:179) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer >> mission(FSNamesystem.java:6630) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer >> mission(FSNamesystem.java:6612) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAnc >> estorAccess(FSNamesystem.java:6564) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn >> ternal(FSNamesystem.java:4368) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn >> t(FSNamesystem.java:4338) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F >> SNamesystem.java:4311) >> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd >> irs(NameNodeRpcServer.java:853) >> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ >> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr >> anslatorPB.java:600) >> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol >> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam >> enodeProtocolProtos.java) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn >> voker.call(ProtobufRpcEngine.java:619) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2063) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2059) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1688) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2057) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native >> ConstructorAccessorImpl.java:62) >> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De >> legatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:408) >> at org.apache.hadoop.ipc.RemoteException.instantiateException(R >> emoteException.java:106) >> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException( >> RemoteException.java:73) >> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2744) >> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2713) >> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall( >> DistributedFileSystem.java:870) >> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall( >> DistributedFileSystem.java:866) >> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst >> emLinkResolver.java:81) >> at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal( >> DistributedFileSystem.java:866) >> at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(Distribu >> tedFileSystem.java:859) >> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817) >> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter. >> setupJob(FileOutputCommitter.java:305) >> at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOu >> tputCommitter.java:131) >> at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputComm >> itter.java:233) >> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHand >> ler$EventProcessor.handleJobSetup(CommitterEventHandler.java:254) >> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHand >> ler$EventProcessor.run(CommitterEventHandler.java:234) >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool >> Executor.java:1142) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo >> lExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop. >> security.AccessControlException): Permission denied: user=username, >> access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heckFsPermission(FSPermissionChecker.java:271) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heck(FSPermissionChecker.java:257) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heck(FSPermissionChecker.java:238) >> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c >> heckPermission(FSPermissionChecker.java:179) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer >> mission(FSNamesystem.java:6630) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer >> mission(FSNamesystem.java:6612) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAnc >> estorAccess(FSNamesystem.java:6564) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn >> ternal(FSNamesystem.java:4368) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn >> t(FSNamesystem.java:4338) >> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F >> SNamesystem.java:4311) >> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd >> irs(NameNodeRpcServer.java:853) >> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ >> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr >> anslatorPB.java:600) >> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol >> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam >> enodeProtocolProtos.java) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn >> voker.call(ProtobufRpcEngine.java:619) >> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2063) >> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2059) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro >> upInformation.java:1688) >> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2057) >> at org.apache.hadoop.ipc.Client.call(Client.java:1469) >> at org.apache.hadoop.ipc.Client.call(Client.java:1400) >> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke( >> ProtobufRpcEngine.java:232) >> at com.sun.proxy.$Proxy9.mkdirs(Unknown Source) >> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTran >> slatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:483) >> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth >> od(RetryInvocationHandler.java:187) >> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret >> ryInvocationHandler.java:102) >> at com.sun.proxy.$Proxy10.mkdirs(Unknown Source) >> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2742) >> ... 15 more >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Scalding Development" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. >> > > > > -- > - Piyush > > -- > You received this message because you are subscribed to the Google Groups > "Scalding Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Konstantin mailto:[email protected] -- You received this message because you are subscribed to the Google Groups "Scalding Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
