You are trying to save temp data to a root folder /
You need to set hadoop.tmp.dir (I think that is the one) property.

Kostya

On Thu, Nov 3, 2016 at 9:50 AM, 'Piyush Narang' via Scalding Development <
[email protected]> wrote:

> The javadoc on the distinct function
> <http://twitter.github.io/scalding/index.html#com.twitter.scalding.typed.TypedPipe@distinct(implicitord:Ordering[_%3E:T]):com.twitter.scalding.typed.TypedPipe[T]>
> seems to comment on your use-case:
> "Returns the set of distinct elements in the TypedPipe This is the same
> as: .map((_, ())).group.sum.keys If you want a distinct while joining,
> consider: instead of: a.join(b.distinct.asKeys) manually do the distinct:
> a.join(b.asKeys.sum) The latter creates 1 map/reduce phase rather than 2"
>
> Could you try that? (val joinedSets = data1.join(data2.asKeys.sum)  ).
> Not sure if the hdfs permissions error you're running into is while
> temporary job output is being written out between 2 MR jobs / somewhere
> else.
>
> On Wed, Nov 2, 2016 at 11:54 PM, Nikhil J Joshi <[email protected]>
> wrote:
>
>> Hi,
>> I am trying to perform join with distinct keys in Scalding as
>>
>> val joinedSets = data1 .join(data2.distinct.asKeys)
>>
>> And the above operation raises hdfs permissions error (stacktrace below),
>> while the same without the distinct clause works well. I am imagining that
>> the clause is forcing some disk io at the wrong path. Can anyone suggest a
>> remedy?
>>
>> Thanks,
>> Nikhil
>>
>> stacktrace:
>>
>> Job setup failed : org.apache.hadoop.security.AccessControlException:
>> Permission denied: user=username, access=WRITE,
>> inode="/":hdfs:hdfs:drwxr-xr-x
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckFsPermission(FSPermissionChecker.java:271)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:257)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:238)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:179)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer
>> mission(FSNamesystem.java:6630)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer
>> mission(FSNamesystem.java:6612)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAnc
>> estorAccess(FSNamesystem.java:6564)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn
>> ternal(FSNamesystem.java:4368)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn
>> t(FSNamesystem.java:4338)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F
>> SNamesystem.java:4311)
>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd
>> irs(NameNodeRpcServer.java:853)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr
>> anslatorPB.java:600)
>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>> enodeProtocolProtos.java)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>> voker.call(ProtobufRpcEngine.java:619)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2063)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2059)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1688)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2057)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>> at sun.reflect.NativeConstructorAccessorImpl.newInstance(Native
>> ConstructorAccessorImpl.java:62)
>> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(De
>> legatingConstructorAccessorImpl.java:45)
>> at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
>> at org.apache.hadoop.ipc.RemoteException.instantiateException(R
>> emoteException.java:106)
>> at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(
>> RemoteException.java:73)
>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2744)
>> at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2713)
>> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(
>> DistributedFileSystem.java:870)
>> at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(
>> DistributedFileSystem.java:866)
>> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst
>> emLinkResolver.java:81)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(
>> DistributedFileSystem.java:866)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(Distribu
>> tedFileSystem.java:859)
>> at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1817)
>> at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.
>> setupJob(FileOutputCommitter.java:305)
>> at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOu
>> tputCommitter.java:131)
>> at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputComm
>> itter.java:233)
>> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHand
>> ler$EventProcessor.handleJobSetup(CommitterEventHandler.java:254)
>> at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHand
>> ler$EventProcessor.run(CommitterEventHandler.java:234)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.
>> security.AccessControlException): Permission denied: user=username,
>> access=WRITE, inode="/":hdfs:hdfs:drwxr-xr-x
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckFsPermission(FSPermissionChecker.java:271)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:257)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heck(FSPermissionChecker.java:238)
>> at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.c
>> heckPermission(FSPermissionChecker.java:179)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer
>> mission(FSNamesystem.java:6630)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPer
>> mission(FSNamesystem.java:6612)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAnc
>> estorAccess(FSNamesystem.java:6564)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn
>> ternal(FSNamesystem.java:4368)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsIn
>> t(FSNamesystem.java:4338)
>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(F
>> SNamesystem.java:4311)
>> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkd
>> irs(NameNodeRpcServer.java:853)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ
>> erSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTr
>> anslatorPB.java:600)
>> at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocol
>> Protos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNam
>> enodeProtocolProtos.java)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn
>> voker.call(ProtobufRpcEngine.java:619)
>> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2063)
>> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2059)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1688)
>> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2057)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1469)
>> at org.apache.hadoop.ipc.Client.call(Client.java:1400)
>> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
>> ProtobufRpcEngine.java:232)
>> at com.sun.proxy.$Proxy9.mkdirs(Unknown Source)
>> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTran
>> slatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:539)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMeth
>> od(RetryInvocationHandler.java:187)
>> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Ret
>> ryInvocationHandler.java:102)
>> at com.sun.proxy.$Proxy10.mkdirs(Unknown Source)
>> at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2742)
>> ... 15 more
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Scalding Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> - Piyush
>
> --
> You received this message because you are subscribed to the Google Groups
> "Scalding Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Konstantin                              mailto:[email protected]

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to