What is the datastore you are reading from? HBase? HDFS? Also is there any setup differences between the two pipelines?
On Wed, Sep 30, 2015 at 3:13 PM, Tahir Hameed <[email protected]> wrote: > Hi, > > I am facing a queer problem. I have 2 MR pipelines. One of them is working > fine. The other is not. > > The difference lies in only one of the DoFN functions. > > > The DoFn function which fails is given below: > > public PTable<ImmutableBytesWritable, CE> > myFunction(PTable<ImmutableBytesWritable, Pair<A, B>> joinedData, > PTable<String, C> others) { > > ReadableData<Pair<String, C>> readable = others.asReadable(false); > ParallelDoOptions options = ParallelDoOptions.builder() > .sourceTargets(readable.getSourceTargets()) > .build(); > > return joinedData > .by(someMapFunction, > Avros.writables(ImmutableBytesWritable.class)) > .groupByKey() > .parallelDo("", new CEDoFN(readable, > others.getPTableType()), > > Avros.tableOf(Avros.writables(ImmutableBytesWritable.class), > Avros.reflects(CE.class)), options); > > } > > The stack trace is as follows : > > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) > at > org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:943) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:940) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:940) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1094) > at > org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1061) > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1516) > at > org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1724) > at > org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1777) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:30373) > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1604) > at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:766) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) > at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:772) > at > org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1254) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1318) > at > org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1167) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201) > at > org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288) > at > org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) > at > org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) > at > org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) > at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:802) > at > org.apache.crunch.io.hbase.HTableIterator.<init>(HTableIterator.java:47) > at > org.apache.crunch.io.hbase.HTableIterable.iterator(HTableIterable.java:43) > at > org.apache.crunch.util.DelegatingReadableData$1.iterator(DelegatingReadableData.java:63) > at com.bol.step.enrichmentdashboard.fn.CEDoFN.initialize(CEDoFN.java:45) > at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71) > at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73) > at > org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:44) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: GSSException: No valid credentials provided (Mechanism level: > Failed to find any Kerberos tgt) > > > In the CEDoFunction, the readable is used in the initialization phase to > create a HashMap. This is the place where the stack trace error also points > to. > > In the function which succeeds, the parallelDo is performed directly on > the joinedData which is also a PTable, and there are no errors. The > initialization phases for both functions are exactly the same. > > I fail to understand the cause of the errors because the underlying > implementations for the both PTable and PGroupedTable is the same because > both seem to be extending the PCollectionImpl interface. > > Tahir > > > > > > > > >
