It is HDFS. The setup for both pipelines is the same too.
On Wed, Sep 30, 2015 at 10:17 PM, Micah Whitacre <[email protected]> wrote: > What is the datastore you are reading from? HBase? HDFS? Also is there > any setup differences between the two pipelines? > > On Wed, Sep 30, 2015 at 3:13 PM, Tahir Hameed <[email protected]> wrote: > >> Hi, >> >> I am facing a queer problem. I have 2 MR pipelines. One of them is >> working fine. The other is not. >> >> The difference lies in only one of the DoFN functions. >> >> >> The DoFn function which fails is given below: >> >> public PTable<ImmutableBytesWritable, CE> >> myFunction(PTable<ImmutableBytesWritable, Pair<A, B>> joinedData, >> PTable<String, C> others) { >> >> ReadableData<Pair<String, C>> readable = others.asReadable(false); >> ParallelDoOptions options = ParallelDoOptions.builder() >> .sourceTargets(readable.getSourceTargets()) >> .build(); >> >> return joinedData >> .by(someMapFunction, >> Avros.writables(ImmutableBytesWritable.class)) >> .groupByKey() >> .parallelDo("", new CEDoFN(readable, >> others.getPTableType()), >> >> Avros.tableOf(Avros.writables(ImmutableBytesWritable.class), >> Avros.reflects(CE.class)), options); >> >> } >> >> The stack trace is as follows : >> >> javax.security.sasl.SaslException: GSS initiate failed [Caused by >> GSSException: No valid credentials provided (Mechanism level: Failed to find >> any Kerberos tgt)] >> at >> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212) >> at >> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:943) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:940) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:940) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1094) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1061) >> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1516) >> at >> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1724) >> at >> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1777) >> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:30373) >> at >> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1604) >> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:768) >> at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:766) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126) >> at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:772) >> at >> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160) >> at >> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1254) >> at >> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1318) >> at >> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1167) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130) >> at >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55) >> at >> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201) >> at >> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288) >> at >> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268) >> at >> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140) >> at >> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135) >> at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:802) >> at >> org.apache.crunch.io.hbase.HTableIterator.<init>(HTableIterator.java:47) >> at >> org.apache.crunch.io.hbase.HTableIterable.iterator(HTableIterable.java:43) >> at >> org.apache.crunch.util.DelegatingReadableData$1.iterator(DelegatingReadableData.java:63) >> at com.bol.step.enrichmentdashboard.fn.CEDoFN.initialize(CEDoFN.java:45) >> at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71) >> at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73) >> at >> org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:44) >> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168) >> at >> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389) >> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:415) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628) >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) >> Caused by: GSSException: No valid credentials provided (Mechanism level: >> Failed to find any Kerberos tgt) >> >> >> In the CEDoFunction, the readable is used in the initialization phase to >> create a HashMap. This is the place where the stack trace error also points >> to. >> >> In the function which succeeds, the parallelDo is performed directly on >> the joinedData which is also a PTable, and there are no errors. The >> initialization phases for both functions are exactly the same. >> >> I fail to understand the cause of the errors because the underlying >> implementations for the both PTable and PGroupedTable is the same because >> both seem to be extending the PCollectionImpl interface. >> >> Tahir >> >> >> >> >> >> >> >> >> >
