It is HDFS. The setup for both pipelines is the same too.


On Wed, Sep 30, 2015 at 10:17 PM, Micah Whitacre <[email protected]>
wrote:

> What is the datastore you are reading from?  HBase? HDFS?  Also is there
> any setup differences between the two pipelines?
>
> On Wed, Sep 30, 2015 at 3:13 PM, Tahir Hameed <[email protected]> wrote:
>
>> Hi,
>>
>> I am facing a queer problem. I have 2 MR pipelines. One of them is
>> working fine. The other is not.
>>
>> The difference lies in only one of the DoFN functions.
>>
>>
>> The DoFn function which fails is given below:
>>
>>     public PTable<ImmutableBytesWritable, CE>
>> myFunction(PTable<ImmutableBytesWritable, Pair<A, B>> joinedData,
>> PTable<String, C> others) {
>>
>>         ReadableData<Pair<String, C>> readable = others.asReadable(false);
>>         ParallelDoOptions options = ParallelDoOptions.builder()
>>                 .sourceTargets(readable.getSourceTargets())
>>                 .build();
>>
>>         return joinedData
>>                 .by(someMapFunction,
>> Avros.writables(ImmutableBytesWritable.class))
>>                 .groupByKey()
>>                 .parallelDo("", new CEDoFN(readable,
>> others.getPTableType()),
>>
>> Avros.tableOf(Avros.writables(ImmutableBytesWritable.class),
>> Avros.reflects(CE.class)), options);
>>
>>     }
>>
>> The stack trace is as follows :
>>
>> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
>> GSSException: No valid credentials provided (Mechanism level: Failed to find 
>> any Kerberos tgt)]
>>      at 
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>      at 
>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:943)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:940)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:940)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.writeRequest(RpcClient.java:1094)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.tracedWriteRequest(RpcClient.java:1061)
>>      at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1516)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1724)
>>      at 
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1777)
>>      at 
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:30373)
>>      at 
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1604)
>>      at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:768)
>>      at org.apache.hadoop.hbase.client.HTable$2.call(HTable.java:766)
>>      at 
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:126)
>>      at org.apache.hadoop.hbase.client.HTable.getRowOrBefore(HTable.java:772)
>>      at 
>> org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:160)
>>      at 
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.prefetchRegionCache(ConnectionManager.java:1254)
>>      at 
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1318)
>>      at 
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1167)
>>      at 
>> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:294)
>>      at 
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:130)
>>      at 
>> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:55)
>>      at 
>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:201)
>>      at 
>> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:288)
>>      at 
>> org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:268)
>>      at 
>> org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:140)
>>      at 
>> org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:135)
>>      at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:802)
>>      at 
>> org.apache.crunch.io.hbase.HTableIterator.<init>(HTableIterator.java:47)
>>      at 
>> org.apache.crunch.io.hbase.HTableIterable.iterator(HTableIterable.java:43)
>>      at 
>> org.apache.crunch.util.DelegatingReadableData$1.iterator(DelegatingReadableData.java:63)
>>      at com.bol.step.enrichmentdashboard.fn.CEDoFN.initialize(CEDoFN.java:45)
>>      at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
>>      at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
>>      at 
>> org.apache.crunch.impl.mr.run.CrunchReducer.setup(CrunchReducer.java:44)
>>      at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:168)
>>      at 
>> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
>>      at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
>>      at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>>      at java.security.AccessController.doPrivileged(Native Method)
>>      at javax.security.auth.Subject.doAs(Subject.java:415)
>>      at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>>      at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>> Caused by: GSSException: No valid credentials provided (Mechanism level: 
>> Failed to find any Kerberos tgt)
>>
>>
>> In the CEDoFunction, the readable is used in the initialization phase to
>> create a HashMap. This is the place where the stack trace error also points
>> to.
>>
>> In the function which succeeds, the parallelDo is performed directly on
>> the joinedData which is also a PTable, and there are no errors. The
>> initialization phases for both functions are exactly the same.
>>
>> I fail to understand the cause of the errors because the underlying
>> implementations for the both PTable and PGroupedTable is the same because
>> both seem to be extending the PCollectionImpl interface.
>>
>> Tahir
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Reply via email to