Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Steve Loughran

On 17 Nov 2015, at 15:39, Nikhil Gs 
mailto:gsnikhil1432...@gmail.com>> wrote:

Hello Everyone,

Firstly, thank you so much for the response. In our cluster, we are using Spark 
1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using Kerbros in 
our cluster and the kerberos version is 1.10.3.

The error "GSS initiate failed [Caused by GSSException: No valid credentials 
provided" was occurring when we are trying to load data from kafka  topic to 
hbase by using Spark classes and spark submit job.

My question is, we also have an other project named as XXX in our cluster which 
is successfully deployed and its running and the scenario for that project is, 
flume + Spark submit + Hbase table. For this scenario, it works fine in our 
Kerberos cluster and why not for kafkatopic + Spark Submit + Hbase table.

Are we doing anything wrong? Not able to figure out? Please suggest us.


you are probably into kerberos debug mode. That's not something anyone enjoys 
(*)

There are some options you can turn up for logging in the JVM codebase

https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/secrets.html

then turn up the org.apache.hadoop.security log to log at DEBUG in the HBase 
server as well as your client code.


(*) This is why I recommend deploying kerberos apps at 09:00 on a tuesday. All 
the big timeout events will happen on tuesday in the morning, afternoon or 
evening, the 24h timeout on wed am, 72h on friday, and the 7 day one the 
following week. You don't want to be fielding support calls on a saturday 
evening as the application — or indeed the entire HDFS filesystem - deployed on 
a friday is failing one node at a time


Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Nikhil Gs
Hello Everyone,

Firstly, thank you so much for the response. In our cluster, we are using
Spark 1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using
Kerbros in our cluster and the kerberos version is 1.10.3.

The error "*GSS initiate failed [Caused by GSSException: No valid
credentials provided" *was occurring when we are trying to load data from
kafka  topic to hbase by using Spark classes and spark submit job.

My question is, we also have an other project named as XXX in our cluster
which is successfully deployed and its running and the scenario for that
project is, flume + Spark submit + Hbase table. For this scenario, it works
fine in our Kerberos cluster and why not for kafkatopic + Spark Submit +
Hbase table.

Are we doing anything wrong? Not able to figure out? Please suggest us.

Thanks in advance!

Regards,
Nik.

On Tue, Nov 17, 2015 at 4:03 AM, Steve Loughran 
wrote:

>
> On 17 Nov 2015, at 02:00, Nikhil Gs  wrote:
>
> Hello Team,
>
> Below is the error which we are facing in our cluster after 14 hours of
> starting the spark submit job. Not able to understand the issue and why its
> facing the below error after certain time.
>
> If any of you have faced the same scenario or if you have any idea then
> please guide us. To identify the issue, if you need any other info then
> please revert me back with the requirement.Thanks a lot in advance.
>
> *Log Error:  *
>
> 15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication failed.
> The most likely cause is missing or invalid credentials. Consider 'kinit'.
>
> javax.security.sasl.SaslException: *GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]*
>
>
> I keep my list of causes of error messages online:
> https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html
>
> Spark only support long-lived work on a kerberos cluster from 1.5+, with a
> keytab being supplied to the job. Without this, the yarn client grabs some
> tickets at launch time and hangs on until they expire, which for you is 14
> hours
>
> (For anyone using ticket-at-launch auth, know that Spark 1.5.0-1.5.2
> doesnt talk to Hive on a kerberized cluster; some reflection-related issues
> which weren't picked up during testing. 1.5.3 will fix this
>


Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Steve Loughran

On 17 Nov 2015, at 02:00, Nikhil Gs 
mailto:gsnikhil1432...@gmail.com>> wrote:

Hello Team,

Below is the error which we are facing in our cluster after 14 hours of 
starting the spark submit job. Not able to understand the issue and why its 
facing the below error after certain time.

If any of you have faced the same scenario or if you have any idea then please 
guide us. To identify the issue, if you need any other info then please revert 
me back with the requirement.Thanks a lot in advance.

Log Error:

15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication failed. The 
most likely cause is missing or invalid credentials. Consider 'kinit'.
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]


I keep my list of causes of error messages online: 
https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html

Spark only support long-lived work on a kerberos cluster from 1.5+, with a 
keytab being supplied to the job. Without this, the yarn client grabs some 
tickets at launch time and hangs on until they expire, which for you is 14 hours

(For anyone using ticket-at-launch auth, know that Spark 1.5.0-1.5.2 doesnt 
talk to Hive on a kerberized cluster; some reflection-related issues which 
weren't picked up during testing. 1.5.3 will fix this


Re: Spark Job is getting killed after certain hours

2015-11-16 Thread Ilya Ganelin
Your Kerberos cert is likely expiring. Check your expiration settings.

-Ilya Ganelin

On Mon, Nov 16, 2015 at 9:20 PM, Vipul Rai  wrote:

> Hi Nikhil,
> It seems you have Kerberos enabled cluster and it is unable to
> authenticate using the ticket.
> Please check the Kerberos settings, it could also be because of Kerberos
> version mismatch on nodes.
>
> Thanks,
> Vipul
>
> On Tue 17 Nov, 2015 07:31 Nikhil Gs  wrote:
>
>> Hello Team,
>>
>> Below is the error which we are facing in our cluster after 14 hours of
>> starting the spark submit job. Not able to understand the issue and why its
>> facing the below error after certain time.
>>
>> If any of you have faced the same scenario or if you have any idea then
>> please guide us. To identify the issue, if you need any other info then
>> please revert me back with the requirement.Thanks a lot in advance.
>>
>> *Log Error:  *
>>
>> 15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication
>> failed. The most likely cause is missing or invalid credentials. Consider
>> 'kinit'.
>>
>> javax.security.sasl.SaslException: *GSS initiate failed [Caused by
>> GSSException: No valid credentials provided (Mechanism level: Failed to
>> find any Kerberos tgt)]*
>>
>> at
>> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>
>> at
>> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728)
>>
>> at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:881)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850)
>>
>> at
>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)
>>
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
>>
>> at
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
>>
>> at
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865)
>>
>> at
>> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1580)
>>
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294)
>>
>> at
>> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
>>
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
>>
>> at
>> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
>>
>> at
>> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
>>
>> at
>> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
>>
>> at
>> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1482)
>>
>> at
>> org.apache.hadoop.hbase.client.HTable.put(HTable.java:1095)
>>
>> at
>> com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper$1.run(ModempollHbaseLoadHelper.java:89)
>>
>> at java.security.AccessController.doPrivileged(Native
>> Method)
>>
>> at javax.security.auth.Subject.doAs(Subject.java:356)
>>
>> at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>>
>> at
>> com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper.loadToHbase(ModempollHbaseLoadHelper.java:48)
>>
>> at
>> com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:52)
>>
>> at
>> com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:48)
>>
>> at
>> org.apache.spark.api.java.Java

Re: Spark Job is getting killed after certain hours

2015-11-16 Thread Vipul Rai
Hi Nikhil,
It seems you have Kerberos enabled cluster and it is unable to authenticate
using the ticket.
Please check the Kerberos settings, it could also be because of Kerberos
version mismatch on nodes.

Thanks,
Vipul

On Tue 17 Nov, 2015 07:31 Nikhil Gs  wrote:

> Hello Team,
>
> Below is the error which we are facing in our cluster after 14 hours of
> starting the spark submit job. Not able to understand the issue and why its
> facing the below error after certain time.
>
> If any of you have faced the same scenario or if you have any idea then
> please guide us. To identify the issue, if you need any other info then
> please revert me back with the requirement.Thanks a lot in advance.
>
> *Log Error:  *
>
> 15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication failed.
> The most likely cause is missing or invalid credentials. Consider 'kinit'.
>
> javax.security.sasl.SaslException: *GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]*
>
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>
> at
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:415)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:881)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850)
>
> at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)
>
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)
>
> at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)
>
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865)
>
> at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1580)
>
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294)
>
> at
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)
>
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)
>
> at
> org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)
>
> at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)
>
> at
> org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)
>
> at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1482)
>
> at
> org.apache.hadoop.hbase.client.HTable.put(HTable.java:1095)
>
> at
> com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper$1.run(ModempollHbaseLoadHelper.java:89)
>
> at java.security.AccessController.doPrivileged(Native
> Method)
>
> at javax.security.auth.Subject.doAs(Subject.java:356)
>
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)
>
> at
> com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper.loadToHbase(ModempollHbaseLoadHelper.java:48)
>
> at
> com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:52)
>
> at
> com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:48)
>
> at
> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:999)
>
> at
> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>
> at
> scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
>
> at
> scala.collection.