Re: Spark session dies in about 2 days: HDFS_DELEGATION_TOKEN token can'tbe found

2016-03-14 Thread Nikhil Gs
Mine is the same scenario. I get the HDFS_DELEGATION_TOKEN issue exactly
after the 7 days of the spark job started and it then gets killed.

Even  I'm also looking for the solution.

Regards,
Nik.

On Fri, Mar 11, 2016 at 8:10 PM, Ruslan Dautkhanov 
wrote:

> [image: Boxbe]  This message is eligible
> for Automatic Cleanup! (dautkha...@gmail.com) Add cleanup rule
> 
> | More info
> 
>
> Spark session dies out after ~40 hours when running against Hadoop Secure
> cluster.
>
> spark-submit has --principal and --keytab so kerberos ticket renewal works
> fine according to logs.
>
> Some happens with HDFS dfs connection?
>
> These messages come up every 1 second:
>   See complete stack: http://pastebin.com/QxcQvpqm
>
> 16/03/11 16:04:59 WARN hdfs.LeaseRenewer: Failed to renew lease for
>> [DFSClient_NONMAPREDUCE_1534318438_13] for 2802 seconds.  Will retry
>> shortly ...
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
>> cache
>
>
> Then in 1 hour it stops trying:
>
> 16/03/11 16:18:17 WARN hdfs.DFSClient: Failed to renew lease for
>> DFSClient_NONMAPREDUCE_1534318438_13 for 3600 seconds (>= hard-limit =3600
>> seconds.) Closing all files being written ...
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>> token (HDFS_DELEGATION_TOKEN token 1349 for rdautkha) can't be found in
>> cache
>
>
> It doesn't look it is Kerberos principal ticket renewal problem, because
> that would expire much sooner (by default we have 12 hours), and from the
> logs Spark kerberos ticket renewer works fine.
>
> It's some sort of other hdfs delegation token renewal process that breaks?
>
> RHEL 6.7
>> Spark 1.5
>> Hadoop 2.6
>
>
> Found HDFS-5322, YARN-2648 that seem relevant, but I am not sure if it's
> the same problem.
> It seems Spark problem as I only seen this problem in Spark.
> This is reproducible problem, just wait for ~40 hours and a Spark session
> is no good.
>
>
> Thanks,
> Ruslan
>
>
>


Re: Spark Token Expired Exception

2016-01-06 Thread Nikhil Gs
These are my versions

cdh version = 5.4.1
spark version, 1.3.0
kafka = KAFKA-0.8.2.0-1.kafka1.3.1.p0.9
hbase versions = 1.0.0

Regards,
Nik.

On Wed, Jan 6, 2016 at 3:50 PM, Ted Yu <yuzhih...@gmail.com> wrote:

> Which Spark / hadoop release are you using ?
>
> Thanks
>
> On Wed, Jan 6, 2016 at 12:16 PM, Nikhil Gs <gsnikhil1432...@gmail.com>
> wrote:
>
>> Hello Team,
>>
>>
>> Thank you for your time in advance.
>>
>>
>> Below are the log lines of my spark job which is used for consuming the
>> messages from Kafka Instance and its loading to Hbase. I have noticed the
>> below Warn lines and later it resulted to errors. But I noticed that,
>> exactly after 7 days the token is getting expired and its trying to renew
>> the token but its not able to even after retrying it. Mine is a Kerberos
>> cluster. Can you please look into it and guide me whats the issue.
>>
>>
>> Your time and suggestions are very valuable.
>>
>>
>>
>> 15/12/29 11:33:50 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141043 ms.0 from job set of time 145141043 ms
>>
>> 15/12/29 11:33:50 INFO scheduler.JobScheduler: Starting job streaming job
>> 145141043 ms.1 from job set of time 145141043 ms
>>
>> 15/12/29 11:33:50 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141043 ms.1 from job set of time 145141043 ms
>>
>> 15/12/29 11:33:50 INFO rdd.BlockRDD: Removing RDD 120956 from persistence
>> list
>>
>> 15/12/29 11:33:50 INFO scheduler.JobScheduler: Total delay: 0.003 s for
>> time 145141043 ms (execution: 0.000 s)
>>
>> 15/12/29 11:33:50 INFO storage.BlockManager: Removing RDD 120956
>>
>> 15/12/29 11:33:50 INFO kafka.KafkaInputDStream: Removing blocks of RDD
>> BlockRDD[120956] at createStream at SparkStreamingEngine.java:40 of time
>> 145141043 ms
>>
>> 15/12/29 11:33:50 INFO rdd.MapPartitionsRDD: Removing RDD 120957 from
>> persistence list
>>
>> 15/12/29 11:33:50 INFO storage.BlockManager: Removing RDD 120957
>>
>> 15/12/29 11:33:50 INFO scheduler.ReceivedBlockTracker: Deleting batches
>> ArrayBuffer(145141041 ms)
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Added jobs for time
>> 145141044 ms
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Starting job streaming job
>> 145141044 ms.0 from job set of time 145141044 ms
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141044 ms.0 from job set of time 145141044 ms
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Starting job streaming job
>> 145141044 ms.1 from job set of time 145141044 ms
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141044 ms.1 from job set of time 145141044 ms
>>
>> 15/12/29 11:34:00 INFO rdd.BlockRDD: Removing RDD 120958 from persistence
>> list
>>
>> 15/12/29 11:34:00 INFO scheduler.JobScheduler: Total delay: 0.003 s for
>> time 145141044 ms (execution: 0.001 s)
>>
>> 15/12/29 11:34:00 INFO storage.BlockManager: Removing RDD 120958
>>
>> 15/12/29 11:34:00 INFO kafka.KafkaInputDStream: Removing blocks of RDD
>> BlockRDD[120958] at createStream at SparkStreamingEngine.java:40 of time
>> 145141044 ms
>>
>> 15/12/29 11:34:00 INFO rdd.MapPartitionsRDD: Removing RDD 120959 from
>> persistence list
>>
>> 15/12/29 11:34:00 INFO storage.BlockManager: Removing RDD 120959
>>
>> 15/12/29 11:34:00 INFO scheduler.ReceivedBlockTracker: Deleting batches
>> ArrayBuffer(145141042 ms)
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Added jobs for time
>> 145141045 ms
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Starting job streaming job
>> 145141045 ms.0 from job set of time 145141045 ms
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141045 ms.0 from job set of time 145141045 ms
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Starting job streaming job
>> 145141045 ms.1 from job set of time 145141045 ms
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Finished job streaming job
>> 145141045 ms.1 from job set of time 145141045 ms
>>
>> 15/12/29 11:34:10 INFO rdd.BlockRDD: Removing RDD 120960 from persistence
>> list
>>
>> 15/12/29 11:34:10 INFO scheduler.JobScheduler: Total delay: 0.004 s for
>> time 145141045 ms (execution: 0.001

Spark Token Expired Exception

2016-01-06 Thread Nikhil Gs
Hello Team,


Thank you for your time in advance.


Below are the log lines of my spark job which is used for consuming the
messages from Kafka Instance and its loading to Hbase. I have noticed the
below Warn lines and later it resulted to errors. But I noticed that,
exactly after 7 days the token is getting expired and its trying to renew
the token but its not able to even after retrying it. Mine is a Kerberos
cluster. Can you please look into it and guide me whats the issue.


Your time and suggestions are very valuable.



15/12/29 11:33:50 INFO scheduler.JobScheduler: Finished job streaming job
145141043 ms.0 from job set of time 145141043 ms

15/12/29 11:33:50 INFO scheduler.JobScheduler: Starting job streaming job
145141043 ms.1 from job set of time 145141043 ms

15/12/29 11:33:50 INFO scheduler.JobScheduler: Finished job streaming job
145141043 ms.1 from job set of time 145141043 ms

15/12/29 11:33:50 INFO rdd.BlockRDD: Removing RDD 120956 from persistence
list

15/12/29 11:33:50 INFO scheduler.JobScheduler: Total delay: 0.003 s for
time 145141043 ms (execution: 0.000 s)

15/12/29 11:33:50 INFO storage.BlockManager: Removing RDD 120956

15/12/29 11:33:50 INFO kafka.KafkaInputDStream: Removing blocks of RDD
BlockRDD[120956] at createStream at SparkStreamingEngine.java:40 of time
145141043 ms

15/12/29 11:33:50 INFO rdd.MapPartitionsRDD: Removing RDD 120957 from
persistence list

15/12/29 11:33:50 INFO storage.BlockManager: Removing RDD 120957

15/12/29 11:33:50 INFO scheduler.ReceivedBlockTracker: Deleting batches
ArrayBuffer(145141041 ms)

15/12/29 11:34:00 INFO scheduler.JobScheduler: Added jobs for time
145141044 ms

15/12/29 11:34:00 INFO scheduler.JobScheduler: Starting job streaming job
145141044 ms.0 from job set of time 145141044 ms

15/12/29 11:34:00 INFO scheduler.JobScheduler: Finished job streaming job
145141044 ms.0 from job set of time 145141044 ms

15/12/29 11:34:00 INFO scheduler.JobScheduler: Starting job streaming job
145141044 ms.1 from job set of time 145141044 ms

15/12/29 11:34:00 INFO scheduler.JobScheduler: Finished job streaming job
145141044 ms.1 from job set of time 145141044 ms

15/12/29 11:34:00 INFO rdd.BlockRDD: Removing RDD 120958 from persistence
list

15/12/29 11:34:00 INFO scheduler.JobScheduler: Total delay: 0.003 s for
time 145141044 ms (execution: 0.001 s)

15/12/29 11:34:00 INFO storage.BlockManager: Removing RDD 120958

15/12/29 11:34:00 INFO kafka.KafkaInputDStream: Removing blocks of RDD
BlockRDD[120958] at createStream at SparkStreamingEngine.java:40 of time
145141044 ms

15/12/29 11:34:00 INFO rdd.MapPartitionsRDD: Removing RDD 120959 from
persistence list

15/12/29 11:34:00 INFO storage.BlockManager: Removing RDD 120959

15/12/29 11:34:00 INFO scheduler.ReceivedBlockTracker: Deleting batches
ArrayBuffer(145141042 ms)

15/12/29 11:34:10 INFO scheduler.JobScheduler: Added jobs for time
145141045 ms

15/12/29 11:34:10 INFO scheduler.JobScheduler: Starting job streaming job
145141045 ms.0 from job set of time 145141045 ms

15/12/29 11:34:10 INFO scheduler.JobScheduler: Finished job streaming job
145141045 ms.0 from job set of time 145141045 ms

15/12/29 11:34:10 INFO scheduler.JobScheduler: Starting job streaming job
145141045 ms.1 from job set of time 145141045 ms

15/12/29 11:34:10 INFO scheduler.JobScheduler: Finished job streaming job
145141045 ms.1 from job set of time 145141045 ms

15/12/29 11:34:10 INFO rdd.BlockRDD: Removing RDD 120960 from persistence
list

15/12/29 11:34:10 INFO scheduler.JobScheduler: Total delay: 0.004 s for
time 145141045 ms (execution: 0.001 s)

15/12/29 11:34:10 INFO storage.BlockManager: Removing RDD 120960

15/12/29 11:34:10 INFO kafka.KafkaInputDStream: Removing blocks of RDD
BlockRDD[120960] at createStream at SparkStreamingEngine.java:40 of time
145141045 ms

15/12/29 11:34:10 INFO rdd.MapPartitionsRDD: Removing RDD 120961 from
persistence list

15/12/29 11:34:10 INFO storage.BlockManager: Removing RDD 120961

15/12/29 11:34:10 INFO scheduler.ReceivedBlockTracker: Deleting batches
ArrayBuffer(145141043 ms)

15/12/29 11:34:13 WARN security.UserGroupInformation:
PriviledgedActionException as:s (auth:SIMPLE)
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 3104414 for s) is expired

15/12/29 11:34:13 *WARN ipc.Client: Exception encountered while connecting
to the server* :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
token (HDFS_DELEGATION_TOKEN token 3104414 for s) is expired

15/12/29 11:34:13 *WARN security.UserGroupInformation:
PriviledgedActionException as:s (auth:SIMPLE) *
cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):*
token 

Re: Spark Job is getting killed after certain hours

2015-11-17 Thread Nikhil Gs
Hello Everyone,

Firstly, thank you so much for the response. In our cluster, we are using
Spark 1.3.0 and our cluster version is CDH 5.4.1. Yes, we are also using
Kerbros in our cluster and the kerberos version is 1.10.3.

The error "*GSS initiate failed [Caused by GSSException: No valid
credentials provided" *was occurring when we are trying to load data from
kafka  topic to hbase by using Spark classes and spark submit job.

My question is, we also have an other project named as XXX in our cluster
which is successfully deployed and its running and the scenario for that
project is, flume + Spark submit + Hbase table. For this scenario, it works
fine in our Kerberos cluster and why not for kafkatopic + Spark Submit +
Hbase table.

Are we doing anything wrong? Not able to figure out? Please suggest us.

Thanks in advance!

Regards,
Nik.

On Tue, Nov 17, 2015 at 4:03 AM, Steve Loughran <ste...@hortonworks.com>
wrote:

>
> On 17 Nov 2015, at 02:00, Nikhil Gs <gsnikhil1432...@gmail.com> wrote:
>
> Hello Team,
>
> Below is the error which we are facing in our cluster after 14 hours of
> starting the spark submit job. Not able to understand the issue and why its
> facing the below error after certain time.
>
> If any of you have faced the same scenario or if you have any idea then
> please guide us. To identify the issue, if you need any other info then
> please revert me back with the requirement.Thanks a lot in advance.
>
> *Log Error:  *
>
> 15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication failed.
> The most likely cause is missing or invalid credentials. Consider 'kinit'.
>
> javax.security.sasl.SaslException: *GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]*
>
>
> I keep my list of causes of error messages online:
> https://steveloughran.gitbooks.io/kerberos_and_hadoop/content/sections/errors.html
>
> Spark only support long-lived work on a kerberos cluster from 1.5+, with a
> keytab being supplied to the job. Without this, the yarn client grabs some
> tickets at launch time and hangs on until they expire, which for you is 14
> hours
>
> (For anyone using ticket-at-launch auth, know that Spark 1.5.0-1.5.2
> doesnt talk to Hive on a kerberized cluster; some reflection-related issues
> which weren't picked up during testing. 1.5.3 will fix this
>


Spark Job is getting killed after certain hours

2015-11-16 Thread Nikhil Gs
Hello Team,

Below is the error which we are facing in our cluster after 14 hours of
starting the spark submit job. Not able to understand the issue and why its
facing the below error after certain time.

If any of you have faced the same scenario or if you have any idea then
please guide us. To identify the issue, if you need any other info then
please revert me back with the requirement.Thanks a lot in advance.

*Log Error:  *

15/11/16 04:54:48 ERROR ipc.AbstractRpcClient: SASL authentication failed.
The most likely cause is missing or invalid credentials. Consider 'kinit'.

javax.security.sasl.SaslException: *GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to
find any Kerberos tgt)]*

at
com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)

at
org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:605)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$600(RpcClientImpl.java:154)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:731)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:728)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:728)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:881)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:850)

at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1174)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:216)

at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:300)

at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:31865)

at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1580)

at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1294)

at
org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1126)

at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:369)

at
org.apache.hadoop.hbase.client.AsyncProcess.submit(AsyncProcess.java:320)

at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.backgroundFlushCommits(BufferedMutatorImpl.java:206)

at
org.apache.hadoop.hbase.client.BufferedMutatorImpl.flush(BufferedMutatorImpl.java:183)

at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1482)

at
org.apache.hadoop.hbase.client.HTable.put(HTable.java:1095)

at
com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper$1.run(ModempollHbaseLoadHelper.java:89)

at java.security.AccessController.doPrivileged(Native
Method)

at javax.security.auth.Subject.doAs(Subject.java:356)

at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1651)

at
com.suxk.bigdata.pulse.consumer.ModempollHbaseLoadHelper.loadToHbase(ModempollHbaseLoadHelper.java:48)

at
com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:52)

at
com.suxk.bigdata.pulse.consumer.ModempollSparkStreamingEngine$1.call(ModempollSparkStreamingEngine.java:48)

at
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:999)

at
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at
scala.collection.Iterator$$anon$10.next(Iterator.scala:312)

at
scala.collection.Iterator$class.foreach(Iterator.scala:727)

at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to

Kafka and Spark combination

2015-10-09 Thread Nikhil Gs
Has anyone worked with Kafka in a scenario where the Streaming data from
the Kafka consumer is picked by Spark (Java) functionality and directly
placed in Hbase.

Regards,
Gs.


Re: Kafka streaming "at least once" semantics

2015-10-09 Thread Nikhil Gs
Hello Everyone,

Has anyone worked with Kafka in a scenario where the Streaming data from
the Kafka consumer is picked by Spark (Java) functionality and directly
placed in Hbase.

Please let me know, we are completely new to this scenario. That will be
very helpful.

Regards,
GS.

Regards,
Nik.

On Fri, Oct 9, 2015 at 7:30 AM, pushkar priyadarshi <
priyadarshi.push...@gmail.com> wrote:

> [image: Boxbe]  This message is eligible
> for Automatic Cleanup! (priyadarshi.push...@gmail.com) Add cleanup rule
> 
> | More info
> 
>
> Spark 1.5 kafka direct i think does not store messages rather than it
> fetches messages as in when consumed in the pipeline.That would prevent you
> from having data loss.
>
>
>
> On Fri, Oct 9, 2015 at 7:34 AM, bitborn  wrote:
>
>> Hi all,
>>
>> My company is using Spark streaming and the Kafka API's to process an
>> event
>> stream. We've got most of our application written, but are stuck on "at
>> least once" processing.
>>
>> I created a demo to show roughly what we're doing here:
>> https://github.com/bitborn/resilient-kafka-streaming-in-spark
>> 
>>
>> The problem we're having is when the application experiences an exception
>> (network issue, out of memory, etc) it will drop the batch it's
>> processing.
>> The ideal behavior is it will process each event "at least once" even if
>> that means processing it more than once. Whether this happens via
>> checkpointing, WAL, or kafka offsets is irrelevant, as long as we don't
>> drop
>> data. :)
>>
>> A couple of things we've tried:
>> - Using the kafka direct stream API (via  Cody Koeninger
>> <
>> https://github.com/koeninger/kafka-exactly-once/blob/master/src/main/scala/example/IdempotentExample.scala
>> >
>> )
>> - Using checkpointing with both the low-level and high-level API's
>> - Enabling the write ahead log
>>
>> I've included a log here  spark.log
>> <
>> https://github.com/bitborn/resilient-kafka-streaming-in-spark/blob/master/spark.log
>> >
>> , but I'm afraid it doesn't reveal much.
>>
>> The fact that others seem to be able to get this working properly suggests
>> we're missing some magic configuration or are possibly executing it in a
>> way
>> that won't support the desired behavior.
>>
>> I'd really appreciate some pointers!
>>
>> Thanks much,
>> Andrew Clarkson
>>
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-streaming-at-least-once-semantics-tp24995.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>