Re: Having problem with Spark streaming with Kinesis

Ashrafuzzaman Wed, 26 Nov 2014 20:18:06 -0800

I was trying in one machine with just sbt run.

And it is working with my mac environment with the same configuration.


I used the sample code from
https://github.com/apache/spark/blob/master/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala


val kinesisClient = new AmazonKinesisClient(new
DefaultAWSCredentialsProviderChain())
kinesisClient.setEndpoint(endpointUrl)
val numShards =
kinesisClient.describeStream(streamName).getStreamDescription().getShards()
  .size()

/* In this example, we're going to create 1 Kinesis Worker/Receiver/DStream
for each shard. */
val numStreams = numShards

/* Setup the and SparkConfig and StreamingContext */
/* Spark Streaming batch interval */
val batchInterval = Milliseconds(2000)
val sparkConfig = new SparkConf().setAppName("KinesisWordCount")
val ssc = new StreamingContext(sparkConfig, batchInterval)

/* Kinesis checkpoint interval.  Same as batchInterval for this example. */
val kinesisCheckpointInterval = batchInterval

/* Create the same number of Kinesis DStreams/Receivers as Kinesis stream's
shards */
val kinesisStreams = (0 until numStreams).map { i =>
  KinesisUtils.createStream(ssc, streamName, endpointUrl,
kinesisCheckpointInterval,
      InitialPositionInStream.LATEST, StorageLevel.MEMORY_AND_DISK_2)
}

/* Union all the streams */
val unionStreams = ssc.union(kinesisStreams)

/* Convert each line of Array[Byte] to String, split into words, and count
them */
val words = unionStreams.flatMap(byteArray => new String(byteArray)
  .split(" "))

/* Map each word to a (word, 1) tuple so we can reduce/aggregate by key. */
val wordCounts = words.map(word => (word, 1)).reduceByKey(_ + _)

/* Print the first 10 wordCounts */
wordCounts.print()

/* Start the streaming context and await termination */
ssc.start()
ssc.awaitTermination()



A.K.M. Ashrafuzzaman
Lead Software Engineer
NewsCred <http://www.newscred.com>

(M) 880-175-5592433
Twitter <https://twitter.com/ashrafuzzaman> | Blog
<http://jitu-blog.blogspot.com/> | Facebook
<https://www.facebook.com/ashrafuzzaman.jitu>

Check out The Academy <http://newscred.com/theacademy>, your #1 source
for free content marketing resources

On Wed, Nov 26, 2014 at 11:26 PM, Aniket Bhatnagar <
aniket.bhatna...@gmail.com> wrote:

> What's your cluster size? For streamig to work, it needs shards + 1
> executors.
>
> On Wed, Nov 26, 2014, 5:53 PM A.K.M. Ashrafuzzaman <
> ashrafuzzaman...@gmail.com> wrote:
>
>> Hi guys,
>> When we are using Kinesis with 1 shard then it works fine. But when we
>> use more that 1 then it falls into an infinite loop and no data is
>> processed by the spark streaming. In the kinesis dynamo DB, I can see that
>> it keeps increasing the leaseCounter. But it do start processing.
>>
>> I am using,
>> scala: 2.10.4
>> java version: 1.8.0_25
>> Spark: 1.1.0
>> spark-streaming-kinesis-asl: 1.1.0
>>
>> A.K.M. Ashrafuzzaman
>> Lead Software Engineer
>> NewsCred <http://www.newscred.com/>
>>
>> (M) 880-175-5592433
>> Twitter <https://twitter.com/ashrafuzzaman> | Blog
>> <http://jitu-blog.blogspot.com/> | Facebook
>> <https://www.facebook.com/ashrafuzzaman.jitu>
>>
>> Check out The Academy <http://newscred.com/theacademy>, your #1 source
>> for free content marketing resources
>>
>>

Re: Having problem with Spark streaming with Kinesis

Reply via email to