Hi Tathagata, I think that's exactly what's happening.
The error message is: "com.amazonaws.services.kinesis.model.InvalidArgumentException: StartingSequenceNumber 49550673839151225431779125105915140284622031848663416866 used in GetShardIterator on shard shardId-000000000002 in stream erich-test under account xxxxxxx is invalid because it did not come from this stream". I looked at the DynamoDB table and each job has single table and that table does not contain any stream identification information, only shard checkpointing data. I think the error is that when it tries to read from stream B, it's using checkpointing data for stream A and errors out. So it appears, at first glance, that currently you can't read from multiple Kinesis streams in a single job. I haven't tried this, but it might be possible for this to work if I force each stream to have different shard IDs so there is no ambiguity in the DynamoDB table; however, that's clearly not a feasible production solution. Thanks, -Erich On Thu, May 14, 2015 at 8:34 PM, Tathagata Das <[email protected]> wrote: > A possible problem may be that the kinesis stream in 1.3 uses the > SparkContext app name, as the Kinesis Application Name, that is used by the > Kinesis Client Library to save checkpoints in DynamoDB. Since both kinesis > DStreams are using the Kinesis application name (as they are in the same > StreamingContext / SparkContext / Spark app name), KCL may be doing weird > overwriting checkpoint information of both Kinesis streams into the same > DynamoDB table. Either ways, this is going to be fixed in Spark 1.4. > > On Thu, May 14, 2015 at 4:10 PM, Chris Fregly <[email protected]> wrote: > >> have you tried to union the 2 streams per the KinesisWordCountASL example >> <https://github.com/apache/spark/blob/branch-1.3/extras/kinesis-asl/src/main/scala/org/apache/spark/examples/streaming/KinesisWordCountASL.scala#L120> >> where >> 2 streams (against the same Kinesis stream in this case) are created and >> union'd? >> >> it should work the same way - including union() of streams from totally >> different source types (kafka, kinesis, flume). >> >> >> >> On Thu, May 14, 2015 at 2:07 PM, Tathagata Das <[email protected]> >> wrote: >> >>> What is the error you are seeing? >>> >>> TD >>> >>> On Thu, May 14, 2015 at 9:00 AM, Erich Ess <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> Is it possible to setup streams from multiple Kinesis streams and >>>> process >>>> them in a single job? From what I have read, this should be possible, >>>> however, the Kinesis layer errors out whenever I try to receive from >>>> more >>>> than a single Kinesis Stream. >>>> >>>> Here is the code. Currently, I am focused on just getting receivers >>>> setup >>>> and working for the two Kinesis Streams, as such, this code just >>>> attempts to >>>> print out the contents of both streams: >>>> >>>> implicit val formats = Serialization.formats(NoTypeHints) >>>> >>>> val conf = new SparkConf().setMaster("local[*]").setAppName("test") >>>> val ssc = new StreamingContext(conf, Seconds(1)) >>>> >>>> val rawStream = KinesisUtils.createStream(ssc, "erich-test", >>>> "kinesis.us-east-1.amazonaws.com", Duration(1000), >>>> InitialPositionInStream.TRIM_HORIZON, StorageLevel.MEMORY_ONLY) >>>> rawStream.map(msg => new String(msg)).print >>>> >>>> val loaderStream = KinesisUtils.createStream( >>>> ssc, >>>> "dev-loader", >>>> "kinesis.us-east-1.amazonaws.com", >>>> Duration(1000), >>>> InitialPositionInStream.TRIM_HORIZON, >>>> StorageLevel.MEMORY_ONLY) >>>> >>>> val loader = loaderStream.map(msg => new String(msg)).print >>>> >>>> ssc.start() >>>> >>>> Thanks, >>>> -Erich >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-Kinesis-Streams-in-a-single-Streaming-job-tp22889.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] >>>> >>>> >>> >> > -- Erich Ess | CTO c. 310-703-6058 @SimpleRelevance | 130 E Randolph, Ste 1650 | Chicago, IL 60601 *Machine Learning For Marketers* Named a top startup to watch in Crain's — View the Article. <http://www.chicagobusiness.com/article/20130928/ISSUE02/130929801/big-data-draws-big-interest-and-simple-relevance-is-leading-the> SimpleRelevance.com <http://simplerelevance.com/> | Facebook <https://www.facebook.com/simplerelevance> | Twitter <http://www.twitter.com/simplerelevance> | Blog <http://blog.simplerelevance.com/>
