Re: Spark + Kinesis + Stream Name + Cache?

2015-05-08 Thread Mike Trienis
Hey Chris!

I was happy to see the documentation outlining that issue :-) However, I
must have got into a pretty terrible state because I had to delete and
recreate the kinesis streams as well as the DynamoDB tables.

Thanks for the reply, everything is sorted.

Mike




On Fri, May 8, 2015 at 7:55 PM, Chris Fregly  wrote:

> hey mike-
>
> as you pointed out here from my docs, changing the stream name is
> sometimes problematic due to the way the Kinesis Client Library manages
> leases and checkpoints, etc in DynamoDB.
>
> I noticed this directly while developing the Kinesis connector which is
> why I highlighted the issue here.
>
> is wiping out the DynamoDB table a suitable workaround for now?  usually
> in production, you wouldn't be changing stream names often, so hopefully
> that's ok for dev.
>
> otherwise, can you share the relevant spark streaming logs that are
> generated when you do this?
>
> I saw a lot of "lease not owned by this Kinesis Client" type of errors,
> from what I remember.
>
> lemme know!
>
> -Chris
>
> On May 8, 2015, at 4:36 PM, Mike Trienis  wrote:
>
>
>- [Kinesis stream name]: The Kinesis stream that this streaming
>application receives from
>   - The application name used in the streaming context becomes the
>   Kinesis application name
>   - The application name must be unique for a given account and
>   region.
>   - The Kinesis backend automatically associates the application name
>   to the Kinesis stream using a DynamoDB table (always in the us-east-1
>   region) created during Kinesis Client Library initialization.
>   - *Changing the application name or stream name can lead to Kinesis
>   errors in some cases. If you see errors, you may need to manually delete
>   the DynamoDB table.*
>
>
> On Fri, May 8, 2015 at 2:06 PM, Mike Trienis 
> wrote:
>
>> Hi All,
>>
>> I am submitting the assembled fat jar file by the command:
>>
>> bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar
>> --class com.xxx.Consumer -0.1-SNAPSHOT.jar
>>
>> It reads the data file from kinesis using the stream name defined in a
>> configuration file. It turns out that it reads the data perfectly fine for
>> one stream name (i.e. the default), however, if I switch the stream name
>> and re-submit the jar, it no longer reads the data. From CloudWatch, I can
>> see that there is data put into the stream and spark is actually sending
>> get requests as well. However, it doesn't seem to be outputting the data.
>>
>> Has anyone else encountered a similar issue? Does spark cache the stream
>> name somewhere? I also have checkpointing enabled as well.
>>
>> Thanks, Mike.
>>
>>
>>
>>
>>
>>
>


Re: Spark + Kinesis + Stream Name + Cache?

2015-05-08 Thread Chris Fregly
hey mike-

as you pointed out here from my docs, changing the stream name is sometimes 
problematic due to the way the Kinesis Client Library manages leases and 
checkpoints, etc in DynamoDB.

I noticed this directly while developing the Kinesis connector which is why I 
highlighted the issue here.

is wiping out the DynamoDB table a suitable workaround for now?  usually in 
production, you wouldn't be changing stream names often, so hopefully that's ok 
for dev.

otherwise, can you share the relevant spark streaming logs that are generated 
when you do this?

I saw a lot of "lease not owned by this Kinesis Client" type of errors, from 
what I remember.

lemme know!

-Chris 

> On May 8, 2015, at 4:36 PM, Mike Trienis  wrote:
> 
> [Kinesis stream name]: The Kinesis stream that this streaming application 
> receives from
> The application name used in the streaming context becomes the Kinesis 
> application name
> The application name must be unique for a given account and region.
> The Kinesis backend automatically associates the application name to the 
> Kinesis stream using a DynamoDB table (always in the us-east-1 region) 
> created during Kinesis Client Library initialization.
> Changing the application name or stream name can lead to Kinesis errors in 
> some cases. If you see errors, you may need to manually delete the DynamoDB 
> table.
> 
>> On Fri, May 8, 2015 at 2:06 PM, Mike Trienis  wrote:
>> Hi All,
>> 
>> I am submitting the assembled fat jar file by the command:
>> 
>> bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar --class 
>> com.xxx.Consumer -0.1-SNAPSHOT.jar 
>> 
>> It reads the data file from kinesis using the stream name defined in a 
>> configuration file. It turns out that it reads the data perfectly fine for 
>> one stream name (i.e. the default), however, if I switch the stream name and 
>> re-submit the jar, it no longer reads the data. From CloudWatch, I can see 
>> that there is data put into the stream and spark is actually sending get 
>> requests as well. However, it doesn't seem to be outputting the data. 
>> 
>> Has anyone else encountered a similar issue? Does spark cache the stream 
>> name somewhere? I also have checkpointing enabled as well.
>> 
>> Thanks, Mike. 
> 


Re: Spark + Kinesis + Stream Name + Cache?

2015-05-08 Thread Mike Trienis
   - [Kinesis stream name]: The Kinesis stream that this streaming
   application receives from
  - The application name used in the streaming context becomes the
  Kinesis application name
  - The application name must be unique for a given account and region.
  - The Kinesis backend automatically associates the application name
  to the Kinesis stream using a DynamoDB table (always in the us-east-1
  region) created during Kinesis Client Library initialization.
  - *Changing the application name or stream name can lead to Kinesis
  errors in some cases. If you see errors, you may need to manually delete
  the DynamoDB table.*


On Fri, May 8, 2015 at 2:06 PM, Mike Trienis 
wrote:

> Hi All,
>
> I am submitting the assembled fat jar file by the command:
>
> bin/spark-submit --jars /spark-streaming-kinesis-asl_2.10-1.3.0.jar
> --class com.xxx.Consumer -0.1-SNAPSHOT.jar
>
> It reads the data file from kinesis using the stream name defined in a
> configuration file. It turns out that it reads the data perfectly fine for
> one stream name (i.e. the default), however, if I switch the stream name
> and re-submit the jar, it no longer reads the data. From CloudWatch, I can
> see that there is data put into the stream and spark is actually sending
> get requests as well. However, it doesn't seem to be outputting the data.
>
> Has anyone else encountered a similar issue? Does spark cache the stream
> name somewhere? I also have checkpointing enabled as well.
>
> Thanks, Mike.
>
>
>
>
>
>