Re: is stateful bolts production ready?

2017-08-14 Thread Arun Mahadevan
>2. Subclass StateProvider to return State objects for namespaces of
interest. For example, in our case, we wish to use a custom state class in
one of the bolts >and use defaults for spouts. In this case, is it safe to
return custom states for the bolt in concern and use the default state
provider
>(InMemoryKeyValueStateProvider) for other namespaces? Is the custom
provider supposed to load last saved state for the namespace in concern
from the >persistent store. Again if state persistence is handled by
framework, how do we know where to get state from?

The state provider config ("topology.state.provider") is per topology, so
you need to choose one of the implementations. You can always have a mix of
stateless (regular) and stateful bolts in your topology. If you are
providing your own state implementation, basically you need to maintain a
separate key-value space for each" namespace". In the case of Redis the
namespace maps to a Redis hash.

See
https://github.com/apache/storm/blob/master/external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueStateProvider.java
and
https://github.com/apache/storm/blob/master/external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueState.java

> 3. Are checkpoint related methods called by the same bolt or spout thread?

Yes its invoked by the same thread that invokes execute.

Thanks,
Arun

On 14 August 2017 at 18:13, Wijekoon, Manusha <manusha.wijek...@citi.com>
wrote:

> Hi Arun,
>
>
>
> Could you please help me with my questions 2 and 3 if possible?
>
>
>
>
>
> Thanks
>
> Manusha
>
>
>
> *From:* Arun Iyer [mailto:ai...@hortonworks.com] *On Behalf Of *Arun
> Mahadevan
> *Sent:* 11 August 2017 10:20
> *To:* Wijekoon, Manusha [ICG-IT]
> *Cc:* user@storm.apache.org
>
> *Subject:* Re: is stateful bolts production ready?
>
>
>
> If you want to use the provided state implementations, you don’t need to
> do any of what you mentioned. You bolt would be initialed with its last
> know state in “initState” and the bolt can keep updating the state in
> “execute". The framework will automatically save the state to the state
> backend periodically. See StatefulTopology[1] for example.
>
>
>
> Right now Storm supports Redis and Hbase as state backends. If you are
> want your own state backend, you need to implement the  get/put/delete and
> the logic for prepare/commit/rollback etc. See Hbase[2] and Redis[3] state
> implementations to get a better idea. Anyways I don’t think Kafka would be
> ideal as a KV state backend since its not easy to do KV lookups without
> loading all the data into memory or you put some KV store on top of it.
>
>
>
> >In addition to the query, what is the intent of stateful bolt since we
> can just hold state in bolt instance?
>
>
>
> It mostly automates what you would have to implement otherwise and also
> ensures that the state is saved consistently across the whole topology
> (i.e. If you have multiple bolts with state, all of their states are saved
> in an atomic manner).
>
>
>
> Thanks,
>
> Arun
>
>
>
> [1] https://github.com/apache/storm/blob/master/examples/
> storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=3mxs_eU-TOeBVqVbQgAj5M7ZYjvBDTrYiIWErFD8nnE=>
>
> [2] https://github.com/apache/storm/blob/master/external/
> storm-hbase/src/main/java/org/apache/storm/hbase/state/
> HBaseKeyValueState.java
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_external_storm-2Dhbase_src_main_java_org_apache_storm_hbase_state_HBaseKeyValueState.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=HkPAFVdkNwWKHjGlWUZjpe_jz86FCtmCHPO0P7A5Bhg=>
>
> *[3] *https://github.com/apache/storm/blob/master/external/
> storm-redis/src/main/java/org/apache/storm/redis/state/
> RedisKeyValueState.java
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_external_storm-2Dredis_src_main_java_org_apache_storm_redis_state_RedisKeyValueState.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=YmOOKJPS-WDUBjfBefJNr6D9lX3dIB9WXqQx6_jHgGY=>
>
>
>
> *From: *王 纯超 <wangchunc...@outlook.com>
> *Reply-To: *"user@storm.apache.org" <user@storm.apache.org>
> *Date: *Friday, August 11, 2017 at 11:21 AM
> 

RE: is stateful bolts production ready?

2017-08-14 Thread Wijekoon, Manusha
Hi Arun,

Could you please help me with my questions 2 and 3 if possible?


Thanks
Manusha

From: Arun Iyer [mailto:ai...@hortonworks.com] On Behalf Of Arun Mahadevan
Sent: 11 August 2017 10:20
To: Wijekoon, Manusha [ICG-IT]
Cc: user@storm.apache.org
Subject: Re: is stateful bolts production ready?

If you want to use the provided state implementations, you don’t need to do any 
of what you mentioned. You bolt would be initialed with its last know state in 
“initState” and the bolt can keep updating the state in “execute". The 
framework will automatically save the state to the state backend periodically. 
See StatefulTopology[1] for example.

Right now Storm supports Redis and Hbase as state backends. If you are want 
your own state backend, you need to implement the  get/put/delete and the logic 
for prepare/commit/rollback etc. See Hbase[2] and Redis[3] state 
implementations to get a better idea. Anyways I don’t think Kafka would be 
ideal as a KV state backend since its not easy to do KV lookups without loading 
all the data into memory or you put some KV store on top of it.

>In addition to the query, what is the intent of stateful bolt since we can 
>just hold state in bolt instance?

It mostly automates what you would have to implement otherwise and also ensures 
that the state is saved consistently across the whole topology (i.e. If you 
have multiple bolts with state, all of their states are saved in an atomic 
manner).

Thanks,
Arun

[1] 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=3mxs_eU-TOeBVqVbQgAj5M7ZYjvBDTrYiIWErFD8nnE=>
[2] 
https://github.com/apache/storm/blob/master/external/storm-hbase/src/main/java/org/apache/storm/hbase/state/HBaseKeyValueState.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_external_storm-2Dhbase_src_main_java_org_apache_storm_hbase_state_HBaseKeyValueState.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=HkPAFVdkNwWKHjGlWUZjpe_jz86FCtmCHPO0P7A5Bhg=>
[3] 
https://github.com/apache/storm/blob/master/external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueState.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_external_storm-2Dredis_src_main_java_org_apache_storm_redis_state_RedisKeyValueState.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=1skdYniP4f4eOD8je0RHN-9CjoU790CMGiTMUHz5FV4=YmOOKJPS-WDUBjfBefJNr6D9lX3dIB9WXqQx6_jHgGY=>

From: 王 纯超 <wangchunc...@outlook.com<mailto:wangchunc...@outlook.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Friday, August 11, 2017 at 11:21 AM
To: "Wijekoon, Manusha" 
<manusha.wijek...@citi.com<mailto:manusha.wijek...@citi.com>>, 
"user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: RE: is stateful bolts production ready?

In addition to the query, what is the intent of stateful bolt since we can just 
hold state in bolt instance?


wangchunc...@outlook.com<mailto:wangchunc...@outlook.com>

From: Wijekoon, Manusha<mailto:manusha.wijek...@citi.com>
Date: 2017-08-10 18:56
To: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: RE: is stateful bolts production ready?
In our case we prefer to use our own state implementation. After going through 
the code and reading documentation, following is how I understand it. Could you 
please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method 
are we supposed to persists the state by our selves or does the framework take 
care of that? If it is taken care of by the framework, how do we add our own 
persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. 
For example, in our case, we wish to use a custom state class in one of the 
bolts and use defaults for spouts. In this case, is it safe to return custom 
states for the bolt in concern and use the default state provider 
(InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider 
supposed to load last saved state for the namespace in concern from the 
persistent store. Again if state persistence is handled by framework, how do we 
know where to get state from?
3. Are c

Re: Re: is stateful bolts production ready?

2017-08-11 Thread 王 纯超
Thanks Arun, that explains a lot.


wangchunc...@outlook.com

From: Arun Mahadevan<mailto:ar...@apache.org>
Date: 2017-08-11 17:50
To: Wijekoon, Manusha<mailto:manusha.wijek...@citi.com>
CC: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: Re: is stateful bolts production ready?
If you want to use the provided state implementations, you don’t need to do any 
of what you mentioned. You bolt would be initialed with its last know state in 
“initState” and the bolt can keep updating the state in “execute". The 
framework will automatically save the state to the state backend periodically. 
See StatefulTopology[1] for example.

Right now Storm supports Redis and Hbase as state backends. If you are want 
your own state backend, you need to implement the  get/put/delete and the logic 
for prepare/commit/rollback etc. See Hbase[2] and Redis[3] state 
implementations to get a better idea. Anyways I don’t think Kafka would be 
ideal as a KV state backend since its not easy to do KV lookups without loading 
all the data into memory or you put some KV store on top of it.

>In addition to the query, what is the intent of stateful bolt since we can 
>just hold state in bolt instance?

It mostly automates what you would have to implement otherwise and also ensures 
that the state is saved consistently across the whole topology (i.e. If you 
have multiple bolts with state, all of their states are saved in an atomic 
manner).

Thanks,
Arun

[1] 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java
[2] 
https://github.com/apache/storm/blob/master/external/storm-hbase/src/main/java/org/apache/storm/hbase/state/HBaseKeyValueState.java
[3] 
https://github.com/apache/storm/blob/master/external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueState.java

From: 王 纯超 <wangchunc...@outlook.com<mailto:wangchunc...@outlook.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Friday, August 11, 2017 at 11:21 AM
To: "Wijekoon, Manusha" 
<manusha.wijek...@citi.com<mailto:manusha.wijek...@citi.com>>, 
"user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: Re: RE: is stateful bolts production ready?

In addition to the query, what is the intent of stateful bolt since we can just 
hold state in bolt instance?


wangchunc...@outlook.com<mailto:wangchunc...@outlook.com>

From: Wijekoon, Manusha<mailto:manusha.wijek...@citi.com>
Date: 2017-08-10 18:56
To: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: RE: is stateful bolts production ready?
In our case we prefer to use our own state implementation. After going through 
the code and reading documentation, following is how I understand it. Could you 
please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method 
are we supposed to persists the state by our selves or does the framework take 
care of that? If it is taken care of by the framework, how do we add our own 
persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. 
For example, in our case, we wish to use a custom state class in one of the 
bolts and use defaults for spouts. In this case, is it safe to return custom 
states for the bolt in concern and use the default state provider 
(InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider 
supposed to load last saved state for the namespace in concern from the 
persistent store. Again if state persistence is handled by framework, how do we 
know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?

Thanks
Manusha



From: Arun Iyer [ai...@hortonworks.com<mailto:ai...@hortonworks.com>] on behalf 
of Arun Mahadevan [ar...@apache.org<mailto:ar...@apache.org>]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: Re: is stateful bolts production ready?

The bolt just needs to “put” the values into the Key-Value state that the bolt 
gets initialized with during “initState”. The framework automatically takes 
care of saving the state behind the scenes.

Theres an example in storm-starter that you might find useful - 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8Zb

Re: is stateful bolts production ready?

2017-08-11 Thread Arun Mahadevan
If you want to use the provided state implementations, you don’t need to do any 
of what you mentioned. You bolt would be initialed with its last know state in 
“initState” and the bolt can keep updating the state in “execute". The 
framework will automatically save the state to the state backend periodically. 
See StatefulTopology[1] for example.

Right now Storm supports Redis and Hbase as state backends. If you are want 
your own state backend, you need to implement the  get/put/delete and the logic 
for prepare/commit/rollback etc. See Hbase[2] and Redis[3] state 
implementations to get a better idea. Anyways I don’t think Kafka would be 
ideal as a KV state backend since its not easy to do KV lookups without loading 
all the data into memory or you put some KV store on top of it.

>In addition to the query, what is the intent of stateful bolt since we can 
>just hold state in bolt instance?

It mostly automates what you would have to implement otherwise and also ensures 
that the state is saved consistently across the whole topology (i.e. If you 
have multiple bolts with state, all of their states are saved in an atomic 
manner).

Thanks,
Arun

[1] 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java
[2] 
https://github.com/apache/storm/blob/master/external/storm-hbase/src/main/java/org/apache/storm/hbase/state/HBaseKeyValueState.java
[3] 
https://github.com/apache/storm/blob/master/external/storm-redis/src/main/java/org/apache/storm/redis/state/RedisKeyValueState.java

From:  王 纯超 <wangchunc...@outlook.com>
Reply-To:  "user@storm.apache.org" <user@storm.apache.org>
Date:  Friday, August 11, 2017 at 11:21 AM
To:  "Wijekoon, Manusha" <manusha.wijek...@citi.com>, "user@storm.apache.org" 
<user@storm.apache.org>
Subject:  Re: RE: is stateful bolts production ready?

In addition to the query, what is the intent of stateful bolt since we can just 
hold state in bolt instance?
 
wangchunc...@outlook.com
 
From: Wijekoon, Manusha
Date: 2017-08-10 18:56
To: user@storm.apache.org
Subject: RE: is stateful bolts production ready?
In our case we prefer to use our own state implementation. After going through 
the code and reading documentation, following is how I understand it. Could you 
please see if my understanding is correct?
 
1. Derive from State and provide an implementation. In the commit (txID) method 
are we supposed to persists the state by our selves or does the framework take 
care of that? If it is taken care of by the framework, how do we add our own 
persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. 
For example, in our case, we wish to use a custom state class in one of the 
bolts and use defaults for spouts. In this case, is it safe to return custom 
states for the bolt in concern and use the default state provider 
(InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider 
supposed to load last saved state for the namespace in concern from the 
persistent store. Again if state persistence is handled by framework, how do we 
know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?
 
Thanks
Manusha
 
 

From: Arun Iyer [ai...@hortonworks.com] on behalf of Arun Mahadevan 
[ar...@apache.org]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org
Subject: Re: is stateful bolts production ready?
 
The bolt just needs to “put” the values into the Key-Value state that the bolt 
gets initialized with during “initState”. The framework automatically takes 
care of saving the state behind the scenes.
 
Theres an example in storm-starter that you might find useful - 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=TLi3IYjWB8QoSVTxXNx7O2mJk5kuXb5w1SbFUj47OVQ=>
 
You can also find the more elaborate documentation here - 
https://github.com/apache/storm/blob/master/docs/State-checkpointing.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_State-2Dcheckpointing.md=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=dV6qlBomTiYIN23BV3fzJl7nhJBd9ewoFsDi3HkxD6I=>
 
Thanks,
Arun
 
From: "Wijekoon, Manusha" 
<manusha.wijek...@citi.com<mailto:manusha.wijek...@citi.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mail

Re: RE: is stateful bolts production ready?

2017-08-10 Thread 王 纯超
In addition to the query, what is the intent of stateful bolt since we can just 
hold state in bolt instance?


wangchunc...@outlook.com

From: Wijekoon, Manusha<mailto:manusha.wijek...@citi.com>
Date: 2017-08-10 18:56
To: user@storm.apache.org<mailto:user@storm.apache.org>
Subject: RE: is stateful bolts production ready?
In our case we prefer to use our own state implementation. After going through 
the code and reading documentation, following is how I understand it. Could you 
please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method 
are we supposed to persists the state by our selves or does the framework take 
care of that? If it is taken care of by the framework, how do we add our own 
persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. 
For example, in our case, we wish to use a custom state class in one of the 
bolts and use defaults for spouts. In this case, is it safe to return custom 
states for the bolt in concern and use the default state provider 
(InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider 
supposed to load last saved state for the namespace in concern from the 
persistent store. Again if state persistence is handled by framework, how do we 
know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?

Thanks
Manusha



From: Arun Iyer [ai...@hortonworks.com] on behalf of Arun Mahadevan 
[ar...@apache.org]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org
Subject: Re: is stateful bolts production ready?

The bolt just needs to “put” the values into the Key-Value state that the bolt 
gets initialized with during “initState”. The framework automatically takes 
care of saving the state behind the scenes.

Theres an example in storm-starter that you might find useful - 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=TLi3IYjWB8QoSVTxXNx7O2mJk5kuXb5w1SbFUj47OVQ=>

You can also find the more elaborate documentation here - 
https://github.com/apache/storm/blob/master/docs/State-checkpointing.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_State-2Dcheckpointing.md=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=dV6qlBomTiYIN23BV3fzJl7nhJBd9ewoFsDi3HkxD6I=>

Thanks,
Arun

From: "Wijekoon, Manusha" 
<manusha.wijek...@citi.com<mailto:manusha.wijek...@citi.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Monday, July 24, 2017 at 4:04 PM
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: is stateful bolts production ready?

Hello

I am thinking of using stateful bolts to manage state of a bolt. From the 
documentation it is not clear how to save the bolt state however. I understand 
it has to be done when we process the checkpoint tuple, but how? Do I just need 
to update the state object and storm pick it up during three phase commit? How 
does Strom know which state object to pick for check pointing?

I wasn’t able to fine more complete examples either, specifically when we can’t 
keep the state in a kev/value map.

Also, Is this functionality tested in production like environments before?


Thanks
M


RE: is stateful bolts production ready?

2017-08-10 Thread Wijekoon, Manusha
In our case we prefer to use our own state implementation. After going through 
the code and reading documentation, following is how I understand it. Could you 
please see if my understanding is correct?

1. Derive from State and provide an implementation. In the commit (txID) method 
are we supposed to persists the state by our selves or does the framework take 
care of that? If it is taken care of by the framework, how do we add our own 
persisting mechanism - for example one that use Kafka to persist state?
2. Subclass StateProvider to return State objects for namespaces of interest. 
For example, in our case, we wish to use a custom state class in one of the 
bolts and use defaults for spouts. In this case, is it safe to return custom 
states for the bolt in concern and use the default state provider 
(InMemoryKeyValueStateProvider) for other namespaces? Is the custom provider 
supposed to load last saved state for the namespace in concern from the 
persistent store. Again if state persistence is handled by framework, how do we 
know where to get state from?
3. Are checkpoint related methods called by the same bolt or spout thread?

Thanks
Manusha



From: Arun Iyer [ai...@hortonworks.com] on behalf of Arun Mahadevan 
[ar...@apache.org]
Sent: Monday, July 24, 2017 2:29 PM
To: user@storm.apache.org
Subject: Re: is stateful bolts production ready?

The bolt just needs to “put” the values into the Key-Value state that the bolt 
gets initialized with during “initState”. The framework automatically takes 
care of saving the state behind the scenes.

Theres an example in storm-starter that you might find useful - 
https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/StatefulTopology.java<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_examples_storm-2Dstarter_src_jvm_org_apache_storm_starter_StatefulTopology.java=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=TLi3IYjWB8QoSVTxXNx7O2mJk5kuXb5w1SbFUj47OVQ=>

You can also find the more elaborate documentation here - 
https://github.com/apache/storm/blob/master/docs/State-checkpointing.md<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_storm_blob_master_docs_State-2Dcheckpointing.md=DwMFaQ=j-EkbjBYwkAB4f8ZbVn1Fw=3V6DSqhjAEmq5iy51r9vVgFw9iAHiTSNsZl3DKb4ONM=afgts--lg7Jf3oTEhOyGvkwmkT8RVx1LedYRwfuTwLg=dV6qlBomTiYIN23BV3fzJl7nhJBd9ewoFsDi3HkxD6I=>

Thanks,
Arun

From: "Wijekoon, Manusha" 
<manusha.wijek...@citi.com<mailto:manusha.wijek...@citi.com>>
Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Date: Monday, July 24, 2017 at 4:04 PM
To: "user@storm.apache.org<mailto:user@storm.apache.org>" 
<user@storm.apache.org<mailto:user@storm.apache.org>>
Subject: is stateful bolts production ready?

Hello

I am thinking of using stateful bolts to manage state of a bolt. From the 
documentation it is not clear how to save the bolt state however. I understand 
it has to be done when we process the checkpoint tuple, but how? Do I just need 
to update the state object and storm pick it up during three phase commit? How 
does Strom know which state object to pick for check pointing?

I wasn’t able to fine more complete examples either, specifically when we can’t 
keep the state in a kev/value map.

Also, Is this functionality tested in production like environments before?


Thanks
M


is stateful bolts production ready?

2017-07-24 Thread Wijekoon, Manusha
Hello

I am thinking of using stateful bolts to manage state of a bolt. From the 
documentation it is not clear how to save the bolt state however. I understand 
it has to be done when we process the checkpoint tuple, but how? Do I just need 
to update the state object and storm pick it up during three phase commit? How 
does Strom know which state object to pick for check pointing?

I wasn't able to fine more complete examples either, specifically when we can't 
keep the state in a kev/value map.

Also, Is this functionality tested in production like environments before?


Thanks
M