Re: How to think of batches vs partitions

Nathan Marz Thu, 17 Apr 2014 13:06:06 -0700

You just implement IBackingMap, and then wrap it in TransactionalMap or
OpaqueTransactionalMap



On Thu, Apr 17, 2014 at 11:23 AM, Raphael Hsieh <[email protected]>wrote:

> oh ok,
> So if I want the guarantee of single message processing when using an
> external datastore, I need to implement the methods described 
> here<https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts>
>  (
> https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts)
> myself?
>
>
>
> On Thu, Apr 17, 2014 at 11:00 AM, Nathan Marz <[email protected]>wrote:
>
>> aggregate / partitionAggregate are only aggregations within the current
>> batch, the persistent equivalents are aggregations across all batches.
>>
>> The logic for querying states, updating them, and keeping track of batch
>> ids happens in the states themselves. For example, look at the multiUpdate
>> method in TransactionalMap:
>> https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/TransactionalMap.java
>>
>> Things are structured so that TransactionalMap delegates to an
>> "IBackingMap" which handles the actual persistence. IBackingMap just has
>> multiGet and multiPut methods. An implementation for a database (like
>> Cassandra, Riak, HBase, etc.) just has to implement IBackingMap.
>>
>>
>> On Thu, Apr 17, 2014 at 10:15 AM, Raphael Hsieh <[email protected]>wrote:
>>
>>> I guess I'm just confused as to when "multiGet" and "multiPut" are
>>> called when using an implementation of the IBackingMap
>>>
>>>
>>> On Thu, Apr 17, 2014 at 8:33 AM, Raphael Hsieh <[email protected]>wrote:
>>>
>>>> So from my understanding, this is how the different spout types
>>>> guarantee single message processing. For example, an opaque transactional
>>>> spout will look at transaction id's in order to guarantee in order batch
>>>> processing, making sure that the txid's are processed in order, and using
>>>> the previous and current values to fix any mixups.
>>>>
>>>> When doing an aggregation does it aggregation across all batches ? If
>>>> so, how does this happen ? Will it query the datastore for the current
>>>> value, then add the current aggregate value to the stored value in order to
>>>> create the global aggregate ? Where does this logic happen ? I can't seem
>>>> to find where this happens in the persistentAggregate or even
>>>> partitionPersist...
>>>>
>>>>
>>>> On Wed, Apr 16, 2014 at 8:30 PM, Nathan Marz <[email protected]>wrote:
>>>>
>>>>> Batches are processed sequentially, but each batch is partitioned (and
>>>>> therefore processed in parallel). As a batch is processed, it can be
>>>>> repartitioned an arbitrary number of times throughout the Trident 
>>>>> topology.
>>>>>
>>>>>
>>>>> On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh 
>>>>> <[email protected]>wrote:
>>>>>
>>>>>> Hi I found myself being confused on how to think of Storm/Trident
>>>>>> processing batches.
>>>>>> Are batches processed sequentially, but split into multiple
>>>>>> partitions that are spread throughout the worker nodes ?
>>>>>>
>>>>>> Or are batches processed in parrallel and spread among worker nodes
>>>>>> to be split into partitions within each host running on multiple threads 
>>>>>> ?
>>>>>>
>>>>>> Thanks!
>>>>>> --
>>>>>> Raphael Hsieh
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Twitter: @nathanmarz
>>>>> http://nathanmarz.com
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Raphael Hsieh
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Raphael Hsieh
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Twitter: @nathanmarz
>> http://nathanmarz.com
>>
>
>
>
> --
> Raphael Hsieh
>
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: How to think of batches vs partitions

Reply via email to