Re: How to think of batches vs partitions

Nathan Marz Thu, 17 Apr 2014 11:01:21 -0700

aggregate / partitionAggregate are only aggregations within the current
batch, the persistent equivalents are aggregations across all batches.


The logic for querying states, updating them, and keeping track of batch
ids happens in the states themselves. For example, look at the multiUpdate
method in TransactionalMap:
https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/TransactionalMap.java

Things are structured so that TransactionalMap delegates to an
"IBackingMap" which handles the actual persistence. IBackingMap just has
multiGet and multiPut methods. An implementation for a database (like
Cassandra, Riak, HBase, etc.) just has to implement IBackingMap.


On Thu, Apr 17, 2014 at 10:15 AM, Raphael Hsieh <[email protected]>wrote:

> I guess I'm just confused as to when "multiGet" and "multiPut" are called
> when using an implementation of the IBackingMap
>
>
> On Thu, Apr 17, 2014 at 8:33 AM, Raphael Hsieh <[email protected]>wrote:
>
>> So from my understanding, this is how the different spout types guarantee
>> single message processing. For example, an opaque transactional spout will
>> look at transaction id's in order to guarantee in order batch processing,
>> making sure that the txid's are processed in order, and using the previous
>> and current values to fix any mixups.
>>
>> When doing an aggregation does it aggregation across all batches ? If so,
>> how does this happen ? Will it query the datastore for the current value,
>> then add the current aggregate value to the stored value in order to create
>> the global aggregate ? Where does this logic happen ? I can't seem to find
>> where this happens in the persistentAggregate or even partitionPersist...
>>
>>
>> On Wed, Apr 16, 2014 at 8:30 PM, Nathan Marz <[email protected]>wrote:
>>
>>> Batches are processed sequentially, but each batch is partitioned (and
>>> therefore processed in parallel). As a batch is processed, it can be
>>> repartitioned an arbitrary number of times throughout the Trident topology.
>>>
>>>
>>> On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh <[email protected]>wrote:
>>>
>>>> Hi I found myself being confused on how to think of Storm/Trident
>>>> processing batches.
>>>> Are batches processed sequentially, but split into multiple partitions
>>>> that are spread throughout the worker nodes ?
>>>>
>>>> Or are batches processed in parrallel and spread among worker nodes to
>>>> be split into partitions within each host running on multiple threads ?
>>>>
>>>> Thanks!
>>>> --
>>>> Raphael Hsieh
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Twitter: @nathanmarz
>>> http://nathanmarz.com
>>>
>>
>>
>>
>> --
>> Raphael Hsieh
>>
>>
>>
>
>
>
> --
> Raphael Hsieh
> Amazon.com
> Software Development Engineer I
> (978) 764-9014
>
>
>
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Re: How to think of batches vs partitions

Reply via email to