Re: How to think of batches vs partitions

Raphael Hsieh Thu, 17 Apr 2014 08:34:38 -0700

So from my understanding, this is how the different spout types guarantee
single message processing. For example, an opaque transactional spout will
look at transaction id's in order to guarantee in order batch processing,
making sure that the txid's are processed in order, and using the previous
and current values to fix any mixups.

When doing an aggregation does it aggregation across all batches ? If so,
how does this happen ? Will it query the datastore for the current value,
then add the current aggregate value to the stored value in order to create
the global aggregate ? Where does this logic happen ? I can't seem to find
where this happens in the persistentAggregate or even partitionPersist...

On Wed, Apr 16, 2014 at 8:30 PM, Nathan Marz <[email protected]> wrote:

> Batches are processed sequentially, but each batch is partitioned (and
> therefore processed in parallel). As a batch is processed, it can be
> repartitioned an arbitrary number of times throughout the Trident topology.
>
>
> On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh <[email protected]>wrote:
>
>> Hi I found myself being confused on how to think of Storm/Trident
>> processing batches.
>> Are batches processed sequentially, but split into multiple partitions
>> that are spread throughout the worker nodes ?
>>
>> Or are batches processed in parrallel and spread among worker nodes to be
>> split into partitions within each host running on multiple threads ?
>>
>> Thanks!
>> --
>> Raphael Hsieh
>>
>>
>>
>>
>
>
>
> --
> Twitter: @nathanmarz
> http://nathanmarz.com
>

-- 
Raphael Hsieh

Re: How to think of batches vs partitions

Reply via email to