oh ok, So if I want the guarantee of single message processing when using an external datastore, I need to implement the methods described here<https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts> ( https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts) myself?
On Thu, Apr 17, 2014 at 11:00 AM, Nathan Marz <[email protected]> wrote: > aggregate / partitionAggregate are only aggregations within the current > batch, the persistent equivalents are aggregations across all batches. > > The logic for querying states, updating them, and keeping track of batch > ids happens in the states themselves. For example, look at the multiUpdate > method in TransactionalMap: > https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/TransactionalMap.java > > Things are structured so that TransactionalMap delegates to an > "IBackingMap" which handles the actual persistence. IBackingMap just has > multiGet and multiPut methods. An implementation for a database (like > Cassandra, Riak, HBase, etc.) just has to implement IBackingMap. > > > On Thu, Apr 17, 2014 at 10:15 AM, Raphael Hsieh <[email protected]>wrote: > >> I guess I'm just confused as to when "multiGet" and "multiPut" are called >> when using an implementation of the IBackingMap >> >> >> On Thu, Apr 17, 2014 at 8:33 AM, Raphael Hsieh <[email protected]>wrote: >> >>> So from my understanding, this is how the different spout types >>> guarantee single message processing. For example, an opaque transactional >>> spout will look at transaction id's in order to guarantee in order batch >>> processing, making sure that the txid's are processed in order, and using >>> the previous and current values to fix any mixups. >>> >>> When doing an aggregation does it aggregation across all batches ? If >>> so, how does this happen ? Will it query the datastore for the current >>> value, then add the current aggregate value to the stored value in order to >>> create the global aggregate ? Where does this logic happen ? I can't seem >>> to find where this happens in the persistentAggregate or even >>> partitionPersist... >>> >>> >>> On Wed, Apr 16, 2014 at 8:30 PM, Nathan Marz <[email protected]>wrote: >>> >>>> Batches are processed sequentially, but each batch is partitioned (and >>>> therefore processed in parallel). As a batch is processed, it can be >>>> repartitioned an arbitrary number of times throughout the Trident topology. >>>> >>>> >>>> On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh <[email protected]>wrote: >>>> >>>>> Hi I found myself being confused on how to think of Storm/Trident >>>>> processing batches. >>>>> Are batches processed sequentially, but split into multiple partitions >>>>> that are spread throughout the worker nodes ? >>>>> >>>>> Or are batches processed in parrallel and spread among worker nodes to >>>>> be split into partitions within each host running on multiple threads ? >>>>> >>>>> Thanks! >>>>> -- >>>>> Raphael Hsieh >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Twitter: @nathanmarz >>>> http://nathanmarz.com >>>> >>> >>> >>> >>> -- >>> Raphael Hsieh >>> >>> >>> >> >> >> >> -- >> Raphael Hsieh >> >> >> >> > > > > -- > Twitter: @nathanmarz > http://nathanmarz.com > -- Raphael Hsieh
