You just implement IBackingMap, and then wrap it in TransactionalMap or OpaqueTransactionalMap
On Thu, Apr 17, 2014 at 11:23 AM, Raphael Hsieh <[email protected]>wrote: > oh ok, > So if I want the guarantee of single message processing when using an > external datastore, I need to implement the methods described > here<https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts> > ( > https://github.com/nathanmarz/storm/wiki/Trident-state#opaque-transactional-spouts) > myself? > > > > On Thu, Apr 17, 2014 at 11:00 AM, Nathan Marz <[email protected]>wrote: > >> aggregate / partitionAggregate are only aggregations within the current >> batch, the persistent equivalents are aggregations across all batches. >> >> The logic for querying states, updating them, and keeping track of batch >> ids happens in the states themselves. For example, look at the multiUpdate >> method in TransactionalMap: >> https://github.com/apache/incubator-storm/blob/master/storm-core/src/jvm/storm/trident/state/map/TransactionalMap.java >> >> Things are structured so that TransactionalMap delegates to an >> "IBackingMap" which handles the actual persistence. IBackingMap just has >> multiGet and multiPut methods. An implementation for a database (like >> Cassandra, Riak, HBase, etc.) just has to implement IBackingMap. >> >> >> On Thu, Apr 17, 2014 at 10:15 AM, Raphael Hsieh <[email protected]>wrote: >> >>> I guess I'm just confused as to when "multiGet" and "multiPut" are >>> called when using an implementation of the IBackingMap >>> >>> >>> On Thu, Apr 17, 2014 at 8:33 AM, Raphael Hsieh <[email protected]>wrote: >>> >>>> So from my understanding, this is how the different spout types >>>> guarantee single message processing. For example, an opaque transactional >>>> spout will look at transaction id's in order to guarantee in order batch >>>> processing, making sure that the txid's are processed in order, and using >>>> the previous and current values to fix any mixups. >>>> >>>> When doing an aggregation does it aggregation across all batches ? If >>>> so, how does this happen ? Will it query the datastore for the current >>>> value, then add the current aggregate value to the stored value in order to >>>> create the global aggregate ? Where does this logic happen ? I can't seem >>>> to find where this happens in the persistentAggregate or even >>>> partitionPersist... >>>> >>>> >>>> On Wed, Apr 16, 2014 at 8:30 PM, Nathan Marz <[email protected]>wrote: >>>> >>>>> Batches are processed sequentially, but each batch is partitioned (and >>>>> therefore processed in parallel). As a batch is processed, it can be >>>>> repartitioned an arbitrary number of times throughout the Trident >>>>> topology. >>>>> >>>>> >>>>> On Wed, Apr 16, 2014 at 4:28 PM, Raphael Hsieh >>>>> <[email protected]>wrote: >>>>> >>>>>> Hi I found myself being confused on how to think of Storm/Trident >>>>>> processing batches. >>>>>> Are batches processed sequentially, but split into multiple >>>>>> partitions that are spread throughout the worker nodes ? >>>>>> >>>>>> Or are batches processed in parrallel and spread among worker nodes >>>>>> to be split into partitions within each host running on multiple threads >>>>>> ? >>>>>> >>>>>> Thanks! >>>>>> -- >>>>>> Raphael Hsieh >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Twitter: @nathanmarz >>>>> http://nathanmarz.com >>>>> >>>> >>>> >>>> >>>> -- >>>> Raphael Hsieh >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Raphael Hsieh >>> >>> >>> >>> >> >> >> >> -- >> Twitter: @nathanmarz >> http://nathanmarz.com >> > > > > -- > Raphael Hsieh > > > > -- Twitter: @nathanmarz http://nathanmarz.com
