Yes batchId + partitionIndex consistently represents the same data as long as:
1. Any repartitioning you do is deterministic (so partitionBy is, but shuffle is not) 2. You're using a spout that replays the exact same batch each time (which is true of transactional spouts but not of opaque transactional spouts) On Sun, Feb 16, 2014 at 5:23 AM, Brian O'Neill <[email protected]>wrote: > I don't see an answer to the final question in this thread: > https://groups.google.com/forum/#!topic/storm-user/m86grqSXjtQ > > We have a similar use case and require consistent partitioning such that a > batch partition always contains the same data. > > Like David, I want to double check that the partitioning is consistent > across replays, even in the event of host failures, etc. > > Does a batchId + partitionIndex, consistently represent the same data? > Does TransactionalTridentKafkaSpout make such a guarantee? > > -brian > > -- > Brian ONeill > CTO, Health Market Science (http://healthmarketscience.com) > mobile:215.588.6024 > blog: http://brianoneill.blogspot.com/ > twitter: @boneill42 > -- Twitter: @nathanmarz http://nathanmarz.com
