Yes batchId + partitionIndex consistently represents the same data as long
as:

1. Any repartitioning you do is deterministic (so partitionBy is, but
shuffle is not)
2. You're using a spout that replays the exact same batch each time (which
is true of transactional spouts but not of opaque transactional spouts)


On Sun, Feb 16, 2014 at 5:23 AM, Brian O'Neill <[email protected]>wrote:

> I don't see an answer to the final question in this thread:
> https://groups.google.com/forum/#!topic/storm-user/m86grqSXjtQ
>
> We have a similar use case and require consistent partitioning such that a
> batch partition always contains the same data.
>
> Like David, I want to double check that the partitioning is consistent
> across replays, even in the event of host failures, etc.
>
> Does a batchId + partitionIndex, consistently represent the same data?
> Does TransactionalTridentKafkaSpout make such a guarantee?
>
> -brian
>
> --
> Brian ONeill
> CTO, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://brianoneill.blogspot.com/
> twitter: @boneill42
>



-- 
Twitter: @nathanmarz
http://nathanmarz.com

Reply via email to