Thanks Ryan.  I was hoping there was a change data capture framework.  We
have late arriving events, some of which can be very late.  We would have
to batch collect data for a large time period every so often to go back and
collect those or accept that we are going to lose a small percentage of
events.  Neither of which is ideal.

On Tue, Aug 9, 2016 at 10:30 AM, Ryan Svihla <r...@foundev.pro> wrote:

> The typical pattern I've seen in the field is kafka + consumers for each
> destination (variant of dual write I know), this of course would not work
> for your goal of relying on C* for dedup. Triggers would also suffer the
> same problem unfortunately so you're really left with a batch job (most
> likely Spark) to move data from C* into HDFS on a given interval. If this
> is really a cold storage use case that can work quite well especially
> assuming you've modeled your data as a time series or with some sort of
> time based bucketing so you can quickly get full partitions data out of C*
> in a deterministic fashion and not have to scan your entire data set.
>
> I've also for similar needs have seen Spark streaming + querying cassandra
> for duplication checks to dedup then output to another source (form of dual
> write but with dedup), this was really silly and slow. I only bring it up
> to save you the trouble in case you end up in the same path chasing for
> something more 'real time'.
>
> Regards,
> Ryan Svihla
>
> On Aug 9, 2016, 11:09 AM -0500, Ben Vogan <b...@shopkick.com>, wrote:
>
> Hi all,
>
> We are investigating using Cassandra in our data platform.  We would like
> data to go into Cassandra first and to eventually be replicated into our
> data lake in HDFS for long term cold storage.  Does anyone know of a good
> way of doing this?  We would rather not have parallel writes to HDFS and
> Cassandra because we were hoping that we could use Cassandra primary keys
> to de-duplicate events.
>
> Thanks,
> --
> <http://shopkick.com/>
> *BENJAMIN VOGAN* | Data Platform Team Lead
> shopkick <http://www.shopkick.com/>
> <http://facebook.com/shopkick> <http://instagram.com/shopkick>
> <http://pinterest.com/shopkick> <http://twitter.com/shopkick>
> <https://www.linkedin.com/company/831240?trk=tyah&trkInfo=clickedVertical%3Acompany%2CentityType%3AentityHistoryName%2CclickedEntityId%3Acompany_831240%2Cidx%3A0>
>
> The indispensable app that rewards you for shopping.
>
>


-- 
<http://shopkick.com/>
*BENJAMIN VOGAN* | Data Platform Team Lead
shopkick <http://www.shopkick.com/>
<http://facebook.com/shopkick> <http://instagram.com/shopkick>
<http://pinterest.com/shopkick> <http://twitter.com/shopkick>
<https://www.linkedin.com/company/831240?trk=tyah&trkInfo=clickedVertical%3Acompany%2CentityType%3AentityHistoryName%2CclickedEntityId%3Acompany_831240%2Cidx%3A0>

The indispensable app that rewards you for shopping.

Reply via email to