Thanks :) Duly noted - this is all uncharted territory for us, hence the value of seasoned advice.
Best -- Hugo José Pinto No dia 03/01/2015, às 23:43, Peter Lin <wool...@gmail.com> escreveu: > > listen to colin's advice, avoid the temptation of anti-patterns. > >> On Sat, Jan 3, 2015 at 6:10 PM, Colin <colpcl...@gmail.com> wrote: >> Use a message bus with a transactional get, get the message, send to >> cassandra, upon write success, submit to esp, commit get on bus. Messaging >> systems like rabbitmq support this semantic. >> >> Using cassandra as a queuing mechanism is an anti-pattern. >> >> -- >> Colin Clark >> +1-320-221-9531 >> >> >>> On Jan 3, 2015, at 6:07 PM, Hugo José Pinto <hugo.pi...@inovaworks.com> >>> wrote: >>> >>> Thank you all for your answers. >>> >>> It seems I'll have to go with some event-driven processing before/during >>> the Cassandra write path. >>> >>> My concern would be that I'd love to first guarantee the disk write of the >>> Cassandra persistence and then do the event processing (which is mostly >>> CRUD intercepts at this point), even if slightly delayed, and doing so via >>> triggers would probably bog down the whole processing pipeline. >>> >>> What I'd probably do is to write, in trigger, a separate key table with all >>> the CRUDed elements and to have the ESP process that table. >>> >>> Thank you for your contribution. Should anyone else have any experiende >>> experience in these scenarios I'm obviously all ears as well. >>> >>> Best, >>> >>> Hugo >>> >>> Enviado do meu iPhone >>> >>> No dia 03/01/2015, às 11:09, DuyHai Doan <doanduy...@gmail.com> escreveu: >>> >>>> Hello Hugo >>>> >>>> I was facing the same kind of requirement from some users. Long story >>>> short, below are the possible strategies with advantages and draw-backs of >>>> each >>>> >>>> 1) Put Spark in front of the back-end, every incoming >>>> modification/update/insert goes into Spark first, then Spark will forward >>>> it to Cassandra for persistence. With Spark, you can perform pre or >>>> post-processing and notify external clients of mutation. >>>> >>>> The draw back of this solution is that all the incoming mutations must go >>>> through Spark. You may set up a Kafka queue as temporary storage to >>>> distribute the load and consume mutations with Spark but it add ups to the >>>> architecture complexity with additional components & technologies >>>> >>>> 2) For high availability and resilience, you probably want to have all >>>> mutations saved first into Cassandra then process notifications with >>>> Spark. In this case the only way to have notifications from Cassandra, as >>>> of version 2.1, is to rely on manually coded triggers (which is still >>>> experimental feature). >>>> >>>> With the triggers you can notify whatever clients you want, not only Spark. >>>> >>>> The big draw back of this solution is that playing with triggers is >>>> dangerous if you are not familiar with Cassandra internals. Indeed the >>>> trigger is on the write path and may hurt performance if you are doing >>>> complex and blocking tasks. >>>> >>>> That's the 2 solutions I can see, maybe the ML members will propose other >>>> innovative choices >>>> >>>> Regards >>>> >>>>> On Sat, Jan 3, 2015 at 11:46 AM, Hugo José Pinto >>>>> <hugo.pi...@inovaworks.com> wrote: >>>>> Hello. >>>>> >>>>> We're currently using Hazelcast (http://hazelcast.org/) as a distributed >>>>> in-memory data grid. That's been working sort-of-well for us, but going >>>>> solely in-memory has exhausted its path in our use case, and we're >>>>> considering porting our application to a NoSQL persistent store. After >>>>> the usual comparisons and evaluations, we're borderline close to picking >>>>> Cassandra, plus eventually Spark for analytics. >>>>> >>>>> Nonetheless, there is a gap in our architectural needs that we're still >>>>> not grasping how to solve in Cassandra (with or without Spark): Hazelcast >>>>> allows us to create a Continuous Query in that, whenever a row is >>>>> added/removed/modified from the clause's resultset, Hazelcast calls up >>>>> back with the corresponding notification. We use this to continuously >>>>> update the clients via AJAX streaming with the new/changed rows. >>>>> >>>>> This is probably a conceptual mismatch we're making, so - how to best >>>>> address this use case in Cassandra (with or without Spark's help)? Is >>>>> there something in the API that allows for Continuous Queries on >>>>> key/clause changes (haven't found it)? Is there some other way to get a >>>>> stream of key/clause updates? Events of some sort? >>>>> >>>>> I'm aware that we could, eventually, periodically poll Cassandra, but in >>>>> our use case, the client is potentially interested in a large number of >>>>> table clause notifications (think "all changes to Ship positions on >>>>> California's coastline"), and iterating out of the store would kill the >>>>> streamer's scalability. >>>>> >>>>> Hence, the magic question: what are we missing? Is Cassandra the wrong >>>>> tool for the job? Are we not aware of a particular part of the API or >>>>> external library in/outside the apache realm that would allow for this? >>>>> >>>>> Many thanks for any assistance! >>>>> >>>>> Hugo >>>>> >>>> >