I can't remember if I asked this question before, but.... We're using Cassandra as our transactional system, and building up quite a library of map/reduce jobs that perform data quality analysis, statistics, etc. (> 100 jobs now)
But... we are still struggling to provide an "ad-hoc" query mechanism for our users. To fill that gap, I believe we still need to materialize our data in an RDBMS. Anyone have any ideas? Better ways to support ad-hoc queries? Effectively, our users want to be able to select count(distinct Y) from X group by Z. Where Y and Z are arbitrary columns of rows in X. We believe we can create column families with different key structures (using Y an Z as row keys), but some column names we don't know / can't predict ahead of time. Are people doing bulk exports? Anyone trying to keep an RDBMS in synch in real-time? -brian -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/