We operate a solution that stores large amounts of data in HBASE that needs to be available for online access.
For efficient scanning, there are three pieces of data encoded in row keys (in particular a time dimension) and for other reasons some columns hold JSON encoded data. Currently, analytics data is created in two ways: a) a non-trivial M/R job that computes pre-aggregated data sets and offloads them into an analytical data base for interactive reporting b) other M/R jobs that create specialize reports (heuristics) that cannot be computed from pre-aggregated data In particular for b) but possibly also for variations of a) I would like to find more "user friendly" ways than Java implemented M/R jobs - at least for some cases. So this is not about interactive querying of data directly from HBase tables. It is rather about pre-processing HBase stored (large) data sets into either input to interactive query engines (some other DB, Phoenix,...) or into some other specialized format. I spent some time with HIVE but found that the HBase integration simply doesn't cut it (parsing a row key, mapping JSON column content). I know there is some more out there, but before spending an eternity trying out various methods, I am shamelessly trying to benefit from your expertise by asking for some good pointers. Thanks, Henning
