Following is our current architecture...
We have huge data residing in HDFS.. That we do not want to change. With Impala select queries, we are taking that data and loading it in HBase, using Phoenix. Which is then used by data scientists to do analysis using R and Spark. Each data set creates new schemas and tables in hbase, so its fast for data scientists to do analysis... We want to go for Kudu for obvious advantages in this space. Can you tell me where can we fit it? Thanks, Darshan...
