In our software we need to combine fast interactive access to the data with
quite complex data processing. I know that Phoenix intended for fast access,
but hoped that also I could be able to use Phoenix as a source for complex
processing with the Spark. Unfortunately, Phoenix + Spark shows very poor
performance. E.g., querying big (about billion records) table with distinct
takes about 2 hours. At the same time this task with Hive source takes a few
minutes. Is it expected? Does it mean that Phoenix is absolutely not suitable
for batch processing with spark and I should duplicate data to Hive and
process it with Hive?