Hiļ¼all Currently we are using Phoenix to store and query large datasets of KPI for our projects. Noting that we definitely need to do full table scan of phoneix KPI tables for data statistics and summary collection, e.g. from five minutes data table to summary hour based data table, and to day based and week based data tables, and so on. The approaches now we used currently are as follows: 1. using Phoenix upsert into ... select ... grammer , however, the query performance would not satisfy our expectation. 2. using Apache Spark with the phoenix_mr integration to read data from phoenix tables and create rdd, then we can transform these rdds to summary rdd, and bulkload to new Phoenix data table. This approach can satisfy most of our application requirements, but in some cases we cannot complete the full scan job.
Here are my questions: 1. Is there any more efficient approaches for improving performance of Phoenix full table scan of large data sets? Any kindly share are greately appropriated. 2. Noting that full table scan is not quite appropriate for hbase tables, is there any alternative options for doing such work under current hdfs and hbase environments? Please kindly share any good points. Best regards, Sun. CertusNet
