We have a Hbase table. Each time we aggreate the table based on some columns, we are doing full scan for entire table. What are the ideas for extracting just the delta or increments frokm the last loading .
Right now i m following this approach. But want some better ideas. - Mount the hbase into Hive table -The rowkey of hbase table is mapped to key column in hive table. - extracting the timestamp from rowkey and extracting yesterday's data. - also there is a timestamp column ( non key) . I am extracting previous days's data and aggregating it - Then merging the incremental aggregated data into target aggregate table using full outer join . Questions 1) any better sugestions for incremental loading 2) if the use of key column from Hive , give any perfromance benefit. I dont see much change in terms of timing.