This proposal looks good, should improve performance and GC issues during
dataload. Please create an issue in Jira. We can create unsafe functions in
common module (just like spark) to allow them to be used across
modules/components, also can check if can reuse any from spark unsafe.
On Sun, Nov 2
Hi All,
In the current carbondata system loading performance is not so encouraging
since we need to sort the data at executor level for data loading.
Carbondata collects batch of data and sorts before dumping to the temporary
files and finally it does merge sort from those temporary files to finis