Hi Yaojinguo, The issue is we currently load all the index info into driver memory which causes a large memory footprint irrespective of query type(filter or full scan). This can be avoided by loading only required segment's index information for filter queries. We could achieve it by creating a datamap containing segment level min/max information.Instead of loading all the datamaps till blocklet level, we can load only the segment level min/max at startup and load the next level datamaps based on the query. This approach combined with LRU should be able to limit the memory consumption at driver side.
The datamap containing segment level min/max needs to be implemented and is not currently supported in carbondata. Regards Raghu On Wed, Apr 11, 2018 at 1:25 PM, yaojinguo <xianhou...@163.com> wrote: > Hi community , > I am using CarbonData1.3 + Spark2.1, I find a potential bottleneck when > using Carbondata. As > I know, CarbonData loads all of the carbonindex files and turn these files > to DataMap or SegmentIndex (for early version)which contains startkey > ,endkey,min/max value of each column. If I have one table with 200 columns > which contains 1000 segments, each segment has 2000 carbondata files, > assume > each column occupies just 10 bytes, then you need at least 20GB memory to > store min/max values only. Any suggestion to resolve this problem? > > > > > -- > Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556. > n5.nabble.com/ >