[ https://issues.apache.org/jira/browse/CARBONDATA-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacky Li resolved CARBONDATA-458. --------------------------------- Resolution: Fixed Fix Version/s: 0.3.0-incubating > Improving carbon first time query performance > ---------------------------------------------- > > Key: CARBONDATA-458 > URL: https://issues.apache.org/jira/browse/CARBONDATA-458 > Project: CarbonData > Issue Type: Improvement > Components: core, data-load, data-query > Reporter: kumar vishal > Assignee: kumar vishal > Fix For: 0.3.0-incubating > > Time Spent: 4.5h > Remaining Estimate: 0h > > Improving carbon first time query performance > Reason: > 1. As file system cache is cleared file reading will make it slower to read > and cache > 2. In first time query carbon will have to read the footer from file data > file to form the btree > 3. Carbon reading more footer data than its required(data chunk) > 4. There are lots of random seek is happening in carbon as column data(data > page, rle, inverted index) are not stored together. > Solution: > 1. Improve block loading time. This can be done by removing data chunk from > blockletInfo and storing only offset and length of data chunk > 2. compress presence meta bitset stored for null values for measure column > using snappy > 3. Store the metadata and data of a column together and read together this > reduces random seek and improve IO -- This message was sent by Atlassian JIRA (v6.3.4#6332)