Akash R Nilugal created YARN-9753:
-------------------------------------

             Summary: Cache Pre-Priming
                 Key: YARN-9753
                 URL: https://issues.apache.org/jira/browse/YARN-9753
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Akash R Nilugal


Currently, we have an index server which basically helps in distributed caching 
of the datamaps in a separate spark application.

The caching of the datamaps in index server will start once the query is fired 
on the table for the first time, all the datamaps will be loaded

if the count(*) is fired and only required will be loaded for any filter query.



Here the problem or the bottleneck is, until and unless the query is fired on 
table, the caching won’t be done for the table datamaps.

So consider a scenario where we are just loading the data to table for whole 
day and then next day we query,

so all the segments will start loading into cache. So first time the query will 
be slow.



What if we load the datamaps into cache or preprime the cache without waititng 
for any query on the table?

Yes, what if we load the cache after every load is done, what if we load the 
cache for all the segments at once,

so that first time query need not do all this job, which makes it faster.



Here i have attached the design document for the pre-priming of cache into 
index server. Please have a look at it



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to