Re: In-memory cache in Drill

Kunal Khatua Wed, 10 May 2017 09:30:55 -0700

Drill does not cache data in memory because it introduces the risk of dealing 
with stale data when working with data at a large scale.



If you want to avoid hitting the actual storage repeatedly, one option is to 
use the 'create temp table ' feature 
(https://drill.apache.org/docs/create-temporary-table-as-cttas/). This allows 
you to land the data to a local (or distributed) F, and use that data storage 
instead. These tables are alive only for the lifetime of the session 
(connection your client/SQLLine) makes to the Drill cluster.


There is a second benefit of using this approach. You can translate the 
original data source into a format that is highly suitable to what you are 
doing with the data. For e.g., you could pull in data from an RDBMS or a JSON 
store and write the temp table in parquet for performing analytics on.


~ Kunal

________________________________
From: Michael Shtelma <mshte...@gmail.com>
Sent: Wednesday, May 10, 2017 9:16:30 AM
To: user@drill.apache.org
Subject: In-memory cache in Drill

Hi all,

Are there any way to cache the data that was loaded from the actual
storage plugin in Drill?
As far as I understand, when the query is executed, the data is first
loaded from the storage plugin and handled by the format plugin. After
that, the data is stored using internal vectorized representation and
the query is executed. Is it correct? I am wondering, if there is a
way to store somewhere these data vectors, so that they do not have to
be loaded from the actual storage for each query? Spark does something
like that, by storing data frames  in off heap storage.

Regards,
Michael

Re: In-memory cache in Drill

Reply via email to