Re: In-memory cache in Drill

Rahul Raj Wed, 10 May 2017 09:57:57 -0700

The documentation says a temporary table does not outlive it's session.
What happens when drill connections are wrapped in a connection pool?
Should we drop them after each query in this case?


Regards,
Rahul

On May 10, 2017 10:15 PM, "Michael Shtelma" <[email protected]> wrote:

> yes, for sure this is also the viable approach... but it would be far
> better to be able to have the data also in memory..
> Does it make sense to have something like an in-memory storage plugin?
> In this case it can be also used as a storage for the temporary
> tables.
> Sincerely,
> Michael Shtelma
>
>
> On Wed, May 10, 2017 at 6:30 PM, Kunal Khatua <[email protected]> wrote:
> > Drill does not cache data in memory because it introduces the risk of
> dealing with stale data when working with data at a large scale.
> >
> >
> > If you want to avoid hitting the actual storage repeatedly, one option
> is to use the 'create temp table ' feature (https://drill.apache.org/
> docs/create-temporary-table-as-cttas/). This allows you to land the data
> to a local (or distributed) F, and use that data storage instead. These
> tables are alive only for the lifetime of the session (connection your
> client/SQLLine) makes to the Drill cluster.
> >
> >
> > There is a second benefit of using this approach. You can translate the
> original data source into a format that is highly suitable to what you are
> doing with the data. For e.g., you could pull in data from an RDBMS or a
> JSON store and write the temp table in parquet for performing analytics on.
> >
> >
> > ~ Kunal
> >
> > ________________________________
> > From: Michael Shtelma <[email protected]>
> > Sent: Wednesday, May 10, 2017 9:16:30 AM
> > To: [email protected]
> > Subject: In-memory cache in Drill
> >
> > Hi all,
> >
> > Are there any way to cache the data that was loaded from the actual
> > storage plugin in Drill?
> > As far as I understand, when the query is executed, the data is first
> > loaded from the storage plugin and handled by the format plugin. After
> > that, the data is stored using internal vectorized representation and
> > the query is executed. Is it correct? I am wondering, if there is a
> > way to store somewhere these data vectors, so that they do not have to
> > be loaded from the actual storage for each query? Spark does something
> > like that, by storing data frames  in off heap storage.
> >
> > Regards,
> > Michael
>

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

Re: In-memory cache in Drill

Reply via email to