Re: IEP-22: Direct Data Load proposal

2018-08-16 Thread Vladimir Ozerov
Dima,

By "out of question" I meant that 3rd party persistence should work out of
the box when IEP-22 is ready. No changes should be required there.

As far as persistence vs memory, most probably yes, there might be some
differences. Specifically, when data load starts and persistence is
enabled, we will bypass free lists and write data to new blocks. This way,
overall data will need more pages than when loaded in normal mode. This is
a kind of trade-off you face when loading speed is important (at the very
least Oracle works this way, most probably other vendors does the same).
But this approach may be not applicable for in-memory mode, where total
number of pages is limited, and we do not want to hit page eviction.

To summarize - some optimizations which are applicable for persistent mode
will not be applicable for in-memory.

Vladimir.

On Thu, Aug 16, 2018 at 11:41 AM Dmitriy Setrakyan 
wrote:

> On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov 
> wrote:
>
> > Hi Denis,
> >
> > This IEP is mostly about how we work with our own indexes and pages. So
> 3rd
> > party DB is out of question.
> >
>
> Why? I think 3rd party DB will be supported automatically with CacheStore.
> However, do we need to do something different for memory-only vs.
> memory+disk?
>
> D.
>


Re: IEP-22: Direct Data Load proposal

2018-08-16 Thread Dmitriy Setrakyan
On Thu, Aug 16, 2018 at 1:24 AM, Vladimir Ozerov 
wrote:

> Hi Denis,
>
> This IEP is mostly about how we work with our own indexes and pages. So 3rd
> party DB is out of question.
>

Why? I think 3rd party DB will be supported automatically with CacheStore.
However, do we need to do something different for memory-only vs.
memory+disk?

D.


Re: IEP-22: Direct Data Load proposal

2018-08-16 Thread Vladimir Ozerov
Hi Denis,

This IEP is mostly about how we work with our own indexes and pages. So 3rd
party DB is out of question.

On Thu, Jun 21, 2018 at 10:38 PM Denis Magda  wrote:

> Vladimir,
>
> As I see from the IEP, this data loading technique is supposed to be used
> for deployments with Ignite persistence enabled. Is it possible to
> generalize this solution and use for pure in-memory and in-memory + 3rd
> party DB scenarios?
>
> --
> Denis
>
> On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov 
> wrote:
>
> > Igniters,
> >
> > Initial data load is one of the most important use cases for our product.
> > This is one the first things user try to do with Ignite. And if it takes
> > too much time, it is very likely that user will look for other solutions.
> >
> > We did good progress in this area recently. Specifically - a set of
> > internal improvements on our indexes, steaming mode for JDBC driver, COPY
> > command. But our internals are still not very efficient - every single
> > update goes through the whole set of Ignite components, such as page
> cache,
> > free-lists, BTrees, etc..
> >
> > I created IEP-22 [1]. It's goal is to implement special direct data load
> > mode which will bypass our page cache and use alternative algorithm for
> > index updates. Together with COPY command and streaming this improvement
> > will allow Ignite to load data with very high speed.
> >
> > Please review the IEP and share your comments.
> >
> > Vladimir.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
> >
>


Re: IEP-22: Direct Data Load proposal

2018-06-21 Thread Denis Magda
Vladimir,

As I see from the IEP, this data loading technique is supposed to be used
for deployments with Ignite persistence enabled. Is it possible to
generalize this solution and use for pure in-memory and in-memory + 3rd
party DB scenarios?

--
Denis

On Wed, Jun 20, 2018 at 8:08 AM Vladimir Ozerov 
wrote:

> Igniters,
>
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
>
> We did good progress in this area recently. Specifically - a set of
> internal improvements on our indexes, steaming mode for JDBC driver, COPY
> command. But our internals are still not very efficient - every single
> update goes through the whole set of Ignite components, such as page cache,
> free-lists, BTrees, etc..
>
> I created IEP-22 [1]. It's goal is to implement special direct data load
> mode which will bypass our page cache and use alternative algorithm for
> index updates. Together with COPY command and streaming this improvement
> will allow Ignite to load data with very high speed.
>
> Please review the IEP and share your comments.
>
> Vladimir.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load
>


Re: IEP-22: Direct Data Load proposal

2018-06-21 Thread Vladimir Ozerov
Hi Nikolay,

I do not see any problems with TDE for now.

On Wed, Jun 20, 2018 at 6:16 PM, Nikolay Izhikov 
wrote:

> Hello, Vladimir.
>
> Does this IEP fit with IEP-18: TDE?
>
> Do we allow to user to load data into encrypted cache?
>
> В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет:
> > Igniters,
> >
> > Initial data load is one of the most important use cases for our product.
> > This is one the first things user try to do with Ignite. And if it takes
> > too much time, it is very likely that user will look for other solutions.
> >
> > We did good progress in this area recently. Specifically - a set of
> > internal improvements on our indexes, steaming mode for JDBC driver, COPY
> > command. But our internals are still not very efficient - every single
> > update goes through the whole set of Ignite components, such as page
> cache,
> > free-lists, BTrees, etc..
> >
> > I created IEP-22 [1]. It's goal is to implement special direct data load
> > mode which will bypass our page cache and use alternative algorithm for
> > index updates. Together with COPY command and streaming this improvement
> > will allow Ignite to load data with very high speed.
> >
> > Please review the IEP and share your comments.
> >
> > Vladimir.
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-
> 22%3A+Direct+Data+Load
>


Re: IEP-22: Direct Data Load proposal

2018-06-20 Thread Andrey Kuznetsov
Vladimir,

Great IEP, but I couldn't comprehend the beginning of the "Direct Data
Load" paragraph. Maybe, there are some typos?

ср, 20 июн. 2018 г. в 18:08, Vladimir Ozerov :

> Igniters,
>
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
>

Best regards,
  Andrey Kuznetsov.


Re: IEP-22: Direct Data Load proposal

2018-06-20 Thread Nikolay Izhikov
Hello, Vladimir.

Does this IEP fit with IEP-18: TDE?

Do we allow to user to load data into encrypted cache?

В Ср, 20/06/2018 в 18:08 +0300, Vladimir Ozerov пишет:
> Igniters,
> 
> Initial data load is one of the most important use cases for our product.
> This is one the first things user try to do with Ignite. And if it takes
> too much time, it is very likely that user will look for other solutions.
> 
> We did good progress in this area recently. Specifically - a set of
> internal improvements on our indexes, steaming mode for JDBC driver, COPY
> command. But our internals are still not very efficient - every single
> update goes through the whole set of Ignite components, such as page cache,
> free-lists, BTrees, etc..
> 
> I created IEP-22 [1]. It's goal is to implement special direct data load
> mode which will bypass our page cache and use alternative algorithm for
> index updates. Together with COPY command and streaming this improvement
> will allow Ignite to load data with very high speed.
> 
> Please review the IEP and share your comments.
> 
> Vladimir.
> 
> [1]
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load

signature.asc
Description: This is a digitally signed message part


IEP-22: Direct Data Load proposal

2018-06-20 Thread Vladimir Ozerov
Igniters,

Initial data load is one of the most important use cases for our product.
This is one the first things user try to do with Ignite. And if it takes
too much time, it is very likely that user will look for other solutions.

We did good progress in this area recently. Specifically - a set of
internal improvements on our indexes, steaming mode for JDBC driver, COPY
command. But our internals are still not very efficient - every single
update goes through the whole set of Ignite components, such as page cache,
free-lists, BTrees, etc..

I created IEP-22 [1]. It's goal is to implement special direct data load
mode which will bypass our page cache and use alternative algorithm for
index updates. Together with COPY command and streaming this improvement
will allow Ignite to load data with very high speed.

Please review the IEP and share your comments.

Vladimir.

[1]
https://cwiki.apache.org/confluence/display/IGNITE/IEP-22%3A+Direct+Data+Load