Re: [IMPORTANT] Future of Binary Objects

2018-11-23 Thread Vladimir Ozerov
Ok, let's agree on the fact that we would like to make schema change rules
less restrictive. But how less - is separate topic. Use case which annoys
me the most is DROP/ADD COLUMN commands.

On Thu, Nov 22, 2018 at 12:25 PM Sergi Vladykin 
wrote:

> If we are developing a product for users, we already guessing what is right
> and what is wrong for them. So let's avoid these sophistic statements.
>
> In the end it is always our responsibility to provide a balanced set of
> trade-offs between
> usability, performance and safety.
>
> Let me repeat, I'm not against any possible type conversions, but I'm
> strongly against binary incompatible ones.
> If we always store List.of(1) as 1 and make them binary interchangeable,
> I'm OK with that.
>
> And still for good practices I'd suggest to look at what Protobuf allows
> and what not:
> https://developers.google.com/protocol-buffers/docs/proto3#updating
>
> Sergi
>
> чт, 22 нояб. 2018 г. в 11:04, Vladimir Ozerov :
>
> > Sergi,
> >
> > I think we should not guess for users what is right or wrong for them. It
> > is up to user to decide what is valid. For example, consider a user who
> > operates on a list of Integers, and to optimize memory consumption he
> > decide to save in the same field either List, or plain Integer
> in
> > case only single element exists. Another example - a kind of data lake or
> > data cleansing application, which may receive the same field in different
> > forms. E.g. age in the form of Integer or String. Does it work for user
> or
> > not? We do not know. Will he need to migrate the whole data set? We do
> not
> > know either.
> >
> > The only place in the product where we case is SQL. But in this case
> > instead of adding checks on binary level, we should validate data on
> cache
> > level. In fact, Ignite already works this way. E.g. nullability checks
> are
> > performed on cache level rather than binary. All we need is to move all
> > checks to cache level from binary level.
> >
> >
> > On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin  >
> > wrote:
> >
> > > It may be OK to extend compatible field types (like from Int to Long).
> > >
> > > In Protobuf for example this is allowed just because there is no
> > difference
> > > between Int and Long in binary format: they all are equally varlen
> > encoded
> > > and Longs just will occupy up to 9 bytes, while Ints up to 5.
> > >
> > > But for every other case, where binary representation is type
> dependent,
> > I
> > > would be against. This will either require to migrate the whole dataset
> > to
> > > a new model (which is always risky, since you may need to rollback to
> > > previous version of your code) or it will require type
> checks/conversions
> > > for each field access, which is a hard to reason complication and
> > possible
> > > performance penalty.
> > >
> > > Sergi
> > >
> > >
> > >
> > > чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov :
> > >
> > > > Denis,
> > > >
> > > > Several examples:
> > > > 1) DEFAULT values - in SQL you may avoid storing default value in the
> > > table
> > > > and store it in metadata instead. Not applicable for BinaryObject
> > because
> > > > the same binary object may be saved to two SQL tables with different
> > > > defaults
> > > > 2) DATE and other temporal types - in SQL you want to store it in
> > special
> > > > format to be able to extract date parts quickly (typically - 11
> bytes).
> > > But
> > > > in Java and some other languages the best format is plain long. this
> is
> > > why
> > > > we use it BinaryObject
> > > > 3) String charset - in SQL you may choose different charsets for
> > > different
> > > > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we
> store
> > > > everything in UTF-8, and this is fine for most cases, well ... except
> > of
> > > > SQL :-)
> > > >
> > > > The key thing here is that you cannot define a format which will be
> > good
> > > > for both SQL, and native API. They are very different. This is why I
> > > > propose to define additional interface on cache level defining how to
> > > store
> > > > values, which will be very different from binary objects.
> > > >
> > > > Vladimir.
> > > >
> > > > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda 
> wrote:
> > > >
> > > > > Vladimir,
> > > > >
> > > > > Could you educate me a little bit, why the current format is bad
> for
> > > SQL
> > > > > and why another one is more suitable?
> > > > >
> > > > > Also, if we introduce the new format then why would we keep the
> > binary
> > > > one?
> > > > > Is the new format just a next version of the binary one.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > > I do not know why we did that in the first place. This
> restriction
> > > > > prevents
> > > > > > type evolution and confuses users.
> > > > >
> > > > >
> > > > > That is a hot requirement shared by those who use Ignite SQL in
> > > > production.
> > > > > +1.
> > > > >
> > > > > --
> > > > > Denis
> > > > >
> > > > > On 

Re: [IMPORTANT] Future of Binary Objects

2018-11-22 Thread Sergi Vladykin
If we are developing a product for users, we already guessing what is right
and what is wrong for them. So let's avoid these sophistic statements.

In the end it is always our responsibility to provide a balanced set of
trade-offs between
usability, performance and safety.

Let me repeat, I'm not against any possible type conversions, but I'm
strongly against binary incompatible ones.
If we always store List.of(1) as 1 and make them binary interchangeable,
I'm OK with that.

And still for good practices I'd suggest to look at what Protobuf allows
and what not:
https://developers.google.com/protocol-buffers/docs/proto3#updating

Sergi

чт, 22 нояб. 2018 г. в 11:04, Vladimir Ozerov :

> Sergi,
>
> I think we should not guess for users what is right or wrong for them. It
> is up to user to decide what is valid. For example, consider a user who
> operates on a list of Integers, and to optimize memory consumption he
> decide to save in the same field either List, or plain Integer in
> case only single element exists. Another example - a kind of data lake or
> data cleansing application, which may receive the same field in different
> forms. E.g. age in the form of Integer or String. Does it work for user or
> not? We do not know. Will he need to migrate the whole data set? We do not
> know either.
>
> The only place in the product where we case is SQL. But in this case
> instead of adding checks on binary level, we should validate data on cache
> level. In fact, Ignite already works this way. E.g. nullability checks are
> performed on cache level rather than binary. All we need is to move all
> checks to cache level from binary level.
>
>
> On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin 
> wrote:
>
> > It may be OK to extend compatible field types (like from Int to Long).
> >
> > In Protobuf for example this is allowed just because there is no
> difference
> > between Int and Long in binary format: they all are equally varlen
> encoded
> > and Longs just will occupy up to 9 bytes, while Ints up to 5.
> >
> > But for every other case, where binary representation is type dependent,
> I
> > would be against. This will either require to migrate the whole dataset
> to
> > a new model (which is always risky, since you may need to rollback to
> > previous version of your code) or it will require type checks/conversions
> > for each field access, which is a hard to reason complication and
> possible
> > performance penalty.
> >
> > Sergi
> >
> >
> >
> > чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov :
> >
> > > Denis,
> > >
> > > Several examples:
> > > 1) DEFAULT values - in SQL you may avoid storing default value in the
> > table
> > > and store it in metadata instead. Not applicable for BinaryObject
> because
> > > the same binary object may be saved to two SQL tables with different
> > > defaults
> > > 2) DATE and other temporal types - in SQL you want to store it in
> special
> > > format to be able to extract date parts quickly (typically - 11 bytes).
> > But
> > > in Java and some other languages the best format is plain long. this is
> > why
> > > we use it BinaryObject
> > > 3) String charset - in SQL you may choose different charsets for
> > different
> > > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we store
> > > everything in UTF-8, and this is fine for most cases, well ... except
> of
> > > SQL :-)
> > >
> > > The key thing here is that you cannot define a format which will be
> good
> > > for both SQL, and native API. They are very different. This is why I
> > > propose to define additional interface on cache level defining how to
> > store
> > > values, which will be very different from binary objects.
> > >
> > > Vladimir.
> > >
> > > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda  wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Could you educate me a little bit, why the current format is bad for
> > SQL
> > > > and why another one is more suitable?
> > > >
> > > > Also, if we introduce the new format then why would we keep the
> binary
> > > one?
> > > > Is the new format just a next version of the binary one.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > >
> > > >
> > > > That is a hot requirement shared by those who use Ignite SQL in
> > > production.
> > > > +1.
> > > >
> > > > --
> > > > Denis
> > > >
> > > > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of 

Re: [IMPORTANT] Future of Binary Objects

2018-11-22 Thread Vladimir Ozerov
Sergi,

I think we should not guess for users what is right or wrong for them. It
is up to user to decide what is valid. For example, consider a user who
operates on a list of Integers, and to optimize memory consumption he
decide to save in the same field either List, or plain Integer in
case only single element exists. Another example - a kind of data lake or
data cleansing application, which may receive the same field in different
forms. E.g. age in the form of Integer or String. Does it work for user or
not? We do not know. Will he need to migrate the whole data set? We do not
know either.

The only place in the product where we case is SQL. But in this case
instead of adding checks on binary level, we should validate data on cache
level. In fact, Ignite already works this way. E.g. nullability checks are
performed on cache level rather than binary. All we need is to move all
checks to cache level from binary level.


On Thu, Nov 22, 2018 at 9:41 AM Sergi Vladykin 
wrote:

> It may be OK to extend compatible field types (like from Int to Long).
>
> In Protobuf for example this is allowed just because there is no difference
> between Int and Long in binary format: they all are equally varlen encoded
> and Longs just will occupy up to 9 bytes, while Ints up to 5.
>
> But for every other case, where binary representation is type dependent, I
> would be against. This will either require to migrate the whole dataset to
> a new model (which is always risky, since you may need to rollback to
> previous version of your code) or it will require type checks/conversions
> for each field access, which is a hard to reason complication and possible
> performance penalty.
>
> Sergi
>
>
>
> чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov :
>
> > Denis,
> >
> > Several examples:
> > 1) DEFAULT values - in SQL you may avoid storing default value in the
> table
> > and store it in metadata instead. Not applicable for BinaryObject because
> > the same binary object may be saved to two SQL tables with different
> > defaults
> > 2) DATE and other temporal types - in SQL you want to store it in special
> > format to be able to extract date parts quickly (typically - 11 bytes).
> But
> > in Java and some other languages the best format is plain long. this is
> why
> > we use it BinaryObject
> > 3) String charset - in SQL you may choose different charsets for
> different
> > tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we store
> > everything in UTF-8, and this is fine for most cases, well ... except of
> > SQL :-)
> >
> > The key thing here is that you cannot define a format which will be good
> > for both SQL, and native API. They are very different. This is why I
> > propose to define additional interface on cache level defining how to
> store
> > values, which will be very different from binary objects.
> >
> > Vladimir.
> >
> > On Thu, Nov 22, 2018 at 3:32 AM Denis Magda  wrote:
> >
> > > Vladimir,
> > >
> > > Could you educate me a little bit, why the current format is bad for
> SQL
> > > and why another one is more suitable?
> > >
> > > Also, if we introduce the new format then why would we keep the binary
> > one?
> > > Is the new format just a next version of the binary one.
> > >
> > > 2.3) Remove restrictions on changing field type
> > > > I do not know why we did that in the first place. This restriction
> > > prevents
> > > > type evolution and confuses users.
> > >
> > >
> > > That is a hot requirement shared by those who use Ignite SQL in
> > production.
> > > +1.
> > >
> > > --
> > > Denis
> > >
> > > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov  >
> > > wrote:
> > >
> > > > Igniters,
> > > >
> > > > It is very likely that Apache Ignite 3.0 will be released next year.
> So
> > > we
> > > > need to start thinking about major product improvements. I'd like to
> > > start
> > > > with binary objects.
> > > >
> > > > Currently they are one of the main limiting factors for the product.
> > They
> > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > >
> > > > I would like to ask all of you who worked with binary objects to
> share
> > > your
> > > > feedback and ideas, so that we understand how they should look like
> in
> > AI
> > > > 3.0. This is a brain storm - let's accumulate ideas first and
> minimize
> > > > critics. Then we will work on ideas in separate topics.
> > > >
> > > > 1) Historical background
> > > >
> > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > working
> > > > on .NET and CPP clients. During design we had several ideas in mind:
> > > > - ability to read object fields in O(1) without deserialization
> > > > - interoperabillty between Java, .NET and CPP.
> > > >
> > > > Since then a number of other concepts were mixed to the cocktail:
> > > > - Affinity key fields
> > > > - Strict typing for existing fields (aka metadata)
> > > > - 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Sergi Vladykin
It may be OK to extend compatible field types (like from Int to Long).

In Protobuf for example this is allowed just because there is no difference
between Int and Long in binary format: they all are equally varlen encoded
and Longs just will occupy up to 9 bytes, while Ints up to 5.

But for every other case, where binary representation is type dependent, I
would be against. This will either require to migrate the whole dataset to
a new model (which is always risky, since you may need to rollback to
previous version of your code) or it will require type checks/conversions
for each field access, which is a hard to reason complication and possible
performance penalty.

Sergi



чт, 22 нояб. 2018 г. в 09:23, Vladimir Ozerov :

> Denis,
>
> Several examples:
> 1) DEFAULT values - in SQL you may avoid storing default value in the table
> and store it in metadata instead. Not applicable for BinaryObject because
> the same binary object may be saved to two SQL tables with different
> defaults
> 2) DATE and other temporal types - in SQL you want to store it in special
> format to be able to extract date parts quickly (typically - 11 bytes). But
> in Java and some other languages the best format is plain long. this is why
> we use it BinaryObject
> 3) String charset - in SQL you may choose different charsets for different
> tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we store
> everything in UTF-8, and this is fine for most cases, well ... except of
> SQL :-)
>
> The key thing here is that you cannot define a format which will be good
> for both SQL, and native API. They are very different. This is why I
> propose to define additional interface on cache level defining how to store
> values, which will be very different from binary objects.
>
> Vladimir.
>
> On Thu, Nov 22, 2018 at 3:32 AM Denis Magda  wrote:
>
> > Vladimir,
> >
> > Could you educate me a little bit, why the current format is bad for SQL
> > and why another one is more suitable?
> >
> > Also, if we introduce the new format then why would we keep the binary
> one?
> > Is the new format just a next version of the binary one.
> >
> > 2.3) Remove restrictions on changing field type
> > > I do not know why we did that in the first place. This restriction
> > prevents
> > > type evolution and confuses users.
> >
> >
> > That is a hot requirement shared by those who use Ignite SQL in
> production.
> > +1.
> >
> > --
> > Denis
> >
> > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov 
> > wrote:
> >
> > > Igniters,
> > >
> > > It is very likely that Apache Ignite 3.0 will be released next year. So
> > we
> > > need to start thinking about major product improvements. I'd like to
> > start
> > > with binary objects.
> > >
> > > Currently they are one of the main limiting factors for the product.
> They
> > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > >
> > > I would like to ask all of you who worked with binary objects to share
> > your
> > > feedback and ideas, so that we understand how they should look like in
> AI
> > > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > > critics. Then we will work on ideas in separate topics.
> > >
> > > 1) Historical background
> > >
> > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > working
> > > on .NET and CPP clients. During design we had several ideas in mind:
> > > - ability to read object fields in O(1) without deserialization
> > > - interoperabillty between Java, .NET and CPP.
> > >
> > > Since then a number of other concepts were mixed to the cocktail:
> > > - Affinity key fields
> > > - Strict typing for existing fields (aka metadata)
> > > - Binary Object as storage format
> > >
> > > 2) My proposals
> > >
> > > 2.1) Introduce "Data Row Format" interface
> > > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > no
> > > length, no hash code, etc), allow supper-fast field access, support
> > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > types (date, time, timestamp, timestamp with timezone, etc), and store
> > > these types as efficiently as possible.
> > >
> > > What we need is to introduce an interface which will convert a pair of
> > > key-value objects into a row. This row will be used to store data and
> to
> > > get fields from it. Care about memory consumption, need SQL and strict
> > > schema - use one format. Need flexibility and prefer key-value access -
> > use
> > > another format which will store binary objects unchanged (current
> > > behavior).
> > >
> > > interface DataRowFormat {
> > > DataRow create(Object key, Object value); // primitives or binary
> > > objects
> > > DataRowMetadata metadata();
> > > }
> > >
> > > 2.2) Remove affinity field from metadata
> > > 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Val,

If we treat binary object as a plain container of fields with certain names
and types, we do not care how to convert Int to String. This is up to user
to decide how to migrate.
Ignite could help users in some cases. E.g. for SQL caches we may provide
ALTER TABLE command, which will do necessary conversions on storage layer.

On Thu, Nov 22, 2018 at 4:27 AM Valentin Kulichenko <
valentin.kuliche...@gmail.com> wrote:

> We should definitely allow to change type of field/column to another
> compatible type. The fact that we do not allow to change Int to Long is
> pretty insane. However, there are cases when it's much more complicated.
> How are we going to replace Int with a String, for example? I believe this
> should require certain migration procedure anyway. How do other databases
> handle that?
>
> -Val
>
> On Wed, Nov 21, 2018 at 4:32 PM Denis Magda  wrote:
>
> > Vladimir,
> >
> > Could you educate me a little bit, why the current format is bad for SQL
> > and why another one is more suitable?
> >
> > Also, if we introduce the new format then why would we keep the binary
> one?
> > Is the new format just a next version of the binary one.
> >
> > 2.3) Remove restrictions on changing field type
> > > I do not know why we did that in the first place. This restriction
> > prevents
> > > type evolution and confuses users.
> >
> >
> > That is a hot requirement shared by those who use Ignite SQL in
> production.
> > +1.
> >
> > --
> > Denis
> >
> > On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov 
> > wrote:
> >
> > > Igniters,
> > >
> > > It is very likely that Apache Ignite 3.0 will be released next year. So
> > we
> > > need to start thinking about major product improvements. I'd like to
> > start
> > > with binary objects.
> > >
> > > Currently they are one of the main limiting factors for the product.
> They
> > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > >
> > > I would like to ask all of you who worked with binary objects to share
> > your
> > > feedback and ideas, so that we understand how they should look like in
> AI
> > > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > > critics. Then we will work on ideas in separate topics.
> > >
> > > 1) Historical background
> > >
> > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > working
> > > on .NET and CPP clients. During design we had several ideas in mind:
> > > - ability to read object fields in O(1) without deserialization
> > > - interoperabillty between Java, .NET and CPP.
> > >
> > > Since then a number of other concepts were mixed to the cocktail:
> > > - Affinity key fields
> > > - Strict typing for existing fields (aka metadata)
> > > - Binary Object as storage format
> > >
> > > 2) My proposals
> > >
> > > 2.1) Introduce "Data Row Format" interface
> > > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > no
> > > length, no hash code, etc), allow supper-fast field access, support
> > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > types (date, time, timestamp, timestamp with timezone, etc), and store
> > > these types as efficiently as possible.
> > >
> > > What we need is to introduce an interface which will convert a pair of
> > > key-value objects into a row. This row will be used to store data and
> to
> > > get fields from it. Care about memory consumption, need SQL and strict
> > > schema - use one format. Need flexibility and prefer key-value access -
> > use
> > > another format which will store binary objects unchanged (current
> > > behavior).
> > >
> > > interface DataRowFormat {
> > > DataRow create(Object key, Object value); // primitives or binary
> > > objects
> > > DataRowMetadata metadata();
> > > }
> > >
> > > 2.2) Remove affinity field from metadata
> > > Affinity rules are governed by cache, not type. We should remove
> > > "affintiyFieldName" from metadata.
> > >
> > > 2.3) Remove restrictions on changing field type
> > > I do not know why we did that in the first place. This restriction
> > prevents
> > > type evolution and confuses users.
> > >
> > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > fields,
> > > put fixed-length fields before variable-length.
> > > Motivation: to save space.
> > >
> > > What else? Please share your ideas.
> > >
> > > Vladimir.
> > >
> >
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Denis,

Several examples:
1) DEFAULT values - in SQL you may avoid storing default value in the table
and store it in metadata instead. Not applicable for BinaryObject because
the same binary object may be saved to two SQL tables with different
defaults
2) DATE and other temporal types - in SQL you want to store it in special
format to be able to extract date parts quickly (typically - 11 bytes). But
in Java and some other languages the best format is plain long. this is why
we use it BinaryObject
3) String charset - in SQL you may choose different charsets for different
tables. E.g. UTF-8 for one, ASCII for another. In BinaryObject we store
everything in UTF-8, and this is fine for most cases, well ... except of
SQL :-)

The key thing here is that you cannot define a format which will be good
for both SQL, and native API. They are very different. This is why I
propose to define additional interface on cache level defining how to store
values, which will be very different from binary objects.

Vladimir.

On Thu, Nov 22, 2018 at 3:32 AM Denis Magda  wrote:

> Vladimir,
>
> Could you educate me a little bit, why the current format is bad for SQL
> and why another one is more suitable?
>
> Also, if we introduce the new format then why would we keep the binary one?
> Is the new format just a next version of the binary one.
>
> 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
>
>
> That is a hot requirement shared by those who use Ignite SQL in production.
> +1.
>
> --
> Denis
>
> On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov 
> wrote:
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> > DataRow create(Object key, Object value); // primitives or binary
> > objects
> > DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Ilya,

Currently binary objects already works almost as you proposed. We have 4
bytes types (type name hash) and we have 4 bytes schema ID (hash of all
field names). We do not write field IDs in the object itself. What we do
not have is separation of fixed and varlen fields. Agree, that we should
implement it and remove offsets of fixed fields from the binary object.

On Wed, Nov 21, 2018 at 7:18 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I would like to propose the following changes:
>
> - Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
> class types + fields) as opposed of cksum(class name) as we have it
> currently. Note that we only have to compute that once per class loaded in
> JVM.
> - BinaryType has a list of fixed length fields (numbers, datetimes, flags)
> and list of variable length fields. We can put all fixed length fields at
> start of BinaryObject so that we can access them by offset as per typeId.
> - Likewise we don't need to encode field id in BinaryObject anymore, save 4
> bytes per field. We already know their order from BinaryType.
> - This means when you ALTER TABLE we add a BinaryType to existing Class (or
> pseudo-Class type name) and we can use it for new data, and eventually
> update existing data to have this field.
> - On top of BinaryType's we can have checks that run them against SQL table
> columns list to see if there are any mismatches.
>
> To Illustrate, previously we had it like:
> [ Type id | String field id | String field value | Long field id | Long
> field value | Datetime field id | Datetime field value ]
> But now it will be
> [ Type id | Long field value | Datetime field value | String field value ]
> ^--^ can be accessed by offset
>
> Regards,
> Ilya.
>
> --
> Ilya Kasnacheev
>
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> > DataRow create(Object key, Object value); // primitives or binary
> > objects
> > DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Valentin Kulichenko
We should definitely allow to change type of field/column to another
compatible type. The fact that we do not allow to change Int to Long is
pretty insane. However, there are cases when it's much more complicated.
How are we going to replace Int with a String, for example? I believe this
should require certain migration procedure anyway. How do other databases
handle that?

-Val

On Wed, Nov 21, 2018 at 4:32 PM Denis Magda  wrote:

> Vladimir,
>
> Could you educate me a little bit, why the current format is bad for SQL
> and why another one is more suitable?
>
> Also, if we introduce the new format then why would we keep the binary one?
> Is the new format just a next version of the binary one.
>
> 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
>
>
> That is a hot requirement shared by those who use Ignite SQL in production.
> +1.
>
> --
> Denis
>
> On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov 
> wrote:
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> > DataRow create(Object key, Object value); // primitives or binary
> > objects
> > DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Denis Magda
Vladimir,

Could you educate me a little bit, why the current format is bad for SQL
and why another one is more suitable?

Also, if we introduce the new format then why would we keep the binary one?
Is the new format just a next version of the binary one.

2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.


That is a hot requirement shared by those who use Ignite SQL in production.
+1.

--
Denis

On Mon, Nov 19, 2018 at 11:05 PM Vladimir Ozerov 
wrote:

> Igniters,
>
> It is very likely that Apache Ignite 3.0 will be released next year. So we
> need to start thinking about major product improvements. I'd like to start
> with binary objects.
>
> Currently they are one of the main limiting factors for the product. They
> are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> comparing to other vendors. They are slow - not suitable for SQL at all.
>
> I would like to ask all of you who worked with binary objects to share your
> feedback and ideas, so that we understand how they should look like in AI
> 3.0. This is a brain storm - let's accumulate ideas first and minimize
> critics. Then we will work on ideas in separate topics.
>
> 1) Historical background
>
> BO were implemented around 2014 (Apache Ignite 1.5) when we started working
> on .NET and CPP clients. During design we had several ideas in mind:
> - ability to read object fields in O(1) without deserialization
> - interoperabillty between Java, .NET and CPP.
>
> Since then a number of other concepts were mixed to the cocktail:
> - Affinity key fields
> - Strict typing for existing fields (aka metadata)
> - Binary Object as storage format
>
> 2) My proposals
>
> 2.1) Introduce "Data Row Format" interface
> Binary Objects are terrible candidates for storage. Too fat, too slow.
> Efficient storage typically has <10 bytes overhead per row (no metadata, no
> length, no hash code, etc), allow supper-fast field access, support
> different string formats (ASCII, UTF-8, etc), support different temporal
> types (date, time, timestamp, timestamp with timezone, etc), and store
> these types as efficiently as possible.
>
> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. Care about memory consumption, need SQL and strict
> schema - use one format. Need flexibility and prefer key-value access - use
> another format which will store binary objects unchanged (current
> behavior).
>
> interface DataRowFormat {
> DataRow create(Object key, Object value); // primitives or binary
> objects
> DataRowMetadata metadata();
> }
>
> 2.2) Remove affinity field from metadata
> Affinity rules are governed by cache, not type. We should remove
> "affintiyFieldName" from metadata.
>
> 2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.
>
> 2.4) Use bitmaps for "null" and default values and for fixed-length fields,
> put fixed-length fields before variable-length.
> Motivation: to save space.
>
> What else? Please share your ideas.
>
> Vladimir.
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Andrey Mashenkov
Hi,

Vladimir,  Ilya,

What about variable length fields? How do you suggest to store offsets in
footer or header?

For large objects, headers will allow to retrive field faster and detect
null immediately, but we have to reserve place for all var-len fields
offset and update header after serialization.
however, footers looks more compact (we can omit nulls) and allow us to use
stream concept during serialization.
Have I miss smth?


On Wed, Nov 21, 2018 at 7:18 PM Ilya Kasnacheev 
wrote:

> Hello!
>
> I would like to propose the following changes:
>
> - Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
> class types + fields) as opposed of cksum(class name) as we have it
> currently. Note that we only have to compute that once per class loaded in
> JVM.
> - BinaryType has a list of fixed length fields (numbers, datetimes, flags)
> and list of variable length fields. We can put all fixed length fields at
> start of BinaryObject so that we can access them by offset as per typeId.
> - Likewise we don't need to encode field id in BinaryObject anymore, save 4
> bytes per field. We already know their order from BinaryType.
> - This means when you ALTER TABLE we add a BinaryType to existing Class (or
> pseudo-Class type name) and we can use it for new data, and eventually
> update existing data to have this field.
> - On top of BinaryType's we can have checks that run them against SQL table
> columns list to see if there are any mismatches.
>
> To Illustrate, previously we had it like:
> [ Type id | String field id | String field value | Long field id | Long
> field value | Datetime field id | Datetime field value ]
> But now it will be
> [ Type id | Long field value | Datetime field value | String field value ]
> ^--^ can be accessed by offset
>
> Regards,
> Ilya.
>
> --
> Ilya Kasnacheev
>
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> > DataRow create(Object key, Object value); // primitives or binary
> > objects
> > DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


-- 
Best regards,
Andrey V. Mashenkov


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Ilya Kasnacheev
Hello!

I would like to propose the following changes:

- Let's allow multiple BinaryType's per Class. Make typeId = cksum(list of
class types + fields) as opposed of cksum(class name) as we have it
currently. Note that we only have to compute that once per class loaded in
JVM.
- BinaryType has a list of fixed length fields (numbers, datetimes, flags)
and list of variable length fields. We can put all fixed length fields at
start of BinaryObject so that we can access them by offset as per typeId.
- Likewise we don't need to encode field id in BinaryObject anymore, save 4
bytes per field. We already know their order from BinaryType.
- This means when you ALTER TABLE we add a BinaryType to existing Class (or
pseudo-Class type name) and we can use it for new data, and eventually
update existing data to have this field.
- On top of BinaryType's we can have checks that run them against SQL table
columns list to see if there are any mismatches.

To Illustrate, previously we had it like:
[ Type id | String field id | String field value | Long field id | Long
field value | Datetime field id | Datetime field value ]
But now it will be
[ Type id | Long field value | Datetime field value | String field value ]
^--^ can be accessed by offset

Regards,
Ilya.

-- 
Ilya Kasnacheev


вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :

> Igniters,
>
> It is very likely that Apache Ignite 3.0 will be released next year. So we
> need to start thinking about major product improvements. I'd like to start
> with binary objects.
>
> Currently they are one of the main limiting factors for the product. They
> are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> comparing to other vendors. They are slow - not suitable for SQL at all.
>
> I would like to ask all of you who worked with binary objects to share your
> feedback and ideas, so that we understand how they should look like in AI
> 3.0. This is a brain storm - let's accumulate ideas first and minimize
> critics. Then we will work on ideas in separate topics.
>
> 1) Historical background
>
> BO were implemented around 2014 (Apache Ignite 1.5) when we started working
> on .NET and CPP clients. During design we had several ideas in mind:
> - ability to read object fields in O(1) without deserialization
> - interoperabillty between Java, .NET and CPP.
>
> Since then a number of other concepts were mixed to the cocktail:
> - Affinity key fields
> - Strict typing for existing fields (aka metadata)
> - Binary Object as storage format
>
> 2) My proposals
>
> 2.1) Introduce "Data Row Format" interface
> Binary Objects are terrible candidates for storage. Too fat, too slow.
> Efficient storage typically has <10 bytes overhead per row (no metadata, no
> length, no hash code, etc), allow supper-fast field access, support
> different string formats (ASCII, UTF-8, etc), support different temporal
> types (date, time, timestamp, timestamp with timezone, etc), and store
> these types as efficiently as possible.
>
> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. Care about memory consumption, need SQL and strict
> schema - use one format. Need flexibility and prefer key-value access - use
> another format which will store binary objects unchanged (current
> behavior).
>
> interface DataRowFormat {
> DataRow create(Object key, Object value); // primitives or binary
> objects
> DataRowMetadata metadata();
> }
>
> 2.2) Remove affinity field from metadata
> Affinity rules are governed by cache, not type. We should remove
> "affintiyFieldName" from metadata.
>
> 2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.
>
> 2.4) Use bitmaps for "null" and default values and for fixed-length fields,
> put fixed-length fields before variable-length.
> Motivation: to save space.
>
> What else? Please share your ideas.
>
> Vladimir.
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Stephen Darlington
Possibly heading into wishlist rather than practical territory here, but you 
did ask...

> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. 

Rather than mapping objects to a row, how about mapping to a more general 
“internal storage” interface? Assuming that all the data for a row is stored 
together makes it difficult to implement any optimisations that spans multiple 
rows. Think of a string state field where there are only five known values… we 
currently repeat the text over and over. Or a full column store backend.

Regards,
Stephen




Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Pavel Tupitsyn
Makes sense.

I'm trying to grasp this from usability POV.
Having two ways of storing data with different behavior can be confusing.
In .NET we already have this issue with DateTime when if you want SQL, you
get subtly different behavior.

So IMO we should enable strict type checks for all caches, even non-SQL
ones.
Users will be able to evolve types by adding/removing fields, but at least
type id will be fixed.
And for SQL caches you'll get a clear exception like "Field does not exist
in SQL schema: foobar"

On Wed, Nov 21, 2018 at 4:19 PM Vladimir Ozerov 
wrote:

> Pavel,
>
> This could be solved with aforementioned "RowFormat". We will be able to
> configure cache as follows: "this is a cache with strict type checks, first
> one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
> will be possible to serialize anything into binary object, but when it
> comes to real store, exception will be thrown.
>
> Makes sense?
>
> On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn 
> wrote:
>
> > Vladimir,
> >
> > IMO the issue is that we allow any type of data in the cache (put Person,
> > then put int to the same cache).
> > Are we going to address this in 3.0 and enforce key/value types according
> > to cache configuration?
> > This will provide more space for optimizations.
> >
> > On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov 
> > wrote:
> >
> > > Denis,
> > >
> > > In theory data conversion could be avoided in certain cases. E.g.
> > consider
> > > a case of loading data through streamer. We know the cache, we know
> it's
> > > metadata and row format. So instead of doing "user object" -> "binary
> > > object" -> "row", we can do "user object" -> "row".
> > >
> > > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov <
> dmekhani...@gmail.com>
> > > wrote:
> > >
> > > > Vladimir,
> > > >
> > > > Thank you for the clarification. I didn't see this distinction first.
> > > >
> > > > I meant using customizable formats for all serialization, not only
> for
> > > > storage.
> > > > The idea behind my proposal is to avoid data conversion, when loading
> > > data
> > > > into Ignite.
> > > > It will complicate usage of thin clients though, so I'm not sure,
> that
> > it
> > > > will make users happier.
> > > >
> > > > But anyway, the same approach may be used for storage only.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov  >:
> > > >
> > > > > Denis,
> > > > >
> > > > > Could you please clarify - are you talking about storage, e.g. how
> > > > objects
> > > > > are stored in Ignite, or about serialization as a whole? I'd like
> to
> > > > better
> > > > > understand whether the use case you described is relevant to my
> idea
> > of
> > > > > splitting binary objects from underlying storage format.
> > > > > My vision was that we can use current BinaryObject protocol (with
> > > > whatever
> > > > > optimizations needed), as a common format for communication between
> > > nodes
> > > > > and a common serialization protocol. This is very handy because all
> > > > > participants (Java, С++, .NET, all sorts of thin clients) are able
> to
> > > > work
> > > > > with it. So if I have a "Person" class in Java I can read it in any
> > > other
> > > > > platform without any additional configuration. But when it comes to
> > > > > *storage*, then we may introduce pluggable row format interface
> which
> > > > will
> > > > > apply any necessary transformations. So if someone wants to store
> > > objects
> > > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > > classes, implementa field extraction logic, etc.) - then just
> > implement
> > > > > that interface. They key is that this implementation will only be
> > > needed
> > > > in
> > > > > Java, not in a dozen of platform we support.
> > > > >
> > > > > But when it comes to how to store object in a cache
> > > > >
> > > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > > dmekhani...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > People often ask about possibility to store their data in that
> > > format,
> > > > > that
> > > > > > they use in their applications.
> > > > > > If you use Avro everywhere in your application, then why not
> store
> > > data
> > > > > in
> > > > > > the same format in Ignite?
> > > > > > So, how about making an interface, that would enlist all
> operations
> > > we
> > > > > > need,
> > > > > > and use this interface everywhere without relying on any specific
> > > > > > implementation.
> > > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > > implementation, that you can get from Ignite
> > > > > > is *BinaryObjectImpl*.
> > > > > > I think, we should make Ignite extendible and provide capability
> to
> > > > > specify
> > > > > > your own data format
> > > > > > by implementing the corresponding interfaces.
> > > > > > So, if you like JSONB or Protobuf or whatever else, you could
> > enable
> > > a
> > > > > > module 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Pavel,

This could be solved with aforementioned "RowFormat". We will be able to
configure cache as follows: "this is a cache with strict type checks, first
one is A, with fields A1, A2, A3, second is B with fields B1, B2". So it
will be possible to serialize anything into binary object, but when it
comes to real store, exception will be thrown.

Makes sense?

On Wed, Nov 21, 2018 at 3:21 PM Pavel Tupitsyn  wrote:

> Vladimir,
>
> IMO the issue is that we allow any type of data in the cache (put Person,
> then put int to the same cache).
> Are we going to address this in 3.0 and enforce key/value types according
> to cache configuration?
> This will provide more space for optimizations.
>
> On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov 
> wrote:
>
> > Denis,
> >
> > In theory data conversion could be avoided in certain cases. E.g.
> consider
> > a case of loading data through streamer. We know the cache, we know it's
> > metadata and row format. So instead of doing "user object" -> "binary
> > object" -> "row", we can do "user object" -> "row".
> >
> > On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov 
> > wrote:
> >
> > > Vladimir,
> > >
> > > Thank you for the clarification. I didn't see this distinction first.
> > >
> > > I meant using customizable formats for all serialization, not only for
> > > storage.
> > > The idea behind my proposal is to avoid data conversion, when loading
> > data
> > > into Ignite.
> > > It will complicate usage of thin clients though, so I'm not sure, that
> it
> > > will make users happier.
> > >
> > > But anyway, the same approach may be used for storage only.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov :
> > >
> > > > Denis,
> > > >
> > > > Could you please clarify - are you talking about storage, e.g. how
> > > objects
> > > > are stored in Ignite, or about serialization as a whole? I'd like to
> > > better
> > > > understand whether the use case you described is relevant to my idea
> of
> > > > splitting binary objects from underlying storage format.
> > > > My vision was that we can use current BinaryObject protocol (with
> > > whatever
> > > > optimizations needed), as a common format for communication between
> > nodes
> > > > and a common serialization protocol. This is very handy because all
> > > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > > work
> > > > with it. So if I have a "Person" class in Java I can read it in any
> > other
> > > > platform without any additional configuration. But when it comes to
> > > > *storage*, then we may introduce pluggable row format interface which
> > > will
> > > > apply any necessary transformations. So if someone wants to store
> > objects
> > > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > > classes, implementa field extraction logic, etc.) - then just
> implement
> > > > that interface. They key is that this implementation will only be
> > needed
> > > in
> > > > Java, not in a dozen of platform we support.
> > > >
> > > > But when it comes to how to store object in a cache
> > > >
> > > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> > dmekhani...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > People often ask about possibility to store their data in that
> > format,
> > > > that
> > > > > they use in their applications.
> > > > > If you use Avro everywhere in your application, then why not store
> > data
> > > > in
> > > > > the same format in Ignite?
> > > > > So, how about making an interface, that would enlist all operations
> > we
> > > > > need,
> > > > > and use this interface everywhere without relying on any specific
> > > > > implementation.
> > > > > *BinaryObject* looks like a suitable interface, but the only
> > > > > implementation, that you can get from Ignite
> > > > > is *BinaryObjectImpl*.
> > > > > I think, we should make Ignite extendible and provide capability to
> > > > specify
> > > > > your own data format
> > > > > by implementing the corresponding interfaces.
> > > > > So, if you like JSONB or Protobuf or whatever else, you could
> enable
> > a
> > > > > module for the corresponding
> > > > > format, and use it for storing the data.
> > > > >
> > > > > Denis
> > > > >
> > > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> > zaleslaw@gmail.com
> > > >:
> > > > >
> > > > > > I'd like @Vyacheslav Daradur approach.
> > > > > >
> > > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > > internal
> > > > > > raw-memory (and hence unsafe) binary row format.
> > > > > >
> > > > > > P.S. If somebody is interested in this apporach, I could share
> more
> > > > > > information
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 11:33, 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Pavel Tupitsyn
Vladimir,

IMO the issue is that we allow any type of data in the cache (put Person,
then put int to the same cache).
Are we going to address this in 3.0 and enforce key/value types according
to cache configuration?
This will provide more space for optimizations.

On Wed, Nov 21, 2018 at 3:14 PM Vladimir Ozerov 
wrote:

> Denis,
>
> In theory data conversion could be avoided in certain cases. E.g. consider
> a case of loading data through streamer. We know the cache, we know it's
> metadata and row format. So instead of doing "user object" -> "binary
> object" -> "row", we can do "user object" -> "row".
>
> On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov 
> wrote:
>
> > Vladimir,
> >
> > Thank you for the clarification. I didn't see this distinction first.
> >
> > I meant using customizable formats for all serialization, not only for
> > storage.
> > The idea behind my proposal is to avoid data conversion, when loading
> data
> > into Ignite.
> > It will complicate usage of thin clients though, so I'm not sure, that it
> > will make users happier.
> >
> > But anyway, the same approach may be used for storage only.
> >
> > Denis
> >
> > ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov :
> >
> > > Denis,
> > >
> > > Could you please clarify - are you talking about storage, e.g. how
> > objects
> > > are stored in Ignite, or about serialization as a whole? I'd like to
> > better
> > > understand whether the use case you described is relevant to my idea of
> > > splitting binary objects from underlying storage format.
> > > My vision was that we can use current BinaryObject protocol (with
> > whatever
> > > optimizations needed), as a common format for communication between
> nodes
> > > and a common serialization protocol. This is very handy because all
> > > participants (Java, С++, .NET, all sorts of thin clients) are able to
> > work
> > > with it. So if I have a "Person" class in Java I can read it in any
> other
> > > platform without any additional configuration. But when it comes to
> > > *storage*, then we may introduce pluggable row format interface which
> > will
> > > apply any necessary transformations. So if someone wants to store
> objects
> > > in Avro/Protobuf, and ready to configure and implement it (generate
> > > classes, implementa field extraction logic, etc.) - then just implement
> > > that interface. They key is that this implementation will only be
> needed
> > in
> > > Java, not in a dozen of platform we support.
> > >
> > > But when it comes to how to store object in a cache
> > >
> > > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov <
> dmekhani...@gmail.com
> > >
> > > wrote:
> > >
> > > > People often ask about possibility to store their data in that
> format,
> > > that
> > > > they use in their applications.
> > > > If you use Avro everywhere in your application, then why not store
> data
> > > in
> > > > the same format in Ignite?
> > > > So, how about making an interface, that would enlist all operations
> we
> > > > need,
> > > > and use this interface everywhere without relying on any specific
> > > > implementation.
> > > > *BinaryObject* looks like a suitable interface, but the only
> > > > implementation, that you can get from Ignite
> > > > is *BinaryObjectImpl*.
> > > > I think, we should make Ignite extendible and provide capability to
> > > specify
> > > > your own data format
> > > > by implementing the corresponding interfaces.
> > > > So, if you like JSONB or Protobuf or whatever else, you could enable
> a
> > > > module for the corresponding
> > > > format, and use it for storing the data.
> > > >
> > > > Denis
> > > >
> > > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev <
> zaleslaw@gmail.com
> > >:
> > > >
> > > > > I'd like @Vyacheslav Daradur approach.
> > > > >
> > > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > > UnsafeRow is a concrete InternalRow that represents a mutable
> > internal
> > > > > raw-memory (and hence unsafe) binary row format.
> > > > >
> > > > > P.S. If somebody is interested in this apporach, I could share more
> > > > > information
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> > sergi.vlady...@gmail.com
> > > >:
> > > > >
> > > > > > I really like Protobuf format. It is probably not what we need
> for
> > > O(1)
> > > > > > fields access,
> > > > > > but for compact data representation we can derive lots from
> there.
> > > > > >
> > > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > > The correct way to evolve schema in common case is to add new
> > fields
> > > > and
> > > > > > gradually
> > > > > > deprecate the old ones, if you can skip default/null fields in
> > binary
> > > > > > format this approach
> > > > > > will not introduce any noticeable performance/size overhead.
> > > > > >

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Denis,

In theory data conversion could be avoided in certain cases. E.g. consider
a case of loading data through streamer. We know the cache, we know it's
metadata and row format. So instead of doing "user object" -> "binary
object" -> "row", we can do "user object" -> "row".

On Wed, Nov 21, 2018 at 1:31 PM Denis Mekhanikov 
wrote:

> Vladimir,
>
> Thank you for the clarification. I didn't see this distinction first.
>
> I meant using customizable formats for all serialization, not only for
> storage.
> The idea behind my proposal is to avoid data conversion, when loading data
> into Ignite.
> It will complicate usage of thin clients though, so I'm not sure, that it
> will make users happier.
>
> But anyway, the same approach may be used for storage only.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov :
>
> > Denis,
> >
> > Could you please clarify - are you talking about storage, e.g. how
> objects
> > are stored in Ignite, or about serialization as a whole? I'd like to
> better
> > understand whether the use case you described is relevant to my idea of
> > splitting binary objects from underlying storage format.
> > My vision was that we can use current BinaryObject protocol (with
> whatever
> > optimizations needed), as a common format for communication between nodes
> > and a common serialization protocol. This is very handy because all
> > participants (Java, С++, .NET, all sorts of thin clients) are able to
> work
> > with it. So if I have a "Person" class in Java I can read it in any other
> > platform without any additional configuration. But when it comes to
> > *storage*, then we may introduce pluggable row format interface which
> will
> > apply any necessary transformations. So if someone wants to store objects
> > in Avro/Protobuf, and ready to configure and implement it (generate
> > classes, implementa field extraction logic, etc.) - then just implement
> > that interface. They key is that this implementation will only be needed
> in
> > Java, not in a dozen of platform we support.
> >
> > But when it comes to how to store object in a cache
> >
> > On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov  >
> > wrote:
> >
> > > People often ask about possibility to store their data in that format,
> > that
> > > they use in their applications.
> > > If you use Avro everywhere in your application, then why not store data
> > in
> > > the same format in Ignite?
> > > So, how about making an interface, that would enlist all operations we
> > > need,
> > > and use this interface everywhere without relying on any specific
> > > implementation.
> > > *BinaryObject* looks like a suitable interface, but the only
> > > implementation, that you can get from Ignite
> > > is *BinaryObjectImpl*.
> > > I think, we should make Ignite extendible and provide capability to
> > specify
> > > your own data format
> > > by implementing the corresponding interfaces.
> > > So, if you like JSONB or Protobuf or whatever else, you could enable a
> > > module for the corresponding
> > > format, and use it for storing the data.
> > >
> > > Denis
> > >
> > > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev  >:
> > >
> > > > I'd like @Vyacheslav Daradur approach.
> > > >
> > > > Maybe somebody could have a look at UnsafeRow in Spark
> > > >
> > > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > > UnsafeRow is a concrete InternalRow that represents a mutable
> internal
> > > > raw-memory (and hence unsafe) binary row format.
> > > >
> > > > P.S. If somebody is interested in this apporach, I could share more
> > > > information
> > > >
> > > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin <
> sergi.vlady...@gmail.com
> > >:
> > > >
> > > > > I really like Protobuf format. It is probably not what we need for
> > O(1)
> > > > > fields access,
> > > > > but for compact data representation we can derive lots from there.
> > > > >
> > > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > > The correct way to evolve schema in common case is to add new
> fields
> > > and
> > > > > gradually
> > > > > deprecate the old ones, if you can skip default/null fields in
> binary
> > > > > format this approach
> > > > > will not introduce any noticeable performance/size overhead.
> > > > >
> > > > > Sergi
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >:
> > > > >
> > > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > > Scheme
> > > > > > approach.
> > > > > >
> > > > > > That assumes that metadata will be stored separately from
> > serialized
> > > > > > data to reduce size.
> > > > > > In this case, the most advantages of Binary Objects like access
> in
> > > > > > O(1) and access without deserialization may be achieved.
> > > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > > voze...@gridgain.com
> > > > >
> > > > > > 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Denis Mekhanikov
Vladimir,

Thank you for the clarification. I didn't see this distinction first.

I meant using customizable formats for all serialization, not only for
storage.
The idea behind my proposal is to avoid data conversion, when loading data
into Ignite.
It will complicate usage of thin clients though, so I'm not sure, that it
will make users happier.

But anyway, the same approach may be used for storage only.

Denis

ср, 21 нояб. 2018 г. в 12:57, Vladimir Ozerov :

> Denis,
>
> Could you please clarify - are you talking about storage, e.g. how objects
> are stored in Ignite, or about serialization as a whole? I'd like to better
> understand whether the use case you described is relevant to my idea of
> splitting binary objects from underlying storage format.
> My vision was that we can use current BinaryObject protocol (with whatever
> optimizations needed), as a common format for communication between nodes
> and a common serialization protocol. This is very handy because all
> participants (Java, С++, .NET, all sorts of thin clients) are able to work
> with it. So if I have a "Person" class in Java I can read it in any other
> platform without any additional configuration. But when it comes to
> *storage*, then we may introduce pluggable row format interface which will
> apply any necessary transformations. So if someone wants to store objects
> in Avro/Protobuf, and ready to configure and implement it (generate
> classes, implementa field extraction logic, etc.) - then just implement
> that interface. They key is that this implementation will only be needed in
> Java, not in a dozen of platform we support.
>
> But when it comes to how to store object in a cache
>
> On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov 
> wrote:
>
> > People often ask about possibility to store their data in that format,
> that
> > they use in their applications.
> > If you use Avro everywhere in your application, then why not store data
> in
> > the same format in Ignite?
> > So, how about making an interface, that would enlist all operations we
> > need,
> > and use this interface everywhere without relying on any specific
> > implementation.
> > *BinaryObject* looks like a suitable interface, but the only
> > implementation, that you can get from Ignite
> > is *BinaryObjectImpl*.
> > I think, we should make Ignite extendible and provide capability to
> specify
> > your own data format
> > by implementing the corresponding interfaces.
> > So, if you like JSONB or Protobuf or whatever else, you could enable a
> > module for the corresponding
> > format, and use it for storing the data.
> >
> > Denis
> >
> > ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev :
> >
> > > I'd like @Vyacheslav Daradur approach.
> > >
> > > Maybe somebody could have a look at UnsafeRow in Spark
> > >
> > >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > > raw-memory (and hence unsafe) binary row format.
> > >
> > > P.S. If somebody is interested in this apporach, I could share more
> > > information
> > >
> > > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin  >:
> > >
> > > > I really like Protobuf format. It is probably not what we need for
> O(1)
> > > > fields access,
> > > > but for compact data representation we can derive lots from there.
> > > >
> > > > Also IMO, restricting field type change is absolutely sane idea.
> > > > The correct way to evolve schema in common case is to add new fields
> > and
> > > > gradually
> > > > deprecate the old ones, if you can skip default/null fields in binary
> > > > format this approach
> > > > will not introduce any noticeable performance/size overhead.
> > > >
> > > > Sergi
> > > >
> > > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > >
> > > > > I think, one of a possible way to reduce overhead and TCO - SQL
> > Scheme
> > > > > approach.
> > > > >
> > > > > That assumes that metadata will be stored separately from
> serialized
> > > > > data to reduce size.
> > > > > In this case, the most advantages of Binary Objects like access in
> > > > > O(1) and access without deserialization may be achieved.
> > > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> > voze...@gridgain.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Hi Alexey,
> > > > > >
> > > > > > Binary Objects only.
> > > > > >
> > > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > > zaleslaw@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Do we discuss here Core features only or the roadmap for all
> > > > > components?
> > > > > > >
> > > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > > voze...@gridgain.com
> > > > >:
> > > > > > >
> > > > > > > > Igniters,
> > > > > > > >
> > > > > > > > It is very likely that Apache Ignite 3.0 will be released
> next
> > > > year.
> > > > > So
> > > 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Denis,

Could you please clarify - are you talking about storage, e.g. how objects
are stored in Ignite, or about serialization as a whole? I'd like to better
understand whether the use case you described is relevant to my idea of
splitting binary objects from underlying storage format.
My vision was that we can use current BinaryObject protocol (with whatever
optimizations needed), as a common format for communication between nodes
and a common serialization protocol. This is very handy because all
participants (Java, С++, .NET, all sorts of thin clients) are able to work
with it. So if I have a "Person" class in Java I can read it in any other
platform without any additional configuration. But when it comes to
*storage*, then we may introduce pluggable row format interface which will
apply any necessary transformations. So if someone wants to store objects
in Avro/Protobuf, and ready to configure and implement it (generate
classes, implementa field extraction logic, etc.) - then just implement
that interface. They key is that this implementation will only be needed in
Java, not in a dozen of platform we support.

But when it comes to how to store object in a cache

On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov 
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev :
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin :
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur  >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > zaleslaw@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements. I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Igor Sapego
I want to offer several optimizations:

1. If we store fields metadata anyway, and are going to store bitmasks for
null fields, should we also exclude "header" byte from object field? As we
can get field type info from a metadata.

2. If we have subsequent fields of fixed length we can avoid storing offset
to these field, as we can easily calculate these offsets. We can even store
them in metadata to improve performance.

3. If these two optimizations are adopted, it makes sense to mention in docs
that it is highly recommended to write fixed sized types in the beginning
of the
object.

Best Regards,
Igor


On Wed, Nov 21, 2018 at 11:37 AM Denis Mekhanikov 
wrote:

> People often ask about possibility to store their data in that format, that
> they use in their applications.
> If you use Avro everywhere in your application, then why not store data in
> the same format in Ignite?
> So, how about making an interface, that would enlist all operations we
> need,
> and use this interface everywhere without relying on any specific
> implementation.
> *BinaryObject* looks like a suitable interface, but the only
> implementation, that you can get from Ignite
> is *BinaryObjectImpl*.
> I think, we should make Ignite extendible and provide capability to specify
> your own data format
> by implementing the corresponding interfaces.
> So, if you like JSONB or Protobuf or whatever else, you could enable a
> module for the corresponding
> format, and use it for storing the data.
>
> Denis
>
> ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev :
>
> > I'd like @Vyacheslav Daradur approach.
> >
> > Maybe somebody could have a look at UnsafeRow in Spark
> >
> >
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> > UnsafeRow is a concrete InternalRow that represents a mutable internal
> > raw-memory (and hence unsafe) binary row format.
> >
> > P.S. If somebody is interested in this apporach, I could share more
> > information
> >
> > вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin :
> >
> > > I really like Protobuf format. It is probably not what we need for O(1)
> > > fields access,
> > > but for compact data representation we can derive lots from there.
> > >
> > > Also IMO, restricting field type change is absolutely sane idea.
> > > The correct way to evolve schema in common case is to add new fields
> and
> > > gradually
> > > deprecate the old ones, if you can skip default/null fields in binary
> > > format this approach
> > > will not introduce any noticeable performance/size overhead.
> > >
> > > Sergi
> > >
> > > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur  >:
> > >
> > > > I think, one of a possible way to reduce overhead and TCO - SQL
> Scheme
> > > > approach.
> > > >
> > > > That assumes that metadata will be stored separately from serialized
> > > > data to reduce size.
> > > > In this case, the most advantages of Binary Objects like access in
> > > > O(1) and access without deserialization may be achieved.
> > > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov <
> voze...@gridgain.com
> > >
> > > > wrote:
> > > > >
> > > > > Hi Alexey,
> > > > >
> > > > > Binary Objects only.
> > > > >
> > > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > > zaleslaw@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Do we discuss here Core features only or the roadmap for all
> > > > components?
> > > > > >
> > > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> > voze...@gridgain.com
> > > >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > > year.
> > > > So
> > > > > > we
> > > > > > > need to start thinking about major product improvements. I'd
> like
> > > to
> > > > > > start
> > > > > > > with binary objects.
> > > > > > >
> > > > > > > Currently they are one of the main limiting factors for the
> > > product.
> > > > They
> > > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> > Ignite
> > > > > > > comparing to other vendors. They are slow - not suitable for
> SQL
> > at
> > > > all.
> > > > > > >
> > > > > > > I would like to ask all of you who worked with binary objects
> to
> > > > share
> > > > > > your
> > > > > > > feedback and ideas, so that we understand how they should look
> > like
> > > > in AI
> > > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > > minimize
> > > > > > > critics. Then we will work on ideas in separate topics.
> > > > > > >
> > > > > > > 1) Historical background
> > > > > > >
> > > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> > started
> > > > > > working
> > > > > > > on .NET and CPP clients. During design we had several ideas in
> > > mind:
> > > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > > >
> > > > > > > Since then a number of 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Denis Mekhanikov
People often ask about possibility to store their data in that format, that
they use in their applications.
If you use Avro everywhere in your application, then why not store data in
the same format in Ignite?
So, how about making an interface, that would enlist all operations we
need,
and use this interface everywhere without relying on any specific
implementation.
*BinaryObject* looks like a suitable interface, but the only
implementation, that you can get from Ignite
is *BinaryObjectImpl*.
I think, we should make Ignite extendible and provide capability to specify
your own data format
by implementing the corresponding interfaces.
So, if you like JSONB or Protobuf or whatever else, you could enable a
module for the corresponding
format, and use it for storing the data.

Denis

ср, 21 нояб. 2018 г. в 10:10, Alexey Zinoviev :

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin :
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur :
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov  >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > zaleslaw@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> voze...@gridgain.com
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert a
> pair
> > > of
> > > > > > key-value objects into a row. 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Sergi,

Changing filed name to change it's type is not user-friendly approach.
Because it will prevent operations like, which are perfectly normal from
user perspective:
ALTER TABLE my_table MODIFY COLUMN x BIGINT; // Was INT previously.

Command above is way more simpler than:
1. ALTER TABLE my_table DROP COLUMN x;
2. ALTER TABLE my_table ADD COLUMN x1 BIGINT;
3. Change application code in multiple places to deal with new fields

Binary object is essentially a collection of key-value pairs, no more than
that, so there is no need to restrict field types. All confusion will go
away, if we introduce "RowFormat" interface on cache level, which I
explained briefly in previous emails. In this case we may have "flexible"
row format allowing any types for the same field as long as user
application tolerates this, and we can have "strict" row format with
concrete fields, concrete types and concrete restrictions on them (NOT
NULL, CHECK, etc). In this case user still can create a binary object with
any type, but it might be rejected on storage level by "RowFormat"
implementation.


On Tue, Nov 20, 2018 at 11:33 AM Sergi Vladykin 
wrote:

> I really like Protobuf format. It is probably not what we need for O(1)
> fields access,
> but for compact data representation we can derive lots from there.
>
> Also IMO, restricting field type change is absolutely sane idea.
> The correct way to evolve schema in common case is to add new fields and
> gradually
> deprecate the old ones, if you can skip default/null fields in binary
> format this approach
> will not introduce any noticeable performance/size overhead.
>
> Sergi
>
> вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur :
>
> > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > approach.
> >
> > That assumes that metadata will be stored separately from serialized
> > data to reduce size.
> > In this case, the most advantages of Binary Objects like access in
> > O(1) and access without deserialization may be achieved.
> > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov 
> > wrote:
> > >
> > > Hi Alexey,
> > >
> > > Binary Objects only.
> > >
> > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> zaleslaw@gmail.com
> > >
> > > wrote:
> > >
> > > > Do we discuss here Core features only or the roadmap for all
> > components?
> > > >
> > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov  >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> > and to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > > DataRow create(Object key, Object 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Hi Alexey,

Yes, this looks really similar to Postgres format as welд - bitset, fixed
fields, varlen fields. Most probably we need something similar.

On Wed, Nov 21, 2018 at 10:10 AM Alexey Zinoviev 
wrote:

> I'd like @Vyacheslav Daradur approach.
>
> Maybe somebody could have a look at UnsafeRow in Spark
>
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
> UnsafeRow is a concrete InternalRow that represents a mutable internal
> raw-memory (and hence unsafe) binary row format.
>
> P.S. If somebody is interested in this apporach, I could share more
> information
>
> вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin :
>
> > I really like Protobuf format. It is probably not what we need for O(1)
> > fields access,
> > but for compact data representation we can derive lots from there.
> >
> > Also IMO, restricting field type change is absolutely sane idea.
> > The correct way to evolve schema in common case is to add new fields and
> > gradually
> > deprecate the old ones, if you can skip default/null fields in binary
> > format this approach
> > will not introduce any noticeable performance/size overhead.
> >
> > Sergi
> >
> > вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur :
> >
> > > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > > approach.
> > >
> > > That assumes that metadata will be stored separately from serialized
> > > data to reduce size.
> > > In this case, the most advantages of Binary Objects like access in
> > > O(1) and access without deserialization may be achieved.
> > > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov  >
> > > wrote:
> > > >
> > > > Hi Alexey,
> > > >
> > > > Binary Objects only.
> > > >
> > > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> > zaleslaw@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Do we discuss here Core features only or the roadmap for all
> > > components?
> > > > >
> > > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov <
> voze...@gridgain.com
> > >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > It is very likely that Apache Ignite 3.0 will be released next
> > year.
> > > So
> > > > > we
> > > > > > need to start thinking about major product improvements. I'd like
> > to
> > > > > start
> > > > > > with binary objects.
> > > > > >
> > > > > > Currently they are one of the main limiting factors for the
> > product.
> > > They
> > > > > > are fat - 30+ bytes overhead on average, high TCO of Apache
> Ignite
> > > > > > comparing to other vendors. They are slow - not suitable for SQL
> at
> > > all.
> > > > > >
> > > > > > I would like to ask all of you who worked with binary objects to
> > > share
> > > > > your
> > > > > > feedback and ideas, so that we understand how they should look
> like
> > > in AI
> > > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > > minimize
> > > > > > critics. Then we will work on ideas in separate topics.
> > > > > >
> > > > > > 1) Historical background
> > > > > >
> > > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we
> started
> > > > > working
> > > > > > on .NET and CPP clients. During design we had several ideas in
> > mind:
> > > > > > - ability to read object fields in O(1) without deserialization
> > > > > > - interoperabillty between Java, .NET and CPP.
> > > > > >
> > > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > > - Affinity key fields
> > > > > > - Strict typing for existing fields (aka metadata)
> > > > > > - Binary Object as storage format
> > > > > >
> > > > > > 2) My proposals
> > > > > >
> > > > > > 2.1) Introduce "Data Row Format" interface
> > > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > > slow.
> > > > > > Efficient storage typically has <10 bytes overhead per row (no
> > > metadata,
> > > > > no
> > > > > > length, no hash code, etc), allow supper-fast field access,
> support
> > > > > > different string formats (ASCII, UTF-8, etc), support different
> > > temporal
> > > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > > store
> > > > > > these types as efficiently as possible.
> > > > > >
> > > > > > What we need is to introduce an interface which will convert a
> pair
> > > of
> > > > > > key-value objects into a row. This row will be used to store data
> > > and to
> > > > > > get fields from it. Care about memory consumption, need SQL and
> > > strict
> > > > > > schema - use one format. Need flexibility and prefer key-value
> > > access -
> > > > > use
> > > > > > another format which will store binary objects unchanged (current
> > > > > > behavior).
> > > > > >
> > > > > > interface DataRowFormat {
> > > > > > DataRow create(Object key, Object value); // primitives or
> > binary
> > > > > > objects
> > > > > > DataRowMetadata metadata();
> > > > > > }
> > > > > >
> > > > > > 2.2) Remove affinity field from metadata
> > > > > 

Re: [IMPORTANT] Future of Binary Objects

2018-11-21 Thread Vladimir Ozerov
Vyacheslav,

Metadata is already stored separately. Object only contain 4 bytes
reference to that metadata (aka "schema ID") and offsets to be able to find
fields quickly. But if separate row format from binary format, we may be
able to reduce it event further to some extent.

On Tue, Nov 20, 2018 at 11:12 AM Vyacheslav Daradur 
wrote:

> I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> approach.
>
> That assumes that metadata will be stored separately from serialized
> data to reduce size.
> In this case, the most advantages of Binary Objects like access in
> O(1) and access without deserialization may be achieved.
> On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov 
> wrote:
> >
> > Hi Alexey,
> >
> > Binary Objects only.
> >
> > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev  >
> > wrote:
> >
> > > Do we discuss here Core features only or the roadmap for all
> components?
> > >
> > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
> > >
> > > > Igniters,
> > > >
> > > > It is very likely that Apache Ignite 3.0 will be released next year.
> So
> > > we
> > > > need to start thinking about major product improvements. I'd like to
> > > start
> > > > with binary objects.
> > > >
> > > > Currently they are one of the main limiting factors for the product.
> They
> > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > > >
> > > > I would like to ask all of you who worked with binary objects to
> share
> > > your
> > > > feedback and ideas, so that we understand how they should look like
> in AI
> > > > 3.0. This is a brain storm - let's accumulate ideas first and
> minimize
> > > > critics. Then we will work on ideas in separate topics.
> > > >
> > > > 1) Historical background
> > > >
> > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > working
> > > > on .NET and CPP clients. During design we had several ideas in mind:
> > > > - ability to read object fields in O(1) without deserialization
> > > > - interoperabillty between Java, .NET and CPP.
> > > >
> > > > Since then a number of other concepts were mixed to the cocktail:
> > > > - Affinity key fields
> > > > - Strict typing for existing fields (aka metadata)
> > > > - Binary Object as storage format
> > > >
> > > > 2) My proposals
> > > >
> > > > 2.1) Introduce "Data Row Format" interface
> > > > Binary Objects are terrible candidates for storage. Too fat, too
> slow.
> > > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > > no
> > > > length, no hash code, etc), allow supper-fast field access, support
> > > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > > types (date, time, timestamp, timestamp with timezone, etc), and
> store
> > > > these types as efficiently as possible.
> > > >
> > > > What we need is to introduce an interface which will convert a pair
> of
> > > > key-value objects into a row. This row will be used to store data
> and to
> > > > get fields from it. Care about memory consumption, need SQL and
> strict
> > > > schema - use one format. Need flexibility and prefer key-value
> access -
> > > use
> > > > another format which will store binary objects unchanged (current
> > > > behavior).
> > > >
> > > > interface DataRowFormat {
> > > > DataRow create(Object key, Object value); // primitives or binary
> > > > objects
> > > > DataRowMetadata metadata();
> > > > }
> > > >
> > > > 2.2) Remove affinity field from metadata
> > > > Affinity rules are governed by cache, not type. We should remove
> > > > "affintiyFieldName" from metadata.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > I do not know why we did that in the first place. This restriction
> > > prevents
> > > > type evolution and confuses users.
> > > >
> > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > fields,
> > > > put fixed-length fields before variable-length.
> > > > Motivation: to save space.
> > > >
> > > > What else? Please share your ideas.
> > > >
> > > > Vladimir.
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-20 Thread Alexey Zinoviev
I'd like @Vyacheslav Daradur approach.

Maybe somebody could have a look at UnsafeRow in Spark
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
UnsafeRow is a concrete InternalRow that represents a mutable internal
raw-memory (and hence unsafe) binary row format.

P.S. If somebody is interested in this apporach, I could share more
information

вт, 20 нояб. 2018 г. в 11:33, Sergi Vladykin :

> I really like Protobuf format. It is probably not what we need for O(1)
> fields access,
> but for compact data representation we can derive lots from there.
>
> Also IMO, restricting field type change is absolutely sane idea.
> The correct way to evolve schema in common case is to add new fields and
> gradually
> deprecate the old ones, if you can skip default/null fields in binary
> format this approach
> will not introduce any noticeable performance/size overhead.
>
> Sergi
>
> вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur :
>
> > I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> > approach.
> >
> > That assumes that metadata will be stored separately from serialized
> > data to reduce size.
> > In this case, the most advantages of Binary Objects like access in
> > O(1) and access without deserialization may be achieved.
> > On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov 
> > wrote:
> > >
> > > Hi Alexey,
> > >
> > > Binary Objects only.
> > >
> > > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev <
> zaleslaw@gmail.com
> > >
> > > wrote:
> > >
> > > > Do we discuss here Core features only or the roadmap for all
> > components?
> > > >
> > > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov  >:
> > > >
> > > > > Igniters,
> > > > >
> > > > > It is very likely that Apache Ignite 3.0 will be released next
> year.
> > So
> > > > we
> > > > > need to start thinking about major product improvements. I'd like
> to
> > > > start
> > > > > with binary objects.
> > > > >
> > > > > Currently they are one of the main limiting factors for the
> product.
> > They
> > > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > > comparing to other vendors. They are slow - not suitable for SQL at
> > all.
> > > > >
> > > > > I would like to ask all of you who worked with binary objects to
> > share
> > > > your
> > > > > feedback and ideas, so that we understand how they should look like
> > in AI
> > > > > 3.0. This is a brain storm - let's accumulate ideas first and
> > minimize
> > > > > critics. Then we will work on ideas in separate topics.
> > > > >
> > > > > 1) Historical background
> > > > >
> > > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > > working
> > > > > on .NET and CPP clients. During design we had several ideas in
> mind:
> > > > > - ability to read object fields in O(1) without deserialization
> > > > > - interoperabillty between Java, .NET and CPP.
> > > > >
> > > > > Since then a number of other concepts were mixed to the cocktail:
> > > > > - Affinity key fields
> > > > > - Strict typing for existing fields (aka metadata)
> > > > > - Binary Object as storage format
> > > > >
> > > > > 2) My proposals
> > > > >
> > > > > 2.1) Introduce "Data Row Format" interface
> > > > > Binary Objects are terrible candidates for storage. Too fat, too
> > slow.
> > > > > Efficient storage typically has <10 bytes overhead per row (no
> > metadata,
> > > > no
> > > > > length, no hash code, etc), allow supper-fast field access, support
> > > > > different string formats (ASCII, UTF-8, etc), support different
> > temporal
> > > > > types (date, time, timestamp, timestamp with timezone, etc), and
> > store
> > > > > these types as efficiently as possible.
> > > > >
> > > > > What we need is to introduce an interface which will convert a pair
> > of
> > > > > key-value objects into a row. This row will be used to store data
> > and to
> > > > > get fields from it. Care about memory consumption, need SQL and
> > strict
> > > > > schema - use one format. Need flexibility and prefer key-value
> > access -
> > > > use
> > > > > another format which will store binary objects unchanged (current
> > > > > behavior).
> > > > >
> > > > > interface DataRowFormat {
> > > > > DataRow create(Object key, Object value); // primitives or
> binary
> > > > > objects
> > > > > DataRowMetadata metadata();
> > > > > }
> > > > >
> > > > > 2.2) Remove affinity field from metadata
> > > > > Affinity rules are governed by cache, not type. We should remove
> > > > > "affintiyFieldName" from metadata.
> > > > >
> > > > > 2.3) Remove restrictions on changing field type
> > > > > I do not know why we did that in the first place. This restriction
> > > > prevents
> > > > > type evolution and confuses users.
> > > > >
> > > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > > fields,
> > > > > put fixed-length fields before variable-length.
> > > > > Motivation: to 

Re: [IMPORTANT] Future of Binary Objects

2018-11-20 Thread Sergi Vladykin
I really like Protobuf format. It is probably not what we need for O(1)
fields access,
but for compact data representation we can derive lots from there.

Also IMO, restricting field type change is absolutely sane idea.
The correct way to evolve schema in common case is to add new fields and
gradually
deprecate the old ones, if you can skip default/null fields in binary
format this approach
will not introduce any noticeable performance/size overhead.

Sergi

вт, 20 нояб. 2018 г. в 11:12, Vyacheslav Daradur :

> I think, one of a possible way to reduce overhead and TCO - SQL Scheme
> approach.
>
> That assumes that metadata will be stored separately from serialized
> data to reduce size.
> In this case, the most advantages of Binary Objects like access in
> O(1) and access without deserialization may be achieved.
> On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov 
> wrote:
> >
> > Hi Alexey,
> >
> > Binary Objects only.
> >
> > On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev  >
> > wrote:
> >
> > > Do we discuss here Core features only or the roadmap for all
> components?
> > >
> > > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
> > >
> > > > Igniters,
> > > >
> > > > It is very likely that Apache Ignite 3.0 will be released next year.
> So
> > > we
> > > > need to start thinking about major product improvements. I'd like to
> > > start
> > > > with binary objects.
> > > >
> > > > Currently they are one of the main limiting factors for the product.
> They
> > > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > > comparing to other vendors. They are slow - not suitable for SQL at
> all.
> > > >
> > > > I would like to ask all of you who worked with binary objects to
> share
> > > your
> > > > feedback and ideas, so that we understand how they should look like
> in AI
> > > > 3.0. This is a brain storm - let's accumulate ideas first and
> minimize
> > > > critics. Then we will work on ideas in separate topics.
> > > >
> > > > 1) Historical background
> > > >
> > > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > > working
> > > > on .NET and CPP clients. During design we had several ideas in mind:
> > > > - ability to read object fields in O(1) without deserialization
> > > > - interoperabillty between Java, .NET and CPP.
> > > >
> > > > Since then a number of other concepts were mixed to the cocktail:
> > > > - Affinity key fields
> > > > - Strict typing for existing fields (aka metadata)
> > > > - Binary Object as storage format
> > > >
> > > > 2) My proposals
> > > >
> > > > 2.1) Introduce "Data Row Format" interface
> > > > Binary Objects are terrible candidates for storage. Too fat, too
> slow.
> > > > Efficient storage typically has <10 bytes overhead per row (no
> metadata,
> > > no
> > > > length, no hash code, etc), allow supper-fast field access, support
> > > > different string formats (ASCII, UTF-8, etc), support different
> temporal
> > > > types (date, time, timestamp, timestamp with timezone, etc), and
> store
> > > > these types as efficiently as possible.
> > > >
> > > > What we need is to introduce an interface which will convert a pair
> of
> > > > key-value objects into a row. This row will be used to store data
> and to
> > > > get fields from it. Care about memory consumption, need SQL and
> strict
> > > > schema - use one format. Need flexibility and prefer key-value
> access -
> > > use
> > > > another format which will store binary objects unchanged (current
> > > > behavior).
> > > >
> > > > interface DataRowFormat {
> > > > DataRow create(Object key, Object value); // primitives or binary
> > > > objects
> > > > DataRowMetadata metadata();
> > > > }
> > > >
> > > > 2.2) Remove affinity field from metadata
> > > > Affinity rules are governed by cache, not type. We should remove
> > > > "affintiyFieldName" from metadata.
> > > >
> > > > 2.3) Remove restrictions on changing field type
> > > > I do not know why we did that in the first place. This restriction
> > > prevents
> > > > type evolution and confuses users.
> > > >
> > > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > > fields,
> > > > put fixed-length fields before variable-length.
> > > > Motivation: to save space.
> > > >
> > > > What else? Please share your ideas.
> > > >
> > > > Vladimir.
> > > >
> > >
>
>
>
> --
> Best Regards, Vyacheslav D.
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-20 Thread Vyacheslav Daradur
I think, one of a possible way to reduce overhead and TCO - SQL Scheme approach.

That assumes that metadata will be stored separately from serialized
data to reduce size.
In this case, the most advantages of Binary Objects like access in
O(1) and access without deserialization may be achieved.
On Tue, Nov 20, 2018 at 10:56 AM Vladimir Ozerov  wrote:
>
> Hi Alexey,
>
> Binary Objects only.
>
> On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev 
> wrote:
>
> > Do we discuss here Core features only or the roadmap for all components?
> >
> > вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
> >
> > > Igniters,
> > >
> > > It is very likely that Apache Ignite 3.0 will be released next year. So
> > we
> > > need to start thinking about major product improvements. I'd like to
> > start
> > > with binary objects.
> > >
> > > Currently they are one of the main limiting factors for the product. They
> > > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > > comparing to other vendors. They are slow - not suitable for SQL at all.
> > >
> > > I would like to ask all of you who worked with binary objects to share
> > your
> > > feedback and ideas, so that we understand how they should look like in AI
> > > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > > critics. Then we will work on ideas in separate topics.
> > >
> > > 1) Historical background
> > >
> > > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> > working
> > > on .NET and CPP clients. During design we had several ideas in mind:
> > > - ability to read object fields in O(1) without deserialization
> > > - interoperabillty between Java, .NET and CPP.
> > >
> > > Since then a number of other concepts were mixed to the cocktail:
> > > - Affinity key fields
> > > - Strict typing for existing fields (aka metadata)
> > > - Binary Object as storage format
> > >
> > > 2) My proposals
> > >
> > > 2.1) Introduce "Data Row Format" interface
> > > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > > Efficient storage typically has <10 bytes overhead per row (no metadata,
> > no
> > > length, no hash code, etc), allow supper-fast field access, support
> > > different string formats (ASCII, UTF-8, etc), support different temporal
> > > types (date, time, timestamp, timestamp with timezone, etc), and store
> > > these types as efficiently as possible.
> > >
> > > What we need is to introduce an interface which will convert a pair of
> > > key-value objects into a row. This row will be used to store data and to
> > > get fields from it. Care about memory consumption, need SQL and strict
> > > schema - use one format. Need flexibility and prefer key-value access -
> > use
> > > another format which will store binary objects unchanged (current
> > > behavior).
> > >
> > > interface DataRowFormat {
> > > DataRow create(Object key, Object value); // primitives or binary
> > > objects
> > > DataRowMetadata metadata();
> > > }
> > >
> > > 2.2) Remove affinity field from metadata
> > > Affinity rules are governed by cache, not type. We should remove
> > > "affintiyFieldName" from metadata.
> > >
> > > 2.3) Remove restrictions on changing field type
> > > I do not know why we did that in the first place. This restriction
> > prevents
> > > type evolution and confuses users.
> > >
> > > 2.4) Use bitmaps for "null" and default values and for fixed-length
> > fields,
> > > put fixed-length fields before variable-length.
> > > Motivation: to save space.
> > >
> > > What else? Please share your ideas.
> > >
> > > Vladimir.
> > >
> >



-- 
Best Regards, Vyacheslav D.


Re: [IMPORTANT] Future of Binary Objects

2018-11-19 Thread Vladimir Ozerov
Hi Alexey,

Binary Objects only.

On Tue, Nov 20, 2018 at 10:50 AM Alexey Zinoviev 
wrote:

> Do we discuss here Core features only or the roadmap for all components?
>
> вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :
>
> > Igniters,
> >
> > It is very likely that Apache Ignite 3.0 will be released next year. So
> we
> > need to start thinking about major product improvements. I'd like to
> start
> > with binary objects.
> >
> > Currently they are one of the main limiting factors for the product. They
> > are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> > comparing to other vendors. They are slow - not suitable for SQL at all.
> >
> > I would like to ask all of you who worked with binary objects to share
> your
> > feedback and ideas, so that we understand how they should look like in AI
> > 3.0. This is a brain storm - let's accumulate ideas first and minimize
> > critics. Then we will work on ideas in separate topics.
> >
> > 1) Historical background
> >
> > BO were implemented around 2014 (Apache Ignite 1.5) when we started
> working
> > on .NET and CPP clients. During design we had several ideas in mind:
> > - ability to read object fields in O(1) without deserialization
> > - interoperabillty between Java, .NET and CPP.
> >
> > Since then a number of other concepts were mixed to the cocktail:
> > - Affinity key fields
> > - Strict typing for existing fields (aka metadata)
> > - Binary Object as storage format
> >
> > 2) My proposals
> >
> > 2.1) Introduce "Data Row Format" interface
> > Binary Objects are terrible candidates for storage. Too fat, too slow.
> > Efficient storage typically has <10 bytes overhead per row (no metadata,
> no
> > length, no hash code, etc), allow supper-fast field access, support
> > different string formats (ASCII, UTF-8, etc), support different temporal
> > types (date, time, timestamp, timestamp with timezone, etc), and store
> > these types as efficiently as possible.
> >
> > What we need is to introduce an interface which will convert a pair of
> > key-value objects into a row. This row will be used to store data and to
> > get fields from it. Care about memory consumption, need SQL and strict
> > schema - use one format. Need flexibility and prefer key-value access -
> use
> > another format which will store binary objects unchanged (current
> > behavior).
> >
> > interface DataRowFormat {
> > DataRow create(Object key, Object value); // primitives or binary
> > objects
> > DataRowMetadata metadata();
> > }
> >
> > 2.2) Remove affinity field from metadata
> > Affinity rules are governed by cache, not type. We should remove
> > "affintiyFieldName" from metadata.
> >
> > 2.3) Remove restrictions on changing field type
> > I do not know why we did that in the first place. This restriction
> prevents
> > type evolution and confuses users.
> >
> > 2.4) Use bitmaps for "null" and default values and for fixed-length
> fields,
> > put fixed-length fields before variable-length.
> > Motivation: to save space.
> >
> > What else? Please share your ideas.
> >
> > Vladimir.
> >
>


Re: [IMPORTANT] Future of Binary Objects

2018-11-19 Thread Alexey Zinoviev
Do we discuss here Core features only or the roadmap for all components?

вт, 20 нояб. 2018 г. в 10:05, Vladimir Ozerov :

> Igniters,
>
> It is very likely that Apache Ignite 3.0 will be released next year. So we
> need to start thinking about major product improvements. I'd like to start
> with binary objects.
>
> Currently they are one of the main limiting factors for the product. They
> are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
> comparing to other vendors. They are slow - not suitable for SQL at all.
>
> I would like to ask all of you who worked with binary objects to share your
> feedback and ideas, so that we understand how they should look like in AI
> 3.0. This is a brain storm - let's accumulate ideas first and minimize
> critics. Then we will work on ideas in separate topics.
>
> 1) Historical background
>
> BO were implemented around 2014 (Apache Ignite 1.5) when we started working
> on .NET and CPP clients. During design we had several ideas in mind:
> - ability to read object fields in O(1) without deserialization
> - interoperabillty between Java, .NET and CPP.
>
> Since then a number of other concepts were mixed to the cocktail:
> - Affinity key fields
> - Strict typing for existing fields (aka metadata)
> - Binary Object as storage format
>
> 2) My proposals
>
> 2.1) Introduce "Data Row Format" interface
> Binary Objects are terrible candidates for storage. Too fat, too slow.
> Efficient storage typically has <10 bytes overhead per row (no metadata, no
> length, no hash code, etc), allow supper-fast field access, support
> different string formats (ASCII, UTF-8, etc), support different temporal
> types (date, time, timestamp, timestamp with timezone, etc), and store
> these types as efficiently as possible.
>
> What we need is to introduce an interface which will convert a pair of
> key-value objects into a row. This row will be used to store data and to
> get fields from it. Care about memory consumption, need SQL and strict
> schema - use one format. Need flexibility and prefer key-value access - use
> another format which will store binary objects unchanged (current
> behavior).
>
> interface DataRowFormat {
> DataRow create(Object key, Object value); // primitives or binary
> objects
> DataRowMetadata metadata();
> }
>
> 2.2) Remove affinity field from metadata
> Affinity rules are governed by cache, not type. We should remove
> "affintiyFieldName" from metadata.
>
> 2.3) Remove restrictions on changing field type
> I do not know why we did that in the first place. This restriction prevents
> type evolution and confuses users.
>
> 2.4) Use bitmaps for "null" and default values and for fixed-length fields,
> put fixed-length fields before variable-length.
> Motivation: to save space.
>
> What else? Please share your ideas.
>
> Vladimir.
>


[IMPORTANT] Future of Binary Objects

2018-11-19 Thread Vladimir Ozerov
Igniters,

It is very likely that Apache Ignite 3.0 will be released next year. So we
need to start thinking about major product improvements. I'd like to start
with binary objects.

Currently they are one of the main limiting factors for the product. They
are fat - 30+ bytes overhead on average, high TCO of Apache Ignite
comparing to other vendors. They are slow - not suitable for SQL at all.

I would like to ask all of you who worked with binary objects to share your
feedback and ideas, so that we understand how they should look like in AI
3.0. This is a brain storm - let's accumulate ideas first and minimize
critics. Then we will work on ideas in separate topics.

1) Historical background

BO were implemented around 2014 (Apache Ignite 1.5) when we started working
on .NET and CPP clients. During design we had several ideas in mind:
- ability to read object fields in O(1) without deserialization
- interoperabillty between Java, .NET and CPP.

Since then a number of other concepts were mixed to the cocktail:
- Affinity key fields
- Strict typing for existing fields (aka metadata)
- Binary Object as storage format

2) My proposals

2.1) Introduce "Data Row Format" interface
Binary Objects are terrible candidates for storage. Too fat, too slow.
Efficient storage typically has <10 bytes overhead per row (no metadata, no
length, no hash code, etc), allow supper-fast field access, support
different string formats (ASCII, UTF-8, etc), support different temporal
types (date, time, timestamp, timestamp with timezone, etc), and store
these types as efficiently as possible.

What we need is to introduce an interface which will convert a pair of
key-value objects into a row. This row will be used to store data and to
get fields from it. Care about memory consumption, need SQL and strict
schema - use one format. Need flexibility and prefer key-value access - use
another format which will store binary objects unchanged (current behavior).

interface DataRowFormat {
DataRow create(Object key, Object value); // primitives or binary
objects
DataRowMetadata metadata();
}

2.2) Remove affinity field from metadata
Affinity rules are governed by cache, not type. We should remove
"affintiyFieldName" from metadata.

2.3) Remove restrictions on changing field type
I do not know why we did that in the first place. This restriction prevents
type evolution and confuses users.

2.4) Use bitmaps for "null" and default values and for fixed-length fields,
put fixed-length fields before variable-length.
Motivation: to save space.

What else? Please share your ideas.

Vladimir.