subject:"Data compression in Ignite 2.0"

Re: Data compression in Ignite 2.0

2017-08-28 Thread Vladimir Ozerov

Hi Vyacheslav,

Yes, I would suggest you to do so.

On Fri, Aug 25, 2017 at 2:51 PM, Vyacheslav Daradur 
wrote:

> Hi, should I close the initial ticket [1] as "Won't Fix" and add link to
> the new discusion about storage compression [2] in comments?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3592
> [2] http://apache-ignite-developers.2346864.n4.nabble.
> com/Data-compression-in-Ignite-td20679.html
>
> 2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur :
>
>> Vladimir, thank you for detailed explanation.
>>
>> I think I've understanded the main idea of described storage compression.
>>
>> I'll join the new discussion after researching of given material and
>> comlpetion of varint-optimization [1].
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-5097
>>
>> 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov :
>>
>>> Vova,
>>>
>>> Finally we back to my initial idea - to look how "big databases compress"
>>> data :)
>>>
>>>
>>> Just to remind how IBM DB2 do this[1].
>>>
>>> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
>>> 1205db210compression/
>>> 
>>>
>>> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov 
>>> wrote:
>>>
>>> > Vyacheslav,
>>> >
>>> > This is not about my needs, but about the product :-) BinaryObject is a
>>> > central entity used for both data transfer and data storage. This is
>>> both
>>> > good and bad at the same time.
>>> >
>>> > Good thing is that as we optimize binary protocol, we improve both
>>> network
>>> > and storage performance at the same time. We have at least 3 things
>>> which
>>> > will be included into the product soon: varint encoding [1], optimized
>>> > string encoding [2] and null-field optimization [3]. Bad thing is that
>>> > binary object format is not well suited for data storage optimizations,
>>> > including compression. For example, one good compression technique is
>>> to
>>> > organize data in column-store format, or to introduce shared
>>> "dictionary"
>>> > with unique values on cache level. In both cases N equal values are not
>>> > stored N times. Instead, we store one value and N references to it, or
>>> so.
>>> > This way 2x-10x compression is possible depending on workload type.
>>> Binary
>>> > object protocol with some compression on top of it cannot give such
>>> > improvement, because it will compress data in individual objects,
>>> instead
>>> > of compressing the whole cache data in a single context.
>>> >
>>> > That said, I propose to give up adding compression to BinaryObject.
>>> This is
>>> > a dead end. Instead, we should:
>>> > 1) Optimize protocol itself to be more compact, as described in
>>> > aforementioned Ignite tickets
>>> > 2) Start new discussion about storage compression
>>> >
>>> > You can read papers of other vendors to get better understanding on
>>> > possible compression options. E.g. Oracle has a lot of compression
>>> > techniques, including heat maps, background compression, per-block
>>> > compression, data dictionaries, etc. [4].
>>> >
>>> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
>>> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
>>> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
>>> > [4] http://www.oracle.com/technetwork/database/options/
>>> > compression/advanced-
>>> > compression-wp-12c-1896128.pdf
>>> >
>>> > Vladimir.
>>> >
>>> >
>>>
>>> --
>>> Alexey Kuznetsov
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.
>>
>
>
>
> --
> Best Regards, Vyacheslav D.
>

Re: Data compression in Ignite 2.0

2017-08-25 Thread Vyacheslav Daradur

Hi, should I close the initial ticket [1] as "Won't Fix" and add link to
the new discusion about storage compression [2] in comments?

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html

2017-08-09 23:05 GMT+03:00 Vyacheslav Daradur :

> Vladimir, thank you for detailed explanation.
>
> I think I've understanded the main idea of described storage compression.
>
> I'll join the new discussion after researching of given material and
> comlpetion of varint-optimization [1].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5097
>
> 2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov :
>
>> Vova,
>>
>> Finally we back to my initial idea - to look how "big databases compress"
>> data :)
>>
>>
>> Just to remind how IBM DB2 do this[1].
>>
>> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
>> 1205db210compression/
>> 
>>
>> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov 
>> wrote:
>>
>> > Vyacheslav,
>> >
>> > This is not about my needs, but about the product :-) BinaryObject is a
>> > central entity used for both data transfer and data storage. This is
>> both
>> > good and bad at the same time.
>> >
>> > Good thing is that as we optimize binary protocol, we improve both
>> network
>> > and storage performance at the same time. We have at least 3 things
>> which
>> > will be included into the product soon: varint encoding [1], optimized
>> > string encoding [2] and null-field optimization [3]. Bad thing is that
>> > binary object format is not well suited for data storage optimizations,
>> > including compression. For example, one good compression technique is to
>> > organize data in column-store format, or to introduce shared
>> "dictionary"
>> > with unique values on cache level. In both cases N equal values are not
>> > stored N times. Instead, we store one value and N references to it, or
>> so.
>> > This way 2x-10x compression is possible depending on workload type.
>> Binary
>> > object protocol with some compression on top of it cannot give such
>> > improvement, because it will compress data in individual objects,
>> instead
>> > of compressing the whole cache data in a single context.
>> >
>> > That said, I propose to give up adding compression to BinaryObject.
>> This is
>> > a dead end. Instead, we should:
>> > 1) Optimize protocol itself to be more compact, as described in
>> > aforementioned Ignite tickets
>> > 2) Start new discussion about storage compression
>> >
>> > You can read papers of other vendors to get better understanding on
>> > possible compression options. E.g. Oracle has a lot of compression
>> > techniques, including heat maps, background compression, per-block
>> > compression, data dictionaries, etc. [4].
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
>> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
>> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
>> > [4] http://www.oracle.com/technetwork/database/options/
>> > compression/advanced-
>> > compression-wp-12c-1896128.pdf
>> >
>> > Vladimir.
>> >
>> >
>>
>> --
>> Alexey Kuznetsov
>>
>
>
>
> --
> Best Regards, Vyacheslav D.
>



-- 
Best Regards, Vyacheslav D.

Re: Data compression in Ignite 2.0

2017-08-09 Thread Vyacheslav Daradur

Vladimir, thank you for detailed explanation.

I think I've understanded the main idea of described storage compression.

I'll join the new discussion after researching of given material and
comlpetion of varint-optimization [1].

[1] https://issues.apache.org/jira/browse/IGNITE-5097

2017-08-02 15:43 GMT+03:00 Alexey Kuznetsov :

> Vova,
>
> Finally we back to my initial idea - to look how "big databases compress"
> data :)
>
>
> Just to remind how IBM DB2 do this[1].
>
> [1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
> 1205db210compression/
>
> On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov 
> wrote:
>
> > Vyacheslav,
> >
> > This is not about my needs, but about the product :-) BinaryObject is a
> > central entity used for both data transfer and data storage. This is both
> > good and bad at the same time.
> >
> > Good thing is that as we optimize binary protocol, we improve both
> network
> > and storage performance at the same time. We have at least 3 things which
> > will be included into the product soon: varint encoding [1], optimized
> > string encoding [2] and null-field optimization [3]. Bad thing is that
> > binary object format is not well suited for data storage optimizations,
> > including compression. For example, one good compression technique is to
> > organize data in column-store format, or to introduce shared "dictionary"
> > with unique values on cache level. In both cases N equal values are not
> > stored N times. Instead, we store one value and N references to it, or
> so.
> > This way 2x-10x compression is possible depending on workload type.
> Binary
> > object protocol with some compression on top of it cannot give such
> > improvement, because it will compress data in individual objects, instead
> > of compressing the whole cache data in a single context.
> >
> > That said, I propose to give up adding compression to BinaryObject. This
> is
> > a dead end. Instead, we should:
> > 1) Optimize protocol itself to be more compact, as described in
> > aforementioned Ignite tickets
> > 2) Start new discussion about storage compression
> >
> > You can read papers of other vendors to get better understanding on
> > possible compression options. E.g. Oracle has a lot of compression
> > techniques, including heat maps, background compression, per-block
> > compression, data dictionaries, etc. [4].
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-5097
> > [2] https://issues.apache.org/jira/browse/IGNITE-5655
> > [3] https://issues.apache.org/jira/browse/IGNITE-3939
> > [4] http://www.oracle.com/technetwork/database/options/
> > compression/advanced-
> > compression-wp-12c-1896128.pdf
> >
> > Vladimir.
> >
> >
>
> --
> Alexey Kuznetsov
>



-- 
Best Regards, Vyacheslav D.

Re: Data compression in Ignite 2.0

2017-08-02 Thread Alexey Kuznetsov

Vova,

Finally we back to my initial idea - to look how "big databases compress"
data :)


Just to remind how IBM DB2 do this[1].

[1] http://www.ibm.com/developerworks/data/library/techarticle/dm-
1205db210compression/

On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov 
wrote:

> Vyacheslav,
>
> This is not about my needs, but about the product :-) BinaryObject is a
> central entity used for both data transfer and data storage. This is both
> good and bad at the same time.
>
> Good thing is that as we optimize binary protocol, we improve both network
> and storage performance at the same time. We have at least 3 things which
> will be included into the product soon: varint encoding [1], optimized
> string encoding [2] and null-field optimization [3]. Bad thing is that
> binary object format is not well suited for data storage optimizations,
> including compression. For example, one good compression technique is to
> organize data in column-store format, or to introduce shared "dictionary"
> with unique values on cache level. In both cases N equal values are not
> stored N times. Instead, we store one value and N references to it, or so.
> This way 2x-10x compression is possible depending on workload type. Binary
> object protocol with some compression on top of it cannot give such
> improvement, because it will compress data in individual objects, instead
> of compressing the whole cache data in a single context.
>
> That said, I propose to give up adding compression to BinaryObject. This is
> a dead end. Instead, we should:
> 1) Optimize protocol itself to be more compact, as described in
> aforementioned Ignite tickets
> 2) Start new discussion about storage compression
>
> You can read papers of other vendors to get better understanding on
> possible compression options. E.g. Oracle has a lot of compression
> techniques, including heat maps, background compression, per-block
> compression, data dictionaries, etc. [4].
>
> [1] https://issues.apache.org/jira/browse/IGNITE-5097
> [2] https://issues.apache.org/jira/browse/IGNITE-5655
> [3] https://issues.apache.org/jira/browse/IGNITE-3939
> [4] http://www.oracle.com/technetwork/database/options/
> compression/advanced-
> compression-wp-12c-1896128.pdf
>
> Vladimir.
>
>

-- 
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2017-08-01 Thread dsetrakyan

I would prefer that we reuse an existing compression protocol, but at the table 
level.

If not possible, then we should go with a shared mapping approach. Any idea how 
hard?

⁣D.

On Aug 1, 2017, 11:15 AM, at 11:15 AM, Vladimir Ozerov  
wrote:
>Vyacheslav,
>
>This is not about my needs, but about the product :-) BinaryObject is a
>central entity used for both data transfer and data storage. This is
>both
>good and bad at the same time.
>
>Good thing is that as we optimize binary protocol, we improve both
>network
>and storage performance at the same time. We have at least 3 things
>which
>will be included into the product soon: varint encoding [1], optimized
>string encoding [2] and null-field optimization [3]. Bad thing is that
>binary object format is not well suited for data storage optimizations,
>including compression. For example, one good compression technique is
>to
>organize data in column-store format, or to introduce shared
>"dictionary"
>with unique values on cache level. In both cases N equal values are not
>stored N times. Instead, we store one value and N references to it, or
>so.
>This way 2x-10x compression is possible depending on workload type.
>Binary
>object protocol with some compression on top of it cannot give such
>improvement, because it will compress data in individual objects,
>instead
>of compressing the whole cache data in a single context.
>
>That said, I propose to give up adding compression to BinaryObject.
>This is
>a dead end. Instead, we should:
>1) Optimize protocol itself to be more compact, as described in
>aforementioned Ignite tickets
>2) Start new discussion about storage compression
>
>You can read papers of other vendors to get better understanding on
>possible compression options. E.g. Oracle has a lot of compression
>techniques, including heat maps, background compression, per-block
>compression, data dictionaries, etc. [4].
>
>[1] https://issues.apache.org/jira/browse/IGNITE-5097
>[2] https://issues.apache.org/jira/browse/IGNITE-5655
>[3] https://issues.apache.org/jira/browse/IGNITE-3939
>[4]
>http://www.oracle.com/technetwork/database/options/compression/advanced-
>compression-wp-12c-1896128.pdf
>
>Vladimir.
>
>
>On Tue, Jul 11, 2017 at 6:56 PM, Vyacheslav Daradur
>
>wrote:
>
>> Hi Igniters!
>>
>> I'd like to continue developing and discussing about compression in
>Ignite.
>>
>> Vladimir, could you propose a design of compression feature in
>Ignite,
>> that suits you?
>>
>> 2017-06-15 16:13 GMT+03:00 Vyacheslav Daradur :
>>
>>> Hi Igniters.
>>>
>>> Vladimir, I want to propose another design of an implementation of
>the
>>> per-field compression.
>>>
>>> 1) We will add new step in the method prepareForCache (for example)
>of
>>> CacheObject, or in GridCacheMapEntry.
>>>
>>> At the step, after marshalling of an object, we will compress fields
>of
>>> the object which described in advance.
>>> User will describe class fields which he wants to compess in an
>another
>>> entity like Metadata.
>>>
>>> For compression, we will introduce another entity, for example
>>> CompressionProcessor, which will work with bytes array (marshalled
>object).
>>> The entity will read bytes array of described fields, compress it
>and
>>> rewrite binary representation of the whole object.
>>> After processing the object will be put in the cache.
>>>
>>> In this case design not to relate to binary infrastructure.
>>> But there is big overhead to heap-memory for the buffer.
>>>
>>> 2) Another solution is to compress bytes array of whole object on
>copying
>>> to off-heap.
>>> But, in this case I don't understand yet, how to provide support of
>>> querying and indexing.
>>>
>>>
>>> 2017-06-09 11:21 GMT+03:00 Sergey Kozlov :
>>>
 Hi

 * "Per-field compression" is applicable for huge BLOB fields and
>will
 impose the restrictions like unable ot index such fields, slower
>getting
 data, potential OOM issues if compression ration is too high.
 But for some cases it makes sense

 On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев
>
 wrote:

 > Seems that Dmitry is referring to transparent data encryption. It
>is
 used
 > throughout the whale database industry.
 >
 > 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov
>:
 >
 > > Dima,
 > >
 > > Encryption of certain fields is as bad as compression. First,
>it is a
 > huge
 > > change, which makes already complex binary protocol even more
 complex.
 > > Second, it have to be ported to CPP, .NET platforms, as well as
>to
 JDBC
 > and
 > > ODBC.
 > > Last, but the most important - this is not our headache to
>encrypt
 > > sensitive data. This is user responsibility. Nobody in a sane
>mind
 will
 > > store passwords in plain form. Instead, user should encrypt it
>on his
 > own,
 > > choosing proper

Re: Data compression in Ignite 2.0

2017-08-01 Thread Vladimir Ozerov

Vyacheslav,

This is not about my needs, but about the product :-) BinaryObject is a
central entity used for both data transfer and data storage. This is both
good and bad at the same time.

Good thing is that as we optimize binary protocol, we improve both network
and storage performance at the same time. We have at least 3 things which
will be included into the product soon: varint encoding [1], optimized
string encoding [2] and null-field optimization [3]. Bad thing is that
binary object format is not well suited for data storage optimizations,
including compression. For example, one good compression technique is to
organize data in column-store format, or to introduce shared "dictionary"
with unique values on cache level. In both cases N equal values are not
stored N times. Instead, we store one value and N references to it, or so.
This way 2x-10x compression is possible depending on workload type. Binary
object protocol with some compression on top of it cannot give such
improvement, because it will compress data in individual objects, instead
of compressing the whole cache data in a single context.

That said, I propose to give up adding compression to BinaryObject. This is
a dead end. Instead, we should:
1) Optimize protocol itself to be more compact, as described in
aforementioned Ignite tickets
2) Start new discussion about storage compression

You can read papers of other vendors to get better understanding on
possible compression options. E.g. Oracle has a lot of compression
techniques, including heat maps, background compression, per-block
compression, data dictionaries, etc. [4].

[1] https://issues.apache.org/jira/browse/IGNITE-5097
[2] https://issues.apache.org/jira/browse/IGNITE-5655
[3] https://issues.apache.org/jira/browse/IGNITE-3939
[4] http://www.oracle.com/technetwork/database/options/compression/advanced-
compression-wp-12c-1896128.pdf

Vladimir.

On Tue, Jul 11, 2017 at 6:56 PM, Vyacheslav Daradur 
wrote:

> Hi Igniters!
>
> I'd like to continue developing and discussing about compression in Ignite.
>
> Vladimir, could you propose a design of compression feature in Ignite,
> that suits you?
>
> 2017-06-15 16:13 GMT+03:00 Vyacheslav Daradur :
>
>> Hi Igniters.
>>
>> Vladimir, I want to propose another design of an implementation of the
>> per-field compression.
>>
>> 1) We will add new step in the method prepareForCache (for example) of
>> CacheObject, or in GridCacheMapEntry.
>>
>> At the step, after marshalling of an object, we will compress fields of
>> the object which described in advance.
>> User will describe class fields which he wants to compess in an another
>> entity like Metadata.
>>
>> For compression, we will introduce another entity, for example
>> CompressionProcessor, which will work with bytes array (marshalled object).
>> The entity will read bytes array of described fields, compress it and
>> rewrite binary representation of the whole object.
>> After processing the object will be put in the cache.
>>
>> In this case design not to relate to binary infrastructure.
>> But there is big overhead to heap-memory for the buffer.
>>
>> 2) Another solution is to compress bytes array of whole object on copying
>> to off-heap.
>> But, in this case I don't understand yet, how to provide support of
>> querying and indexing.
>>
>>
>> 2017-06-09 11:21 GMT+03:00 Sergey Kozlov :
>>
>>> Hi
>>>
>>> * "Per-field compression" is applicable for huge BLOB fields and will
>>> impose the restrictions like unable ot index such fields, slower getting
>>> data, potential OOM issues if compression ration is too high.
>>> But for some cases it makes sense
>>>
>>> On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев 
>>> wrote:
>>>
>>> > Seems that Dmitry is referring to transparent data encryption. It is
>>> used
>>> > throughout the whale database industry.
>>> >
>>> > 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :
>>> >
>>> > > Dima,
>>> > >
>>> > > Encryption of certain fields is as bad as compression. First, it is a
>>> > huge
>>> > > change, which makes already complex binary protocol even more
>>> complex.
>>> > > Second, it have to be ported to CPP, .NET platforms, as well as to
>>> JDBC
>>> > and
>>> > > ODBC.
>>> > > Last, but the most important - this is not our headache to encrypt
>>> > > sensitive data. This is user responsibility. Nobody in a sane mind
>>> will
>>> > > store passwords in plain form. Instead, user should encrypt it on his
>>> > own,
>>> > > choosing proper encryption parameters - algorithms, key lengths,
>>> salts,
>>> > > etc.. How are you going to expose this in API or configuration?
>>> > >
>>> > > We should not implement data encryption on binary level, this is out
>>> of
>>> > > question. Encryption should be implemented on application level (user
>>> > > efforts), transport layer (SSL - we already have it), and possibly on
>>> > > disk-level (there are tools

Re: Data compression in Ignite 2.0

2017-06-15 Thread Vyacheslav Daradur

Hi Igniters.

Vladimir, I want to propose another design of an implementation of the
per-field compression.

1) We will add new step in the method prepareForCache (for example) of
CacheObject, or in GridCacheMapEntry.

At the step, after marshalling of an object, we will compress fields of the
object which described in advance.
User will describe class fields which he wants to compess in an another
entity like Metadata.

For compression, we will introduce another entity, for example
CompressionProcessor, which will work with bytes array (marshalled object).
The entity will read bytes array of described fields, compress it and
rewrite binary representation of the whole object.
After processing the object will be put in the cache.

In this case design not to relate to binary infrastructure.
But there is big overhead to heap-memory for the buffer.

2) Another solution is to compress bytes array of whole object on copying
to off-heap.
But, in this case I don't understand yet, how to provide support of
querying and indexing.


2017-06-09 11:21 GMT+03:00 Sergey Kozlov :

> Hi
>
> * "Per-field compression" is applicable for huge BLOB fields and will
> impose the restrictions like unable ot index such fields, slower getting
> data, potential OOM issues if compression ration is too high.
> But for some cases it makes sense
>
> On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев 
> wrote:
>
> > Seems that Dmitry is referring to transparent data encryption. It is used
> > throughout the whale database industry.
> >
> > 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :
> >
> > > Dima,
> > >
> > > Encryption of certain fields is as bad as compression. First, it is a
> > huge
> > > change, which makes already complex binary protocol even more complex.
> > > Second, it have to be ported to CPP, .NET platforms, as well as to JDBC
> > and
> > > ODBC.
> > > Last, but the most important - this is not our headache to encrypt
> > > sensitive data. This is user responsibility. Nobody in a sane mind will
> > > store passwords in plain form. Instead, user should encrypt it on his
> > own,
> > > choosing proper encryption parameters - algorithms, key lengths, salts,
> > > etc.. How are you going to expose this in API or configuration?
> > >
> > > We should not implement data encryption on binary level, this is out of
> > > question. Encryption should be implemented on application level (user
> > > efforts), transport layer (SSL - we already have it), and possibly on
> > > disk-level (there are tools for this already).
> > >
> > >
> > > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur <
> daradu...@gmail.com>
> > > wrote:
> > >
> > > > >> which is much less useful.
> > > > I note, in some cases there is profit more than twice per size of an
> > > > object.
> > > >
> > > > >> Would it be possible to change your implementation to handle the
> > > > encryption instead?
> > > > Yes, of cource, there's not much difference between compression and
> > > > encryption, including in my implementation of per-field-compression.
> > > >
> > > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > When this feature started out as data compression in Ignite, it
> > sounded
> > > > > very useful. Now it is unfolding as a per-field compression, which
> is
> > > > much
> > > > > less useful. In fact, it is questionable whether it is useful at
> all.
> > > The
> > > > > fact that this feature is implemented does not make it mandatory
> for
> > > the
> > > > > community to accept it.
> > > > >
> > > > > However, as I mentioned before, per-field encryption is very
> useful,
> > as
> > > > it
> > > > > would allow users automatically encrypt certain sensitive fields,
> > like
> > > > > passwords, credit card numbers, etc. There is not much conceptual
> > > > > difference between compressing a field vs encrypting a field. Would
> > it
> > > be
> > > > > possible to change your implementation to handle the encryption
> > > instead?
> > > > >
> > > > > D.
> > > > >
> > > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Guys, I want to be clear:
> > > > > > * "Per-field compression" design is the result of a research of
> the
> > > > > binary
> > > > > > infrastructure of Ignite and some other its places (querying,
> > > indexing,
> > > > > > etc.)
> > > > > > * Full-compression of object will be more effective, but in this
> > case
> > > > > there
> > > > > > is no capability with querying and indexing (or there is large
> > > overhead
> > > > > by
> > > > > > way of decompressing of full object (or caches pages) on demand)
> > > > > > * "Per-field compression" is a one of ways to implement the
> > > compression
> > > > > > feature
> > > > > >
> > > > > > I'm new to Ignite also I can be mistaken in some things.
> > > > > > Last 3-4 month I've tryed to

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergey Kozlov

Hi

* "Per-field compression" is applicable for huge BLOB fields and will
impose the restrictions like unable ot index such fields, slower getting
data, potential OOM issues if compression ration is too high.
But for some cases it makes sense

On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев  wrote:

> Seems that Dmitry is referring to transparent data encryption. It is used
> throughout the whale database industry.
>
> 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :
>
> > Dima,
> >
> > Encryption of certain fields is as bad as compression. First, it is a
> huge
> > change, which makes already complex binary protocol even more complex.
> > Second, it have to be ported to CPP, .NET platforms, as well as to JDBC
> and
> > ODBC.
> > Last, but the most important - this is not our headache to encrypt
> > sensitive data. This is user responsibility. Nobody in a sane mind will
> > store passwords in plain form. Instead, user should encrypt it on his
> own,
> > choosing proper encryption parameters - algorithms, key lengths, salts,
> > etc.. How are you going to expose this in API or configuration?
> >
> > We should not implement data encryption on binary level, this is out of
> > question. Encryption should be implemented on application level (user
> > efforts), transport layer (SSL - we already have it), and possibly on
> > disk-level (there are tools for this already).
> >
> >
> > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
> > wrote:
> >
> > > >> which is much less useful.
> > > I note, in some cases there is profit more than twice per size of an
> > > object.
> > >
> > > >> Would it be possible to change your implementation to handle the
> > > encryption instead?
> > > Yes, of cource, there's not much difference between compression and
> > > encryption, including in my implementation of per-field-compression.
> > >
> > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Vyacheslav,
> > > >
> > > > When this feature started out as data compression in Ignite, it
> sounded
> > > > very useful. Now it is unfolding as a per-field compression, which is
> > > much
> > > > less useful. In fact, it is questionable whether it is useful at all.
> > The
> > > > fact that this feature is implemented does not make it mandatory for
> > the
> > > > community to accept it.
> > > >
> > > > However, as I mentioned before, per-field encryption is very useful,
> as
> > > it
> > > > would allow users automatically encrypt certain sensitive fields,
> like
> > > > passwords, credit card numbers, etc. There is not much conceptual
> > > > difference between compressing a field vs encrypting a field. Would
> it
> > be
> > > > possible to change your implementation to handle the encryption
> > instead?
> > > >
> > > > D.
> > > >
> > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Guys, I want to be clear:
> > > > > * "Per-field compression" design is the result of a research of the
> > > > binary
> > > > > infrastructure of Ignite and some other its places (querying,
> > indexing,
> > > > > etc.)
> > > > > * Full-compression of object will be more effective, but in this
> case
> > > > there
> > > > > is no capability with querying and indexing (or there is large
> > overhead
> > > > by
> > > > > way of decompressing of full object (or caches pages) on demand)
> > > > > * "Per-field compression" is a one of ways to implement the
> > compression
> > > > > feature
> > > > >
> > > > > I'm new to Ignite also I can be mistaken in some things.
> > > > > Last 3-4 month I've tryed to start dicussion about a design, but
> > nobody
> > > > > answers nothing (except Dmitry and Valentin who was interested how
> it
> > > > > works).
> > > > > But I understand that this is community and nobody is obliged to
> > > anybody.
> > > > >
> > > > > There are strong Ignite experts.
> > > > > If they can help me and community with a design of the compression
> > > > feature
> > > > > it will be great.
> > > > > At the moment I have a desire and time to be engaged in development
> > of
> > > > > compression feature in Ignite.
> > > > > Let's use this opportunity :)
> > > > >
> > > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan  >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I have never seen a single Ignite user asking about compressing a
> > > > single
> > > > > > field. However, we have had requests to secure certain fields,
> e.g.
> > > > > > passwords.
> > > > > >
> > > > > > I personally do not think per-field compression is needed, unless
> > we
> > > > can
> > > > > > point out some concrete real life use cases.
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> > > > daradu...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > >> I thought that if

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergey Kozlov

Hi

* "Per-field compression" is applicable for huge BLOB fields and will
impose the restrictions like unable ot index such fields, slower getting
data, potential OOM issues if compression ration is too high.
But for some cases it makes sense

On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев  wrote:

> Seems that Dmitry is referring to transparent data encryption. It is used
> throughout the whale database industry.
>
> 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :
>
> > Dima,
> >
> > Encryption of certain fields is as bad as compression. First, it is a
> huge
> > change, which makes already complex binary protocol even more complex.
> > Second, it have to be ported to CPP, .NET platforms, as well as to JDBC
> and
> > ODBC.
> > Last, but the most important - this is not our headache to encrypt
> > sensitive data. This is user responsibility. Nobody in a sane mind will
> > store passwords in plain form. Instead, user should encrypt it on his
> own,
> > choosing proper encryption parameters - algorithms, key lengths, salts,
> > etc.. How are you going to expose this in API or configuration?
> >
> > We should not implement data encryption on binary level, this is out of
> > question. Encryption should be implemented on application level (user
> > efforts), transport layer (SSL - we already have it), and possibly on
> > disk-level (there are tools for this already).
> >
> >
> > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
> > wrote:
> >
> > > >> which is much less useful.
> > > I note, in some cases there is profit more than twice per size of an
> > > object.
> > >
> > > >> Would it be possible to change your implementation to handle the
> > > encryption instead?
> > > Yes, of cource, there's not much difference between compression and
> > > encryption, including in my implementation of per-field-compression.
> > >
> > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Vyacheslav,
> > > >
> > > > When this feature started out as data compression in Ignite, it
> sounded
> > > > very useful. Now it is unfolding as a per-field compression, which is
> > > much
> > > > less useful. In fact, it is questionable whether it is useful at all.
> > The
> > > > fact that this feature is implemented does not make it mandatory for
> > the
> > > > community to accept it.
> > > >
> > > > However, as I mentioned before, per-field encryption is very useful,
> as
> > > it
> > > > would allow users automatically encrypt certain sensitive fields,
> like
> > > > passwords, credit card numbers, etc. There is not much conceptual
> > > > difference between compressing a field vs encrypting a field. Would
> it
> > be
> > > > possible to change your implementation to handle the encryption
> > instead?
> > > >
> > > > D.
> > > >
> > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Guys, I want to be clear:
> > > > > * "Per-field compression" design is the result of a research of the
> > > > binary
> > > > > infrastructure of Ignite and some other its places (querying,
> > indexing,
> > > > > etc.)
> > > > > * Full-compression of object will be more effective, but in this
> case
> > > > there
> > > > > is no capability with querying and indexing (or there is large
> > overhead
> > > > by
> > > > > way of decompressing of full object (or caches pages) on demand)
> > > > > * "Per-field compression" is a one of ways to implement the
> > compression
> > > > > feature
> > > > >
> > > > > I'm new to Ignite also I can be mistaken in some things.
> > > > > Last 3-4 month I've tryed to start dicussion about a design, but
> > nobody
> > > > > answers nothing (except Dmitry and Valentin who was interested how
> it
> > > > > works).
> > > > > But I understand that this is community and nobody is obliged to
> > > anybody.
> > > > >
> > > > > There are strong Ignite experts.
> > > > > If they can help me and community with a design of the compression
> > > > feature
> > > > > it will be great.
> > > > > At the moment I have a desire and time to be engaged in development
> > of
> > > > > compression feature in Ignite.
> > > > > Let's use this opportunity :)
> > > > >
> > > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan  >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I have never seen a single Ignite user asking about compressing a
> > > > single
> > > > > > field. However, we have had requests to secure certain fields,
> e.g.
> > > > > > passwords.
> > > > > >
> > > > > > I personally do not think per-field compression is needed, unless
> > we
> > > > can
> > > > > > point out some concrete real life use cases.
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> > > > daradu...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > >> I thought that if

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergi Vladykin

+1 to Vladimir. Fields encryption is a user responsibility. I see no reason
to introduce additional complexity to Ignite.

Sergi

2017-06-09 11:11 GMT+03:00 Антон Чураев :

> Seems that Dmitry is referring to transparent data encryption. It is used
> throughout the whale database industry.
>
> 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :
>
> > Dima,
> >
> > Encryption of certain fields is as bad as compression. First, it is a
> huge
> > change, which makes already complex binary protocol even more complex.
> > Second, it have to be ported to CPP, .NET platforms, as well as to JDBC
> and
> > ODBC.
> > Last, but the most important - this is not our headache to encrypt
> > sensitive data. This is user responsibility. Nobody in a sane mind will
> > store passwords in plain form. Instead, user should encrypt it on his
> own,
> > choosing proper encryption parameters - algorithms, key lengths, salts,
> > etc.. How are you going to expose this in API or configuration?
> >
> > We should not implement data encryption on binary level, this is out of
> > question. Encryption should be implemented on application level (user
> > efforts), transport layer (SSL - we already have it), and possibly on
> > disk-level (there are tools for this already).
> >
> >
> > On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
> > wrote:
> >
> > > >> which is much less useful.
> > > I note, in some cases there is profit more than twice per size of an
> > > object.
> > >
> > > >> Would it be possible to change your implementation to handle the
> > > encryption instead?
> > > Yes, of cource, there's not much difference between compression and
> > > encryption, including in my implementation of per-field-compression.
> > >
> > > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Vyacheslav,
> > > >
> > > > When this feature started out as data compression in Ignite, it
> sounded
> > > > very useful. Now it is unfolding as a per-field compression, which is
> > > much
> > > > less useful. In fact, it is questionable whether it is useful at all.
> > The
> > > > fact that this feature is implemented does not make it mandatory for
> > the
> > > > community to accept it.
> > > >
> > > > However, as I mentioned before, per-field encryption is very useful,
> as
> > > it
> > > > would allow users automatically encrypt certain sensitive fields,
> like
> > > > passwords, credit card numbers, etc. There is not much conceptual
> > > > difference between compressing a field vs encrypting a field. Would
> it
> > be
> > > > possible to change your implementation to handle the encryption
> > instead?
> > > >
> > > > D.
> > > >
> > > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Guys, I want to be clear:
> > > > > * "Per-field compression" design is the result of a research of the
> > > > binary
> > > > > infrastructure of Ignite and some other its places (querying,
> > indexing,
> > > > > etc.)
> > > > > * Full-compression of object will be more effective, but in this
> case
> > > > there
> > > > > is no capability with querying and indexing (or there is large
> > overhead
> > > > by
> > > > > way of decompressing of full object (or caches pages) on demand)
> > > > > * "Per-field compression" is a one of ways to implement the
> > compression
> > > > > feature
> > > > >
> > > > > I'm new to Ignite also I can be mistaken in some things.
> > > > > Last 3-4 month I've tryed to start dicussion about a design, but
> > nobody
> > > > > answers nothing (except Dmitry and Valentin who was interested how
> it
> > > > > works).
> > > > > But I understand that this is community and nobody is obliged to
> > > anybody.
> > > > >
> > > > > There are strong Ignite experts.
> > > > > If they can help me and community with a design of the compression
> > > > feature
> > > > > it will be great.
> > > > > At the moment I have a desire and time to be engaged in development
> > of
> > > > > compression feature in Ignite.
> > > > > Let's use this opportunity :)
> > > > >
> > > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan  >:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I have never seen a single Ignite user asking about compressing a
> > > > single
> > > > > > field. However, we have had requests to secure certain fields,
> e.g.
> > > > > > passwords.
> > > > > >
> > > > > > I personally do not think per-field compression is needed, unless
> > we
> > > > can
> > > > > > point out some concrete real life use cases.
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> > > > daradu...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Anton,
> > > > > > >
> > > > > > > >> I thought that if there will storing compressed data in the
> > > > memory,
> > > > > > data
> > > > > > > >> will transmit over wire in

Re: Data compression in Ignite 2.0

2017-06-09 Thread Антон Чураев

Seems that Dmitry is referring to transparent data encryption. It is used
throughout the whale database industry.

2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :

> Dima,
>
> Encryption of certain fields is as bad as compression. First, it is a huge
> change, which makes already complex binary protocol even more complex.
> Second, it have to be ported to CPP, .NET platforms, as well as to JDBC and
> ODBC.
> Last, but the most important - this is not our headache to encrypt
> sensitive data. This is user responsibility. Nobody in a sane mind will
> store passwords in plain form. Instead, user should encrypt it on his own,
> choosing proper encryption parameters - algorithms, key lengths, salts,
> etc.. How are you going to expose this in API or configuration?
>
> We should not implement data encryption on binary level, this is out of
> question. Encryption should be implemented on application level (user
> efforts), transport layer (SSL - we already have it), and possibly on
> disk-level (there are tools for this already).
>
>
> On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
> wrote:
>
> > >> which is much less useful.
> > I note, in some cases there is profit more than twice per size of an
> > object.
> >
> > >> Would it be possible to change your implementation to handle the
> > encryption instead?
> > Yes, of cource, there's not much difference between compression and
> > encryption, including in my implementation of per-field-compression.
> >
> > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Vyacheslav,
> > >
> > > When this feature started out as data compression in Ignite, it sounded
> > > very useful. Now it is unfolding as a per-field compression, which is
> > much
> > > less useful. In fact, it is questionable whether it is useful at all.
> The
> > > fact that this feature is implemented does not make it mandatory for
> the
> > > community to accept it.
> > >
> > > However, as I mentioned before, per-field encryption is very useful, as
> > it
> > > would allow users automatically encrypt certain sensitive fields, like
> > > passwords, credit card numbers, etc. There is not much conceptual
> > > difference between compressing a field vs encrypting a field. Would it
> be
> > > possible to change your implementation to handle the encryption
> instead?
> > >
> > > D.
> > >
> > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > > Guys, I want to be clear:
> > > > * "Per-field compression" design is the result of a research of the
> > > binary
> > > > infrastructure of Ignite and some other its places (querying,
> indexing,
> > > > etc.)
> > > > * Full-compression of object will be more effective, but in this case
> > > there
> > > > is no capability with querying and indexing (or there is large
> overhead
> > > by
> > > > way of decompressing of full object (or caches pages) on demand)
> > > > * "Per-field compression" is a one of ways to implement the
> compression
> > > > feature
> > > >
> > > > I'm new to Ignite also I can be mistaken in some things.
> > > > Last 3-4 month I've tryed to start dicussion about a design, but
> nobody
> > > > answers nothing (except Dmitry and Valentin who was interested how it
> > > > works).
> > > > But I understand that this is community and nobody is obliged to
> > anybody.
> > > >
> > > > There are strong Ignite experts.
> > > > If they can help me and community with a design of the compression
> > > feature
> > > > it will be great.
> > > > At the moment I have a desire and time to be engaged in development
> of
> > > > compression feature in Ignite.
> > > > Let's use this opportunity :)
> > > >
> > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :
> > > >
> > > > > Igniters,
> > > > >
> > > > > I have never seen a single Ignite user asking about compressing a
> > > single
> > > > > field. However, we have had requests to secure certain fields, e.g.
> > > > > passwords.
> > > > >
> > > > > I personally do not think per-field compression is needed, unless
> we
> > > can
> > > > > point out some concrete real life use cases.
> > > > >
> > > > > D.
> > > > >
> > > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> > > daradu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Anton,
> > > > > >
> > > > > > >> I thought that if there will storing compressed data in the
> > > memory,
> > > > > data
> > > > > > >> will transmit over wire in compression too. Is it right?
> > > > > >
> > > > > > In per-field compression case - yes.
> > > > > >
> > > > > > 2017-06-08 13:36 GMT+03:00 Антон Чураев :
> > > > > >
> > > > > > > Guys, could you please help me.
> > > > > > > I thought that if there will storing compressed data in the
> > memory,
> > > > > data
> > > > > > > will transmit over wire in compression too. Is it right?
> > > > > > >
> > > > > > > 2017-06-08 13:30 GMT+03:00

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vyacheslav Daradur

Vladimir,

>>  Nobody in a sane mind will
>> store passwords in plain form. Instead, user should encrypt it on his
own,
>> choosing proper encryption parameters - algorithms, key lengths, salts,
etc..
Sounds reasonable to me.
But if someone want to have this feature OOTB we can continue discussion,
may implement it in some other way.

>> How are you going to expose this in API or configuration?
Just for example: we can provide a plugable interface in the
IgniteConfiguration (or other place), which user will able to implement.

About compression, you wrote:
>> 2) Storing data in memory - here the much simpler step would be to full
>> compression on per-cache basis rather than dealing with per-fields case.
Could you explain your idea? How we can implement it and how it will able
to be compatible with querying and indexing?
Thanks in advance.


2017-06-09 10:50 GMT+03:00 Vladimir Ozerov :

> Dima,
>
> Encryption of certain fields is as bad as compression. First, it is a huge
> change, which makes already complex binary protocol even more complex.
> Second, it have to be ported to CPP, .NET platforms, as well as to JDBC and
> ODBC.
> Last, but the most important - this is not our headache to encrypt
> sensitive data. This is user responsibility. Nobody in a sane mind will
> store passwords in plain form. Instead, user should encrypt it on his own,
> choosing proper encryption parameters - algorithms, key lengths, salts,
> etc.. How are you going to expose this in API or configuration?
>
> We should not implement data encryption on binary level, this is out of
> question. Encryption should be implemented on application level (user
> efforts), transport layer (SSL - we already have it), and possibly on
> disk-level (there are tools for this already).
>
>
> On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
> wrote:
>
> > >> which is much less useful.
> > I note, in some cases there is profit more than twice per size of an
> > object.
> >
> > >> Would it be possible to change your implementation to handle the
> > encryption instead?
> > Yes, of cource, there's not much difference between compression and
> > encryption, including in my implementation of per-field-compression.
> >
> > 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Vyacheslav,
> > >
> > > When this feature started out as data compression in Ignite, it sounded
> > > very useful. Now it is unfolding as a per-field compression, which is
> > much
> > > less useful. In fact, it is questionable whether it is useful at all.
> The
> > > fact that this feature is implemented does not make it mandatory for
> the
> > > community to accept it.
> > >
> > > However, as I mentioned before, per-field encryption is very useful, as
> > it
> > > would allow users automatically encrypt certain sensitive fields, like
> > > passwords, credit card numbers, etc. There is not much conceptual
> > > difference between compressing a field vs encrypting a field. Would it
> be
> > > possible to change your implementation to handle the encryption
> instead?
> > >
> > > D.
> > >
> > > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > > Guys, I want to be clear:
> > > > * "Per-field compression" design is the result of a research of the
> > > binary
> > > > infrastructure of Ignite and some other its places (querying,
> indexing,
> > > > etc.)
> > > > * Full-compression of object will be more effective, but in this case
> > > there
> > > > is no capability with querying and indexing (or there is large
> overhead
> > > by
> > > > way of decompressing of full object (or caches pages) on demand)
> > > > * "Per-field compression" is a one of ways to implement the
> compression
> > > > feature
> > > >
> > > > I'm new to Ignite also I can be mistaken in some things.
> > > > Last 3-4 month I've tryed to start dicussion about a design, but
> nobody
> > > > answers nothing (except Dmitry and Valentin who was interested how it
> > > > works).
> > > > But I understand that this is community and nobody is obliged to
> > anybody.
> > > >
> > > > There are strong Ignite experts.
> > > > If they can help me and community with a design of the compression
> > > feature
> > > > it will be great.
> > > > At the moment I have a desire and time to be engaged in development
> of
> > > > compression feature in Ignite.
> > > > Let's use this opportunity :)
> > > >
> > > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :
> > > >
> > > > > Igniters,
> > > > >
> > > > > I have never seen a single Ignite user asking about compressing a
> > > single
> > > > > field. However, we have had requests to secure certain fields, e.g.
> > > > > passwords.
> > > > >
> > > > > I personally do not think per-field compression is needed, unless
> we
> > > can
> > > > > point out some concrete real life use cases.
> > > > >
> > > > > D.
> > > > >
> > > > > On Thu, Jun 8,

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vladimir Ozerov

Dima,

Encryption of certain fields is as bad as compression. First, it is a huge
change, which makes already complex binary protocol even more complex.
Second, it have to be ported to CPP, .NET platforms, as well as to JDBC and
ODBC.
Last, but the most important - this is not our headache to encrypt
sensitive data. This is user responsibility. Nobody in a sane mind will
store passwords in plain form. Instead, user should encrypt it on his own,
choosing proper encryption parameters - algorithms, key lengths, salts,
etc.. How are you going to expose this in API or configuration?

We should not implement data encryption on binary level, this is out of
question. Encryption should be implemented on application level (user
efforts), transport layer (SSL - we already have it), and possibly on
disk-level (there are tools for this already).


On Fri, Jun 9, 2017 at 9:06 AM, Vyacheslav Daradur 
wrote:

> >> which is much less useful.
> I note, in some cases there is profit more than twice per size of an
> object.
>
> >> Would it be possible to change your implementation to handle the
> encryption instead?
> Yes, of cource, there's not much difference between compression and
> encryption, including in my implementation of per-field-compression.
>
> 2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :
>
> > Vyacheslav,
> >
> > When this feature started out as data compression in Ignite, it sounded
> > very useful. Now it is unfolding as a per-field compression, which is
> much
> > less useful. In fact, it is questionable whether it is useful at all. The
> > fact that this feature is implemented does not make it mandatory for the
> > community to accept it.
> >
> > However, as I mentioned before, per-field encryption is very useful, as
> it
> > would allow users automatically encrypt certain sensitive fields, like
> > passwords, credit card numbers, etc. There is not much conceptual
> > difference between compressing a field vs encrypting a field. Would it be
> > possible to change your implementation to handle the encryption instead?
> >
> > D.
> >
> > On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur  >
> > wrote:
> >
> > > Guys, I want to be clear:
> > > * "Per-field compression" design is the result of a research of the
> > binary
> > > infrastructure of Ignite and some other its places (querying, indexing,
> > > etc.)
> > > * Full-compression of object will be more effective, but in this case
> > there
> > > is no capability with querying and indexing (or there is large overhead
> > by
> > > way of decompressing of full object (or caches pages) on demand)
> > > * "Per-field compression" is a one of ways to implement the compression
> > > feature
> > >
> > > I'm new to Ignite also I can be mistaken in some things.
> > > Last 3-4 month I've tryed to start dicussion about a design, but nobody
> > > answers nothing (except Dmitry and Valentin who was interested how it
> > > works).
> > > But I understand that this is community and nobody is obliged to
> anybody.
> > >
> > > There are strong Ignite experts.
> > > If they can help me and community with a design of the compression
> > feature
> > > it will be great.
> > > At the moment I have a desire and time to be engaged in development of
> > > compression feature in Ignite.
> > > Let's use this opportunity :)
> > >
> > > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Igniters,
> > > >
> > > > I have never seen a single Ignite user asking about compressing a
> > single
> > > > field. However, we have had requests to secure certain fields, e.g.
> > > > passwords.
> > > >
> > > > I personally do not think per-field compression is needed, unless we
> > can
> > > > point out some concrete real life use cases.
> > > >
> > > > D.
> > > >
> > > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> > daradu...@gmail.com>
> > > > wrote:
> > > >
> > > > > Anton,
> > > > >
> > > > > >> I thought that if there will storing compressed data in the
> > memory,
> > > > data
> > > > > >> will transmit over wire in compression too. Is it right?
> > > > >
> > > > > In per-field compression case - yes.
> > > > >
> > > > > 2017-06-08 13:36 GMT+03:00 Антон Чураев :
> > > > >
> > > > > > Guys, could you please help me.
> > > > > > I thought that if there will storing compressed data in the
> memory,
> > > > data
> > > > > > will transmit over wire in compression too. Is it right?
> > > > > >
> > > > > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > > > >
> > > > > > > Vladimir,
> > > > > > >
> > > > > > > The main problem which I'am trying to solve is storing data in
> > > memory
> > > > > in
> > > > > > a
> > > > > > > compression form via Ignite.
> > > > > > > The main goal is using memory more effectivelly.
> > > > > > >
> > > > > > > >> here the much simpler step would be to full
> > > > > > > compression on per-cache basis

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vyacheslav Daradur

>> which is much less useful.
I note, in some cases there is profit more than twice per size of an object.

>> Would it be possible to change your implementation to handle the
encryption instead?
Yes, of cource, there's not much difference between compression and
encryption, including in my implementation of per-field-compression.

2017-06-09 8:55 GMT+03:00 Dmitriy Setrakyan :

> Vyacheslav,
>
> When this feature started out as data compression in Ignite, it sounded
> very useful. Now it is unfolding as a per-field compression, which is much
> less useful. In fact, it is questionable whether it is useful at all. The
> fact that this feature is implemented does not make it mandatory for the
> community to accept it.
>
> However, as I mentioned before, per-field encryption is very useful, as it
> would allow users automatically encrypt certain sensitive fields, like
> passwords, credit card numbers, etc. There is not much conceptual
> difference between compressing a field vs encrypting a field. Would it be
> possible to change your implementation to handle the encryption instead?
>
> D.
>
> On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur 
> wrote:
>
> > Guys, I want to be clear:
> > * "Per-field compression" design is the result of a research of the
> binary
> > infrastructure of Ignite and some other its places (querying, indexing,
> > etc.)
> > * Full-compression of object will be more effective, but in this case
> there
> > is no capability with querying and indexing (or there is large overhead
> by
> > way of decompressing of full object (or caches pages) on demand)
> > * "Per-field compression" is a one of ways to implement the compression
> > feature
> >
> > I'm new to Ignite also I can be mistaken in some things.
> > Last 3-4 month I've tryed to start dicussion about a design, but nobody
> > answers nothing (except Dmitry and Valentin who was interested how it
> > works).
> > But I understand that this is community and nobody is obliged to anybody.
> >
> > There are strong Ignite experts.
> > If they can help me and community with a design of the compression
> feature
> > it will be great.
> > At the moment I have a desire and time to be engaged in development of
> > compression feature in Ignite.
> > Let's use this opportunity :)
> >
> > 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Igniters,
> > >
> > > I have never seen a single Ignite user asking about compressing a
> single
> > > field. However, we have had requests to secure certain fields, e.g.
> > > passwords.
> > >
> > > I personally do not think per-field compression is needed, unless we
> can
> > > point out some concrete real life use cases.
> > >
> > > D.
> > >
> > > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur <
> daradu...@gmail.com>
> > > wrote:
> > >
> > > > Anton,
> > > >
> > > > >> I thought that if there will storing compressed data in the
> memory,
> > > data
> > > > >> will transmit over wire in compression too. Is it right?
> > > >
> > > > In per-field compression case - yes.
> > > >
> > > > 2017-06-08 13:36 GMT+03:00 Антон Чураев :
> > > >
> > > > > Guys, could you please help me.
> > > > > I thought that if there will storing compressed data in the memory,
> > > data
> > > > > will transmit over wire in compression too. Is it right?
> > > > >
> > > > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > > > Vladimir,
> > > > > >
> > > > > > The main problem which I'am trying to solve is storing data in
> > memory
> > > > in
> > > > > a
> > > > > > compression form via Ignite.
> > > > > > The main goal is using memory more effectivelly.
> > > > > >
> > > > > > >> here the much simpler step would be to full
> > > > > > compression on per-cache basis rather than dealing with
> per-fields
> > > > case.
> > > > > >
> > > > > > Please explain your idea. Compess data by memory-page?
> > > > > > Is it compatible with quering and indexing?
> > > > > >
> > > > > > >> In the end, if user would like to compress particular field,
> he
> > > can
> > > > > > always to it on his own
> > > > > > I think we mustn't think in this way, if user need something he
> > > trying
> > > > to
> > > > > > choose a tool which has this feature OOTB.
> > > > > >
> > > > > >
> > > > > >
> > > > > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov  >:
> > > > > >
> > > > > > > Igniters,
> > > > > > >
> > > > > > > Honestly I still do not see how to apply it gracefully this
> > feature
> > > > ti
> > > > > > > Ignite. And overall approach to compress only particular fields
> > > looks
> > > > > > > overcomplicated to me. Remember, that our main use case is an
> > > > > application
> > > > > > > without classes on the server. It means that any kind of
> > > annotations
> > > > > are
> > > > > > > inapplicable. To be more precise: proper API should be
> > implemented
> > > to
> > > > > > > handle

Re: Data compression in Ignite 2.0

2017-06-08 Thread Dmitriy Setrakyan

Vyacheslav,

When this feature started out as data compression in Ignite, it sounded
very useful. Now it is unfolding as a per-field compression, which is much
less useful. In fact, it is questionable whether it is useful at all. The
fact that this feature is implemented does not make it mandatory for the
community to accept it.

However, as I mentioned before, per-field encryption is very useful, as it
would allow users automatically encrypt certain sensitive fields, like
passwords, credit card numbers, etc. There is not much conceptual
difference between compressing a field vs encrypting a field. Would it be
possible to change your implementation to handle the encryption instead?

D.

On Thu, Jun 8, 2017 at 10:42 PM, Vyacheslav Daradur 
wrote:

> Guys, I want to be clear:
> * "Per-field compression" design is the result of a research of the binary
> infrastructure of Ignite and some other its places (querying, indexing,
> etc.)
> * Full-compression of object will be more effective, but in this case there
> is no capability with querying and indexing (or there is large overhead by
> way of decompressing of full object (or caches pages) on demand)
> * "Per-field compression" is a one of ways to implement the compression
> feature
>
> I'm new to Ignite also I can be mistaken in some things.
> Last 3-4 month I've tryed to start dicussion about a design, but nobody
> answers nothing (except Dmitry and Valentin who was interested how it
> works).
> But I understand that this is community and nobody is obliged to anybody.
>
> There are strong Ignite experts.
> If they can help me and community with a design of the compression feature
> it will be great.
> At the moment I have a desire and time to be engaged in development of
> compression feature in Ignite.
> Let's use this opportunity :)
>
> 2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :
>
> > Igniters,
> >
> > I have never seen a single Ignite user asking about compressing a single
> > field. However, we have had requests to secure certain fields, e.g.
> > passwords.
> >
> > I personally do not think per-field compression is needed, unless we can
> > point out some concrete real life use cases.
> >
> > D.
> >
> > On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur 
> > wrote:
> >
> > > Anton,
> > >
> > > >> I thought that if there will storing compressed data in the memory,
> > data
> > > >> will transmit over wire in compression too. Is it right?
> > >
> > > In per-field compression case - yes.
> > >
> > > 2017-06-08 13:36 GMT+03:00 Антон Чураев :
> > >
> > > > Guys, could you please help me.
> > > > I thought that if there will storing compressed data in the memory,
> > data
> > > > will transmit over wire in compression too. Is it right?
> > > >
> > > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > Vladimir,
> > > > >
> > > > > The main problem which I'am trying to solve is storing data in
> memory
> > > in
> > > > a
> > > > > compression form via Ignite.
> > > > > The main goal is using memory more effectivelly.
> > > > >
> > > > > >> here the much simpler step would be to full
> > > > > compression on per-cache basis rather than dealing with per-fields
> > > case.
> > > > >
> > > > > Please explain your idea. Compess data by memory-page?
> > > > > Is it compatible with quering and indexing?
> > > > >
> > > > > >> In the end, if user would like to compress particular field, he
> > can
> > > > > always to it on his own
> > > > > I think we mustn't think in this way, if user need something he
> > trying
> > > to
> > > > > choose a tool which has this feature OOTB.
> > > > >
> > > > >
> > > > >
> > > > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > Honestly I still do not see how to apply it gracefully this
> feature
> > > ti
> > > > > > Ignite. And overall approach to compress only particular fields
> > looks
> > > > > > overcomplicated to me. Remember, that our main use case is an
> > > > application
> > > > > > without classes on the server. It means that any kind of
> > annotations
> > > > are
> > > > > > inapplicable. To be more precise: proper API should be
> implemented
> > to
> > > > > > handle no-class case (e.g. how would build such an object through
> > > > > > BinaryBuilder without a class?), and only then add annotations as
> > > > > > convenient addition to more basic API.
> > > > > >
> > > > > > It seems to me that full implementation, which takes in count
> > proper
> > > > > > "classless" API, changes to binary metadata to reflect compressed
> > > > fields,
> > > > > > changes to SQL, changes to binary protocol, and porting to .NET
> and
> > > > CPP,
> > > > > > will yield very complex solution with little value to the
> product.
> > > > > >
> > > > > > Instead, as I proposed earlier, it seems that we'd better start
> > with
> > > > the
> >

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur

Guys, I want to be clear:
* "Per-field compression" design is the result of a research of the binary
infrastructure of Ignite and some other its places (querying, indexing,
etc.)
* Full-compression of object will be more effective, but in this case there
is no capability with querying and indexing (or there is large overhead by
way of decompressing of full object (or caches pages) on demand)
* "Per-field compression" is a one of ways to implement the compression
feature

I'm new to Ignite also I can be mistaken in some things.
Last 3-4 month I've tryed to start dicussion about a design, but nobody
answers nothing (except Dmitry and Valentin who was interested how it
works).
But I understand that this is community and nobody is obliged to anybody.

There are strong Ignite experts.
If they can help me and community with a design of the compression feature
it will be great.
At the moment I have a desire and time to be engaged in development of
compression feature in Ignite.
Let's use this opportunity :)

2017-06-09 5:36 GMT+03:00 Dmitriy Setrakyan :

> Igniters,
>
> I have never seen a single Ignite user asking about compressing a single
> field. However, we have had requests to secure certain fields, e.g.
> passwords.
>
> I personally do not think per-field compression is needed, unless we can
> point out some concrete real life use cases.
>
> D.
>
> On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur 
> wrote:
>
> > Anton,
> >
> > >> I thought that if there will storing compressed data in the memory,
> data
> > >> will transmit over wire in compression too. Is it right?
> >
> > In per-field compression case - yes.
> >
> > 2017-06-08 13:36 GMT+03:00 Антон Чураев :
> >
> > > Guys, could you please help me.
> > > I thought that if there will storing compressed data in the memory,
> data
> > > will transmit over wire in compression too. Is it right?
> > >
> > > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur :
> > >
> > > > Vladimir,
> > > >
> > > > The main problem which I'am trying to solve is storing data in memory
> > in
> > > a
> > > > compression form via Ignite.
> > > > The main goal is using memory more effectivelly.
> > > >
> > > > >> here the much simpler step would be to full
> > > > compression on per-cache basis rather than dealing with per-fields
> > case.
> > > >
> > > > Please explain your idea. Compess data by memory-page?
> > > > Is it compatible with quering and indexing?
> > > >
> > > > >> In the end, if user would like to compress particular field, he
> can
> > > > always to it on his own
> > > > I think we mustn't think in this way, if user need something he
> trying
> > to
> > > > choose a tool which has this feature OOTB.
> > > >
> > > >
> > > >
> > > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :
> > > >
> > > > > Igniters,
> > > > >
> > > > > Honestly I still do not see how to apply it gracefully this feature
> > ti
> > > > > Ignite. And overall approach to compress only particular fields
> looks
> > > > > overcomplicated to me. Remember, that our main use case is an
> > > application
> > > > > without classes on the server. It means that any kind of
> annotations
> > > are
> > > > > inapplicable. To be more precise: proper API should be implemented
> to
> > > > > handle no-class case (e.g. how would build such an object through
> > > > > BinaryBuilder without a class?), and only then add annotations as
> > > > > convenient addition to more basic API.
> > > > >
> > > > > It seems to me that full implementation, which takes in count
> proper
> > > > > "classless" API, changes to binary metadata to reflect compressed
> > > fields,
> > > > > changes to SQL, changes to binary protocol, and porting to .NET and
> > > CPP,
> > > > > will yield very complex solution with little value to the product.
> > > > >
> > > > > Instead, as I proposed earlier, it seems that we'd better start
> with
> > > the
> > > > > problem we are trying to solve. Basically, compression could help
> in
> > > two
> > > > > cases:
> > > > > 1) Transmitting data over wire - it should be implemented on
> > > > communication
> > > > > layer and should not affect binary serialization component a lot.
> > > > > 2) Storing data in memory - here the much simpler step would be to
> > full
> > > > > compression on per-cache basis rather than dealing with per-fields
> > > case.
> > > > >
> > > > > In the end, if user would like to compress particular field, he can
> > > > always
> > > > > to it on his own, and set already compressed field to our
> > BinaryObject.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > >
> > > > > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Valentin,
> > > > > >
> > > > > > Yes, I have the prototype[1][2]
> > > > > >
> > > > > > You can see an example of Java class[3] that I used in my
> > benchmark.
> > > > > > For

Re: Data compression in Ignite 2.0

2017-06-08 Thread Dmitriy Setrakyan

Igniters,

I have never seen a single Ignite user asking about compressing a single
field. However, we have had requests to secure certain fields, e.g.
passwords.

I personally do not think per-field compression is needed, unless we can
point out some concrete real life use cases.

D.

On Thu, Jun 8, 2017 at 3:42 AM, Vyacheslav Daradur 
wrote:

> Anton,
>
> >> I thought that if there will storing compressed data in the memory, data
> >> will transmit over wire in compression too. Is it right?
>
> In per-field compression case - yes.
>
> 2017-06-08 13:36 GMT+03:00 Антон Чураев :
>
> > Guys, could you please help me.
> > I thought that if there will storing compressed data in the memory, data
> > will transmit over wire in compression too. Is it right?
> >
> > 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur :
> >
> > > Vladimir,
> > >
> > > The main problem which I'am trying to solve is storing data in memory
> in
> > a
> > > compression form via Ignite.
> > > The main goal is using memory more effectivelly.
> > >
> > > >> here the much simpler step would be to full
> > > compression on per-cache basis rather than dealing with per-fields
> case.
> > >
> > > Please explain your idea. Compess data by memory-page?
> > > Is it compatible with quering and indexing?
> > >
> > > >> In the end, if user would like to compress particular field, he can
> > > always to it on his own
> > > I think we mustn't think in this way, if user need something he trying
> to
> > > choose a tool which has this feature OOTB.
> > >
> > >
> > >
> > > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :
> > >
> > > > Igniters,
> > > >
> > > > Honestly I still do not see how to apply it gracefully this feature
> ti
> > > > Ignite. And overall approach to compress only particular fields looks
> > > > overcomplicated to me. Remember, that our main use case is an
> > application
> > > > without classes on the server. It means that any kind of annotations
> > are
> > > > inapplicable. To be more precise: proper API should be implemented to
> > > > handle no-class case (e.g. how would build such an object through
> > > > BinaryBuilder without a class?), and only then add annotations as
> > > > convenient addition to more basic API.
> > > >
> > > > It seems to me that full implementation, which takes in count proper
> > > > "classless" API, changes to binary metadata to reflect compressed
> > fields,
> > > > changes to SQL, changes to binary protocol, and porting to .NET and
> > CPP,
> > > > will yield very complex solution with little value to the product.
> > > >
> > > > Instead, as I proposed earlier, it seems that we'd better start with
> > the
> > > > problem we are trying to solve. Basically, compression could help in
> > two
> > > > cases:
> > > > 1) Transmitting data over wire - it should be implemented on
> > > communication
> > > > layer and should not affect binary serialization component a lot.
> > > > 2) Storing data in memory - here the much simpler step would be to
> full
> > > > compression on per-cache basis rather than dealing with per-fields
> > case.
> > > >
> > > > In the end, if user would like to compress particular field, he can
> > > always
> > > > to it on his own, and set already compressed field to our
> BinaryObject.
> > > >
> > > > Vladimir.
> > > >
> > > >
> > > > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Valentin,
> > > > >
> > > > > Yes, I have the prototype[1][2]
> > > > >
> > > > > You can see an example of Java class[3] that I used in my
> benchmark.
> > > > > For example:
> > > > > class Foo {
> > > > > @BinaryCompression
> > > > > String data;
> > > > > }
> > > > > If user make decision to store the object in compressed form, he
> can
> > > use
> > > > > the annotation @BinaryCompression as shown above.
> > > > > It means annotated field 'data' will be compressed at marshalling.
> > > > >
> > > > > [1] https://github.com/apache/ignite/pull/1951
> > > > > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > > > > [3]
> > > > > https://github.com/daradurvs/ignite-compression/blob/
> > > > > master/src/main/java/ru/daradurvs/ignite/compression/
> > > model/Audit1F.java
> > > > >
> > > > >
> > > > >
> > > > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > > > > valentin.kuliche...@gmail.com
> > > > > >:
> > > > >
> > > > > > Vyacheslav, Anton,
> > > > > >
> > > > > > Are there any ideas and/or prototypes for the API? Your design
> > > > > suggestions
> > > > > > seem to make sense, but I would like to see how it all this will
> > like
> > > > > from
> > > > > > user's standpoint.
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев <
> churaev...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Vyacheslav, correct me if something wrong
> > > > > > >
> > > > > > > We could provide opportunity of

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur

Anton,

>> I thought that if there will storing compressed data in the memory, data
>> will transmit over wire in compression too. Is it right?

In per-field compression case - yes.

2017-06-08 13:36 GMT+03:00 Антон Чураев :

> Guys, could you please help me.
> I thought that if there will storing compressed data in the memory, data
> will transmit over wire in compression too. Is it right?
>
> 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur :
>
> > Vladimir,
> >
> > The main problem which I'am trying to solve is storing data in memory in
> a
> > compression form via Ignite.
> > The main goal is using memory more effectivelly.
> >
> > >> here the much simpler step would be to full
> > compression on per-cache basis rather than dealing with per-fields case.
> >
> > Please explain your idea. Compess data by memory-page?
> > Is it compatible with quering and indexing?
> >
> > >> In the end, if user would like to compress particular field, he can
> > always to it on his own
> > I think we mustn't think in this way, if user need something he trying to
> > choose a tool which has this feature OOTB.
> >
> >
> >
> > 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :
> >
> > > Igniters,
> > >
> > > Honestly I still do not see how to apply it gracefully this feature ti
> > > Ignite. And overall approach to compress only particular fields looks
> > > overcomplicated to me. Remember, that our main use case is an
> application
> > > without classes on the server. It means that any kind of annotations
> are
> > > inapplicable. To be more precise: proper API should be implemented to
> > > handle no-class case (e.g. how would build such an object through
> > > BinaryBuilder without a class?), and only then add annotations as
> > > convenient addition to more basic API.
> > >
> > > It seems to me that full implementation, which takes in count proper
> > > "classless" API, changes to binary metadata to reflect compressed
> fields,
> > > changes to SQL, changes to binary protocol, and porting to .NET and
> CPP,
> > > will yield very complex solution with little value to the product.
> > >
> > > Instead, as I proposed earlier, it seems that we'd better start with
> the
> > > problem we are trying to solve. Basically, compression could help in
> two
> > > cases:
> > > 1) Transmitting data over wire - it should be implemented on
> > communication
> > > layer and should not affect binary serialization component a lot.
> > > 2) Storing data in memory - here the much simpler step would be to full
> > > compression on per-cache basis rather than dealing with per-fields
> case.
> > >
> > > In the end, if user would like to compress particular field, he can
> > always
> > > to it on his own, and set already compressed field to our BinaryObject.
> > >
> > > Vladimir.
> > >
> > >
> > > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > > Valentin,
> > > >
> > > > Yes, I have the prototype[1][2]
> > > >
> > > > You can see an example of Java class[3] that I used in my benchmark.
> > > > For example:
> > > > class Foo {
> > > > @BinaryCompression
> > > > String data;
> > > > }
> > > > If user make decision to store the object in compressed form, he can
> > use
> > > > the annotation @BinaryCompression as shown above.
> > > > It means annotated field 'data' will be compressed at marshalling.
> > > >
> > > > [1] https://github.com/apache/ignite/pull/1951
> > > > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > > > [3]
> > > > https://github.com/daradurvs/ignite-compression/blob/
> > > > master/src/main/java/ru/daradurvs/ignite/compression/
> > model/Audit1F.java
> > > >
> > > >
> > > >
> > > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > > > valentin.kuliche...@gmail.com
> > > > >:
> > > >
> > > > > Vyacheslav, Anton,
> > > > >
> > > > > Are there any ideas and/or prototypes for the API? Your design
> > > > suggestions
> > > > > seem to make sense, but I would like to see how it all this will
> like
> > > > from
> > > > > user's standpoint.
> > > > >
> > > > > -Val
> > > > >
> > > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев  >
> > > > wrote:
> > > > >
> > > > > > Vyacheslav, correct me if something wrong
> > > > > >
> > > > > > We could provide opportunity of choose between CPU usage and
> > MEM/NET
> > > > > usage
> > > > > > for users by compression some attributes of stored objects.
> > > > > > You have learned design, and it is possible to localize changes
> in
> > > > > > marshalling without performance affect and current functionality.
> > > > > >
> > > > > > I think, that it's usefull for our project and users.
> > > > > > Community, what do you think about this proposal?
> > > > > >
> > > > > >
> > > > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > > > >
> > > > > > > In short,
> > > > > > >
> > > > > > > During marshalling a fields is

Re: Data compression in Ignite 2.0

2017-06-08 Thread Антон Чураев

Guys, could you please help me.
I thought that if there will storing compressed data in the memory, data
will transmit over wire in compression too. Is it right?

2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur :

> Vladimir,
>
> The main problem which I'am trying to solve is storing data in memory in a
> compression form via Ignite.
> The main goal is using memory more effectivelly.
>
> >> here the much simpler step would be to full
> compression on per-cache basis rather than dealing with per-fields case.
>
> Please explain your idea. Compess data by memory-page?
> Is it compatible with quering and indexing?
>
> >> In the end, if user would like to compress particular field, he can
> always to it on his own
> I think we mustn't think in this way, if user need something he trying to
> choose a tool which has this feature OOTB.
>
>
>
> 2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :
>
> > Igniters,
> >
> > Honestly I still do not see how to apply it gracefully this feature ti
> > Ignite. And overall approach to compress only particular fields looks
> > overcomplicated to me. Remember, that our main use case is an application
> > without classes on the server. It means that any kind of annotations are
> > inapplicable. To be more precise: proper API should be implemented to
> > handle no-class case (e.g. how would build such an object through
> > BinaryBuilder without a class?), and only then add annotations as
> > convenient addition to more basic API.
> >
> > It seems to me that full implementation, which takes in count proper
> > "classless" API, changes to binary metadata to reflect compressed fields,
> > changes to SQL, changes to binary protocol, and porting to .NET and CPP,
> > will yield very complex solution with little value to the product.
> >
> > Instead, as I proposed earlier, it seems that we'd better start with the
> > problem we are trying to solve. Basically, compression could help in two
> > cases:
> > 1) Transmitting data over wire - it should be implemented on
> communication
> > layer and should not affect binary serialization component a lot.
> > 2) Storing data in memory - here the much simpler step would be to full
> > compression on per-cache basis rather than dealing with per-fields case.
> >
> > In the end, if user would like to compress particular field, he can
> always
> > to it on his own, and set already compressed field to our BinaryObject.
> >
> > Vladimir.
> >
> >
> > On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur  >
> > wrote:
> >
> > > Valentin,
> > >
> > > Yes, I have the prototype[1][2]
> > >
> > > You can see an example of Java class[3] that I used in my benchmark.
> > > For example:
> > > class Foo {
> > > @BinaryCompression
> > > String data;
> > > }
> > > If user make decision to store the object in compressed form, he can
> use
> > > the annotation @BinaryCompression as shown above.
> > > It means annotated field 'data' will be compressed at marshalling.
> > >
> > > [1] https://github.com/apache/ignite/pull/1951
> > > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > > [3]
> > > https://github.com/daradurvs/ignite-compression/blob/
> > > master/src/main/java/ru/daradurvs/ignite/compression/
> model/Audit1F.java
> > >
> > >
> > >
> > > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > > valentin.kuliche...@gmail.com
> > > >:
> > >
> > > > Vyacheslav, Anton,
> > > >
> > > > Are there any ideas and/or prototypes for the API? Your design
> > > suggestions
> > > > seem to make sense, but I would like to see how it all this will like
> > > from
> > > > user's standpoint.
> > > >
> > > > -Val
> > > >
> > > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев 
> > > wrote:
> > > >
> > > > > Vyacheslav, correct me if something wrong
> > > > >
> > > > > We could provide opportunity of choose between CPU usage and
> MEM/NET
> > > > usage
> > > > > for users by compression some attributes of stored objects.
> > > > > You have learned design, and it is possible to localize changes in
> > > > > marshalling without performance affect and current functionality.
> > > > >
> > > > > I think, that it's usefull for our project and users.
> > > > > Community, what do you think about this proposal?
> > > > >
> > > > >
> > > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > > > In short,
> > > > > >
> > > > > > During marshalling a fields is represented as BinaryFieldAccessor
> > > which
> > > > > > manages its marshalling. It checks if the field is marked by
> > > annotation
> > > > > > @BinaryCompression, in that case - binary  representation of
> field
> > > > (bytes
> > > > > > array) will be compressed. It will be marked as compressed by
> types
> > > > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> > > compressed
> > > > > > bytes
> > > > > > array wiil be include in binary representation of whole object.
> > Note,
> > > > > >

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur

Vladimir,

The main problem which I'am trying to solve is storing data in memory in a
compression form via Ignite.
The main goal is using memory more effectivelly.

>> here the much simpler step would be to full
compression on per-cache basis rather than dealing with per-fields case.

Please explain your idea. Compess data by memory-page?
Is it compatible with quering and indexing?

>> In the end, if user would like to compress particular field, he can
always to it on his own
I think we mustn't think in this way, if user need something he trying to
choose a tool which has this feature OOTB.



2017-06-08 12:53 GMT+03:00 Vladimir Ozerov :

> Igniters,
>
> Honestly I still do not see how to apply it gracefully this feature ti
> Ignite. And overall approach to compress only particular fields looks
> overcomplicated to me. Remember, that our main use case is an application
> without classes on the server. It means that any kind of annotations are
> inapplicable. To be more precise: proper API should be implemented to
> handle no-class case (e.g. how would build such an object through
> BinaryBuilder without a class?), and only then add annotations as
> convenient addition to more basic API.
>
> It seems to me that full implementation, which takes in count proper
> "classless" API, changes to binary metadata to reflect compressed fields,
> changes to SQL, changes to binary protocol, and porting to .NET and CPP,
> will yield very complex solution with little value to the product.
>
> Instead, as I proposed earlier, it seems that we'd better start with the
> problem we are trying to solve. Basically, compression could help in two
> cases:
> 1) Transmitting data over wire - it should be implemented on communication
> layer and should not affect binary serialization component a lot.
> 2) Storing data in memory - here the much simpler step would be to full
> compression on per-cache basis rather than dealing with per-fields case.
>
> In the end, if user would like to compress particular field, he can always
> to it on his own, and set already compressed field to our BinaryObject.
>
> Vladimir.
>
>
> On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur 
> wrote:
>
> > Valentin,
> >
> > Yes, I have the prototype[1][2]
> >
> > You can see an example of Java class[3] that I used in my benchmark.
> > For example:
> > class Foo {
> > @BinaryCompression
> > String data;
> > }
> > If user make decision to store the object in compressed form, he can use
> > the annotation @BinaryCompression as shown above.
> > It means annotated field 'data' will be compressed at marshalling.
> >
> > [1] https://github.com/apache/ignite/pull/1951
> > [2] https://issues.apache.org/jira/browse/IGNITE-5226
> > [3]
> > https://github.com/daradurvs/ignite-compression/blob/
> > master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java
> >
> >
> >
> > 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> > valentin.kuliche...@gmail.com
> > >:
> >
> > > Vyacheslav, Anton,
> > >
> > > Are there any ideas and/or prototypes for the API? Your design
> > suggestions
> > > seem to make sense, but I would like to see how it all this will like
> > from
> > > user's standpoint.
> > >
> > > -Val
> > >
> > > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев 
> > wrote:
> > >
> > > > Vyacheslav, correct me if something wrong
> > > >
> > > > We could provide opportunity of choose between CPU usage and MEM/NET
> > > usage
> > > > for users by compression some attributes of stored objects.
> > > > You have learned design, and it is possible to localize changes in
> > > > marshalling without performance affect and current functionality.
> > > >
> > > > I think, that it's usefull for our project and users.
> > > > Community, what do you think about this proposal?
> > > >
> > > >
> > > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > In short,
> > > > >
> > > > > During marshalling a fields is represented as BinaryFieldAccessor
> > which
> > > > > manages its marshalling. It checks if the field is marked by
> > annotation
> > > > > @BinaryCompression, in that case - binary  representation of field
> > > (bytes
> > > > > array) will be compressed. It will be marked as compressed by types
> > > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> > compressed
> > > > > bytes
> > > > > array wiil be include in binary representation of whole object.
> Note,
> > > > > header of marshalled object will not be compressed. Compression
> > > affected
> > > > > only object's field representation.
> > > > >
> > > > > Objects in IgniteCache is represented as BinaryObject which is
> > wrapper
> > > > over
> > > > > bytes array of marshalled object.
> > > > > BinaryObject provides some usefull methods, which are used by
> Ignite
> > > > > systems.
> > > > > For example, the Queries use BinaryObject#field method, which
> > > > deserializes
> > > > > only field of

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vladimir Ozerov

Igniters,

Honestly I still do not see how to apply it gracefully this feature ti
Ignite. And overall approach to compress only particular fields looks
overcomplicated to me. Remember, that our main use case is an application
without classes on the server. It means that any kind of annotations are
inapplicable. To be more precise: proper API should be implemented to
handle no-class case (e.g. how would build such an object through
BinaryBuilder without a class?), and only then add annotations as
convenient addition to more basic API.

It seems to me that full implementation, which takes in count proper
"classless" API, changes to binary metadata to reflect compressed fields,
changes to SQL, changes to binary protocol, and porting to .NET and CPP,
will yield very complex solution with little value to the product.

Instead, as I proposed earlier, it seems that we'd better start with the
problem we are trying to solve. Basically, compression could help in two
cases:
1) Transmitting data over wire - it should be implemented on communication
layer and should not affect binary serialization component a lot.
2) Storing data in memory - here the much simpler step would be to full
compression on per-cache basis rather than dealing with per-fields case.

In the end, if user would like to compress particular field, he can always
to it on his own, and set already compressed field to our BinaryObject.

Vladimir.


On Thu, Jun 8, 2017 at 12:37 PM, Vyacheslav Daradur 
wrote:

> Valentin,
>
> Yes, I have the prototype[1][2]
>
> You can see an example of Java class[3] that I used in my benchmark.
> For example:
> class Foo {
> @BinaryCompression
> String data;
> }
> If user make decision to store the object in compressed form, he can use
> the annotation @BinaryCompression as shown above.
> It means annotated field 'data' will be compressed at marshalling.
>
> [1] https://github.com/apache/ignite/pull/1951
> [2] https://issues.apache.org/jira/browse/IGNITE-5226
> [3]
> https://github.com/daradurvs/ignite-compression/blob/
> master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java
>
>
>
> 2017-06-08 2:04 GMT+03:00 Valentin Kulichenko <
> valentin.kuliche...@gmail.com
> >:
>
> > Vyacheslav, Anton,
> >
> > Are there any ideas and/or prototypes for the API? Your design
> suggestions
> > seem to make sense, but I would like to see how it all this will like
> from
> > user's standpoint.
> >
> > -Val
> >
> > On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев 
> wrote:
> >
> > > Vyacheslav, correct me if something wrong
> > >
> > > We could provide opportunity of choose between CPU usage and MEM/NET
> > usage
> > > for users by compression some attributes of stored objects.
> > > You have learned design, and it is possible to localize changes in
> > > marshalling without performance affect and current functionality.
> > >
> > > I think, that it's usefull for our project and users.
> > > Community, what do you think about this proposal?
> > >
> > >
> > > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur :
> > >
> > > > In short,
> > > >
> > > > During marshalling a fields is represented as BinaryFieldAccessor
> which
> > > > manages its marshalling. It checks if the field is marked by
> annotation
> > > > @BinaryCompression, in that case - binary  representation of field
> > (bytes
> > > > array) will be compressed. It will be marked as compressed by types
> > > > constant (GridBinaryMarshaller.COMPRESSED), after this the
> compressed
> > > > bytes
> > > > array wiil be include in binary representation of whole object. Note,
> > > > header of marshalled object will not be compressed. Compression
> > affected
> > > > only object's field representation.
> > > >
> > > > Objects in IgniteCache is represented as BinaryObject which is
> wrapper
> > > over
> > > > bytes array of marshalled object.
> > > > BinaryObject provides some usefull methods, which are used by Ignite
> > > > systems.
> > > > For example, the Queries use BinaryObject#field method, which
> > > deserializes
> > > > only field of object, without deserializing of whole object.
> > > > BinaryObject#field method during deserialization, if meets the
> constant
> > > of
> > > > compressed type, decompress this bytes array, then continue
> > unmarshalling
> > > > as usual.
> > > >
> > > > Now, I introduced the Compressor interface in IgniteConfigurations,
> it
> > > > allows user to use own implementation of compressor - it is the
> > > requirement
> > > > in the task[1].
> > > >
> > > > As far as I know, Vladimir Ozerov doesn't like the idea of granting
> > this
> > > > opportunity to the user.
> > > > In that case we can choose a compression algorithm which we will
> > provide
> > > by
> > > > default and will move the interface to internals of binary
> > > infractructure.
> > > > For this case I've prepared benchmarked, which I've sent earlier.
> > > >
> > > > I vote for ZSTD algorithm[2], it provides

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur

Valentin,

Yes, I have the prototype[1][2]

You can see an example of Java class[3] that I used in my benchmark.
For example:
class Foo {
@BinaryCompression
String data;
}
If user make decision to store the object in compressed form, he can use
the annotation @BinaryCompression as shown above.
It means annotated field 'data' will be compressed at marshalling.

[1] https://github.com/apache/ignite/pull/1951
[2] https://issues.apache.org/jira/browse/IGNITE-5226
[3]
https://github.com/daradurvs/ignite-compression/blob/master/src/main/java/ru/daradurvs/ignite/compression/model/Audit1F.java



2017-06-08 2:04 GMT+03:00 Valentin Kulichenko :

> Vyacheslav, Anton,
>
> Are there any ideas and/or prototypes for the API? Your design suggestions
> seem to make sense, but I would like to see how it all this will like from
> user's standpoint.
>
> -Val
>
> On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев  wrote:
>
> > Vyacheslav, correct me if something wrong
> >
> > We could provide opportunity of choose between CPU usage and MEM/NET
> usage
> > for users by compression some attributes of stored objects.
> > You have learned design, and it is possible to localize changes in
> > marshalling without performance affect and current functionality.
> >
> > I think, that it's usefull for our project and users.
> > Community, what do you think about this proposal?
> >
> >
> > 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur :
> >
> > > In short,
> > >
> > > During marshalling a fields is represented as BinaryFieldAccessor which
> > > manages its marshalling. It checks if the field is marked by annotation
> > > @BinaryCompression, in that case - binary  representation of field
> (bytes
> > > array) will be compressed. It will be marked as compressed by types
> > > constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> > > bytes
> > > array wiil be include in binary representation of whole object. Note,
> > > header of marshalled object will not be compressed. Compression
> affected
> > > only object's field representation.
> > >
> > > Objects in IgniteCache is represented as BinaryObject which is wrapper
> > over
> > > bytes array of marshalled object.
> > > BinaryObject provides some usefull methods, which are used by Ignite
> > > systems.
> > > For example, the Queries use BinaryObject#field method, which
> > deserializes
> > > only field of object, without deserializing of whole object.
> > > BinaryObject#field method during deserialization, if meets the constant
> > of
> > > compressed type, decompress this bytes array, then continue
> unmarshalling
> > > as usual.
> > >
> > > Now, I introduced the Compressor interface in IgniteConfigurations, it
> > > allows user to use own implementation of compressor - it is the
> > requirement
> > > in the task[1].
> > >
> > > As far as I know, Vladimir Ozerov doesn't like the idea of granting
> this
> > > opportunity to the user.
> > > In that case we can choose a compression algorithm which we will
> provide
> > by
> > > default and will move the interface to internals of binary
> > infractructure.
> > > For this case I've prepared benchmarked, which I've sent earlier.
> > >
> > > I vote for ZSTD algorithm[2], it provides good compression ratio and
> good
> > > throughput. It has implementation in Java, .NET and C++, and has
> > > ASF-friendly license, we can use it in the all Ignite platforms.
> > > You can look at an assessment of this algorithm in my benchmark's
> > >
> > > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > > [2]https://github.com/facebook/zstd
> > >
> > >
> > > 2017-06-06 16:02 GMT+03:00 Антон Чураев :
> > >
> > > > Looks good for me.
> > > >
> > > > Could You propose design of implementation in couple of sentences?
> > > > So that we can estimate the completeness and complexity of the
> > proposal.
> > > >
> > > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > Anton,
> > > > >
> > > > > Of course, the solution does not affect on existing
> implementation. I
> > > > mean,
> > > > > there is no changes if user not use the annotation
> > @BinaryCompression.
> > > > (no
> > > > > performance changes)
> > > > > Only if user make decision to use compression on specific field or
> > > fields
> > > > > of a class - in that case compression will be used at marshalling
> in
> > > > > relation to annotated fields.
> > > > >
> > > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев :
> > > > >
> > > > > > Vyacheslav,
> > > > > >
> > > > > > Is it possible to propose implementation that can be switched on
> > > > > on-demand?
> > > > > > In this case it should not affect performance of current
> solution.
> > > > > >
> > > > > > I mean, that users should make decision what is more important
> for
> > > > them:
> > > > > > throutput or memory/net usage.
> > > > > > May be they will be choose not all objects,

Re: Data compression in Ignite 2.0

2017-06-07 Thread Valentin Kulichenko

Vyacheslav, Anton,

Are there any ideas and/or prototypes for the API? Your design suggestions
seem to make sense, but I would like to see how it all this will like from
user's standpoint.

-Val

On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев  wrote:

> Vyacheslav, correct me if something wrong
>
> We could provide opportunity of choose between CPU usage and MEM/NET usage
> for users by compression some attributes of stored objects.
> You have learned design, and it is possible to localize changes in
> marshalling without performance affect and current functionality.
>
> I think, that it's usefull for our project and users.
> Community, what do you think about this proposal?
>
>
> 2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur :
>
> > In short,
> >
> > During marshalling a fields is represented as BinaryFieldAccessor which
> > manages its marshalling. It checks if the field is marked by annotation
> > @BinaryCompression, in that case - binary  representation of field (bytes
> > array) will be compressed. It will be marked as compressed by types
> > constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> > bytes
> > array wiil be include in binary representation of whole object. Note,
> > header of marshalled object will not be compressed. Compression affected
> > only object's field representation.
> >
> > Objects in IgniteCache is represented as BinaryObject which is wrapper
> over
> > bytes array of marshalled object.
> > BinaryObject provides some usefull methods, which are used by Ignite
> > systems.
> > For example, the Queries use BinaryObject#field method, which
> deserializes
> > only field of object, without deserializing of whole object.
> > BinaryObject#field method during deserialization, if meets the constant
> of
> > compressed type, decompress this bytes array, then continue unmarshalling
> > as usual.
> >
> > Now, I introduced the Compressor interface in IgniteConfigurations, it
> > allows user to use own implementation of compressor - it is the
> requirement
> > in the task[1].
> >
> > As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> > opportunity to the user.
> > In that case we can choose a compression algorithm which we will provide
> by
> > default and will move the interface to internals of binary
> infractructure.
> > For this case I've prepared benchmarked, which I've sent earlier.
> >
> > I vote for ZSTD algorithm[2], it provides good compression ratio and good
> > throughput. It has implementation in Java, .NET and C++, and has
> > ASF-friendly license, we can use it in the all Ignite platforms.
> > You can look at an assessment of this algorithm in my benchmark's
> >
> > [1] https://issues.apache.org/jira/browse/IGNITE-3592
> > [2]https://github.com/facebook/zstd
> >
> >
> > 2017-06-06 16:02 GMT+03:00 Антон Чураев :
> >
> > > Looks good for me.
> > >
> > > Could You propose design of implementation in couple of sentences?
> > > So that we can estimate the completeness and complexity of the
> proposal.
> > >
> > > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur :
> > >
> > > > Anton,
> > > >
> > > > Of course, the solution does not affect on existing implementation. I
> > > mean,
> > > > there is no changes if user not use the annotation
> @BinaryCompression.
> > > (no
> > > > performance changes)
> > > > Only if user make decision to use compression on specific field or
> > fields
> > > > of a class - in that case compression will be used at marshalling in
> > > > relation to annotated fields.
> > > >
> > > > 2017-06-06 15:10 GMT+03:00 Антон Чураев :
> > > >
> > > > > Vyacheslav,
> > > > >
> > > > > Is it possible to propose implementation that can be switched on
> > > > on-demand?
> > > > > In this case it should not affect performance of current solution.
> > > > >
> > > > > I mean, that users should make decision what is more important for
> > > them:
> > > > > throutput or memory/net usage.
> > > > > May be they will be choose not all objects, or only some attributes
> > of
> > > > > objects for compress.
> > > > >
> > > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > > > Conclusion:
> > > > > > Provided solution allows reduce size of an object in IgniteCache
> at
> > > the
> > > > > > cost of throughput reduction (small - in some cases), it depends
> on
> > > > part
> > > > > of
> > > > > > object which will be compressed and compression algorithm.
> > > > > > I mean, we can make more effective use of memory, and in some
> cases
> > > it
> > > > > can
> > > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > > >
> > > > > > Especially, it will be particularly useful for object's fields
> > which
> > > > are
> > > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > > >
> > > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев

Re: Data compression in Ignite 2.0

2017-06-07 Thread Антон Чураев

Vyacheslav, correct me if something wrong

We could provide opportunity of choose between CPU usage and MEM/NET usage
for users by compression some attributes of stored objects.
You have learned design, and it is possible to localize changes in
marshalling without performance affect and current functionality.

I think, that it's usefull for our project and users.
Community, what do you think about this proposal?


2017-06-06 17:29 GMT+03:00 Vyacheslav Daradur :

> In short,
>
> During marshalling a fields is represented as BinaryFieldAccessor which
> manages its marshalling. It checks if the field is marked by annotation
> @BinaryCompression, in that case - binary  representation of field (bytes
> array) will be compressed. It will be marked as compressed by types
> constant (GridBinaryMarshaller.COMPRESSED), after this the compressed
> bytes
> array wiil be include in binary representation of whole object. Note,
> header of marshalled object will not be compressed. Compression affected
> only object's field representation.
>
> Objects in IgniteCache is represented as BinaryObject which is wrapper over
> bytes array of marshalled object.
> BinaryObject provides some usefull methods, which are used by Ignite
> systems.
> For example, the Queries use BinaryObject#field method, which deserializes
> only field of object, without deserializing of whole object.
> BinaryObject#field method during deserialization, if meets the constant of
> compressed type, decompress this bytes array, then continue unmarshalling
> as usual.
>
> Now, I introduced the Compressor interface in IgniteConfigurations, it
> allows user to use own implementation of compressor - it is the requirement
> in the task[1].
>
> As far as I know, Vladimir Ozerov doesn't like the idea of granting this
> opportunity to the user.
> In that case we can choose a compression algorithm which we will provide by
> default and will move the interface to internals of binary infractructure.
> For this case I've prepared benchmarked, which I've sent earlier.
>
> I vote for ZSTD algorithm[2], it provides good compression ratio and good
> throughput. It has implementation in Java, .NET and C++, and has
> ASF-friendly license, we can use it in the all Ignite platforms.
> You can look at an assessment of this algorithm in my benchmark's
>
> [1] https://issues.apache.org/jira/browse/IGNITE-3592
> [2]https://github.com/facebook/zstd
>
>
> 2017-06-06 16:02 GMT+03:00 Антон Чураев :
>
> > Looks good for me.
> >
> > Could You propose design of implementation in couple of sentences?
> > So that we can estimate the completeness and complexity of the proposal.
> >
> > 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur :
> >
> > > Anton,
> > >
> > > Of course, the solution does not affect on existing implementation. I
> > mean,
> > > there is no changes if user not use the annotation @BinaryCompression.
> > (no
> > > performance changes)
> > > Only if user make decision to use compression on specific field or
> fields
> > > of a class - in that case compression will be used at marshalling in
> > > relation to annotated fields.
> > >
> > > 2017-06-06 15:10 GMT+03:00 Антон Чураев :
> > >
> > > > Vyacheslav,
> > > >
> > > > Is it possible to propose implementation that can be switched on
> > > on-demand?
> > > > In this case it should not affect performance of current solution.
> > > >
> > > > I mean, that users should make decision what is more important for
> > them:
> > > > throutput or memory/net usage.
> > > > May be they will be choose not all objects, or only some attributes
> of
> > > > objects for compress.
> > > >
> > > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > Conclusion:
> > > > > Provided solution allows reduce size of an object in IgniteCache at
> > the
> > > > > cost of throughput reduction (small - in some cases), it depends on
> > > part
> > > > of
> > > > > object which will be compressed and compression algorithm.
> > > > > I mean, we can make more effective use of memory, and in some cases
> > it
> > > > can
> > > > > reduce loading of the interconnect. (replication, rebalancing)
> > > > >
> > > > > Especially, it will be particularly useful for object's fields
> which
> > > are
> > > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > > >
> > > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев :
> > > > >
> > > > > > Vyacheslav, thank you! But could you please provide a conclusions
> > or
> > > > > > proposals based on this benchmarks?
> > > > > >
> > > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > > > >
> > > > > > > Dmitry,
> > > > > > >
> > > > > > > Excel-pages:
> > > > > > >
> > > > > > > 1). "Compression ratio (2)" - shows object size, with
> compression
> > > and
> > > > > > > without compression. (Conditions: literal text)
> > > > > > > 1st

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

In short,

During marshalling a fields is represented as BinaryFieldAccessor which
manages its marshalling. It checks if the field is marked by annotation
@BinaryCompression, in that case - binary  representation of field (bytes
array) will be compressed. It will be marked as compressed by types
constant (GridBinaryMarshaller.COMPRESSED), after this the compressed bytes
array wiil be include in binary representation of whole object. Note,
header of marshalled object will not be compressed. Compression affected
only object's field representation.

Objects in IgniteCache is represented as BinaryObject which is wrapper over
bytes array of marshalled object.
BinaryObject provides some usefull methods, which are used by Ignite
systems.
For example, the Queries use BinaryObject#field method, which deserializes
only field of object, without deserializing of whole object.
BinaryObject#field method during deserialization, if meets the constant of
compressed type, decompress this bytes array, then continue unmarshalling
as usual.

Now, I introduced the Compressor interface in IgniteConfigurations, it
allows user to use own implementation of compressor - it is the requirement
in the task[1].

As far as I know, Vladimir Ozerov doesn't like the idea of granting this
opportunity to the user.
In that case we can choose a compression algorithm which we will provide by
default and will move the interface to internals of binary infractructure.
For this case I've prepared benchmarked, which I've sent earlier.

I vote for ZSTD algorithm[2], it provides good compression ratio and good
throughput. It has implementation in Java, .NET and C++, and has
ASF-friendly license, we can use it in the all Ignite platforms.
You can look at an assessment of this algorithm in my benchmark's

[1] https://issues.apache.org/jira/browse/IGNITE-3592
[2]https://github.com/facebook/zstd


2017-06-06 16:02 GMT+03:00 Антон Чураев :

> Looks good for me.
>
> Could You propose design of implementation in couple of sentences?
> So that we can estimate the completeness and complexity of the proposal.
>
> 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur :
>
> > Anton,
> >
> > Of course, the solution does not affect on existing implementation. I
> mean,
> > there is no changes if user not use the annotation @BinaryCompression.
> (no
> > performance changes)
> > Only if user make decision to use compression on specific field or fields
> > of a class - in that case compression will be used at marshalling in
> > relation to annotated fields.
> >
> > 2017-06-06 15:10 GMT+03:00 Антон Чураев :
> >
> > > Vyacheslav,
> > >
> > > Is it possible to propose implementation that can be switched on
> > on-demand?
> > > In this case it should not affect performance of current solution.
> > >
> > > I mean, that users should make decision what is more important for
> them:
> > > throutput or memory/net usage.
> > > May be they will be choose not all objects, or only some attributes of
> > > objects for compress.
> > >
> > > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur :
> > >
> > > > Conclusion:
> > > > Provided solution allows reduce size of an object in IgniteCache at
> the
> > > > cost of throughput reduction (small - in some cases), it depends on
> > part
> > > of
> > > > object which will be compressed and compression algorithm.
> > > > I mean, we can make more effective use of memory, and in some cases
> it
> > > can
> > > > reduce loading of the interconnect. (replication, rebalancing)
> > > >
> > > > Especially, it will be particularly useful for object's fields which
> > are
> > > > large text (>~ 250 bytes) and can be effectively compressed.
> > > >
> > > > 2017-06-06 12:00 GMT+03:00 Антон Чураев :
> > > >
> > > > > Vyacheslav, thank you! But could you please provide a conclusions
> or
> > > > > proposals based on this benchmarks?
> > > > >
> > > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > > > Dmitry,
> > > > > >
> > > > > > Excel-pages:
> > > > > >
> > > > > > 1). "Compression ratio (2)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions: literal text)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > algrithms
> > > > > > depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >
> > > > > > 2). "Compression ratio (1)" - shows object size, with compression
> > and
> > > > > > without compression. (Conditions:  badly compressed character
> > > sequence)
> > > > > > 1st graph shows compression ratios of using different compression
> > > > > > algrithms depending on size of compressed field.
> > > > > > 2nd graph shows evaluation of size of objects depending on sizes
> > and
> > > > > > compression algorithms.
> > > > > >

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев

Looks good for me.

Could You propose design of implementation in couple of sentences?
So that we can estimate the completeness and complexity of the proposal.

2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur :

> Anton,
>
> Of course, the solution does not affect on existing implementation. I mean,
> there is no changes if user not use the annotation @BinaryCompression. (no
> performance changes)
> Only if user make decision to use compression on specific field or fields
> of a class - in that case compression will be used at marshalling in
> relation to annotated fields.
>
> 2017-06-06 15:10 GMT+03:00 Антон Чураев :
>
> > Vyacheslav,
> >
> > Is it possible to propose implementation that can be switched on
> on-demand?
> > In this case it should not affect performance of current solution.
> >
> > I mean, that users should make decision what is more important for them:
> > throutput or memory/net usage.
> > May be they will be choose not all objects, or only some attributes of
> > objects for compress.
> >
> > 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur :
> >
> > > Conclusion:
> > > Provided solution allows reduce size of an object in IgniteCache at the
> > > cost of throughput reduction (small - in some cases), it depends on
> part
> > of
> > > object which will be compressed and compression algorithm.
> > > I mean, we can make more effective use of memory, and in some cases it
> > can
> > > reduce loading of the interconnect. (replication, rebalancing)
> > >
> > > Especially, it will be particularly useful for object's fields which
> are
> > > large text (>~ 250 bytes) and can be effectively compressed.
> > >
> > > 2017-06-06 12:00 GMT+03:00 Антон Чураев :
> > >
> > > > Vyacheslav, thank you! But could you please provide a conclusions or
> > > > proposals based on this benchmarks?
> > > >
> > > > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > Dmitry,
> > > > >
> > > > > Excel-pages:
> > > > >
> > > > > 1). "Compression ratio (2)" - shows object size, with compression
> and
> > > > > without compression. (Conditions: literal text)
> > > > > 1st graph shows compression ratios of using different compression
> > > > algrithms
> > > > > depending on size of compressed field.
> > > > > 2nd graph shows evaluation of size of objects depending on sizes
> and
> > > > > compression algorithms.
> > > > >
> > > > > 2). "Compression ratio (1)" - shows object size, with compression
> and
> > > > > without compression. (Conditions:  badly compressed character
> > sequence)
> > > > > 1st graph shows compression ratios of using different compression
> > > > > algrithms depending on size of compressed field.
> > > > > 2nd graph shows evaluation of size of objects depending on sizes
> and
> > > > > compression algorithms.
> > > > >
> > > > > 3) 'put-avg" - shows average time of the "put" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 4) 'put-thrpt" - shows throughput of the "put" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 5) 'get-avg" - shows average time of the "get" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > > 6) 'get-thrpt" - shows throughput of the "get" operation depending
> on
> > > > size
> > > > > and compression algorithms.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >:
> > > > >
> > > > > > Vladimir, I am not sure how to interpret the graphs? What are we
> > > > looking
> > > > > > at?
> > > > > >
> > > > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > > > daradu...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi, Igniters.
> > > > > > >
> > > > > > > I've prepared some benchmarking. Results [1].
> > > > > > >
> > > > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > > > >
> > > > > > > I hope that helps to interest the community and accelerates a
> > > > reaction
> > > > > to
> > > > > > > this improvment :)
> > > > > > >
> > > > > > > [1]
> > > > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > > > master/src/main/resources/result
> > > > > > > [2] https://drive.google.com/file/d/
> > 0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > > > view
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >:
> > > > > > >
> > > > > > > > Guys, any thoughts?
> > > > > > > >
> > > > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > >:
> > > > > > > >
> > > > > > > >> Hi guys,
> > > > > > > >>
> > > > > > > >> I've prepared the PR to show my idea.
> > > > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > > > >>
> > > > > > > >> About querying - I've just copied existing

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев

Vyacheslav,

Is it possible to propose implementation that can be switched on on-demand?
In this case it should not affect performance of current solution.

I mean, that users should make decision what is more important for them:
throutput or memory/net usage.
May be they will be choose not all objects, or only some attributes of
objects for compress.

2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur :

> Conclusion:
> Provided solution allows reduce size of an object in IgniteCache at the
> cost of throughput reduction (small - in some cases), it depends on part of
> object which will be compressed and compression algorithm.
> I mean, we can make more effective use of memory, and in some cases it can
> reduce loading of the interconnect. (replication, rebalancing)
>
> Especially, it will be particularly useful for object's fields which are
> large text (>~ 250 bytes) and can be effectively compressed.
>
> 2017-06-06 12:00 GMT+03:00 Антон Чураев :
>
> > Vyacheslav, thank you! But could you please provide a conclusions or
> > proposals based on this benchmarks?
> >
> > 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :
> >
> > > Dmitry,
> > >
> > > Excel-pages:
> > >
> > > 1). "Compression ratio (2)" - shows object size, with compression and
> > > without compression. (Conditions: literal text)
> > > 1st graph shows compression ratios of using different compression
> > algrithms
> > > depending on size of compressed field.
> > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > compression algorithms.
> > >
> > > 2). "Compression ratio (1)" - shows object size, with compression and
> > > without compression. (Conditions:  badly compressed character sequence)
> > > 1st graph shows compression ratios of using different compression
> > > algrithms depending on size of compressed field.
> > > 2nd graph shows evaluation of size of objects depending on sizes and
> > > compression algorithms.
> > >
> > > 3) 'put-avg" - shows average time of the "put" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 5) 'get-avg" - shows average time of the "get" operation depending on
> > size
> > > and compression algorithms.
> > >
> > > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
> > size
> > > and compression algorithms.
> > >
> > >
> > >
> > >
> > > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :
> > >
> > > > Vladimir, I am not sure how to interpret the graphs? What are we
> > looking
> > > > at?
> > > >
> > > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi, Igniters.
> > > > >
> > > > > I've prepared some benchmarking. Results [1].
> > > > >
> > > > > And I've prepared the evaluation in the form of diagrams [2].
> > > > >
> > > > > I hope that helps to interest the community and accelerates a
> > reaction
> > > to
> > > > > this improvment :)
> > > > >
> > > > > [1]
> > > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > > master/src/main/resources/result
> > > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> > view
> > > > >
> > > > >
> > > > >
> > > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > > > Guys, any thoughts?
> > > > > >
> > > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > > > >
> > > > > >> Hi guys,
> > > > > >>
> > > > > >> I've prepared the PR to show my idea.
> > > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > > >>
> > > > > >> About querying - I've just copied existing tests and have
> > annotated
> > > > the
> > > > > >> testing data.
> > > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > > > >> f4058141d059bb577e75244764
> > > > > >>
> > > > > >> It means fields which will be marked by @BinaryCompression will
> be
> > > > > >> compressed at marshalling via BinaryMarshaller.
> > > > > >>
> > > > > >> This solution has no effect on existing data or project
> > > architecture.
> > > > > >>
> > > > > >> I'll be glad to see your thougths.
> > > > > >>
> > > > > >>
> > > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >:
> > > > > >>
> > > > > >>> Dmitriy,
> > > > > >>>
> > > > > >>> I have ready prototype. I want to show it.
> > > > > >>> It is always easier to discuss on example.
> > > > > >>>
> > > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > > dsetrak...@apache.org
> > > > >:
> > > > > >>>
> > > > >  Vyacheslav,
> > > > > 
> > > > >  I think it is a bit premature to provide a PR without getting
> a
> > > > >  community
> > > > >  consensus on the dev list. Please allow some time for the
> > > community
> > > > to
> >

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

I wish to note, that the results of benchmarking shows metrics from
stress-testing.
I mean in real scenarios, for example business operations, which take
milliseconds or seсonds, increase in time of put-get-operation will be
insignificant.

2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur :

> Conclusion:
> Provided solution allows reduce size of an object in IgniteCache at the
> cost of throughput reduction (small - in some cases), it depends on part of
> object which will be compressed and compression algorithm.
> I mean, we can make more effective use of memory, and in some cases it can
> reduce loading of the interconnect. (replication, rebalancing)
>
> Especially, it will be particularly useful for object's fields which are
> large text (>~ 250 bytes) and can be effectively compressed.
>
> 2017-06-06 12:00 GMT+03:00 Антон Чураев :
>
>> Vyacheslav, thank you! But could you please provide a conclusions or
>> proposals based on this benchmarks?
>>
>> 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :
>>
>> > Dmitry,
>> >
>> > Excel-pages:
>> >
>> > 1). "Compression ratio (2)" - shows object size, with compression and
>> > without compression. (Conditions: literal text)
>> > 1st graph shows compression ratios of using different compression
>> algrithms
>> > depending on size of compressed field.
>> > 2nd graph shows evaluation of size of objects depending on sizes and
>> > compression algorithms.
>> >
>> > 2). "Compression ratio (1)" - shows object size, with compression and
>> > without compression. (Conditions:  badly compressed character sequence)
>> > 1st graph shows compression ratios of using different compression
>> > algrithms depending on size of compressed field.
>> > 2nd graph shows evaluation of size of objects depending on sizes and
>> > compression algorithms.
>> >
>> > 3) 'put-avg" - shows average time of the "put" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 5) 'get-avg" - shows average time of the "get" operation depending on
>> size
>> > and compression algorithms.
>> >
>> > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
>> size
>> > and compression algorithms.
>> >
>> >
>> >
>> >
>> > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :
>> >
>> > > Vladimir, I am not sure how to interpret the graphs? What are we
>> looking
>> > > at?
>> > >
>> > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
>> daradu...@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Hi, Igniters.
>> > > >
>> > > > I've prepared some benchmarking. Results [1].
>> > > >
>> > > > And I've prepared the evaluation in the form of diagrams [2].
>> > > >
>> > > > I hope that helps to interest the community and accelerates a
>> reaction
>> > to
>> > > > this improvment :)
>> > > >
>> > > > [1]
>> > > > https://github.com/daradurvs/ignite-compression/tree/
>> > > > master/src/main/resources/result
>> > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk
>> /view
>> > > >
>> > > >
>> > > >
>> > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
>> > > >
>> > > > > Guys, any thoughts?
>> > > > >
>> > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur <
>> daradu...@gmail.com>:
>> > > > >
>> > > > >> Hi guys,
>> > > > >>
>> > > > >> I've prepared the PR to show my idea.
>> > > > >> https://github.com/apache/ignite/pull/1951/files
>> > > > >>
>> > > > >> About querying - I've just copied existing tests and have
>> annotated
>> > > the
>> > > > >> testing data.
>> > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> > > > >> f4058141d059bb577e75244764
>> > > > >>
>> > > > >> It means fields which will be marked by @BinaryCompression will
>> be
>> > > > >> compressed at marshalling via BinaryMarshaller.
>> > > > >>
>> > > > >> This solution has no effect on existing data or project
>> > architecture.
>> > > > >>
>> > > > >> I'll be glad to see your thougths.
>> > > > >>
>> > > > >>
>> > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
>> daradu...@gmail.com
>> > >:
>> > > > >>
>> > > > >>> Dmitriy,
>> > > > >>>
>> > > > >>> I have ready prototype. I want to show it.
>> > > > >>> It is always easier to discuss on example.
>> > > > >>>
>> > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
>> > dsetrak...@apache.org
>> > > >:
>> > > > >>>
>> > > >  Vyacheslav,
>> > > > 
>> > > >  I think it is a bit premature to provide a PR without getting a
>> > > >  community
>> > > >  consensus on the dev list. Please allow some time for the
>> > community
>> > > to
>> > > >  respond.
>> > > > 
>> > > >  D.
>> > > > 
>> > > >  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>> > > >  daradu...@gmail.com>
>> > > >  wrote:
>> > > > 
>> > > >  > I created the ticket:

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

Conclusion:
Provided solution allows reduce size of an object in IgniteCache at the
cost of throughput reduction (small - in some cases), it depends on part of
object which will be compressed and compression algorithm.
I mean, we can make more effective use of memory, and in some cases it can
reduce loading of the interconnect. (replication, rebalancing)

Especially, it will be particularly useful for object's fields which are
large text (>~ 250 bytes) and can be effectively compressed.

2017-06-06 12:00 GMT+03:00 Антон Чураев :

> Vyacheslav, thank you! But could you please provide a conclusions or
> proposals based on this benchmarks?
>
> 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :
>
> > Dmitry,
> >
> > Excel-pages:
> >
> > 1). "Compression ratio (2)" - shows object size, with compression and
> > without compression. (Conditions: literal text)
> > 1st graph shows compression ratios of using different compression
> algrithms
> > depending on size of compressed field.
> > 2nd graph shows evaluation of size of objects depending on sizes and
> > compression algorithms.
> >
> > 2). "Compression ratio (1)" - shows object size, with compression and
> > without compression. (Conditions:  badly compressed character sequence)
> > 1st graph shows compression ratios of using different compression
> > algrithms depending on size of compressed field.
> > 2nd graph shows evaluation of size of objects depending on sizes and
> > compression algorithms.
> >
> > 3) 'put-avg" - shows average time of the "put" operation depending on
> size
> > and compression algorithms.
> >
> > 4) 'put-thrpt" - shows throughput of the "put" operation depending on
> size
> > and compression algorithms.
> >
> > 5) 'get-avg" - shows average time of the "get" operation depending on
> size
> > and compression algorithms.
> >
> > 6) 'get-thrpt" - shows throughput of the "get" operation depending on
> size
> > and compression algorithms.
> >
> >
> >
> >
> > 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :
> >
> > > Vladimir, I am not sure how to interpret the graphs? What are we
> looking
> > > at?
> > >
> > > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi, Igniters.
> > > >
> > > > I've prepared some benchmarking. Results [1].
> > > >
> > > > And I've prepared the evaluation in the form of diagrams [2].
> > > >
> > > > I hope that helps to interest the community and accelerates a
> reaction
> > to
> > > > this improvment :)
> > > >
> > > > [1]
> > > > https://github.com/daradurvs/ignite-compression/tree/
> > > > master/src/main/resources/result
> > > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/
> view
> > > >
> > > >
> > > >
> > > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > > > Guys, any thoughts?
> > > > >
> > > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur  >:
> > > > >
> > > > >> Hi guys,
> > > > >>
> > > > >> I've prepared the PR to show my idea.
> > > > >> https://github.com/apache/ignite/pull/1951/files
> > > > >>
> > > > >> About querying - I've just copied existing tests and have
> annotated
> > > the
> > > > >> testing data.
> > > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > > >> f4058141d059bb577e75244764
> > > > >>
> > > > >> It means fields which will be marked by @BinaryCompression will be
> > > > >> compressed at marshalling via BinaryMarshaller.
> > > > >>
> > > > >> This solution has no effect on existing data or project
> > architecture.
> > > > >>
> > > > >> I'll be glad to see your thougths.
> > > > >>
> > > > >>
> > > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> > > > >>
> > > > >>> Dmitriy,
> > > > >>>
> > > > >>> I have ready prototype. I want to show it.
> > > > >>> It is always easier to discuss on example.
> > > > >>>
> > > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> > dsetrak...@apache.org
> > > >:
> > > > >>>
> > > >  Vyacheslav,
> > > > 
> > > >  I think it is a bit premature to provide a PR without getting a
> > > >  community
> > > >  consensus on the dev list. Please allow some time for the
> > community
> > > to
> > > >  respond.
> > > > 
> > > >  D.
> > > > 
> > > >  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > > >  daradu...@gmail.com>
> > > >  wrote:
> > > > 
> > > >  > I created the ticket: https://issues.apache.org/jira
> > > >  /browse/IGNITE-5226
> > > >  >
> > > >  > I'll prepare a PR with described solution in couple of days.
> > > >  >
> > > >  > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > > daradu...@gmail.com
> > > > >:
> > > >  >
> > > >  > > Hi, Igniters!
> > > >  > >
> > > >  > > Apache 2.0 is released.
> > > >  > >
> > > >  > > Let's continue the discussion about a compression design.
>

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев

Vyacheslav, thank you! But could you please provide a conclusions or
proposals based on this benchmarks?

2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :

> Dmitry,
>
> Excel-pages:
>
> 1). "Compression ratio (2)" - shows object size, with compression and
> without compression. (Conditions: literal text)
> 1st graph shows compression ratios of using different compression algrithms
> depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 2). "Compression ratio (1)" - shows object size, with compression and
> without compression. (Conditions:  badly compressed character sequence)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 3) 'put-avg" - shows average time of the "put" operation depending on size
> and compression algorithms.
>
> 4) 'put-thrpt" - shows throughput of the "put" operation depending on size
> and compression algorithms.
>
> 5) 'get-avg" - shows average time of the "get" operation depending on size
> and compression algorithms.
>
> 6) 'get-thrpt" - shows throughput of the "get" operation depending on size
> and compression algorithms.
>
>
>
>
> 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :
>
> > Vladimir, I am not sure how to interpret the graphs? What are we looking
> > at?
> >
> > On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur  >
> > wrote:
> >
> > > Hi, Igniters.
> > >
> > > I've prepared some benchmarking. Results [1].
> > >
> > > And I've prepared the evaluation in the form of diagrams [2].
> > >
> > > I hope that helps to interest the community and accelerates a reaction
> to
> > > this improvment :)
> > >
> > > [1]
> > > https://github.com/daradurvs/ignite-compression/tree/
> > > master/src/main/resources/result
> > > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
> > >
> > >
> > >
> > > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
> > >
> > > > Guys, any thoughts?
> > > >
> > > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :
> > > >
> > > >> Hi guys,
> > > >>
> > > >> I've prepared the PR to show my idea.
> > > >> https://github.com/apache/ignite/pull/1951/files
> > > >>
> > > >> About querying - I've just copied existing tests and have annotated
> > the
> > > >> testing data.
> > > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > > >> f4058141d059bb577e75244764
> > > >>
> > > >> It means fields which will be marked by @BinaryCompression will be
> > > >> compressed at marshalling via BinaryMarshaller.
> > > >>
> > > >> This solution has no effect on existing data or project
> architecture.
> > > >>
> > > >> I'll be glad to see your thougths.
> > > >>
> > > >>
> > > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur  >:
> > > >>
> > > >>> Dmitriy,
> > > >>>
> > > >>> I have ready prototype. I want to show it.
> > > >>> It is always easier to discuss on example.
> > > >>>
> > > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan <
> dsetrak...@apache.org
> > >:
> > > >>>
> > >  Vyacheslav,
> > > 
> > >  I think it is a bit premature to provide a PR without getting a
> > >  community
> > >  consensus on the dev list. Please allow some time for the
> community
> > to
> > >  respond.
> > > 
> > >  D.
> > > 
> > >  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> > >  daradu...@gmail.com>
> > >  wrote:
> > > 
> > >  > I created the ticket: https://issues.apache.org/jira
> > >  /browse/IGNITE-5226
> > >  >
> > >  > I'll prepare a PR with described solution in couple of days.
> > >  >
> > >  > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >:
> > >  >
> > >  > > Hi, Igniters!
> > >  > >
> > >  > > Apache 2.0 is released.
> > >  > >
> > >  > > Let's continue the discussion about a compression design.
> > >  > >
> > >  > > At the moment, I found only one solution which is compatible
> > with
> > >  > querying
> > >  > > and indexing, this is per-objects-field compression.
> > >  > > Per-fields compression means that metadata (a header) of an
> > object
> > >  won't
> > >  > > be compressed, only serialized values of an object fields (in
> > > bytes
> > >  array
> > >  > > form) will be compressed.
> > >  > >
> > >  > > This solution have some contentious issues:
> > >  > > - small values, like primitives and short arrays - there isn't
> > >  sense to
> > >  > > compress them;
> > >  > > - there is no possible to use compression with java-predefined
> > >  types;
> > >  > >
> > >  > > We can provide an annotation, @IgniteCompression - for
> example,
> > >

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

All metrics are taken from app based on custom assembly of AI, containing
the provided PR.

2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur :

> Dmitry,
>
> Excel-pages:
>
> 1). "Compression ratio (2)" - shows object size, with compression and
> without compression. (Conditions: literal text)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 2). "Compression ratio (1)" - shows object size, with compression and
> without compression. (Conditions:  badly compressed character sequence)
> 1st graph shows compression ratios of using different compression
> algrithms depending on size of compressed field.
> 2nd graph shows evaluation of size of objects depending on sizes and
> compression algorithms.
>
> 3) 'put-avg" - shows average time of the "put" operation depending on size
> and compression algorithms.
>
> 4) 'put-thrpt" - shows throughput of the "put" operation depending on size
> and compression algorithms.
>
> 5) 'get-avg" - shows average time of the "get" operation depending on size
> and compression algorithms.
>
> 6) 'get-thrpt" - shows throughput of the "get" operation depending on size
> and compression algorithms.
>
>
>
>
> 2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :
>
>> Vladimir, I am not sure how to interpret the graphs? What are we looking
>> at?
>>
>> On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur 
>> wrote:
>>
>> > Hi, Igniters.
>> >
>> > I've prepared some benchmarking. Results [1].
>> >
>> > And I've prepared the evaluation in the form of diagrams [2].
>> >
>> > I hope that helps to interest the community and accelerates a reaction
>> to
>> > this improvment :)
>> >
>> > [1]
>> > https://github.com/daradurvs/ignite-compression/tree/
>> > master/src/main/resources/result
>> > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
>> >
>> >
>> >
>> > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
>> >
>> > > Guys, any thoughts?
>> > >
>> > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :
>> > >
>> > >> Hi guys,
>> > >>
>> > >> I've prepared the PR to show my idea.
>> > >> https://github.com/apache/ignite/pull/1951/files
>> > >>
>> > >> About querying - I've just copied existing tests and have annotated
>> the
>> > >> testing data.
>> > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> > >> f4058141d059bb577e75244764
>> > >>
>> > >> It means fields which will be marked by @BinaryCompression will be
>> > >> compressed at marshalling via BinaryMarshaller.
>> > >>
>> > >> This solution has no effect on existing data or project architecture.
>> > >>
>> > >> I'll be glad to see your thougths.
>> > >>
>> > >>
>> > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :
>> > >>
>> > >>> Dmitriy,
>> > >>>
>> > >>> I have ready prototype. I want to show it.
>> > >>> It is always easier to discuss on example.
>> > >>>
>> > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan > >:
>> > >>>
>> >  Vyacheslav,
>> > 
>> >  I think it is a bit premature to provide a PR without getting a
>> >  community
>> >  consensus on the dev list. Please allow some time for the
>> community to
>> >  respond.
>> > 
>> >  D.
>> > 
>> >  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>> >  daradu...@gmail.com>
>> >  wrote:
>> > 
>> >  > I created the ticket: https://issues.apache.org/jira
>> >  /browse/IGNITE-5226
>> >  >
>> >  > I'll prepare a PR with described solution in couple of days.
>> >  >
>> >  > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
>> daradu...@gmail.com
>> > >:
>> >  >
>> >  > > Hi, Igniters!
>> >  > >
>> >  > > Apache 2.0 is released.
>> >  > >
>> >  > > Let's continue the discussion about a compression design.
>> >  > >
>> >  > > At the moment, I found only one solution which is compatible
>> with
>> >  > querying
>> >  > > and indexing, this is per-objects-field compression.
>> >  > > Per-fields compression means that metadata (a header) of an
>> object
>> >  won't
>> >  > > be compressed, only serialized values of an object fields (in
>> > bytes
>> >  array
>> >  > > form) will be compressed.
>> >  > >
>> >  > > This solution have some contentious issues:
>> >  > > - small values, like primitives and short arrays - there isn't
>> >  sense to
>> >  > > compress them;
>> >  > > - there is no possible to use compression with java-predefined
>> >  types;
>> >  > >
>> >  > > We can provide an annotation, @IgniteCompression - for example,
>> >  which can
>> >  > > be used by users for marking fields to compress.
>> >  > >
>> >  > > Any thoughts?
>> >  > >
>> >

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

Dmitry,

Excel-pages:

1). "Compression ratio (2)" - shows object size, with compression and
without compression. (Conditions: literal text)
1st graph shows compression ratios of using different compression algrithms
depending on size of compressed field.
2nd graph shows evaluation of size of objects depending on sizes and
compression algorithms.

2). "Compression ratio (1)" - shows object size, with compression and
without compression. (Conditions:  badly compressed character sequence)
1st graph shows compression ratios of using different compression
algrithms depending on size of compressed field.
2nd graph shows evaluation of size of objects depending on sizes and
compression algorithms.

3) 'put-avg" - shows average time of the "put" operation depending on size
and compression algorithms.

4) 'put-thrpt" - shows throughput of the "put" operation depending on size
and compression algorithms.

5) 'get-avg" - shows average time of the "get" operation depending on size
and compression algorithms.

6) 'get-thrpt" - shows throughput of the "get" operation depending on size
and compression algorithms.




2017-06-06 10:59 GMT+03:00 Dmitriy Setrakyan :

> Vladimir, I am not sure how to interpret the graphs? What are we looking
> at?
>
> On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur 
> wrote:
>
> > Hi, Igniters.
> >
> > I've prepared some benchmarking. Results [1].
> >
> > And I've prepared the evaluation in the form of diagrams [2].
> >
> > I hope that helps to interest the community and accelerates a reaction to
> > this improvment :)
> >
> > [1]
> > https://github.com/daradurvs/ignite-compression/tree/
> > master/src/main/resources/result
> > [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
> >
> >
> >
> > 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
> >
> > > Guys, any thoughts?
> > >
> > > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :
> > >
> > >> Hi guys,
> > >>
> > >> I've prepared the PR to show my idea.
> > >> https://github.com/apache/ignite/pull/1951/files
> > >>
> > >> About querying - I've just copied existing tests and have annotated
> the
> > >> testing data.
> > >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> > >> f4058141d059bb577e75244764
> > >>
> > >> It means fields which will be marked by @BinaryCompression will be
> > >> compressed at marshalling via BinaryMarshaller.
> > >>
> > >> This solution has no effect on existing data or project architecture.
> > >>
> > >> I'll be glad to see your thougths.
> > >>
> > >>
> > >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :
> > >>
> > >>> Dmitriy,
> > >>>
> > >>> I have ready prototype. I want to show it.
> > >>> It is always easier to discuss on example.
> > >>>
> > >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan  >:
> > >>>
> >  Vyacheslav,
> > 
> >  I think it is a bit premature to provide a PR without getting a
> >  community
> >  consensus on the dev list. Please allow some time for the community
> to
> >  respond.
> > 
> >  D.
> > 
> >  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
> >  daradu...@gmail.com>
> >  wrote:
> > 
> >  > I created the ticket: https://issues.apache.org/jira
> >  /browse/IGNITE-5226
> >  >
> >  > I'll prepare a PR with described solution in couple of days.
> >  >
> >  > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
> > >:
> >  >
> >  > > Hi, Igniters!
> >  > >
> >  > > Apache 2.0 is released.
> >  > >
> >  > > Let's continue the discussion about a compression design.
> >  > >
> >  > > At the moment, I found only one solution which is compatible
> with
> >  > querying
> >  > > and indexing, this is per-objects-field compression.
> >  > > Per-fields compression means that metadata (a header) of an
> object
> >  won't
> >  > > be compressed, only serialized values of an object fields (in
> > bytes
> >  array
> >  > > form) will be compressed.
> >  > >
> >  > > This solution have some contentious issues:
> >  > > - small values, like primitives and short arrays - there isn't
> >  sense to
> >  > > compress them;
> >  > > - there is no possible to use compression with java-predefined
> >  types;
> >  > >
> >  > > We can provide an annotation, @IgniteCompression - for example,
> >  which can
> >  > > be used by users for marking fields to compress.
> >  > >
> >  > > Any thoughts?
> >  > >
> >  > > Maybe someone already have ready design?
> >  > >
> >  > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> > daradu...@gmail.com
> >  >:
> >  > >
> >  > >> Alexey,
> >  > >>
> >  > >> Yes, I've read it.
> >  > >>
> >  > >> Ok, let's discuss about public API design.
> >  > >>
> >  >

Re: Data compression in Ignite 2.0

2017-06-06 Thread Dmitriy Setrakyan

Vladimir, I am not sure how to interpret the graphs? What are we looking at?

On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur 
wrote:

> Hi, Igniters.
>
> I've prepared some benchmarking. Results [1].
>
> And I've prepared the evaluation in the form of diagrams [2].
>
> I hope that helps to interest the community and accelerates a reaction to
> this improvment :)
>
> [1]
> https://github.com/daradurvs/ignite-compression/tree/
> master/src/main/resources/result
> [2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view
>
>
>
> 2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :
>
> > Guys, any thoughts?
> >
> > 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :
> >
> >> Hi guys,
> >>
> >> I've prepared the PR to show my idea.
> >> https://github.com/apache/ignite/pull/1951/files
> >>
> >> About querying - I've just copied existing tests and have annotated the
> >> testing data.
> >> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
> >> f4058141d059bb577e75244764
> >>
> >> It means fields which will be marked by @BinaryCompression will be
> >> compressed at marshalling via BinaryMarshaller.
> >>
> >> This solution has no effect on existing data or project architecture.
> >>
> >> I'll be glad to see your thougths.
> >>
> >>
> >> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :
> >>
> >>> Dmitriy,
> >>>
> >>> I have ready prototype. I want to show it.
> >>> It is always easier to discuss on example.
> >>>
> >>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan :
> >>>
>  Vyacheslav,
> 
>  I think it is a bit premature to provide a PR without getting a
>  community
>  consensus on the dev list. Please allow some time for the community to
>  respond.
> 
>  D.
> 
>  On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
>  daradu...@gmail.com>
>  wrote:
> 
>  > I created the ticket: https://issues.apache.org/jira
>  /browse/IGNITE-5226
>  >
>  > I'll prepare a PR with described solution in couple of days.
>  >
>  > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur  >:
>  >
>  > > Hi, Igniters!
>  > >
>  > > Apache 2.0 is released.
>  > >
>  > > Let's continue the discussion about a compression design.
>  > >
>  > > At the moment, I found only one solution which is compatible with
>  > querying
>  > > and indexing, this is per-objects-field compression.
>  > > Per-fields compression means that metadata (a header) of an object
>  won't
>  > > be compressed, only serialized values of an object fields (in
> bytes
>  array
>  > > form) will be compressed.
>  > >
>  > > This solution have some contentious issues:
>  > > - small values, like primitives and short arrays - there isn't
>  sense to
>  > > compress them;
>  > > - there is no possible to use compression with java-predefined
>  types;
>  > >
>  > > We can provide an annotation, @IgniteCompression - for example,
>  which can
>  > > be used by users for marking fields to compress.
>  > >
>  > > Any thoughts?
>  > >
>  > > Maybe someone already have ready design?
>  > >
>  > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur <
> daradu...@gmail.com
>  >:
>  > >
>  > >> Alexey,
>  > >>
>  > >> Yes, I've read it.
>  > >>
>  > >> Ok, let's discuss about public API design.
>  > >>
>  > >> I think we need to add some a configure entity to
>  CacheConfiguration,
>  > >> which will contain the Compressor interface implementation and
> some
>  > usefull
>  > >> parameters.
>  > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>  compress
>  > >> data after marshalling.
>  > >>
>  > >>
>  > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov <
> akuznet...@apache.org
>  >:
>  > >>
>  > >>> Vyacheslav,
>  > >>>
>  > >>> Did you read initial discussion [1] about compression?
>  > >>> As far as I remember we agreed to add only some "top-level" API
> in
>  > order
>  > >>> to
>  > >>> provide a way for
>  > >>> Ignite users to inject some sort of custom compression.
>  > >>>
>  > >>>
>  > >>> [1]
>  > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>  > >>> ompression-in-Ignite-2-0-td10099.html
>  > >>>
>  > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs  >
>  > wrote:
>  > >>>
>  > >>> > Hi Igniters!
>  > >>> >
>  > >>> > I am interested in this task.
>  > >>> > Provide some kind of pluggable compression SPI support
>  > >>> > 
>  > >>> >
>  > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
>  has
>  > >>> rejected
>  > >>> > it.

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur

Hi, Igniters.

I've prepared some benchmarking. Results [1].

And I've prepared the evaluation in the form of diagrams [2].

I hope that helps to interest the community and accelerates a reaction to
this improvment :)

[1]
https://github.com/daradurvs/ignite-compression/tree/master/src/main/resources/result
[2] https://drive.google.com/file/d/0B2CeUAOgrHkoMklyZ25YTEdKcEk/view



2017-05-24 9:49 GMT+03:00 Vyacheslav Daradur :

> Guys, any thoughts?
>
> 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :
>
>> Hi guys,
>>
>> I've prepared the PR to show my idea.
>> https://github.com/apache/ignite/pull/1951/files
>>
>> About querying - I've just copied existing tests and have annotated the
>> testing data.
>> https://github.com/apache/ignite/pull/1951/files#diff-c19a9d
>> f4058141d059bb577e75244764
>>
>> It means fields which will be marked by @BinaryCompression will be
>> compressed at marshalling via BinaryMarshaller.
>>
>> This solution has no effect on existing data or project architecture.
>>
>> I'll be glad to see your thougths.
>>
>>
>> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :
>>
>>> Dmitriy,
>>>
>>> I have ready prototype. I want to show it.
>>> It is always easier to discuss on example.
>>>
>>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan :
>>>
 Vyacheslav,

 I think it is a bit premature to provide a PR without getting a
 community
 consensus on the dev list. Please allow some time for the community to
 respond.

 D.

 On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur <
 daradu...@gmail.com>
 wrote:

 > I created the ticket: https://issues.apache.org/jira
 /browse/IGNITE-5226
 >
 > I'll prepare a PR with described solution in couple of days.
 >
 > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :
 >
 > > Hi, Igniters!
 > >
 > > Apache 2.0 is released.
 > >
 > > Let's continue the discussion about a compression design.
 > >
 > > At the moment, I found only one solution which is compatible with
 > querying
 > > and indexing, this is per-objects-field compression.
 > > Per-fields compression means that metadata (a header) of an object
 won't
 > > be compressed, only serialized values of an object fields (in bytes
 array
 > > form) will be compressed.
 > >
 > > This solution have some contentious issues:
 > > - small values, like primitives and short arrays - there isn't
 sense to
 > > compress them;
 > > - there is no possible to use compression with java-predefined
 types;
 > >
 > > We can provide an annotation, @IgniteCompression - for example,
 which can
 > > be used by users for marking fields to compress.
 > >
 > > Any thoughts?
 > >
 > > Maybe someone already have ready design?
 > >
 > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
 > >
 > >> Alexey,
 > >>
 > >> Yes, I've read it.
 > >>
 > >> Ok, let's discuss about public API design.
 > >>
 > >> I think we need to add some a configure entity to
 CacheConfiguration,
 > >> which will contain the Compressor interface implementation and some
 > usefull
 > >> parameters.
 > >> Or maybe to provide a BinaryMarshaller decorator, which will be
 compress
 > >> data after marshalling.
 > >>
 > >>
 > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
 > >>
 > >>> Vyacheslav,
 > >>>
 > >>> Did you read initial discussion [1] about compression?
 > >>> As far as I remember we agreed to add only some "top-level" API in
 > order
 > >>> to
 > >>> provide a way for
 > >>> Ignite users to inject some sort of custom compression.
 > >>>
 > >>>
 > >>> [1]
 > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
 > >>> ompression-in-Ignite-2-0-td10099.html
 > >>>
 > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs 
 > wrote:
 > >>>
 > >>> > Hi Igniters!
 > >>> >
 > >>> > I am interested in this task.
 > >>> > Provide some kind of pluggable compression SPI support
 > >>> > 
 > >>> >
 > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
 has
 > >>> rejected
 > >>> > it.
 > >>> >
 > >>> > Let's continue discussion of task goals and solution design.
 > >>> > As I understood that, the main goal of this task is to store
 data in
 > >>> > compressed form.
 > >>> > This is what I need from Ignite as its user. Compression
 provides
 > >>> economy
 > >>> > on
 > >>> > servers.
 > >>> > We can store more data on same servers at the cost of
 increasing CPU
 > >>> > utilization.
 >

Re: Data compression in Ignite 2.0

2017-05-24 Thread Vyacheslav Daradur

Guys, any thoughts?

2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur :

> Hi guys,
>
> I've prepared the PR to show my idea.
> https://github.com/apache/ignite/pull/1951/files
>
> About querying - I've just copied existing tests and have annotated the
> testing data.
> https://github.com/apache/ignite/pull/1951/files#diff-
> c19a9df4058141d059bb577e75244764
>
> It means fields which will be marked by @BinaryCompression will be
> compressed at marshalling via BinaryMarshaller.
>
> This solution has no effect on existing data or project architecture.
>
> I'll be glad to see your thougths.
>
>
> 2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :
>
>> Dmitriy,
>>
>> I have ready prototype. I want to show it.
>> It is always easier to discuss on example.
>>
>> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan :
>>
>>> Vyacheslav,
>>>
>>> I think it is a bit premature to provide a PR without getting a community
>>> consensus on the dev list. Please allow some time for the community to
>>> respond.
>>>
>>> D.
>>>
>>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur >> >
>>> wrote:
>>>
>>> > I created the ticket: https://issues.apache.org/jira
>>> /browse/IGNITE-5226
>>> >
>>> > I'll prepare a PR with described solution in couple of days.
>>> >
>>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :
>>> >
>>> > > Hi, Igniters!
>>> > >
>>> > > Apache 2.0 is released.
>>> > >
>>> > > Let's continue the discussion about a compression design.
>>> > >
>>> > > At the moment, I found only one solution which is compatible with
>>> > querying
>>> > > and indexing, this is per-objects-field compression.
>>> > > Per-fields compression means that metadata (a header) of an object
>>> won't
>>> > > be compressed, only serialized values of an object fields (in bytes
>>> array
>>> > > form) will be compressed.
>>> > >
>>> > > This solution have some contentious issues:
>>> > > - small values, like primitives and short arrays - there isn't sense
>>> to
>>> > > compress them;
>>> > > - there is no possible to use compression with java-predefined types;
>>> > >
>>> > > We can provide an annotation, @IgniteCompression - for example,
>>> which can
>>> > > be used by users for marking fields to compress.
>>> > >
>>> > > Any thoughts?
>>> > >
>>> > > Maybe someone already have ready design?
>>> > >
>>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
>>> > >
>>> > >> Alexey,
>>> > >>
>>> > >> Yes, I've read it.
>>> > >>
>>> > >> Ok, let's discuss about public API design.
>>> > >>
>>> > >> I think we need to add some a configure entity to
>>> CacheConfiguration,
>>> > >> which will contain the Compressor interface implementation and some
>>> > usefull
>>> > >> parameters.
>>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>>> compress
>>> > >> data after marshalling.
>>> > >>
>>> > >>
>>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov >> >:
>>> > >>
>>> > >>> Vyacheslav,
>>> > >>>
>>> > >>> Did you read initial discussion [1] about compression?
>>> > >>> As far as I remember we agreed to add only some "top-level" API in
>>> > order
>>> > >>> to
>>> > >>> provide a way for
>>> > >>> Ignite users to inject some sort of custom compression.
>>> > >>>
>>> > >>>
>>> > >>> [1]
>>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>>> > >>> ompression-in-Ignite-2-0-td10099.html
>>> > >>>
>>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs 
>>> > wrote:
>>> > >>>
>>> > >>> > Hi Igniters!
>>> > >>> >
>>> > >>> > I am interested in this task.
>>> > >>> > Provide some kind of pluggable compression SPI support
>>> > >>> > 
>>> > >>> >
>>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer
>>> has
>>> > >>> rejected
>>> > >>> > it.
>>> > >>> >
>>> > >>> > Let's continue discussion of task goals and solution design.
>>> > >>> > As I understood that, the main goal of this task is to store
>>> data in
>>> > >>> > compressed form.
>>> > >>> > This is what I need from Ignite as its user. Compression provides
>>> > >>> economy
>>> > >>> > on
>>> > >>> > servers.
>>> > >>> > We can store more data on same servers at the cost of increasing
>>> CPU
>>> > >>> > utilization.
>>> > >>> >
>>> > >>> > I'm researching a possibility of implementation of compression
>>> at the
>>> > >>> > cache-level.
>>> > >>> >
>>> > >>> > Any thoughts?
>>> > >>> >
>>> > >>> > --
>>> > >>> > Best regards,
>>> > >>> > Vyacheslav
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> >
>>> > >>> > --
>>> > >>> > View this message in context: http://apache-ignite-
>>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>>> > >>> > Ignite-2-0-tp10099p16317.html
>>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
>>> > >>> Nabble.com.
>>> > >>> >
>>> > >>>
>>> > >>>
>>> > >>>
>>> > >>>

Re: Data compression in Ignite 2.0

2017-05-16 Thread Vyacheslav Daradur

Hi guys,

I've prepared the PR to show my idea.
https://github.com/apache/ignite/pull/1951/files

About querying - I've just copied existing tests and have annotated the
testing data.
https://github.com/apache/ignite/pull/1951/files#diff-c19a9df4058141d059bb577e75244764

It means fields which will be marked by @BinaryCompression will be
compressed at marshalling via BinaryMarshaller.

This solution has no effect on existing data or project architecture.

I'll be glad to see your thougths.


2017-05-15 19:18 GMT+03:00 Vyacheslav Daradur :

> Dmitriy,
>
> I have ready prototype. I want to show it.
> It is always easier to discuss on example.
>
> 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan :
>
>> Vyacheslav,
>>
>> I think it is a bit premature to provide a PR without getting a community
>> consensus on the dev list. Please allow some time for the community to
>> respond.
>>
>> D.
>>
>> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur 
>> wrote:
>>
>> > I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
>> >
>> > I'll prepare a PR with described solution in couple of days.
>> >
>> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :
>> >
>> > > Hi, Igniters!
>> > >
>> > > Apache 2.0 is released.
>> > >
>> > > Let's continue the discussion about a compression design.
>> > >
>> > > At the moment, I found only one solution which is compatible with
>> > querying
>> > > and indexing, this is per-objects-field compression.
>> > > Per-fields compression means that metadata (a header) of an object
>> won't
>> > > be compressed, only serialized values of an object fields (in bytes
>> array
>> > > form) will be compressed.
>> > >
>> > > This solution have some contentious issues:
>> > > - small values, like primitives and short arrays - there isn't sense
>> to
>> > > compress them;
>> > > - there is no possible to use compression with java-predefined types;
>> > >
>> > > We can provide an annotation, @IgniteCompression - for example, which
>> can
>> > > be used by users for marking fields to compress.
>> > >
>> > > Any thoughts?
>> > >
>> > > Maybe someone already have ready design?
>> > >
>> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
>> > >
>> > >> Alexey,
>> > >>
>> > >> Yes, I've read it.
>> > >>
>> > >> Ok, let's discuss about public API design.
>> > >>
>> > >> I think we need to add some a configure entity to CacheConfiguration,
>> > >> which will contain the Compressor interface implementation and some
>> > usefull
>> > >> parameters.
>> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
>> compress
>> > >> data after marshalling.
>> > >>
>> > >>
>> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
>> > >>
>> > >>> Vyacheslav,
>> > >>>
>> > >>> Did you read initial discussion [1] about compression?
>> > >>> As far as I remember we agreed to add only some "top-level" API in
>> > order
>> > >>> to
>> > >>> provide a way for
>> > >>> Ignite users to inject some sort of custom compression.
>> > >>>
>> > >>>
>> > >>> [1]
>> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>> > >>> ompression-in-Ignite-2-0-td10099.html
>> > >>>
>> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs 
>> > wrote:
>> > >>>
>> > >>> > Hi Igniters!
>> > >>> >
>> > >>> > I am interested in this task.
>> > >>> > Provide some kind of pluggable compression SPI support
>> > >>> > 
>> > >>> >
>> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>> > >>> rejected
>> > >>> > it.
>> > >>> >
>> > >>> > Let's continue discussion of task goals and solution design.
>> > >>> > As I understood that, the main goal of this task is to store data
>> in
>> > >>> > compressed form.
>> > >>> > This is what I need from Ignite as its user. Compression provides
>> > >>> economy
>> > >>> > on
>> > >>> > servers.
>> > >>> > We can store more data on same servers at the cost of increasing
>> CPU
>> > >>> > utilization.
>> > >>> >
>> > >>> > I'm researching a possibility of implementation of compression at
>> the
>> > >>> > cache-level.
>> > >>> >
>> > >>> > Any thoughts?
>> > >>> >
>> > >>> > --
>> > >>> > Best regards,
>> > >>> > Vyacheslav
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> >
>> > >>> > --
>> > >>> > View this message in context: http://apache-ignite-
>> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > >>> > Ignite-2-0-tp10099p16317.html
>> > >>> > Sent from the Apache Ignite Developers mailing list archive at
>> > >>> Nabble.com.
>> > >>> >
>> > >>>
>> > >>>
>> > >>>
>> > >>> --
>> > >>> Alexey Kuznetsov
>> > >>>
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Best Regards, Vyacheslav
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Best Regards, Vyacheslav
>> > >
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav
>> >
>>
>
>
>
> --
> Best Regards, Vyacheslav

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur

Dmitriy,

I have ready prototype. I want to show it.
It is always easier to discuss on example.

2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan :

> Vyacheslav,
>
> I think it is a bit premature to provide a PR without getting a community
> consensus on the dev list. Please allow some time for the community to
> respond.
>
> D.
>
> On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur 
> wrote:
>
> > I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
> >
> > I'll prepare a PR with described solution in couple of days.
> >
> > 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :
> >
> > > Hi, Igniters!
> > >
> > > Apache 2.0 is released.
> > >
> > > Let's continue the discussion about a compression design.
> > >
> > > At the moment, I found only one solution which is compatible with
> > querying
> > > and indexing, this is per-objects-field compression.
> > > Per-fields compression means that metadata (a header) of an object
> won't
> > > be compressed, only serialized values of an object fields (in bytes
> array
> > > form) will be compressed.
> > >
> > > This solution have some contentious issues:
> > > - small values, like primitives and short arrays - there isn't sense to
> > > compress them;
> > > - there is no possible to use compression with java-predefined types;
> > >
> > > We can provide an annotation, @IgniteCompression - for example, which
> can
> > > be used by users for marking fields to compress.
> > >
> > > Any thoughts?
> > >
> > > Maybe someone already have ready design?
> > >
> > > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
> > >
> > >> Alexey,
> > >>
> > >> Yes, I've read it.
> > >>
> > >> Ok, let's discuss about public API design.
> > >>
> > >> I think we need to add some a configure entity to CacheConfiguration,
> > >> which will contain the Compressor interface implementation and some
> > usefull
> > >> parameters.
> > >> Or maybe to provide a BinaryMarshaller decorator, which will be
> compress
> > >> data after marshalling.
> > >>
> > >>
> > >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
> > >>
> > >>> Vyacheslav,
> > >>>
> > >>> Did you read initial discussion [1] about compression?
> > >>> As far as I remember we agreed to add only some "top-level" API in
> > order
> > >>> to
> > >>> provide a way for
> > >>> Ignite users to inject some sort of custom compression.
> > >>>
> > >>>
> > >>> [1]
> > >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> > >>> ompression-in-Ignite-2-0-td10099.html
> > >>>
> > >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs 
> > wrote:
> > >>>
> > >>> > Hi Igniters!
> > >>> >
> > >>> > I am interested in this task.
> > >>> > Provide some kind of pluggable compression SPI support
> > >>> > 
> > >>> >
> > >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
> > >>> rejected
> > >>> > it.
> > >>> >
> > >>> > Let's continue discussion of task goals and solution design.
> > >>> > As I understood that, the main goal of this task is to store data
> in
> > >>> > compressed form.
> > >>> > This is what I need from Ignite as its user. Compression provides
> > >>> economy
> > >>> > on
> > >>> > servers.
> > >>> > We can store more data on same servers at the cost of increasing
> CPU
> > >>> > utilization.
> > >>> >
> > >>> > I'm researching a possibility of implementation of compression at
> the
> > >>> > cache-level.
> > >>> >
> > >>> > Any thoughts?
> > >>> >
> > >>> > --
> > >>> > Best regards,
> > >>> > Vyacheslav
> > >>> >
> > >>> >
> > >>> >
> > >>> >
> > >>> > --
> > >>> > View this message in context: http://apache-ignite-
> > >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> > >>> > Ignite-2-0-tp10099p16317.html
> > >>> > Sent from the Apache Ignite Developers mailing list archive at
> > >>> Nabble.com.
> > >>> >
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Alexey Kuznetsov
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Best Regards, Vyacheslav
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards, Vyacheslav
> > >
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>



-- 
Best Regards, Vyacheslav

Re: Data compression in Ignite 2.0

2017-05-15 Thread Dmitriy Setrakyan

Vyacheslav,

I think it is a bit premature to provide a PR without getting a community
consensus on the dev list. Please allow some time for the community to
respond.

D.

On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur 
wrote:

> I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226
>
> I'll prepare a PR with described solution in couple of days.
>
> 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :
>
> > Hi, Igniters!
> >
> > Apache 2.0 is released.
> >
> > Let's continue the discussion about a compression design.
> >
> > At the moment, I found only one solution which is compatible with
> querying
> > and indexing, this is per-objects-field compression.
> > Per-fields compression means that metadata (a header) of an object won't
> > be compressed, only serialized values of an object fields (in bytes array
> > form) will be compressed.
> >
> > This solution have some contentious issues:
> > - small values, like primitives and short arrays - there isn't sense to
> > compress them;
> > - there is no possible to use compression with java-predefined types;
> >
> > We can provide an annotation, @IgniteCompression - for example, which can
> > be used by users for marking fields to compress.
> >
> > Any thoughts?
> >
> > Maybe someone already have ready design?
> >
> > 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
> >
> >> Alexey,
> >>
> >> Yes, I've read it.
> >>
> >> Ok, let's discuss about public API design.
> >>
> >> I think we need to add some a configure entity to CacheConfiguration,
> >> which will contain the Compressor interface implementation and some
> usefull
> >> parameters.
> >> Or maybe to provide a BinaryMarshaller decorator, which will be compress
> >> data after marshalling.
> >>
> >>
> >> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
> >>
> >>> Vyacheslav,
> >>>
> >>> Did you read initial discussion [1] about compression?
> >>> As far as I remember we agreed to add only some "top-level" API in
> order
> >>> to
> >>> provide a way for
> >>> Ignite users to inject some sort of custom compression.
> >>>
> >>>
> >>> [1]
> >>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
> >>> ompression-in-Ignite-2-0-td10099.html
> >>>
> >>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs 
> wrote:
> >>>
> >>> > Hi Igniters!
> >>> >
> >>> > I am interested in this task.
> >>> > Provide some kind of pluggable compression SPI support
> >>> > 
> >>> >
> >>> > I developed a solution on BinaryMarshaller-level, but reviewer has
> >>> rejected
> >>> > it.
> >>> >
> >>> > Let's continue discussion of task goals and solution design.
> >>> > As I understood that, the main goal of this task is to store data in
> >>> > compressed form.
> >>> > This is what I need from Ignite as its user. Compression provides
> >>> economy
> >>> > on
> >>> > servers.
> >>> > We can store more data on same servers at the cost of increasing CPU
> >>> > utilization.
> >>> >
> >>> > I'm researching a possibility of implementation of compression at the
> >>> > cache-level.
> >>> >
> >>> > Any thoughts?
> >>> >
> >>> > --
> >>> > Best regards,
> >>> > Vyacheslav
> >>> >
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > View this message in context: http://apache-ignite-
> >>> > developers.2346864.n4.nabble.com/Data-compression-in-
> >>> > Ignite-2-0-tp10099p16317.html
> >>> > Sent from the Apache Ignite Developers mailing list archive at
> >>> Nabble.com.
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> Alexey Kuznetsov
> >>>
> >>
> >>
> >>
> >> --
> >> Best Regards, Vyacheslav
> >>
> >
> >
> >
> > --
> > Best Regards, Vyacheslav
> >
>
>
>
> --
> Best Regards, Vyacheslav
>

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur

I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226

I'll prepare a PR with described solution in couple of days.

2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur :

> Hi, Igniters!
>
> Apache 2.0 is released.
>
> Let's continue the discussion about a compression design.
>
> At the moment, I found only one solution which is compatible with querying
> and indexing, this is per-objects-field compression.
> Per-fields compression means that metadata (a header) of an object won't
> be compressed, only serialized values of an object fields (in bytes array
> form) will be compressed.
>
> This solution have some contentious issues:
> - small values, like primitives and short arrays - there isn't sense to
> compress them;
> - there is no possible to use compression with java-predefined types;
>
> We can provide an annotation, @IgniteCompression - for example, which can
> be used by users for marking fields to compress.
>
> Any thoughts?
>
> Maybe someone already have ready design?
>
> 2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :
>
>> Alexey,
>>
>> Yes, I've read it.
>>
>> Ok, let's discuss about public API design.
>>
>> I think we need to add some a configure entity to CacheConfiguration,
>> which will contain the Compressor interface implementation and some usefull
>> parameters.
>> Or maybe to provide a BinaryMarshaller decorator, which will be compress
>> data after marshalling.
>>
>>
>> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
>>
>>> Vyacheslav,
>>>
>>> Did you read initial discussion [1] about compression?
>>> As far as I remember we agreed to add only some "top-level" API in order
>>> to
>>> provide a way for
>>> Ignite users to inject some sort of custom compression.
>>>
>>>
>>> [1]
>>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-c
>>> ompression-in-Ignite-2-0-td10099.html
>>>
>>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs  wrote:
>>>
>>> > Hi Igniters!
>>> >
>>> > I am interested in this task.
>>> > Provide some kind of pluggable compression SPI support
>>> > 
>>> >
>>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>>> rejected
>>> > it.
>>> >
>>> > Let's continue discussion of task goals and solution design.
>>> > As I understood that, the main goal of this task is to store data in
>>> > compressed form.
>>> > This is what I need from Ignite as its user. Compression provides
>>> economy
>>> > on
>>> > servers.
>>> > We can store more data on same servers at the cost of increasing CPU
>>> > utilization.
>>> >
>>> > I'm researching a possibility of implementation of compression at the
>>> > cache-level.
>>> >
>>> > Any thoughts?
>>> >
>>> > --
>>> > Best regards,
>>> > Vyacheslav
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > View this message in context: http://apache-ignite-
>>> > developers.2346864.n4.nabble.com/Data-compression-in-
>>> > Ignite-2-0-tp10099p16317.html
>>> > Sent from the Apache Ignite Developers mailing list archive at
>>> Nabble.com.
>>> >
>>>
>>>
>>>
>>> --
>>> Alexey Kuznetsov
>>>
>>
>>
>>
>> --
>> Best Regards, Vyacheslav
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



-- 
Best Regards, Vyacheslav

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur

Hi, Igniters!

Apache 2.0 is released.

Let's continue the discussion about a compression design.

At the moment, I found only one solution which is compatible with querying
and indexing, this is per-objects-field compression.
Per-fields compression means that metadata (a header) of an object won't be
compressed, only serialized values of an object fields (in bytes array
form) will be compressed.

This solution have some contentious issues:
- small values, like primitives and short arrays - there isn't sense to
compress them;
- there is no possible to use compression with java-predefined types;

We can provide an annotation, @IgniteCompression - for example, which can
be used by users for marking fields to compress.

Any thoughts?

Maybe someone already have ready design?

2017-04-10 11:06 GMT+03:00 Vyacheslav Daradur :

> Alexey,
>
> Yes, I've read it.
>
> Ok, let's discuss about public API design.
>
> I think we need to add some a configure entity to CacheConfiguration,
> which will contain the Compressor interface implementation and some usefull
> parameters.
> Or maybe to provide a BinaryMarshaller decorator, which will be compress
> data after marshalling.
>
>
> 2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :
>
>> Vyacheslav,
>>
>> Did you read initial discussion [1] about compression?
>> As far as I remember we agreed to add only some "top-level" API in order
>> to
>> provide a way for
>> Ignite users to inject some sort of custom compression.
>>
>>
>> [1]
>> http://apache-ignite-developers.2346864.n4.nabble.com/Data-
>> compression-in-Ignite-2-0-td10099.html
>>
>> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs  wrote:
>>
>> > Hi Igniters!
>> >
>> > I am interested in this task.
>> > Provide some kind of pluggable compression SPI support
>> > 
>> >
>> > I developed a solution on BinaryMarshaller-level, but reviewer has
>> rejected
>> > it.
>> >
>> > Let's continue discussion of task goals and solution design.
>> > As I understood that, the main goal of this task is to store data in
>> > compressed form.
>> > This is what I need from Ignite as its user. Compression provides
>> economy
>> > on
>> > servers.
>> > We can store more data on same servers at the cost of increasing CPU
>> > utilization.
>> >
>> > I'm researching a possibility of implementation of compression at the
>> > cache-level.
>> >
>> > Any thoughts?
>> >
>> > --
>> > Best regards,
>> > Vyacheslav
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-ignite-
>> > developers.2346864.n4.nabble.com/Data-compression-in-
>> > Ignite-2-0-tp10099p16317.html
>> > Sent from the Apache Ignite Developers mailing list archive at
>> Nabble.com.
>> >
>>
>>
>>
>> --
>> Alexey Kuznetsov
>>
>
>
>
> --
> Best Regards, Vyacheslav
>



-- 
Best Regards, Vyacheslav

Re: Data compression in Ignite 2.0

2017-04-10 Thread Vyacheslav Daradur

Alexey,

Yes, I've read it.

Ok, let's discuss about public API design.

I think we need to add some a configure entity to CacheConfiguration, which
will contain the Compressor interface implementation and some usefull
parameters.
Or maybe to provide a BinaryMarshaller decorator, which will be compress
data after marshalling.


2017-04-10 10:40 GMT+03:00 Alexey Kuznetsov :

> Vyacheslav,
>
> Did you read initial discussion [1] about compression?
> As far as I remember we agreed to add only some "top-level" API in order to
> provide a way for
> Ignite users to inject some sort of custom compression.
>
>
> [1]
> http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-
> Ignite-2-0-td10099.html
>
> On Mon, Apr 10, 2017 at 2:19 PM, daradurvs  wrote:
>
> > Hi Igniters!
> >
> > I am interested in this task.
> > Provide some kind of pluggable compression SPI support
> > 
> >
> > I developed a solution on BinaryMarshaller-level, but reviewer has
> rejected
> > it.
> >
> > Let's continue discussion of task goals and solution design.
> > As I understood that, the main goal of this task is to store data in
> > compressed form.
> > This is what I need from Ignite as its user. Compression provides economy
> > on
> > servers.
> > We can store more data on same servers at the cost of increasing CPU
> > utilization.
> >
> > I'm researching a possibility of implementation of compression at the
> > cache-level.
> >
> > Any thoughts?
> >
> > --
> > Best regards,
> > Vyacheslav
> >
> >
> >
> >
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Data-compression-in-
> > Ignite-2-0-tp10099p16317.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
> >
>
>
>
> --
> Alexey Kuznetsov
>



-- 
Best Regards, Vyacheslav

Re: Data compression in Ignite 2.0

2017-04-10 Thread Alexey Kuznetsov

Vyacheslav,

Did you read initial discussion [1] about compression?
As far as I remember we agreed to add only some "top-level" API in order to
provide a way for
Ignite users to inject some sort of custom compression.


[1]
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-td10099.html

On Mon, Apr 10, 2017 at 2:19 PM, daradurvs  wrote:

> Hi Igniters!
>
> I am interested in this task.
> Provide some kind of pluggable compression SPI support
> 
>
> I developed a solution on BinaryMarshaller-level, but reviewer has rejected
> it.
>
> Let's continue discussion of task goals and solution design.
> As I understood that, the main goal of this task is to store data in
> compressed form.
> This is what I need from Ignite as its user. Compression provides economy
> on
> servers.
> We can store more data on same servers at the cost of increasing CPU
> utilization.
>
> I'm researching a possibility of implementation of compression at the
> cache-level.
>
> Any thoughts?
>
> --
> Best regards,
> Vyacheslav
>
>
>
>
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Data-compression-in-
> Ignite-2-0-tp10099p16317.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.
>



-- 
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2017-04-10 Thread daradurvs

Hi Igniters!

I am interested in this task.
Provide some kind of pluggable compression SPI support
  

I developed a solution on BinaryMarshaller-level, but reviewer has rejected
it.

Let's continue discussion of task goals and solution design.
As I understood that, the main goal of this task is to store data in
compressed form.
This is what I need from Ignite as its user. Compression provides economy on
servers.
We can store more data on same servers at the cost of increasing CPU
utilization.

I'm researching a possibility of implementation of compression at the
cache-level.

Any thoughts?

--
Best regards, 
Vyacheslav




--
View this message in context: 
http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-2-0-tp10099p16317.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Re: Data compression in Ignite 2.0

2016-07-27 Thread Alexey Kuznetsov

FYI, I created issue for Ignite 2.0:
https://issues.apache.org/jira/browse/IGNITE-3592

Thanks!

On Wed, Jul 27, 2016 at 2:36 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
wrote:

> Nikita,
>
> I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
> did not find any evidence on the internet that their row store is very
> efficient with compression. It was always about column store.
>
> Alexey,
>
> As for DB2, can you check what exactly, when and how it compresses and does
> it give any decent results before suggesting it as an example to follow? I
> don't think it is good idea to repeat every bad idea after other products.
>
> And even if there are good results in DB2, will this all be applicable to
> Ignite? PostgreSql for example provides TOAST compression and this can be
> useful when used in a smart way, but this is a very different architecture
> from what we have.
>
> All in all I agree that may be we should provide some kind of pluggable
> compression SPI support, but do not expect much from it, usually it will be
> just useless.
>
> Sergi
>
>
>
> 2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <sebastien.d...@gmail.com>:
>
> > Hi
> >
> > I add Redis as a sample of memory compression strategy
> >
> > http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
> >
> > http://redis.io/topics/memory-optimization
> >
> > Regards
> >
> > S DIAZ
> >
> >
> >
> > 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
> >
> > > Nikita,
> > >
> > > That was my intention: "we may need to provide a better facility to
> > inject
> > > user's logic here..."
> > >
> > > Andrey,
> > > About compression, once again - DB2 is a row-based DB and they can
> > compress
> > > :)
> > >
> > > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> > > wrote:
> > >
> > > > Very good points indeed. I get the compression in Ignite question
> quite
> > > > often and Hana reference is a typical lead in.
> > > >
> > > > My personal opinion is still that in Ignite *specifically* the
> > > compression
> > > > is best left to the end-user. But we may need to provide a better
> > > facility
> > > > to inject user's logic here...
> > > >
> > > > --
> > > > Nikita Ivanov
> > > >
> > > >
> > > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> > andrewkor...@hotmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Dictionary compression requires some knowledge about data being
> > > > > compressed. For example, for numeric types a range of values must
> be
> > > > known
> > > > > so that the dictionary can be generated. For strings, the number of
> > > > unique
> > > > > values of the column is the key piece of input into the dictionary
> > > > > generation.
> > > > > SAP HANA is a column-based database system: it stores the fields of
> > the
> > > > > data tuple individually using the best compression for the given
> data
> > > > type
> > > > > and the particular set of values. HANA has been specifically built
> > as a
> > > > > general purpose database, rather than as an afterthought layer on
> top
> > > of
> > > > an
> > > > > already existing distributed cache.
> > > > > On the other hand, Ignite is a distributed cache implementation (a
> > > pretty
> > > > > good one!) that in general requires no schema and stores its data
> in
> > > the
> > > > > row-based fashion. Its current design doesn't land itself readily
> to
> > > the
> > > > > kind of optimizations HANA provides out of the box.
> > > > > For the curios types among us, the implementation details of HANA
> are
> > > > well
> > > > > documented in "In-memory Data Management", by Hasso Plattner &
> > > Alexander
> > > > > Zeier.
> > > > > Cheers
> > > > > Andrey
> > > > > _
> > > > > From: Alexey Kuznetsov <akuznet...@gridgain.com > > > > akuznet...@gridgain.com>>
> > > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > > Subject: Re: Data compression in Ignite 2.0
&g

Re: Data compression in Ignite 2.0

2016-07-27 Thread Sergi Vladykin

Nikita,

I agree with Andrey, HANA is a bad comparison to Ignite in this respect. I
did not find any evidence on the internet that their row store is very
efficient with compression. It was always about column store.

Alexey,

As for DB2, can you check what exactly, when and how it compresses and does
it give any decent results before suggesting it as an example to follow? I
don't think it is good idea to repeat every bad idea after other products.

And even if there are good results in DB2, will this all be applicable to
Ignite? PostgreSql for example provides TOAST compression and this can be
useful when used in a smart way, but this is a very different architecture
from what we have.

All in all I agree that may be we should provide some kind of pluggable
compression SPI support, but do not expect much from it, usually it will be
just useless.

Sergi



2016-07-27 10:16 GMT+03:00 Sebastien DIAZ <sebastien.d...@gmail.com>:

> Hi
>
> I add Redis as a sample of memory compression strategy
>
> http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/
>
> http://redis.io/topics/memory-optimization
>
> Regards
>
> S DIAZ
>
>
>
> 2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
>
> > Nikita,
> >
> > That was my intention: "we may need to provide a better facility to
> inject
> > user's logic here..."
> >
> > Andrey,
> > About compression, once again - DB2 is a row-based DB and they can
> compress
> > :)
> >
> > On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> > wrote:
> >
> > > Very good points indeed. I get the compression in Ignite question quite
> > > often and Hana reference is a typical lead in.
> > >
> > > My personal opinion is still that in Ignite *specifically* the
> > compression
> > > is best left to the end-user. But we may need to provide a better
> > facility
> > > to inject user's logic here...
> > >
> > > --
> > > Nikita Ivanov
> > >
> > >
> > > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <
> andrewkor...@hotmail.com
> > >
> > > wrote:
> > >
> > > > Dictionary compression requires some knowledge about data being
> > > > compressed. For example, for numeric types a range of values must be
> > > known
> > > > so that the dictionary can be generated. For strings, the number of
> > > unique
> > > > values of the column is the key piece of input into the dictionary
> > > > generation.
> > > > SAP HANA is a column-based database system: it stores the fields of
> the
> > > > data tuple individually using the best compression for the given data
> > > type
> > > > and the particular set of values. HANA has been specifically built
> as a
> > > > general purpose database, rather than as an afterthought layer on top
> > of
> > > an
> > > > already existing distributed cache.
> > > > On the other hand, Ignite is a distributed cache implementation (a
> > pretty
> > > > good one!) that in general requires no schema and stores its data in
> > the
> > > > row-based fashion. Its current design doesn't land itself readily to
> > the
> > > > kind of optimizations HANA provides out of the box.
> > > > For the curios types among us, the implementation details of HANA are
> > > well
> > > > documented in "In-memory Data Management", by Hasso Plattner &
> > Alexander
> > > > Zeier.
> > > > Cheers
> > > > Andrey
> > > > _
> > > > From: Alexey Kuznetsov <akuznet...@gridgain.com > > > akuznet...@gridgain.com>>
> > > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > > Subject: Re: Data compression in Ignite 2.0
> > > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> > > >
> > > >
> > > > Sergey Kozlov wrote:
> > > > >> For approach 1: Put a large object into a partition cache will
> > > > force to update
> > > > the dictionary placed on replication cache. It may be time-expense
> > > > operation.
> > > > The dictionary will be built only once. And we could control what
> > should
> > > be
> > > > put into dictionary, for example, we could check min and max size and
> > > > decide - put value to dictionary or not.
> > > >
> > > > >> Approach 2-3 are make sense

Re: Data compression in Ignite 2.0

2016-07-27 Thread Sebastien DIAZ

Hi

I add Redis as a sample of memory compression strategy

http://labs.octivi.com/how-we-cut-down-memory-usage-by-82/

http://redis.io/topics/memory-optimization

Regards

S DIAZ



2016-07-27 8:24 GMT+02:00 Alexey Kuznetsov <akuznet...@gridgain.com>:

> Nikita,
>
> That was my intention: "we may need to provide a better facility to inject
> user's logic here..."
>
> Andrey,
> About compression, once again - DB2 is a row-based DB and they can compress
> :)
>
> On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com>
> wrote:
>
> > Very good points indeed. I get the compression in Ignite question quite
> > often and Hana reference is a typical lead in.
> >
> > My personal opinion is still that in Ignite *specifically* the
> compression
> > is best left to the end-user. But we may need to provide a better
> facility
> > to inject user's logic here...
> >
> > --
> > Nikita Ivanov
> >
> >
> > On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com
> >
> > wrote:
> >
> > > Dictionary compression requires some knowledge about data being
> > > compressed. For example, for numeric types a range of values must be
> > known
> > > so that the dictionary can be generated. For strings, the number of
> > unique
> > > values of the column is the key piece of input into the dictionary
> > > generation.
> > > SAP HANA is a column-based database system: it stores the fields of the
> > > data tuple individually using the best compression for the given data
> > type
> > > and the particular set of values. HANA has been specifically built as a
> > > general purpose database, rather than as an afterthought layer on top
> of
> > an
> > > already existing distributed cache.
> > > On the other hand, Ignite is a distributed cache implementation (a
> pretty
> > > good one!) that in general requires no schema and stores its data in
> the
> > > row-based fashion. Its current design doesn't land itself readily to
> the
> > > kind of optimizations HANA provides out of the box.
> > > For the curios types among us, the implementation details of HANA are
> > well
> > > documented in "In-memory Data Management", by Hasso Plattner &
> Alexander
> > > Zeier.
> > > Cheers
> > > Andrey
> > > _
> > > From: Alexey Kuznetsov <akuznet...@gridgain.com > > akuznet...@gridgain.com>>
> > > Sent: Tuesday, July 26, 2016 5:36 AM
> > > Subject: Re: Data compression in Ignite 2.0
> > > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> > >
> > >
> > > Sergey Kozlov wrote:
> > > >> For approach 1: Put a large object into a partition cache will
> > > force to update
> > > the dictionary placed on replication cache. It may be time-expense
> > > operation.
> > > The dictionary will be built only once. And we could control what
> should
> > be
> > > put into dictionary, for example, we could check min and max size and
> > > decide - put value to dictionary or not.
> > >
> > > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > > But it is better at least have a possibility to plug user code for
> > > compression than not to have it at all.
> > >
> > > >> Also I see a danger of OOM if we've got high compression level and
> try
> > > to restore original value in memory.
> > > We could easily get OOM with many other operations right now without
> > > compression, I think it is not an issue, we could add a NOTE to
> > > documentation about such possibility.
> > >
> > > Andrey Kornev wrote:
> > > >> ... in general I think compression is a great data. The cleanest way
> > to
> > > achieve that would be to just make it possible to chain the
> > marshallers...
> > > I think it is also good idea. And looks like it could be used for
> > > compression with some sort of ZIP algorithm, but how to deal with
> > > compression by dictionary substitution?
> > > We need to build dictionary first. Any ideas?
> > >
> > > Nikita Ivanov wrote:
> > > >> SAP Hana does the compression by 1) compressing SQL parameters
> before
> > > execution...
> > > Looks interesting, but my initial point was about compression of cache
> > > data, not SQL queries.
> > > My idea was to make compression transparent for SQL engine when it will
> > > lookup for data.
> > >
> > > But idea of compressing SQL queries result looks very interesting,
> > because
> > > it is known fact, that SQL engine could consume quite a lot of heap for
> > > storing result sets.
> > > I think this should be discussed in separate thread.
> > >
> > > Just for you information, in first message I mentioned that DB2 has
> > > compression by dictionary and according to them it is possible to
> > > compress usual data to 50-80%.
> > > I have some experience with DB2 and can confirm this.
> > >
> > > --
> > > Alexey Kuznetsov
> >
>
>
> --
> Alexey Kuznetsov
>

Re: Data compression in Ignite 2.0

2016-07-27 Thread Alexey Kuznetsov

Nikita,

That was my intention: "we may need to provide a better facility to inject
user's logic here..."

Andrey,
About compression, once again - DB2 is a row-based DB and they can compress
:)

On Wed, Jul 27, 2016 at 12:56 PM, Nikita Ivanov <nivano...@gmail.com> wrote:

> Very good points indeed. I get the compression in Ignite question quite
> often and Hana reference is a typical lead in.
>
> My personal opinion is still that in Ignite *specifically* the compression
> is best left to the end-user. But we may need to provide a better facility
> to inject user's logic here...
>
> --
> Nikita Ivanov
>
>
> On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com>
> wrote:
>
> > Dictionary compression requires some knowledge about data being
> > compressed. For example, for numeric types a range of values must be
> known
> > so that the dictionary can be generated. For strings, the number of
> unique
> > values of the column is the key piece of input into the dictionary
> > generation.
> > SAP HANA is a column-based database system: it stores the fields of the
> > data tuple individually using the best compression for the given data
> type
> > and the particular set of values. HANA has been specifically built as a
> > general purpose database, rather than as an afterthought layer on top of
> an
> > already existing distributed cache.
> > On the other hand, Ignite is a distributed cache implementation (a pretty
> > good one!) that in general requires no schema and stores its data in the
> > row-based fashion. Its current design doesn't land itself readily to the
> > kind of optimizations HANA provides out of the box.
> > For the curios types among us, the implementation details of HANA are
> well
> > documented in "In-memory Data Management", by Hasso Plattner & Alexander
> > Zeier.
> > Cheers
> > Andrey
> > _
> > From: Alexey Kuznetsov <akuznet...@gridgain.com > akuznet...@gridgain.com>>
> > Sent: Tuesday, July 26, 2016 5:36 AM
> > Subject: Re: Data compression in Ignite 2.0
> > To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
> >
> >
> > Sergey Kozlov wrote:
> > >> For approach 1: Put a large object into a partition cache will
> > force to update
> > the dictionary placed on replication cache. It may be time-expense
> > operation.
> > The dictionary will be built only once. And we could control what should
> be
> > put into dictionary, for example, we could check min and max size and
> > decide - put value to dictionary or not.
> >
> > >> Approach 2-3 are make sense for rare cases as Sergi commented.
> > But it is better at least have a possibility to plug user code for
> > compression than not to have it at all.
> >
> > >> Also I see a danger of OOM if we've got high compression level and try
> > to restore original value in memory.
> > We could easily get OOM with many other operations right now without
> > compression, I think it is not an issue, we could add a NOTE to
> > documentation about such possibility.
> >
> > Andrey Kornev wrote:
> > >> ... in general I think compression is a great data. The cleanest way
> to
> > achieve that would be to just make it possible to chain the
> marshallers...
> > I think it is also good idea. And looks like it could be used for
> > compression with some sort of ZIP algorithm, but how to deal with
> > compression by dictionary substitution?
> > We need to build dictionary first. Any ideas?
> >
> > Nikita Ivanov wrote:
> > >> SAP Hana does the compression by 1) compressing SQL parameters before
> > execution...
> > Looks interesting, but my initial point was about compression of cache
> > data, not SQL queries.
> > My idea was to make compression transparent for SQL engine when it will
> > lookup for data.
> >
> > But idea of compressing SQL queries result looks very interesting,
> because
> > it is known fact, that SQL engine could consume quite a lot of heap for
> > storing result sets.
> > I think this should be discussed in separate thread.
> >
> > Just for you information, in first message I mentioned that DB2 has
> > compression by dictionary and according to them it is possible to
> > compress usual data to 50-80%.
> > I have some experience with DB2 and can confirm this.
> >
> > --
> > Alexey Kuznetsov
>


-- 
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2016-07-26 Thread Nikita Ivanov

Very good points indeed. I get the compression in Ignite question quite
often and Hana reference is a typical lead in.

My personal opinion is still that in Ignite *specifically* the compression
is best left to the end-user. But we may need to provide a better facility
to inject user's logic here...

--
Nikita Ivanov


On Tue, Jul 26, 2016 at 9:53 PM, Andrey Kornev <andrewkor...@hotmail.com>
wrote:

> Dictionary compression requires some knowledge about data being
> compressed. For example, for numeric types a range of values must be known
> so that the dictionary can be generated. For strings, the number of unique
> values of the column is the key piece of input into the dictionary
> generation.
> SAP HANA is a column-based database system: it stores the fields of the
> data tuple individually using the best compression for the given data type
> and the particular set of values. HANA has been specifically built as a
> general purpose database, rather than as an afterthought layer on top of an
> already existing distributed cache.
> On the other hand, Ignite is a distributed cache implementation (a pretty
> good one!) that in general requires no schema and stores its data in the
> row-based fashion. Its current design doesn't land itself readily to the
> kind of optimizations HANA provides out of the box.
> For the curios types among us, the implementation details of HANA are well
> documented in "In-memory Data Management", by Hasso Plattner & Alexander
> Zeier.
> Cheers
> Andrey
> _
> From: Alexey Kuznetsov <akuznet...@gridgain.com akuznet...@gridgain.com>>
> Sent: Tuesday, July 26, 2016 5:36 AM
> Subject: Re: Data compression in Ignite 2.0
> To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>
>
>
> Sergey Kozlov wrote:
> >> For approach 1: Put a large object into a partition cache will
> force to update
> the dictionary placed on replication cache. It may be time-expense
> operation.
> The dictionary will be built only once. And we could control what should be
> put into dictionary, for example, we could check min and max size and
> decide - put value to dictionary or not.
>
> >> Approach 2-3 are make sense for rare cases as Sergi commented.
> But it is better at least have a possibility to plug user code for
> compression than not to have it at all.
>
> >> Also I see a danger of OOM if we've got high compression level and try
> to restore original value in memory.
> We could easily get OOM with many other operations right now without
> compression, I think it is not an issue, we could add a NOTE to
> documentation about such possibility.
>
> Andrey Kornev wrote:
> >> ... in general I think compression is a great data. The cleanest way to
> achieve that would be to just make it possible to chain the marshallers...
> I think it is also good idea. And looks like it could be used for
> compression with some sort of ZIP algorithm, but how to deal with
> compression by dictionary substitution?
> We need to build dictionary first. Any ideas?
>
> Nikita Ivanov wrote:
> >> SAP Hana does the compression by 1) compressing SQL parameters before
> execution...
> Looks interesting, but my initial point was about compression of cache
> data, not SQL queries.
> My idea was to make compression transparent for SQL engine when it will
> lookup for data.
>
> But idea of compressing SQL queries result looks very interesting, because
> it is known fact, that SQL engine could consume quite a lot of heap for
> storing result sets.
> I think this should be discussed in separate thread.
>
> Just for you information, in first message I mentioned that DB2 has
> compression by dictionary and according to them it is possible to
> compress usual data to 50-80%.
> I have some experience with DB2 and can confirm this.
>
> --
> Alexey Kuznetsov
>
>
>

Re: Data compression in Ignite 2.0

2016-07-26 Thread Andrey Kornev

Dictionary compression requires some knowledge about data being compressed. For 
example, for numeric types a range of values must be known so that the 
dictionary can be generated. For strings, the number of unique values of the 
column is the key piece of input into the dictionary generation.
SAP HANA is a column-based database system: it stores the fields of the data 
tuple individually using the best compression for the given data type and the 
particular set of values. HANA has been specifically built as a general purpose 
database, rather than as an afterthought layer on top of an already existing 
distributed cache.
On the other hand, Ignite is a distributed cache implementation (a pretty good 
one!) that in general requires no schema and stores its data in the row-based 
fashion. Its current design doesn't land itself readily to the kind of 
optimizations HANA provides out of the box.
For the curios types among us, the implementation details of HANA are well 
documented in "In-memory Data Management", by Hasso Plattner & Alexander Zeier.
Cheers
Andrey
_
From: Alexey Kuznetsov <akuznet...@gridgain.com<mailto:akuznet...@gridgain.com>>
Sent: Tuesday, July 26, 2016 5:36 AM
Subject: Re: Data compression in Ignite 2.0
To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>>

Sergey Kozlov wrote:
>> For approach 1: Put a large object into a partition cache will
force to update
the dictionary placed on replication cache. It may be time-expense
operation.
The dictionary will be built only once. And we could control what should be
put into dictionary, for example, we could check min and max size and
decide - put value to dictionary or not.

>> Approach 2-3 are make sense for rare cases as Sergi commented.
But it is better at least have a possibility to plug user code for
compression than not to have it at all.

>> Also I see a danger of OOM if we've got high compression level and try
to restore original value in memory.
We could easily get OOM with many other operations right now without
compression, I think it is not an issue, we could add a NOTE to
documentation about such possibility.

Andrey Kornev wrote:
>> ... in general I think compression is a great data. The cleanest way to
achieve that would be to just make it possible to chain the marshallers...
I think it is also good idea. And looks like it could be used for
compression with some sort of ZIP algorithm, but how to deal with
compression by dictionary substitution?
We need to build dictionary first. Any ideas?

Nikita Ivanov wrote:
>> SAP Hana does the compression by 1) compressing SQL parameters before
execution...
Looks interesting, but my initial point was about compression of cache
data, not SQL queries.
My idea was to make compression transparent for SQL engine when it will
lookup for data.

But idea of compressing SQL queries result looks very interesting, because
it is known fact, that SQL engine could consume quite a lot of heap for
storing result sets.
I think this should be discussed in separate thread.

Just for you information, in first message I mentioned that DB2 has
compression by dictionary and according to them it is possible to
compress usual data to 50-80%.
I have some experience with DB2 and can confirm this.

--
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2016-07-26 Thread Alexey Kuznetsov

Sergey Kozlov wrote:
 >> For approach 1: Put a large object into a partition cache will
force to update
the dictionary placed on replication cache. It may be time-expense
operation.
The dictionary will be built only once. And we could control what should be
put into dictionary, for example, we could check min and max size and
decide - put value to dictionary or not.

>> Approach 2-3 are make sense for rare cases as Sergi commented.
But it is better at least have a possibility to plug user code for
compression than not to have it at all.

>> Also I see a danger of OOM if we've got high compression level and try
to restore original value in memory.
We could easily get OOM with many other operations right now without
compression, I think it is not an issue, we could add a NOTE to
documentation about such possibility.

Andrey Kornev wrote:
>> ...  in general I think compression is a great data. The cleanest way to
achieve that would be to just make it possible to chain the marshallers...
I think it is also good idea. And looks like it could be used for
compression with some sort of ZIP algorithm, but how to deal with
compression by dictionary substitution?
We need to build dictionary first. Any ideas?

Nikita Ivanov wrote:
>> SAP Hana does the compression by 1) compressing SQL parameters before
execution...
Looks interesting, but my initial point was about compression of cache
data, not SQL queries.
My idea was to make compression transparent for SQL engine when it will
lookup for data.

But idea of compressing SQL queries result looks very interesting, because
it is known fact, that SQL engine could consume quite a lot of heap for
storing result sets.
I think this should be discussed in separate thread.

Just for you information, in first message I mentioned that DB2 has
compression by dictionary and according to them it is possible to
compress usual data to 50-80%.
I have some experience with DB2 and can confirm this.

-- 
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2016-07-25 Thread Andrey Kornev

I'm guessing the suggestion here is to use the compressed form directly for 
WHERE clause evaluation. If that's the case I think there are a couple of 
issues:

1) the LIKE predicate.

2) predicates other than equality (for example, <, >, etc.)


But since Ignite isn't just about SQL queries (surprisingly some people still 
use it just as distributed cache!), in general I think compression is a great 
data. The cleanest way to achieve that would be to just make it possible to 
chain the marshallers. It is possible to do it already without any Ignite code 
changes, but unfortunately it would force people to use the non-public 
BinaryMarshaller class directly (as the first element of the chain).


Cheers

Andrey


From: Dmitriy Setrakyan <dsetrak...@apache.org>
Sent: Monday, July 25, 2016 1:53 PM
To: dev@ignite.apache.org
Subject: Re: Data compression in Ignite 2.0

Nikita, this sounds like a pretty elegant approach.

Does anyone in the community see a problem with this design?

On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <nivano...@gmail.com> wrote:

> SAP Hana does the compression by 1) compressing SQL parameters before
> execution, and 2) storing only compressed data in memory. This way all SQL
> queries work as normal with zero modifications or performance overhead.
> Only results of the query can be (optionally) decompressed back before
> returning to the user.
>
> --
> Nikita Ivanov
>
>
> On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <skoz...@gridgain.com>
> wrote:
>
> > Hi
> >
> > For approach 1: Put a large object into a partiton cache will force to
> > update the dictionary placed on replication cache. It seeis it may be
> > time-expense operation.
> > Appoach 2-3 are make sense for rare cases as Sergi commented.
> > Aslo I see a danger of OOM if we've got high compression level and try to
> > restore orginal value in memory.
> >
> > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> > akuznet...@gridgain.com>
> > wrote:
> >
> > > Sergi,
> > >
> > > Of course it will introduce some slowdown, but with compression more
> data
> > > could be stored in memory
> > > and not will be evicted to disk. In case of compress by dictionary
> > > substitution it will be only one more lookup
> > > and should be fast.
> > >
> > > In general we could provide only API for compression out of the box,
> and
> > > users that really need some sort of compression
> > > will implement it by them self. This will not require much effort I
> > think.
> > >
> > >
> > >
> > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> > sergi.vlady...@gmail.com>
> > > wrote:
> > >
> > > > This will make sense only for rare cases when you have very large
> > objects
> > > > stored, which can be effectively compressed. And even then it will
> > > > introduce slowdown on all the operations, which often will not be
> > > > acceptable. I guess only few users will find this feature useful,
> thus
> > I
> > > > think it does not worth the effort.
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com
> >:
> > > >
> > > > > Hi, All!
> > > > >
> > > > > I would like to propose one more feature for Ignite 2.0.
> > > > >
> > > > > Data compression for data in binary format.
> > > > >
> > > > > Binary format is stored as field name + field data.
> > > > > So we have a description.
> > > > > How about to add one more byte to binary data descriptor:
> > > > >
> > > > > *Compressed*:
> > > > >  0 - Data stored as is (no compression).
> > > > >  1 - Data compressed by dictionary (something like DB2 row
> > compression
> > > > [1],
> > > > >  but for all binary types). We could have system or user defined
> > > > replicated
> > > > > cache for such dictionary and *cache.compact()* method that will
> scan
> > > > > cache, build dictionary and compact data.
> > > > >  2 - Data compressed by Java built in ZIP.
> > > > >  3 - Data compressed by some user custom algorithm.
> > > > >
> > > > > Of course it is possible to compress data in current Ignite 1.x but
> > in
> > > > this
> > >

Re: Data compression in Ignite 2.0

2016-07-25 Thread Dmitriy Setrakyan

Nikita, this sounds like a pretty elegant approach.

Does anyone in the community see a problem with this design?

On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov <nivano...@gmail.com> wrote:

> SAP Hana does the compression by 1) compressing SQL parameters before
> execution, and 2) storing only compressed data in memory. This way all SQL
> queries work as normal with zero modifications or performance overhead.
> Only results of the query can be (optionally) decompressed back before
> returning to the user.
>
> --
> Nikita Ivanov
>
>
> On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <skoz...@gridgain.com>
> wrote:
>
> > Hi
> >
> > For approach 1: Put a large object into a partiton cache will force to
> > update the dictionary placed on replication cache. It seeis it may be
> > time-expense operation.
> > Appoach 2-3 are make sense for rare cases as Sergi commented.
> > Aslo I see a danger of OOM if we've got high compression level and try to
> > restore orginal value in memory.
> >
> > On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> > akuznet...@gridgain.com>
> > wrote:
> >
> > > Sergi,
> > >
> > > Of course it will introduce some slowdown, but with compression more
> data
> > > could be stored in memory
> > > and not will be evicted to disk. In case of compress by dictionary
> > > substitution it will be only one more lookup
> > > and should be fast.
> > >
> > > In general we could provide only API for compression out of the box,
> and
> > > users that really need some sort of compression
> > > will implement it by them self. This will not require much effort I
> > think.
> > >
> > >
> > >
> > > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> > sergi.vlady...@gmail.com>
> > > wrote:
> > >
> > > > This will make sense only for rare cases when you have very large
> > objects
> > > > stored, which can be effectively compressed. And even then it will
> > > > introduce slowdown on all the operations, which often will not be
> > > > acceptable. I guess only few users will find this feature useful,
> thus
> > I
> > > > think it does not worth the effort.
> > > >
> > > > Sergi
> > > >
> > > >
> > > >
> > > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com
> >:
> > > >
> > > > > Hi, All!
> > > > >
> > > > > I would like to propose one more feature for Ignite 2.0.
> > > > >
> > > > > Data compression for data in binary format.
> > > > >
> > > > > Binary format is stored as field name + field data.
> > > > > So we have a description.
> > > > > How about to add one more byte to binary data descriptor:
> > > > >
> > > > > *Compressed*:
> > > > >  0 - Data stored as is (no compression).
> > > > >  1 - Data compressed by dictionary (something like DB2 row
> > compression
> > > > [1],
> > > > >  but for all binary types). We could have system or user defined
> > > > replicated
> > > > > cache for such dictionary and *cache.compact()* method that will
> scan
> > > > > cache, build dictionary and compact data.
> > > > >  2 - Data compressed by Java built in ZIP.
> > > > >  3 - Data compressed by some user custom algorithm.
> > > > >
> > > > > Of course it is possible to compress data in current Ignite 1.x but
> > in
> > > > this
> > > > > case compressed data cannot be accessed from SQL engine, if we
> > > implement
> > > > > support for compression on Ignite core level SQL engine will be
> able
> > to
> > > > > detect that data is compressed and properly handle such data.
> > > > >
> > > > > What do you think?
> > > > > If community consider this feature useful I will create issue in
> > JIRA.
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > > > >
> > > > > --
> > > > > Alexey Kuznetsov
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
> >
> >
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >
>

Re: Data compression in Ignite 2.0

2016-07-25 Thread Nikita Ivanov

SAP Hana does the compression by 1) compressing SQL parameters before
execution, and 2) storing only compressed data in memory. This way all SQL
queries work as normal with zero modifications or performance overhead.
Only results of the query can be (optionally) decompressed back before
returning to the user.

--
Nikita Ivanov


On Mon, Jul 25, 2016 at 1:40 PM, Sergey Kozlov <skoz...@gridgain.com> wrote:

> Hi
>
> For approach 1: Put a large object into a partiton cache will force to
> update the dictionary placed on replication cache. It seeis it may be
> time-expense operation.
> Appoach 2-3 are make sense for rare cases as Sergi commented.
> Aslo I see a danger of OOM if we've got high compression level and try to
> restore orginal value in memory.
>
> On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <
> akuznet...@gridgain.com>
> wrote:
>
> > Sergi,
> >
> > Of course it will introduce some slowdown, but with compression more data
> > could be stored in memory
> > and not will be evicted to disk. In case of compress by dictionary
> > substitution it will be only one more lookup
> > and should be fast.
> >
> > In general we could provide only API for compression out of the box, and
> > users that really need some sort of compression
> > will implement it by them self. This will not require much effort I
> think.
> >
> >
> >
> > On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <
> sergi.vlady...@gmail.com>
> > wrote:
> >
> > > This will make sense only for rare cases when you have very large
> objects
> > > stored, which can be effectively compressed. And even then it will
> > > introduce slowdown on all the operations, which often will not be
> > > acceptable. I guess only few users will find this feature useful, thus
> I
> > > think it does not worth the effort.
> > >
> > > Sergi
> > >
> > >
> > >
> > > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
> > >
> > > > Hi, All!
> > > >
> > > > I would like to propose one more feature for Ignite 2.0.
> > > >
> > > > Data compression for data in binary format.
> > > >
> > > > Binary format is stored as field name + field data.
> > > > So we have a description.
> > > > How about to add one more byte to binary data descriptor:
> > > >
> > > > *Compressed*:
> > > >  0 - Data stored as is (no compression).
> > > >  1 - Data compressed by dictionary (something like DB2 row
> compression
> > > [1],
> > > >  but for all binary types). We could have system or user defined
> > > replicated
> > > > cache for such dictionary and *cache.compact()* method that will scan
> > > > cache, build dictionary and compact data.
> > > >  2 - Data compressed by Java built in ZIP.
> > > >  3 - Data compressed by some user custom algorithm.
> > > >
> > > > Of course it is possible to compress data in current Ignite 1.x but
> in
> > > this
> > > > case compressed data cannot be accessed from SQL engine, if we
> > implement
> > > > support for compression on Ignite core level SQL engine will be able
> to
> > > > detect that data is compressed and properly handle such data.
> > > >
> > > > What do you think?
> > > > If community consider this feature useful I will create issue in
> JIRA.
> > > >
> > > > [1]
> > > >
> > > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > > >
> > > > --
> > > > Alexey Kuznetsov
> > > >
> > >
> >
> >
> >
> > --
> > Alexey Kuznetsov
> >
>
>
>
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>

Re: Data compression in Ignite 2.0

2016-07-25 Thread Sergey Kozlov

Hi

For approach 1: Put a large object into a partiton cache will force to
update the dictionary placed on replication cache. It seeis it may be
time-expense operation.
Appoach 2-3 are make sense for rare cases as Sergi commented.
Aslo I see a danger of OOM if we've got high compression level and try to
restore orginal value in memory.

On Mon, Jul 25, 2016 at 10:39 AM, Alexey Kuznetsov <akuznet...@gridgain.com>
wrote:

> Sergi,
>
> Of course it will introduce some slowdown, but with compression more data
> could be stored in memory
> and not will be evicted to disk. In case of compress by dictionary
> substitution it will be only one more lookup
> and should be fast.
>
> In general we could provide only API for compression out of the box, and
> users that really need some sort of compression
> will implement it by them self. This will not require much effort I think.
>
>
>
> On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
> wrote:
>
> > This will make sense only for rare cases when you have very large objects
> > stored, which can be effectively compressed. And even then it will
> > introduce slowdown on all the operations, which often will not be
> > acceptable. I guess only few users will find this feature useful, thus I
> > think it does not worth the effort.
> >
> > Sergi
> >
> >
> >
> > 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
> >
> > > Hi, All!
> > >
> > > I would like to propose one more feature for Ignite 2.0.
> > >
> > > Data compression for data in binary format.
> > >
> > > Binary format is stored as field name + field data.
> > > So we have a description.
> > > How about to add one more byte to binary data descriptor:
> > >
> > > *Compressed*:
> > >  0 - Data stored as is (no compression).
> > >  1 - Data compressed by dictionary (something like DB2 row compression
> > [1],
> > >  but for all binary types). We could have system or user defined
> > replicated
> > > cache for such dictionary and *cache.compact()* method that will scan
> > > cache, build dictionary and compact data.
> > >  2 - Data compressed by Java built in ZIP.
> > >  3 - Data compressed by some user custom algorithm.
> > >
> > > Of course it is possible to compress data in current Ignite 1.x but in
> > this
> > > case compressed data cannot be accessed from SQL engine, if we
> implement
> > > support for compression on Ignite core level SQL engine will be able to
> > > detect that data is compressed and properly handle such data.
> > >
> > > What do you think?
> > > If community consider this feature useful I will create issue in JIRA.
> > >
> > > [1]
> > >
> > >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> > >
> > > --
> > > Alexey Kuznetsov
> > >
> >
>
>
>
> --
> Alexey Kuznetsov
>



-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: Data compression in Ignite 2.0

2016-07-25 Thread Alexey Kuznetsov

Sergi,

Of course it will introduce some slowdown, but with compression more data
could be stored in memory
and not will be evicted to disk. In case of compress by dictionary
substitution it will be only one more lookup
and should be fast.

In general we could provide only API for compression out of the box, and
users that really need some sort of compression
will implement it by them self. This will not require much effort I think.



On Mon, Jul 25, 2016 at 2:18 PM, Sergi Vladykin <sergi.vlady...@gmail.com>
wrote:

> This will make sense only for rare cases when you have very large objects
> stored, which can be effectively compressed. And even then it will
> introduce slowdown on all the operations, which often will not be
> acceptable. I guess only few users will find this feature useful, thus I
> think it does not worth the effort.
>
> Sergi
>
>
>
> 2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>:
>
> > Hi, All!
> >
> > I would like to propose one more feature for Ignite 2.0.
> >
> > Data compression for data in binary format.
> >
> > Binary format is stored as field name + field data.
> > So we have a description.
> > How about to add one more byte to binary data descriptor:
> >
> > *Compressed*:
> >  0 - Data stored as is (no compression).
> >  1 - Data compressed by dictionary (something like DB2 row compression
> [1],
> >  but for all binary types). We could have system or user defined
> replicated
> > cache for such dictionary and *cache.compact()* method that will scan
> > cache, build dictionary and compact data.
> >  2 - Data compressed by Java built in ZIP.
> >  3 - Data compressed by some user custom algorithm.
> >
> > Of course it is possible to compress data in current Ignite 1.x but in
> this
> > case compressed data cannot be accessed from SQL engine, if we implement
> > support for compression on Ignite core level SQL engine will be able to
> > detect that data is compressed and properly handle such data.
> >
> > What do you think?
> > If community consider this feature useful I will create issue in JIRA.
> >
> > [1]
> >
> >
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
> >
> > --
> > Alexey Kuznetsov
> >
>



-- 
Alexey Kuznetsov

Re: Data compression in Ignite 2.0

2016-07-25 Thread Sergi Vladykin

This will make sense only for rare cases when you have very large objects
stored, which can be effectively compressed. And even then it will
introduce slowdown on all the operations, which often will not be
acceptable. I guess only few users will find this feature useful, thus I
think it does not worth the effort.

Sergi



2016-07-25 9:28 GMT+03:00 Alexey Kuznetsov <akuznet...@gridgain.com>:

> Hi, All!
>
> I would like to propose one more feature for Ignite 2.0.
>
> Data compression for data in binary format.
>
> Binary format is stored as field name + field data.
> So we have a description.
> How about to add one more byte to binary data descriptor:
>
> *Compressed*:
>  0 - Data stored as is (no compression).
>  1 - Data compressed by dictionary (something like DB2 row compression [1],
>  but for all binary types). We could have system or user defined replicated
> cache for such dictionary and *cache.compact()* method that will scan
> cache, build dictionary and compact data.
>  2 - Data compressed by Java built in ZIP.
>  3 - Data compressed by some user custom algorithm.
>
> Of course it is possible to compress data in current Ignite 1.x but in this
> case compressed data cannot be accessed from SQL engine, if we implement
> support for compression on Ignite core level SQL engine will be able to
> detect that data is compressed and properly handle such data.
>
> What do you think?
> If community consider this feature useful I will create issue in JIRA.
>
> [1]
>
> http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/
>
> --
> Alexey Kuznetsov
>

Data compression in Ignite 2.0

2016-07-25 Thread Alexey Kuznetsov

Hi, All!

I would like to propose one more feature for Ignite 2.0.

Data compression for data in binary format.

Binary format is stored as field name + field data.
So we have a description.
How about to add one more byte to binary data descriptor:

*Compressed*:
 0 - Data stored as is (no compression).
 1 - Data compressed by dictionary (something like DB2 row compression [1],
 but for all binary types). We could have system or user defined replicated
cache for such dictionary and *cache.compact()* method that will scan
cache, build dictionary and compact data.
 2 - Data compressed by Java built in ZIP.
 3 - Data compressed by some user custom algorithm.

Of course it is possible to compress data in current Ignite 1.x but in this
case compressed data cannot be accessed from SQL engine, if we implement
support for compression on Ignite core level SQL engine will be able to
detect that data is compressed and properly handle such data.

What do you think?
If community consider this feature useful I will create issue in JIRA.

[1]
http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/

-- 
Alexey Kuznetsov

57 matches

Mail list logo