Re: Data compression in Ignite 2.0

2017-08-28 Thread Vladimir Ozerov
Hi Vyacheslav, Yes, I would suggest you to do so. On Fri, Aug 25, 2017 at 2:51 PM, Vyacheslav Daradur wrote: > Hi, should I close the initial ticket [1] as "Won't Fix" and add link to > the new discusion about storage compression [2] in comments? > > [1]

Re: Data compression in Ignite 2.0

2017-08-25 Thread Vyacheslav Daradur
Hi, should I close the initial ticket [1] as "Won't Fix" and add link to the new discusion about storage compression [2] in comments? [1] https://issues.apache.org/jira/browse/IGNITE-3592 [2] http://apache-ignite-developers.2346864.n4.nabble.com/Data-compression-in-Ignite-td20679.html 2017-08-09

Re: Data compression in Ignite 2.0

2017-08-09 Thread Vyacheslav Daradur
Vladimir, thank you for detailed explanation. I think I've understanded the main idea of described storage compression. I'll join the new discussion after researching of given material and comlpetion of varint-optimization [1]. [1] https://issues.apache.org/jira/browse/IGNITE-5097 2017-08-02

Re: Data compression in Ignite 2.0

2017-08-02 Thread Alexey Kuznetsov
Vova, Finally we back to my initial idea - to look how "big databases compress" data :) Just to remind how IBM DB2 do this[1]. [1] http://www.ibm.com/developerworks/data/library/techarticle/dm- 1205db210compression/ On Tue, Aug 1, 2017 at 4:15 PM, Vladimir Ozerov wrote:

Re: Data compression in Ignite 2.0

2017-08-01 Thread dsetrakyan
I would prefer that we reuse an existing compression protocol, but at the table level. If not possible, then we should go with a shared mapping approach. Any idea how hard? ⁣D.​ On Aug 1, 2017, 11:15 AM, at 11:15 AM, Vladimir Ozerov wrote: >Vyacheslav, > >This is not

Re: Data compression in Ignite 2.0

2017-08-01 Thread Vladimir Ozerov
Vyacheslav, This is not about my needs, but about the product :-) BinaryObject is a central entity used for both data transfer and data storage. This is both good and bad at the same time. Good thing is that as we optimize binary protocol, we improve both network and storage performance at the

Re: Data compression in Ignite 2.0

2017-06-15 Thread Vyacheslav Daradur
Hi Igniters. Vladimir, I want to propose another design of an implementation of the per-field compression. 1) We will add new step in the method prepareForCache (for example) of CacheObject, or in GridCacheMapEntry. At the step, after marshalling of an object, we will compress fields of the

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergey Kozlov
Hi * "Per-field compression" is applicable for huge BLOB fields and will impose the restrictions like unable ot index such fields, slower getting data, potential OOM issues if compression ration is too high. But for some cases it makes sense On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergey Kozlov
Hi * "Per-field compression" is applicable for huge BLOB fields and will impose the restrictions like unable ot index such fields, slower getting data, potential OOM issues if compression ration is too high. But for some cases it makes sense On Fri, Jun 9, 2017 at 11:11 AM, Антон Чураев

Re: Data compression in Ignite 2.0

2017-06-09 Thread Sergi Vladykin
+1 to Vladimir. Fields encryption is a user responsibility. I see no reason to introduce additional complexity to Ignite. Sergi 2017-06-09 11:11 GMT+03:00 Антон Чураев : > Seems that Dmitry is referring to transparent data encryption. It is used > throughout the whale

Re: Data compression in Ignite 2.0

2017-06-09 Thread Антон Чураев
Seems that Dmitry is referring to transparent data encryption. It is used throughout the whale database industry. 2017-06-09 10:50 GMT+03:00 Vladimir Ozerov : > Dima, > > Encryption of certain fields is as bad as compression. First, it is a huge > change, which makes

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vyacheslav Daradur
Vladimir, >> Nobody in a sane mind will >> store passwords in plain form. Instead, user should encrypt it on his own, >> choosing proper encryption parameters - algorithms, key lengths, salts, etc.. Sounds reasonable to me. But if someone want to have this feature OOTB we can continue

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vladimir Ozerov
Dima, Encryption of certain fields is as bad as compression. First, it is a huge change, which makes already complex binary protocol even more complex. Second, it have to be ported to CPP, .NET platforms, as well as to JDBC and ODBC. Last, but the most important - this is not our headache to

Re: Data compression in Ignite 2.0

2017-06-09 Thread Vyacheslav Daradur
>> which is much less useful. I note, in some cases there is profit more than twice per size of an object. >> Would it be possible to change your implementation to handle the encryption instead? Yes, of cource, there's not much difference between compression and encryption, including in my

Re: Data compression in Ignite 2.0

2017-06-08 Thread Dmitriy Setrakyan
Vyacheslav, When this feature started out as data compression in Ignite, it sounded very useful. Now it is unfolding as a per-field compression, which is much less useful. In fact, it is questionable whether it is useful at all. The fact that this feature is implemented does not make it mandatory

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur
Guys, I want to be clear: * "Per-field compression" design is the result of a research of the binary infrastructure of Ignite and some other its places (querying, indexing, etc.) * Full-compression of object will be more effective, but in this case there is no capability with querying and indexing

Re: Data compression in Ignite 2.0

2017-06-08 Thread Dmitriy Setrakyan
Igniters, I have never seen a single Ignite user asking about compressing a single field. However, we have had requests to secure certain fields, e.g. passwords. I personally do not think per-field compression is needed, unless we can point out some concrete real life use cases. D. On Thu, Jun

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur
Anton, >> I thought that if there will storing compressed data in the memory, data >> will transmit over wire in compression too. Is it right? In per-field compression case - yes. 2017-06-08 13:36 GMT+03:00 Антон Чураев : > Guys, could you please help me. > I thought that

Re: Data compression in Ignite 2.0

2017-06-08 Thread Антон Чураев
Guys, could you please help me. I thought that if there will storing compressed data in the memory, data will transmit over wire in compression too. Is it right? 2017-06-08 13:30 GMT+03:00 Vyacheslav Daradur : > Vladimir, > > The main problem which I'am trying to solve is

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur
Vladimir, The main problem which I'am trying to solve is storing data in memory in a compression form via Ignite. The main goal is using memory more effectivelly. >> here the much simpler step would be to full compression on per-cache basis rather than dealing with per-fields case. Please

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vladimir Ozerov
Igniters, Honestly I still do not see how to apply it gracefully this feature ti Ignite. And overall approach to compress only particular fields looks overcomplicated to me. Remember, that our main use case is an application without classes on the server. It means that any kind of annotations are

Re: Data compression in Ignite 2.0

2017-06-08 Thread Vyacheslav Daradur
Valentin, Yes, I have the prototype[1][2] You can see an example of Java class[3] that I used in my benchmark. For example: class Foo { @BinaryCompression String data; } If user make decision to store the object in compressed form, he can use the annotation @BinaryCompression as shown above. It

Re: Data compression in Ignite 2.0

2017-06-07 Thread Valentin Kulichenko
Vyacheslav, Anton, Are there any ideas and/or prototypes for the API? Your design suggestions seem to make sense, but I would like to see how it all this will like from user's standpoint. -Val On Wed, Jun 7, 2017 at 1:06 AM, Антон Чураев wrote: > Vyacheslav, correct me

Re: Data compression in Ignite 2.0

2017-06-07 Thread Антон Чураев
Vyacheslav, correct me if something wrong We could provide opportunity of choose between CPU usage and MEM/NET usage for users by compression some attributes of stored objects. You have learned design, and it is possible to localize changes in marshalling without performance affect and current

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
In short, During marshalling a fields is represented as BinaryFieldAccessor which manages its marshalling. It checks if the field is marked by annotation @BinaryCompression, in that case - binary representation of field (bytes array) will be compressed. It will be marked as compressed by types

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев
Looks good for me. Could You propose design of implementation in couple of sentences? So that we can estimate the completeness and complexity of the proposal. 2017-06-06 15:26 GMT+03:00 Vyacheslav Daradur : > Anton, > > Of course, the solution does not affect on existing

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев
Vyacheslav, Is it possible to propose implementation that can be switched on on-demand? In this case it should not affect performance of current solution. I mean, that users should make decision what is more important for them: throutput or memory/net usage. May be they will be choose not all

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
I wish to note, that the results of benchmarking shows metrics from stress-testing. I mean in real scenarios, for example business operations, which take milliseconds or seсonds, increase in time of put-get-operation will be insignificant. 2017-06-06 14:48 GMT+03:00 Vyacheslav Daradur

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
Conclusion: Provided solution allows reduce size of an object in IgniteCache at the cost of throughput reduction (small - in some cases), it depends on part of object which will be compressed and compression algorithm. I mean, we can make more effective use of memory, and in some cases it can

Re: Data compression in Ignite 2.0

2017-06-06 Thread Антон Чураев
Vyacheslav, thank you! But could you please provide a conclusions or proposals based on this benchmarks? 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur : > Dmitry, > > Excel-pages: > > 1). "Compression ratio (2)" - shows object size, with compression and > without

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
All metrics are taken from app based on custom assembly of AI, containing the provided PR. 2017-06-06 11:28 GMT+03:00 Vyacheslav Daradur : > Dmitry, > > Excel-pages: > > 1). "Compression ratio (2)" - shows object size, with compression and > without compression. (Conditions:

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
Dmitry, Excel-pages: 1). "Compression ratio (2)" - shows object size, with compression and without compression. (Conditions: literal text) 1st graph shows compression ratios of using different compression algrithms depending on size of compressed field. 2nd graph shows evaluation of size of

Re: Data compression in Ignite 2.0

2017-06-06 Thread Dmitriy Setrakyan
Vladimir, I am not sure how to interpret the graphs? What are we looking at? On Tue, Jun 6, 2017 at 12:33 AM, Vyacheslav Daradur wrote: > Hi, Igniters. > > I've prepared some benchmarking. Results [1]. > > And I've prepared the evaluation in the form of diagrams [2]. > > I

Re: Data compression in Ignite 2.0

2017-06-06 Thread Vyacheslav Daradur
Hi, Igniters. I've prepared some benchmarking. Results [1]. And I've prepared the evaluation in the form of diagrams [2]. I hope that helps to interest the community and accelerates a reaction to this improvment :) [1]

Re: Data compression in Ignite 2.0

2017-05-24 Thread Vyacheslav Daradur
Guys, any thoughts? 2017-05-16 13:40 GMT+03:00 Vyacheslav Daradur : > Hi guys, > > I've prepared the PR to show my idea. > https://github.com/apache/ignite/pull/1951/files > > About querying - I've just copied existing tests and have annotated the > testing data. >

Re: Data compression in Ignite 2.0

2017-05-16 Thread Vyacheslav Daradur
Hi guys, I've prepared the PR to show my idea. https://github.com/apache/ignite/pull/1951/files About querying - I've just copied existing tests and have annotated the testing data. https://github.com/apache/ignite/pull/1951/files#diff-c19a9df4058141d059bb577e75244764 It means fields which will

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur
Dmitriy, I have ready prototype. I want to show it. It is always easier to discuss on example. 2017-05-15 19:02 GMT+03:00 Dmitriy Setrakyan : > Vyacheslav, > > I think it is a bit premature to provide a PR without getting a community > consensus on the dev list. Please

Re: Data compression in Ignite 2.0

2017-05-15 Thread Dmitriy Setrakyan
Vyacheslav, I think it is a bit premature to provide a PR without getting a community consensus on the dev list. Please allow some time for the community to respond. D. On Mon, May 15, 2017 at 6:36 AM, Vyacheslav Daradur wrote: > I created the ticket:

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur
I created the ticket: https://issues.apache.org/jira/browse/IGNITE-5226 I'll prepare a PR with described solution in couple of days. 2017-05-15 15:05 GMT+03:00 Vyacheslav Daradur : > Hi, Igniters! > > Apache 2.0 is released. > > Let's continue the discussion about a

Re: Data compression in Ignite 2.0

2017-05-15 Thread Vyacheslav Daradur
Hi, Igniters! Apache 2.0 is released. Let's continue the discussion about a compression design. At the moment, I found only one solution which is compatible with querying and indexing, this is per-objects-field compression. Per-fields compression means that metadata (a header) of an object

Re: Data compression in Ignite 2.0

2017-04-10 Thread Vyacheslav Daradur
Alexey, Yes, I've read it. Ok, let's discuss about public API design. I think we need to add some a configure entity to CacheConfiguration, which will contain the Compressor interface implementation and some usefull parameters. Or maybe to provide a BinaryMarshaller decorator, which will be

Re: Data compression in Ignite 2.0

2017-04-10 Thread Alexey Kuznetsov
Vyacheslav, Did you read initial discussion [1] about compression? As far as I remember we agreed to add only some "top-level" API in order to provide a way for Ignite users to inject some sort of custom compression. [1]

Re: Data compression in Ignite 2.0

2017-04-10 Thread daradurvs
Hi Igniters! I am interested in this task. Provide some kind of pluggable compression SPI support I developed a solution on BinaryMarshaller-level, but reviewer has rejected it. Let's continue discussion of task goals and solution design. As

Re: Data compression in Ignite 2.0

2016-07-27 Thread Alexey Kuznetsov
ng distributed cache. > > > > > On the other hand, Ignite is a distributed cache implementation (a > > > pretty > > > > > good one!) that in general requires no schema and stores its data > in > > > the > > > > > row-based fashion. Its curre

Re: Data compression in Ignite 2.0

2016-07-27 Thread Sergi Vladykin
e is a distributed cache implementation (a > > pretty > > > > good one!) that in general requires no schema and stores its data in > > the > > > > row-based fashion. Its current design doesn't land itself readily to > > the > > > > kind of optimiza

Re: Data compression in Ignite 2.0

2016-07-27 Thread Sebastien DIAZ
etails of HANA are > > well > > > documented in "In-memory Data Management", by Hasso Plattner & > Alexander > > > Zeier. > > > Cheers > > > Andrey > > > _ > > > From: Alexey Kuznetsov <

Re: Data compression in Ignite 2.0

2016-07-27 Thread Alexey Kuznetsov
rovides out of the box. > > For the curios types among us, the implementation details of HANA are > well > > documented in "In-memory Data Management", by Hasso Plattner & Alexander > > Zeier. > > Cheers > > Andrey > > ___

Re: Data compression in Ignite 2.0

2016-07-26 Thread Nikita Ivanov
anagement", by Hasso Plattner & Alexander > Zeier. > Cheers > Andrey > _ > From: Alexey Kuznetsov <akuznet...@gridgain.com akuznet...@gridgain.com>> > Sent: Tuesday, July 26, 2016 5:36 AM > Subject: Re: Data compression i

Re: Data compression in Ignite 2.0

2016-07-26 Thread Andrey Kornev
@gridgain.com>> Sent: Tuesday, July 26, 2016 5:36 AM Subject: Re: Data compression in Ignite 2.0 To: <dev@ignite.apache.org<mailto:dev@ignite.apache.org>> Sergey Kozlov wrote: >> For approach 1: Put a large object into a partition cache will force to update the dictionary placed

Re: Data compression in Ignite 2.0

2016-07-26 Thread Alexey Kuznetsov
Sergey Kozlov wrote: >> For approach 1: Put a large object into a partition cache will force to update the dictionary placed on replication cache. It may be time-expense operation. The dictionary will be built only once. And we could control what should be put into dictionary, for example, we

Re: Data compression in Ignite 2.0

2016-07-25 Thread Andrey Kornev
would force people to use the non-public BinaryMarshaller class directly (as the first element of the chain). Cheers Andrey From: Dmitriy Setrakyan <dsetrak...@apache.org> Sent: Monday, July 25, 2016 1:53 PM To: dev@ignite.apache.org Subject: Re: Data

Re: Data compression in Ignite 2.0

2016-07-25 Thread Dmitriy Setrakyan
Nikita, this sounds like a pretty elegant approach. Does anyone in the community see a problem with this design? On Mon, Jul 25, 2016 at 4:49 PM, Nikita Ivanov wrote: > SAP Hana does the compression by 1) compressing SQL parameters before > execution, and 2) storing only

Re: Data compression in Ignite 2.0

2016-07-25 Thread Nikita Ivanov
SAP Hana does the compression by 1) compressing SQL parameters before execution, and 2) storing only compressed data in memory. This way all SQL queries work as normal with zero modifications or performance overhead. Only results of the query can be (optionally) decompressed back before returning

Re: Data compression in Ignite 2.0

2016-07-25 Thread Sergey Kozlov
Hi For approach 1: Put a large object into a partiton cache will force to update the dictionary placed on replication cache. It seeis it may be time-expense operation. Appoach 2-3 are make sense for rare cases as Sergi commented. Aslo I see a danger of OOM if we've got high compression level and

Re: Data compression in Ignite 2.0

2016-07-25 Thread Alexey Kuznetsov
Sergi, Of course it will introduce some slowdown, but with compression more data could be stored in memory and not will be evicted to disk. In case of compress by dictionary substitution it will be only one more lookup and should be fast. In general we could provide only API for compression out

Re: Data compression in Ignite 2.0

2016-07-25 Thread Sergi Vladykin
This will make sense only for rare cases when you have very large objects stored, which can be effectively compressed. And even then it will introduce slowdown on all the operations, which often will not be acceptable. I guess only few users will find this feature useful, thus I think it does not