[jira] [Commented] (IGNITE-11704) Write tombstones during rebalance to get rid of deferred delete buffer

2019-10-07 Thread Alexei Scherbakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946165#comment-16946165
 ] 

Alexei Scherbakov commented on IGNITE-11704:


[~jokser] [~sboikov]

I've reviewed changes. Overall looks good, but still I have some questions.

1. My main concern is regarding the necessity of tombstoneBytes 5-bytes object. 
Seems it's possible to implement tombstone by treating absence of value as a 
tombstone.
For example, valLen=0 could be treated as tombstone presense. Doing so we can 
get rid of 5 bytes comparison, and instead do null check:
{noformat}
private Boolean isTombstone(ByteBuffer buf, int offset) {
int valLen = buf.getInt(buf.position() + offset);
if (valLen != tombstoneBytes.length)
return Boolean.FALSE;
...
}
{noformat}

Instead we can do something like {{if (valLen == 0) return true}}

2. With new changes in PartitionsEvictManager it's possible to have two tasks 
of different types for the same partition.
Consider a scenario: 
* node finished rebalancing and starts to clear thombstones
* another node joins topology and become an owner for clearing partition.
* eviction is started for already clearing partition.

Probably this should not be allowed.

3. I see changes having no obvious relation to contribution, for example: 
static String cacheGroupMetricsRegistryName(String cacheGrp)
DropCacheContextDuringEvictionTest.java
GridCommandHandlerIndexingTest.java

What's the purpose of these ?

4. Could you explain the modification in 
org.apache.ignite.internal.processors.cache.GridCacheMapEntry#initialValue:

update0 |= (!preload && val == null); ?



> Write tombstones during rebalance to get rid of deferred delete buffer
> --
>
> Key: IGNITE-11704
> URL: https://issues.apache.org/jira/browse/IGNITE-11704
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Goncharuk
>Assignee: Pavel Kovalenko
>Priority: Major
>  Labels: rebalance
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently Ignite relies on deferred delete buffer in order to handle 
> write-remove conflicts during rebalance. Given the limit size of the buffer, 
> this approach is fundamentally flawed, especially in case when persistence is 
> enabled.
> I suggest to extend the logic of data storage to be able to store key 
> tombstones - to keep version for deleted entries. The tombstones will be 
> stored when rebalance is in progress and should be cleaned up when rebalance 
> is completed.
> Later this approach may be used to implement fast partition rebalance based 
> on merkle trees (in this case, tombstones should be written on an incomplete 
> baseline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-10683) Prepare process of packaging and delivering thin clients

2019-10-07 Thread Denis A. Magda (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-10683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16946096#comment-16946096
 ] 

Denis A. Magda commented on IGNITE-10683:
-

[~isapego], I know you are working on the clients' modularization? Will you be 
able to complete the modularization within 2.8 timeframes?

> Prepare process of packaging and delivering thin clients
> 
>
> Key: IGNITE-10683
> URL: https://issues.apache.org/jira/browse/IGNITE-10683
> Project: Ignite
>  Issue Type: Task
>Reporter: Peter Ivanov
>Assignee: Peter Ivanov
>Priority: Major
> Fix For: 2.8
>
>
> # **NodeJs client**
> #* +Instruction+: 
> https://github.com/nobitlost/ignite/blob/ignite--docs/modules/platforms/nodejs/README.md#publish-ignite-nodejs-client-on-npmjscom-instruction
> #* +Uploaded+: https://www.npmjs.com/package/apache-ignite-client
> # **PHP client**
> #* +Instruction+: 
> https://github.com/nobitlost/ignite/blob/ignite-7783-docs/modules/platforms/php/README.md#release-the-client-in-the-php-package-repository-instruction
> {panel}
> Cannot be uploaded on Packagist as the client should be in a dedicated 
> repository for that - 
> https://issues.apache.org/jira/browse/IGNITE-7783?focusedCommentId=16595476&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16595476
> Installation from the sources works.
> {panel}
> # **Python client**
> I have already registered the package `pyignite` on PyPI[1]. The person who 
> is going to take the responsibility of maintaining it should create an 
> account on PyPI and mail me in private, so that I can grant them the 
> necessary rights. They also must install twine[3].
> The process of packaging is well described in the packaging tutorial[2]. In 
> the nutshell, the maintainer must do the following:
> ## Clone/pull the sources from the git repository,
> ## Enter the directory in which the `setup.py` is resides (“the setup 
> directory”), in our case it is `modules/platforms/python`.
> ## Create the packages with the command `python3 setup.py sdist bdist_wheel`. 
> The packages will be created in `modules/platforms/python/dist` folder.
> ## Upload packages with twine: `twine upload dist/*`.
> It is very useful to have a dedicated Python virtual environment prepared to 
> perform steps 3-4. Just do an editable install of `pyignite` into that 
> environment from the setup directory: `pip3 install -e .` You can also 
> install twine (`pip install twine`) in it.
> Consider also making a `.pypirc` file to save time on logging in to PyPI. 
> Newest version of `twine` is said to support keyrings on Linux and Mac, but I 
> have not tried this yet.
> [1] https://pypi.org/project/pyignite/
> [2] https://packaging.python.org/tutorials/packaging-projects/
> [3] https://twine.readthedocs.io/en/latest/
> Some other notes on PyPI and versioning.
> - The package version is located in the `setup.py`, it is a `version` 
> argument of the `setuptools.setup()` function. Editing the `setup.py` is the 
> only way to set the package version.
> - You absolutely can not replace a package in PyPI (hijacking prevention). If 
> you have published the package by mistake, all you can do is delete the 
> unwanted package, increment the version counter in `setup.py`, and try again.
> - If you upload the package through the web interface of PyPI (without 
> twine), the package description will be garbled. Web interface does not 
> support markdown.
> Anyway, I would like to join in the congratulations on successful release. 
> Kudos to the team.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12263) Introduce native persistence compaction operation

2019-10-07 Thread Denis A. Magda (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Denis A. Magda updated IGNITE-12263:


Dev list discussion: 
http://apache-ignite-developers.2346864.n4.nabble.com/How-to-free-up-space-on-disc-after-removing-entries-from-IgniteCache-with-enabled-PDS-td39839.html

> Introduce native persistence compaction operation
> -
>
> Key: IGNITE-12263
> URL: https://issues.apache.org/jira/browse/IGNITE-12263
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Goncharuk
>Priority: Critical
>
> Currently, Ignite native persistence does not shrink storage files after 
> key-value pairs are removed.
> The causes of this behavior are:
>  * The absence of a mechanism that allows Ignite to track highest non-empty 
> page position in a partition file
>  * The absence of a mechanism which allows Ignite to select a page closest to 
> the file beginning for write
>  * The absence of a mechanism which allows Ignite to move a key-value pair 
> from page to page during defragmentation
> As an initial change I suggest to introduce a new node startup mode, which 
> will run a defragmentation procedure allowing the node to shrink storage 
> files. The procedure will not mutate the logical state of a partition 
> allowing further historical rebalance to quickly catch up the node. Since the 
> procedure will run during the node startup (during the final stages of 
> recovery), there will be no concurrent load, thus the entries can be freely 
> moved from page to page with no tricky synchronization.
> If a procedure is applied during the whole cluster restart, then all nodes 
> will be defragmented simultaneously, allowing for a quicker parallel 
> defragmentation at a cost of downtime.
> The procedure should accept an optional list of cache groups to defragment to 
> allow arbitrary cache group selection for defragmentation.
> An idea of the actions taken during the run for each partition selected for 
> defragmentation:
>  * Partition pages are preloaded to memory if possible to avoid excessive 
> page replacement. During the scan, a HWM of the written data is detected 
> (empty pages are skipped)
>  * Pages references in a free list are sorted in a way allowing to pick pages 
> closest to the file start
>  * The partition is scanned in reverse order, key-value pairs are moved 
> closer to the file start, HWM is updated accordingly. This step is 
> particularly open for various optimizations because different strategies will 
> work well for different fragmentation patterns.
>  * After the scan iteration is completed, the file size can be updated 
> according to the HWM
> As a further improvement, this partition defragmentation procedure can be 
> later run in online mode, after proper cache update protocol changes are 
> designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (IGNITE-12141) Ignite Spark Integration Support Schema on Table Write

2019-10-07 Thread Manoj G T (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945983#comment-16945983
 ] 

Manoj G T edited comment on IGNITE-12141 at 10/7/19 3:57 PM:
-

Dear [~zaleslaw], If I understood correctly as per the implementation of 
https://issues.apache.org/jira/browse/IGNITE-9228 we can only give schema name 
to read data, not for write.

This feature was implemented during Ignite 2.6 timeline and at that point of 
time Ignite doesn't allow to create table on any schema other than Public 
Schema and this is the reason for not supporting "OPTION_SCHEMA" during 
Overwrite mode. (Reference: 
[https://github.com/apache/ignite/pull/4551)|https://github.com/apache/ignite/pull/4551]

Kindly let me know if this is also addressed so that I can close this bug.


was (Author: gtmanoj235):
Dear [~zaleslaw], If I understood correctly as per the implementation of 
https://issues.apache.org/jira/browse/IGNITE-9228 we can only give schema name 
to read data, not for write.

If my understanding is correct, this feature was implemented during Ignite 2.6 
timeline and at that point of time Ignite doesn't allow to create table on any 
schema other than Public Schema and this is the reason for not supporting 
"OPTION_SCHEMA" during Overwrite mode. (Reference: 
[https://github.com/apache/ignite/pull/4551)|https://github.com/apache/ignite/pull/4551]

Kindly let me know if this is also addressed so that I can close this bug.

> Ignite Spark Integration Support Schema on Table Write
> --
>
> Key: IGNITE-12141
> URL: https://issues.apache.org/jira/browse/IGNITE-12141
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Manoj G T
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Ignite 2.6 doesn't allow to create table on any schema other than Public 
> Schema and this is the reason for not supporting "OPTION_SCHEMA" during 
> Overwrite mode. Now that Ignite supports to create the table in any given 
> schema it will be great if we can incorporate the changes to support 
> "OPTION_SCHEMA" during Overwrite mode and make it available as part of next 
> Ignite release.
>  
> +Related Issue:+
> [https://stackoverflow.com/questions/57782033/apache-ignite-spark-integration-not-working-with-schema-name]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12141) Ignite Spark Integration Support Schema on Table Write

2019-10-07 Thread Manoj G T (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945983#comment-16945983
 ] 

Manoj G T commented on IGNITE-12141:


Dear [~zaleslaw], If I understood correctly as per the implementation of 
https://issues.apache.org/jira/browse/IGNITE-9228 we can only give schema name 
to read data, not for write.

If my understanding is correct, this feature was implemented during Ignite 2.6 
timeline and at that point of time Ignite doesn't allow to create table on any 
schema other than Public Schema and this is the reason for not supporting 
"OPTION_SCHEMA" during Overwrite mode. (Reference: 
[https://github.com/apache/ignite/pull/4551)|https://github.com/apache/ignite/pull/4551]

Kindly let me know if this is also addressed so that I can close this bug.

> Ignite Spark Integration Support Schema on Table Write
> --
>
> Key: IGNITE-12141
> URL: https://issues.apache.org/jira/browse/IGNITE-12141
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Manoj G T
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Ignite 2.6 doesn't allow to create table on any schema other than Public 
> Schema and this is the reason for not supporting "OPTION_SCHEMA" during 
> Overwrite mode. Now that Ignite supports to create the table in any given 
> schema it will be great if we can incorporate the changes to support 
> "OPTION_SCHEMA" during Overwrite mode and make it available as part of next 
> Ignite release.
>  
> +Related Issue:+
> [https://stackoverflow.com/questions/57782033/apache-ignite-spark-integration-not-working-with-schema-name]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9489) CorruptedTreeException on index create.

2019-10-07 Thread Ivan Rakov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945940#comment-16945940
 ] 

Ivan Rakov commented on IGNITE-9489:


Should be fixed by https://issues.apache.org/jira/browse/IGNITE-12061 and 
https://issues.apache.org/jira/browse/IGNITE-11541.
[~tledkov-gridgain] Can you please verify and close the ticket if the problem 
is not actual anymore?

> CorruptedTreeException on index create.
> ---
>
> Key: IGNITE-9489
> URL: https://issues.apache.org/jira/browse/IGNITE-9489
> Project: Ignite
>  Issue Type: Bug
>  Components: cache, sql
>Affects Versions: 2.4, 2.5, 2.6
>Reporter: Igor Seliverstov
>Priority: Blocker
> Fix For: 2.8
>
> Attachments: Test.java
>
>
> Currently on dynamic index drop with enabled persistence H2TreeIndex 
> instances aren't destroyed. That means that their root pages aren't removed 
> from meta tree (see 
> {{org.apache.ignite.internal.processors.cache.persistence.IndexStorageImpl#getOrAllocateForTree}})
>  and reused on subsequent dynamic index create that leads 
> CorruptedTreeException on initial index rebuild because there are some items 
> with broken links on the root page.
> Reproducer attached.
> Error log:
> {noformat}
> Error during parallel index create/rebuild.
> org.h2.message.DbException: Внутренняя ошибка: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on row: Row@7745722d[ key: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$KeyClass
>  [idHash=2038596277, hash=-1388553726, id=1], val: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$ValueClass
>  [idHash=2109544797, hash=-898815788, field1=val1], ver: GridCacheVersion 
> [topVer=147733489, order=1536253488473, nodeOrder=2] ][ 1, val1, null ]"
> General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on row: Row@7745722d[ key: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$KeyClass
>  [idHash=2038596277, hash=-1388553726, id=1], val: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$ValueClass
>  [idHash=2109544797, hash=-898815788, field1=val1], ver: GridCacheVersion 
> [topVer=147733489, order=1536253488473, nodeOrder=2] ][ 1, val1, null ]" 
> [5-195]
>   at org.h2.message.DbException.get(DbException.java:168)
>   at org.h2.message.DbException.convert(DbException.java:295)
>   at 
> org.apache.ignite.internal.processors.query.h2.database.H2TreeIndex.putx(H2TreeIndex.java:251)
>   at 
> org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$3.apply(IgniteH2Indexing.java:890)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.updateIndex(GridCacheMapEntry.java:4320)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processKey(SchemaIndexCacheVisitorImpl.java:244)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartition(SchemaIndexCacheVisitorImpl.java:207)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.processPartitions(SchemaIndexCacheVisitorImpl.java:166)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl.access$100(SchemaIndexCacheVisitorImpl.java:50)
>   at 
> org.apache.ignite.internal.processors.query.schema.SchemaIndexCacheVisitorImpl$AsyncWorker.body(SchemaIndexCacheVisitorImpl.java:317)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: org.h2.jdbc.JdbcSQLException: Внутренняя ошибка: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on row: Row@7745722d[ key: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$KeyClass
>  [idHash=2038596277, hash=-1388553726, id=1], val: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$ValueClass
>  [idHash=2109544797, hash=-898815788, field1=val1], ver: GridCacheVersion 
> [topVer=147733489, order=1536253488473, nodeOrder=2] ][ 1, val1, null ]"
> General error: "class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on row: Row@7745722d[ key: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$KeyClass
>  [idHash=2038596277, hash=-1388553726, id=1], val: 
> org.apache.ignite.internal.processors.cache.index.AbstractSchemaSelfTest$ValueClass
>  [idHash=2109544797, hash=-898815788, field1=val1], ver: GridCacheVersion 
> [topVer=14773

[jira] [Assigned] (IGNITE-12261) Issue with adding nested index dynamically

2019-10-07 Thread Hemambara (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hemambara reassigned IGNITE-12261:
--

Assignee: Hemambara

>   Issue with adding nested index dynamically 
> -
>
> Key: IGNITE-12261
> URL: https://issues.apache.org/jira/browse/IGNITE-12261
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7.6
>Reporter: Hemambara
>Assignee: Hemambara
>Priority: Major
>
> [http://apache-ignite-users.70518.x6.nabble.com/Issue-with-adding-nested-index-dynamically-tt29571.html]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12266) Add limit parameter to Platforms for processing TextQuery

2019-10-07 Thread Yuriy Shuliha (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945907#comment-16945907
 ] 

Yuriy Shuliha  commented on IGNITE-12266:
-

[~Pavlukhin] - yes, you are right, proper description should be added to the 
ticket.

I planned to add it immediately after creation; but now i see that it will 
require more clarifications.

There's place for TextQuery creation via binary readerin PlatformCache; 
https://github.com/apache/ignite/blob/dc58d91a866d59eb268a60f829a47ab5f259de38/modules/core/src/main/java/org/apache/ignite/internal/processors/platform/cache/PlatformCache.java#L1425

But i cannot find proper point where corresponding TextQuery is being written 
into stream.
[~amashenkov]  - would you assist with this ?

CC: [~avinogradov], [~agoncharuk]

> Add limit parameter to Platforms for processing TextQuery
> -
>
> Key: IGNITE-12266
> URL: https://issues.apache.org/jira/browse/IGNITE-12266
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-11942) IGFS and Hadoop Accelerator Discontinuation

2019-10-07 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-11942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945887#comment-16945887
 ] 

Alexey Zinoviev commented on IGNITE-11942:
--

[~dmagda] I've started the work on decoupling, write later about results

> IGFS and Hadoop Accelerator Discontinuation
> ---
>
> Key: IGNITE-11942
> URL: https://issues.apache.org/jira/browse/IGNITE-11942
> Project: Ignite
>  Issue Type: Task
>Reporter: Denis A. Magda
>Priority: Blocker
> Fix For: 2.8
>
>
> The community has voted for the following decision:
> * IGFS and In-Memory Hadoop Accelerator components are to be discontinued and 
> no longer supported by the community 
> * The existing source code of IGFS and In-Memory Hadoop Accelerator is to be 
> removed from Ignite master. Before that, a special branch like 
> "ignite-igfs-and-hadoop-accelerator" to be forked off the master in order to 
> preserve the sources in Git history for those who might need it. 
> The voting thread:
> http://apache-ignite-developers.2346864.n4.nabble.com/VOTE-Complete-Discontinuation-of-IGFS-and-Hadoop-Accelerator-td42405.html
> Once the changes are made for Ignite 2.8, please contact Denis Magda to 
> update a public documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10292:
-
Labels: await  (was: )

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.9
>Reporter: Anton Dmitriev
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10292:
-
Ignite Flags: Release Notes Required  (was: Docs Required)

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Anton Dmitriev
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10292) ML: Replace IGFS by model storage for TensorFlow

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10292:
-
Affects Version/s: (was: 2.9)
   2.8

> ML: Replace IGFS by model storage for TensorFlow
> 
>
> Key: IGNITE-10292
> URL: https://issues.apache.org/jira/browse/IGNITE-10292
> Project: Ignite
>  Issue Type: Improvement
>  Components: ml
>Affects Versions: 2.8
>Reporter: Anton Dmitriev
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> Currently we have a TensorFlow IGFS plugin that provides a file system 
> functionality (see 
> https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/ignite).
>  At the same time IGFS is deprecated and would be great to replace it by a 
> simple model storage based on cache.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12267) ClassCastException after change column type (drop, add)

2019-10-07 Thread Kirill Tkalenko (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945882#comment-16945882
 ] 

Kirill Tkalenko commented on IGNITE-12267:
--

I started TC.

> ClassCastException after change column type (drop, add)
> ---
>
> Key: IGNITE-12267
> URL: https://issues.apache.org/jira/browse/IGNITE-12267
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> SQL column type change is not present, but it is possible to delete and 
> create with a new type.
> The application of the migration script passes without errors.
> The error occurs whenever the column is accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12267) ClassCastException after change column type (drop, add)

2019-10-07 Thread Kirill Tkalenko (Jira)
Kirill Tkalenko created IGNITE-12267:


 Summary: ClassCastException after change column type (drop, add)
 Key: IGNITE-12267
 URL: https://issues.apache.org/jira/browse/IGNITE-12267
 Project: Ignite
  Issue Type: Improvement
Reporter: Kirill Tkalenko
Assignee: Kirill Tkalenko
 Fix For: 2.8


SQL column type change is not present, but it is possible to delete and create 
with a new type.
The application of the migration script passes without errors.
The error occurs whenever the column is accessed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12141) Ignite Spark Integration Support Schema on Table Write

2019-10-07 Thread Alexey Zinoviev (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945842#comment-16945842
 ] 

Alexey Zinoviev commented on IGNITE-12141:
--

Dear [~gtmanoj235] it was fixed in 
https://issues.apache.org/jira/browse/IGNITE-9228 and will be a part of 2.8 
release. Could we close this bug?

> Ignite Spark Integration Support Schema on Table Write
> --
>
> Key: IGNITE-12141
> URL: https://issues.apache.org/jira/browse/IGNITE-12141
> Project: Ignite
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.7.5
>Reporter: Manoj G T
>Assignee: Alexey Zinoviev
>Priority: Critical
> Fix For: 2.8
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> Ignite 2.6 doesn't allow to create table on any schema other than Public 
> Schema and this is the reason for not supporting "OPTION_SCHEMA" during 
> Overwrite mode. Now that Ignite supports to create the table in any given 
> schema it will be great if we can incorporate the changes to support 
> "OPTION_SCHEMA" during Overwrite mode and make it available as part of next 
> Ignite release.
>  
> +Related Issue:+
> [https://stackoverflow.com/questions/57782033/apache-ignite-spark-integration-not-working-with-schema-name]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-10527) [ML] DenseMatrix(double[] mtx, int rows) mixes args

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev resolved IGNITE-10527.
--
Release Note: Fixed in IGNITE-10480
  Resolution: Fixed

> [ML] DenseMatrix(double[] mtx, int rows) mixes args
> ---
>
> Key: IGNITE-10527
> URL: https://issues.apache.org/jira/browse/IGNITE-10527
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Artem Malykh
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> this(mtx, StorageConstants.ROW_STORAGE_MODE, rows) -> 
> this(mtx, rows, StorageConstants.ROW_STORAGE_MODE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IGNITE-10527) [ML] DenseMatrix(double[] mtx, int rows) mixes args

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev closed IGNITE-10527.


> [ML] DenseMatrix(double[] mtx, int rows) mixes args
> ---
>
> Key: IGNITE-10527
> URL: https://issues.apache.org/jira/browse/IGNITE-10527
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Artem Malykh
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> this(mtx, StorageConstants.ROW_STORAGE_MODE, rows) -> 
> this(mtx, rows, StorageConstants.ROW_STORAGE_MODE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-10527) [ML] DenseMatrix(double[] mtx, int rows) mixes args

2019-10-07 Thread Alexey Zinoviev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-10527:
-
Ignite Flags:   (was: Docs Required)

> [ML] DenseMatrix(double[] mtx, int rows) mixes args
> ---
>
> Key: IGNITE-10527
> URL: https://issues.apache.org/jira/browse/IGNITE-10527
> Project: Ignite
>  Issue Type: Bug
>  Components: ml
>Affects Versions: 2.8
>Reporter: Artem Malykh
>Assignee: Alexey Zinoviev
>Priority: Major
>  Labels: await
> Fix For: 2.8
>
>
> this(mtx, StorageConstants.ROW_STORAGE_MODE, rows) -> 
> this(mtx, rows, StorageConstants.ROW_STORAGE_MODE);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12266) Add limit parameter to Platforms for processing TextQuery

2019-10-07 Thread Ivan Pavlukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945832#comment-16945832
 ] 

Ivan Pavlukhin commented on IGNITE-12266:
-

[~Yuriy_Shuliha], could you please fill in a ticket description? We strive to 
create tickets ready to be taken for development by any community member. 
Usually a ticket description helps with it.

> Add limit parameter to Platforms for processing TextQuery
> -
>
> Key: IGNITE-12266
> URL: https://issues.apache.org/jira/browse/IGNITE-12266
> Project: Ignite
>  Issue Type: Improvement
>  Components: platforms
>Reporter: Yuriy Shuliha 
>Assignee: Yuriy Shuliha 
>Priority: Major
> Fix For: 2.8
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12266) Add limit parameter to Platforms for processing TextQuery

2019-10-07 Thread Yuriy Shuliha (Jira)
Yuriy Shuliha  created IGNITE-12266:
---

 Summary: Add limit parameter to Platforms for processing TextQuery
 Key: IGNITE-12266
 URL: https://issues.apache.org/jira/browse/IGNITE-12266
 Project: Ignite
  Issue Type: Improvement
  Components: platforms
Reporter: Yuriy Shuliha 
Assignee: Yuriy Shuliha 
 Fix For: 2.8






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Alexey Goncharuk (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945817#comment-16945817
 ] 

Alexey Goncharuk commented on IGNITE-12265:
---

Also, there is an exclude in the parent pom file: {{org.apache.ignite -exclude 
org.apache.ignite.client}}

> JavaDoc doesn't have documentation for the org.apache.ignite.client package
> ---
>
> Key: IGNITE-12265
> URL: https://issues.apache.org/jira/browse/IGNITE-12265
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Mekhanikov
>Priority: Major
>
> JavaDoc published on the website doesn't have documentation for the 
> {{org.apache.ignite.client}} package. Link to the website: 
> [https://ignite.apache.org/releases/2.7.6/javadoc/]
> A lack of {{package-info.java}} file or exclusion from the 
> {{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Ivan Pavlukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Pavlukhin updated IGNITE-12265:

Labels: newbie  (was: )

> JavaDoc doesn't have documentation for the org.apache.ignite.client package
> ---
>
> Key: IGNITE-12265
> URL: https://issues.apache.org/jira/browse/IGNITE-12265
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Mekhanikov
>Priority: Major
>  Labels: newbie
>
> JavaDoc published on the website doesn't have documentation for the 
> {{org.apache.ignite.client}} package. Link to the website: 
> [https://ignite.apache.org/releases/2.7.6/javadoc/]
> A lack of {{package-info.java}} file or exclusion from the 
> {{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Ivan Pavlukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Pavlukhin updated IGNITE-12265:

Labels: newbi  (was: )

> JavaDoc doesn't have documentation for the org.apache.ignite.client package
> ---
>
> Key: IGNITE-12265
> URL: https://issues.apache.org/jira/browse/IGNITE-12265
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Mekhanikov
>Priority: Major
>  Labels: newbi
>
> JavaDoc published on the website doesn't have documentation for the 
> {{org.apache.ignite.client}} package. Link to the website: 
> [https://ignite.apache.org/releases/2.7.6/javadoc/]
> A lack of {{package-info.java}} file or exclusion from the 
> {{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Ivan Pavlukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Pavlukhin updated IGNITE-12265:

Labels:   (was: newbi)

> JavaDoc doesn't have documentation for the org.apache.ignite.client package
> ---
>
> Key: IGNITE-12265
> URL: https://issues.apache.org/jira/browse/IGNITE-12265
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Mekhanikov
>Priority: Major
>
> JavaDoc published on the website doesn't have documentation for the 
> {{org.apache.ignite.client}} package. Link to the website: 
> [https://ignite.apache.org/releases/2.7.6/javadoc/]
> A lack of {{package-info.java}} file or exclusion from the 
> {{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12265) JavaDoc doesn't have documentation for the org.apache.ignite.client package

2019-10-07 Thread Denis Mekhanikov (Jira)
Denis Mekhanikov created IGNITE-12265:
-

 Summary: JavaDoc doesn't have documentation for the 
org.apache.ignite.client package
 Key: IGNITE-12265
 URL: https://issues.apache.org/jira/browse/IGNITE-12265
 Project: Ignite
  Issue Type: Bug
Reporter: Denis Mekhanikov


JavaDoc published on the website doesn't have documentation for the 
{{org.apache.ignite.client}} package. Link to the website: 
[https://ignite.apache.org/releases/2.7.6/javadoc/]

A lack of {{package-info.java}} file or exclusion from the 
{{maven-javadoc-plugin}} in the root {{pom.xml}} may be the reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-7820) Investigate and fix perfromance drop of WAL for FSYNC mode

2019-10-07 Thread Andrey N. Gura (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945814#comment-16945814
 ] 

Andrey N. Gura commented on IGNITE-7820:


[~mmuzaf] Yes, the issue is actual. But it isn't blocker for release. See to 
affected version. It is 2.4.

> Investigate and fix perfromance drop of WAL for FSYNC mode
> --
>
> Key: IGNITE-7820
> URL: https://issues.apache.org/jira/browse/IGNITE-7820
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.4
>Reporter: Andrey N. Gura
>Assignee: Andrey N. Gura
>Priority: Critical
> Fix For: 2.8
>
>
> WAL performance drop was introduced by 
> https://issues.apache.org/jira/browse/IGNITE-6339 fix. In order to provide 
> better performance for {{FSYNC}} WAL mode 
> {{FsyncModeFileWriteAheadLogManager}} implementation was added as result of 
> fix issue https://issues.apache.org/jira/browse/IGNITE-7594.
> *What we know about this performance drop:*
> * It affects {{IgnitePutAllBenchmark}} and {{IgnitePutAllTxBenchmark}} and 
> measurements show 10-15% drop and ~50% drop accordingly.
> * It is reproducible not for all hardware configuration. That is for some 
> configuration we see performance improvements instead of drop.
> * It is reproducible for [Many clients --> One server] topology.
> * If {{IGNITE_WAL_MMAP == false}} then we have better performance.
> * If {{fsyncDelay == 0}} then we have better performance.
> *What were tried during initial investigation:*
> * Replacing of {{LockSupport.park/unpark}} to spin leads to improvement about 
> 2%.
> * Using {{FileWriteHandle.fsync(null)}} (unconditional flush) instead of 
> {{FileWriteHandle.fsync(position)}} (conditional flush) doesn't affect 
> benchmarks.
> *What should we do:*
> Investigate the problem and provide fix or recommendation for system tuning.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (IGNITE-12208) If JAVA_HOME is not configured, the ignitevisorcmd.sh script cannot run

2019-10-07 Thread Ilya Kasnacheev (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ilya Kasnacheev resolved IGNITE-12208.
--
Fix Version/s: (was: 2.8)
   Resolution: Duplicate

> If JAVA_HOME is not configured, the ignitevisorcmd.sh script cannot run
> ---
>
> Key: IGNITE-12208
> URL: https://issues.apache.org/jira/browse/IGNITE-12208
> Project: Ignite
>  Issue Type: Bug
>  Components: visor
>Affects Versions: 2.7.6
> Environment: Debian LInux OS
> sudo apt install openjdk-8-jdk
>Reporter: YuJue Li
>Priority: Critical
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If JAVA_HOME is not configured, the ignitevisorcmd.sh script cannot run.
> there is no error message output, and the document does not seem to indicate 
> that JAVA_HOME must be configured.
> Versions 2.7.0 do not have this problem, but versions 2.7.5 and 2.7.6 do.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12264) Private application data should not be lit in the logs, exceptions, ERROR, WARN etc.

2019-10-07 Thread Pushenko Kirill (Jira)
Pushenko Kirill created IGNITE-12264:


 Summary: Private application data should not be lit in the logs, 
exceptions, ERROR, WARN etc.
 Key: IGNITE-12264
 URL: https://issues.apache.org/jira/browse/IGNITE-12264
 Project: Ignite
  Issue Type: Improvement
Affects Versions: 2.7.6
Reporter: Pushenko Kirill


Private application data should not be lit in the logs, exceptions, ERROR, WARN 
etc.

The executions contained a value in which there were cardboard numbers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-9964) SQL: query by affinity key fails when a table is created from a custom template

2019-10-07 Thread Andrey Mashenkov (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16945730#comment-16945730
 ] 

Andrey Mashenkov commented on IGNITE-9964:
--

IGNITE-5795 has been resolved.

Let's check this against latest master and close if it is fixed.

> SQL: query by affinity key fails when a table is created from a custom 
> template
> ---
>
> Key: IGNITE-9964
> URL: https://issues.apache.org/jira/browse/IGNITE-9964
> Project: Ignite
>  Issue Type: Bug
>Reporter: Stanislav Lukyanov
>Priority: Major
>
> SELECT by affinity key doesn't work on a table that was created with a custom 
> cache template if entries were added through JCache (using withKeepBinary()).
> If the first row was added via INSERT and after that another row was added 
> via withKeepBinary(), SELECT by affinity key works correctly.
> If the first row was added via withKeepBinary() and after that another row 
> was added via INSERT, SELECT by affinity key returns nothing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IGNITE-12263) Introduce native persistence compaction operation

2019-10-07 Thread Alexey Goncharuk (Jira)
Alexey Goncharuk created IGNITE-12263:
-

 Summary: Introduce native persistence compaction operation
 Key: IGNITE-12263
 URL: https://issues.apache.org/jira/browse/IGNITE-12263
 Project: Ignite
  Issue Type: Improvement
Reporter: Alexey Goncharuk


Currently, Ignite native persistence does not shrink storage files after 
key-value pairs are removed.
The causes of this behavior are:
 * The absence of a mechanism that allows Ignite to track highest non-empty 
page position in a partition file
 * The absence of a mechanism which allows Ignite to select a page closest to 
the file beginning for write
 * The absence of a mechanism which allows Ignite to move a key-value pair from 
page to page during defragmentation

As an initial change I suggest to introduce a new node startup mode, which will 
run a defragmentation procedure allowing the node to shrink storage files. The 
procedure will not mutate the logical state of a partition allowing further 
historical rebalance to quickly catch up the node. Since the procedure will run 
during the node startup (during the final stages of recovery), there will be no 
concurrent load, thus the entries can be freely moved from page to page with no 
tricky synchronization.

If a procedure is applied during the whole cluster restart, then all nodes will 
be defragmented simultaneously, allowing for a quicker parallel defragmentation 
at a cost of downtime.

The procedure should accept an optional list of cache groups to defragment to 
allow arbitrary cache group selection for defragmentation.

An idea of the actions taken during the run for each partition selected for 
defragmentation:
 * Partition pages are preloaded to memory if possible to avoid excessive page 
replacement. During the scan, a HWM of the written data is detected (empty 
pages are skipped)
 * Pages references in a free list are sorted in a way allowing to pick pages 
closest to the file start
 * The partition is scanned in reverse order, key-value pairs are moved closer 
to the file start, HWM is updated accordingly. This step is particularly open 
for various optimizations because different strategies will work well for 
different fragmentation patterns.
 * After the scan iteration is completed, the file size can be updated 
according to the HWM

As a further improvement, this partition defragmentation procedure can be later 
run in online mode, after proper cache update protocol changes are designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12263) Introduce native persistence compaction operation

2019-10-07 Thread Alexey Goncharuk (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Goncharuk updated IGNITE-12263:
--
Priority: Critical  (was: Major)

> Introduce native persistence compaction operation
> -
>
> Key: IGNITE-12263
> URL: https://issues.apache.org/jira/browse/IGNITE-12263
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Goncharuk
>Priority: Critical
>
> Currently, Ignite native persistence does not shrink storage files after 
> key-value pairs are removed.
> The causes of this behavior are:
>  * The absence of a mechanism that allows Ignite to track highest non-empty 
> page position in a partition file
>  * The absence of a mechanism which allows Ignite to select a page closest to 
> the file beginning for write
>  * The absence of a mechanism which allows Ignite to move a key-value pair 
> from page to page during defragmentation
> As an initial change I suggest to introduce a new node startup mode, which 
> will run a defragmentation procedure allowing the node to shrink storage 
> files. The procedure will not mutate the logical state of a partition 
> allowing further historical rebalance to quickly catch up the node. Since the 
> procedure will run during the node startup (during the final stages of 
> recovery), there will be no concurrent load, thus the entries can be freely 
> moved from page to page with no tricky synchronization.
> If a procedure is applied during the whole cluster restart, then all nodes 
> will be defragmented simultaneously, allowing for a quicker parallel 
> defragmentation at a cost of downtime.
> The procedure should accept an optional list of cache groups to defragment to 
> allow arbitrary cache group selection for defragmentation.
> An idea of the actions taken during the run for each partition selected for 
> defragmentation:
>  * Partition pages are preloaded to memory if possible to avoid excessive 
> page replacement. During the scan, a HWM of the written data is detected 
> (empty pages are skipped)
>  * Pages references in a free list are sorted in a way allowing to pick pages 
> closest to the file start
>  * The partition is scanned in reverse order, key-value pairs are moved 
> closer to the file start, HWM is updated accordingly. This step is 
> particularly open for various optimizations because different strategies will 
> work well for different fragmentation patterns.
>  * After the scan iteration is completed, the file size can be updated 
> according to the HWM
> As a further improvement, this partition defragmentation procedure can be 
> later run in online mode, after proper cache update protocol changes are 
> designed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)