[jira] [Comment Edited] (IGNITE-12638) Classes persisted by DistributedMetaStorage are not IgniteDTO

2020-02-11 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034395#comment-17034395
 ] 

Dmitriy Govorukhin edited comment on IGNITE-12638 at 2/11/20 12:17 PM:
---

[~ibessonov] Looks good to me! Thanks!


was (Author: dmitriygovorukhin):
[~ibessonov]Looks good to me! Thanks!

> Classes persisted by DistributedMetaStorage are not IgniteDTO
> -
>
> Key: IGNITE-12638
> URL: https://issues.apache.org/jira/browse/IGNITE-12638
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has to be fixed to simplify future modification of component.
> DistributedMetaStorageHistoryItem and DistributedMetaStorageVersion will be 
> persisted on disc so we need to have a reliable way to read them even when 
> classes will be updated in the future. IgniteDataTransferObject is the 
> standard option for such cases - it allows versioning of serialization format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12638) Classes persisted by DistributedMetaStorage are not IgniteDTO

2020-02-11 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17034395#comment-17034395
 ] 

Dmitriy Govorukhin commented on IGNITE-12638:
-

[~ibessonov]Looks good to me! Thanks!

> Classes persisted by DistributedMetaStorage are not IgniteDTO
> -
>
> Key: IGNITE-12638
> URL: https://issues.apache.org/jira/browse/IGNITE-12638
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has to be fixed to simplify future modification of component.
> DistributedMetaStorageHistoryItem and DistributedMetaStorageVersion will be 
> persisted on disc so we need to have a reliable way to read them even when 
> classes will be updated in the future. IgniteDataTransferObject is the 
> standard option for such cases - it allows versioning of serialization format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12351) Append additional cp tracking activity - pages sort.

2019-11-12 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12351:

Fix Version/s: 2.8

> Append additional cp tracking activity - pages sort.
> 
>
> Key: IGNITE-12351
> URL: https://issues.apache.org/jira/browse/IGNITE-12351
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Malinovskiy
>Assignee: Vladimir Malinovskiy
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> CheckpointMetricsTracker has no info about _splitAndSortCpPagesIfNeeded_ 
> stage, thus in case of huge number of dirty pages someone can observe in log:
> 10:08:00 checkpoint started
> 10:10:00 checkpoint finished
> <--- ?? 
> 10:10:20 checkpoint started
> if checkpointFrequency = 3 and all tracker durations: beforeLockDuration, 
> lockWaitDuration gives no clue what kind of work (20 sec) Checkpointer thread 
> is waiting for.
> Additionally (hope this is not big deal) need t obe fixed redundant 
> effectivePageId computation cause FullPageId already has effectivePageId info.
>  
> {{return Long.compare(PageIdUtils.effectivePageId(o1.pageId()),
> PageIdUtils.effectivePageId(o2.pageId()));}}
> writeCheckpointEntr() duration should also be logged.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12351) Append additional cp tracking activity - pages sort.

2019-11-12 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972179#comment-16972179
 ] 

Dmitriy Govorukhin commented on IGNITE-12351:
-

[~vmalinovskiy] IgniteSpiCheckpointSelfTestSuite is not good place to store 
CheckCheckpointStartLoggingTest, beacuse IgniteSpiCheckpoint* is not a database 
checkpoint. Please move CheckCheckpointStartLoggingTest to some PDS suite.

> Append additional cp tracking activity - pages sort.
> 
>
> Key: IGNITE-12351
> URL: https://issues.apache.org/jira/browse/IGNITE-12351
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Malinovskiy
>Assignee: Vladimir Malinovskiy
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> CheckpointMetricsTracker has no info about _splitAndSortCpPagesIfNeeded_ 
> stage, thus in case of huge number of dirty pages someone can observe in log:
> 10:08:00 checkpoint started
> 10:10:00 checkpoint finished
> <--- ?? 
> 10:10:20 checkpoint started
> if checkpointFrequency = 3 and all tracker durations: beforeLockDuration, 
> lockWaitDuration gives no clue what kind of work (20 sec) Checkpointer thread 
> is waiting for.
> Additionally (hope this is not big deal) need t obe fixed redundant 
> effectivePageId computation cause FullPageId already has effectivePageId info.
>  
> {{return Long.compare(PageIdUtils.effectivePageId(o1.pageId()),
> PageIdUtils.effectivePageId(o2.pageId()));}}
> writeCheckpointEntr() duration should also be logged.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12108) [IEP-35] Migrate Communication Metrics.

2019-10-25 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959571#comment-16959571
 ] 

Dmitriy Govorukhin commented on IGNITE-12108:
-

[~nizhikov] Do you have any objections? As I can see, Ivan fixed all comments.

> [IEP-35] Migrate Communication Metrics.
> ---
>
> Key: IGNITE-12108
> URL: https://issues.apache.org/jira/browse/IGNITE-12108
> Project: Ignite
>  Issue Type: New Feature
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: IEP-35, await
> Fix For: 2.8
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> ||*Name*||*Description*||
> |communication.tcp.outboundMessagesQueueSize|Number of messages waiting to be 
> sent|
> |communication.tcp.sentBytes|Total number of bytes received by current node|
> |communication.tcp.receivedBytes|Total number of bytes sent by current node|
> |communication.tcp.sentMessagesCount|Total number of messages sent by current 
> node|
> |communication.tcp.receivedMessagesCount|Total number of messages received by 
> current node|
> |communication.tcp.sentMessagesByType.|Total number of messages 
> with given type sent by current node|
> |communication.tcp.receivedMessagesByType.|Total number of 
> messages with given type received by current node|
> |communication.tcp..sentMessagesToNode|Total number of messages sent 
> by current node to the given node|
> |communication.tcp..receivedMessagesFromNode|Total number of messages 
> received by current node from the given node|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12199) WAL record with data entries doesn't flushes on backups for transactional cache

2019-09-21 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935044#comment-16935044
 ] 

Dmitriy Govorukhin commented on IGNITE-12199:
-

[~agura] Looks good to me too. Proceed to merge.

> WAL record with data entries doesn't flushes on backups for transactional 
> cache
> ---
>
> Key: IGNITE-12199
> URL: https://issues.apache.org/jira/browse/IGNITE-12199
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Andrey Gura
>Assignee: Andrey Gura
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> WAL record with data entries doesn't flushes on backups for transactional 
> cache.
> This issue can be reproduced for example by 
> {{TxPartitionCounterStateConsistencyTest.testSingleThreadedUpdateOrder}} test 
> with disabled MMAP mode.
> Problem place in code is {{GridDistributedTxRemoteAdapter#commitIfLocked}} 
> where {{wal.log()}} doesn't assign returned file pointer to the {{ptr}} 
> variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (IGNITE-12179) Test and javadoc fixes

2019-09-19 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12179:

Fix Version/s: 2.8

> Test and javadoc fixes
> --
>
> Key: IGNITE-12179
> URL: https://issues.apache.org/jira/browse/IGNITE-12179
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> *.testTtlNoTx flaky failed on TC
> TcpCommunicationSpiFreezingClientTest failed
> TcpCommunicationSpiFaultyClientSslTest.testNotAcceptedConnection failed
> testCacheIdleVerifyPrintLostPartitions failed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IGNITE-12135) Rework GridCommandHandlerTest

2019-09-11 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16927395#comment-16927395
 ] 

Dmitriy Govorukhin commented on IGNITE-12135:
-

[~ktkale...@gridgain.com]Merged to master. Thanks for your contribution!

> Rework GridCommandHandlerTest
> -
>
> Key: IGNITE-12135
> URL: https://issues.apache.org/jira/browse/IGNITE-12135
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Kirill Tkalenko
>Assignee: Kirill Tkalenko
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There are 50+ tests. In each test we are start and stop nodes. I think we 
> could split tests at least to two groups:
>  # Tests on normal behaviour. We could start nodes before all tests and stop 
> them after all tests.
>  # Tests required start new cluster before each test.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12127) WAL writer may close file IO with unflushed changes when MMAP is disabled

2019-09-05 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923423#comment-16923423
 ] 

Dmitriy Govorukhin commented on IGNITE-12127:
-

Cherry-picked to ignite-2.7.6 402c9450dafbb201708f66d8bdab0ade0b87bd4f

> WAL writer may close file IO with unflushed changes when MMAP is disabled
> -
>
> Key: IGNITE-12127
> URL: https://issues.apache.org/jira/browse/IGNITE-12127
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Most likely the issue manifests itself as the following critical error:
> {code}
> 2019-08-27 14:52:31.286 ERROR 26835 --- [wal-write-worker%null-#447] ROOT : 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.StorageException: Failed to write 
> buffer.]]
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to write buffer.
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3444)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3249)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
> [ignite-core-2.5.7.jar!/:2.5.7]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> Caused by: java.nio.channels.ClosedChannelException: null
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) 
> ~[na:1.8.0_201]
> at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) 
> ~[na:1.8.0_201]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3437)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> ... 3 common frames omitted
> {code}
> It appears that there following sequence is possible:
>  * Thread A attempts to log a large record which does not fit segment, 
> {{addRecord}} fails and the thread A starts segment rollover. I successfully 
> runs {{flushOrWait(null)}} and gets de-scheduled before adding switch segment 
> record
>  * Thread B attempts to log another record, which fits exactly till the end 
> of the current segment. The record is added to the buffer
>  * Thread A resumes and fails to add the switch segment record. No flush is 
> performed and the thread immediately proceeds for wal-writer close
>  * WAL writer thread wakes up, sees that there is a CLOSE request, closes the 
> file IO and immediately proceeds to write unflushed changes causing the 
> exception.
> Unconditional flush after switch segment record write should fix the issue.
> Besides the bug itself, I suggest the following changes to the 
> {{FileWriteHandleImpl}} ({{FileWriteAheadLogManager}} in earlier versions):
>  * There is an {{fsync(filePtr)}} call inside {{close()}}; however, 
> {{fsync()}} checks the {{stop}} flag (which is set inside {{close}}) and 
> returns immediately after {{flushOrWait()}} if the flag is set - this is very 
> confusing. After all, the {{close()}} itself explicitly calls {{force}} after 
> flush
>  * There is an ignored IO exception in mmap mode - this should be propagated 
> to the failure handler
>  * In WAL writer, we check for file CLOSE and then attemp to write to 
> (possibly) the same write handle - write should be always before close
>  * In WAL writer, there are racy reads of current handle - it would be better 
> if we read the current handle once and then operate on it during the whole 
> loop iteration



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12127) WAL writer may close file IO with unflushed changes when MMAP is disabled

2019-09-05 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16923412#comment-16923412
 ] 

Dmitriy Govorukhin commented on IGNITE-12127:
-

Merged to master a13337d94755d7e1cc097c6f00311552fea25ae6

> WAL writer may close file IO with unflushed changes when MMAP is disabled
> -
>
> Key: IGNITE-12127
> URL: https://issues.apache.org/jira/browse/IGNITE-12127
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Most likely the issue manifests itself as the following critical error:
> {code}
> 2019-08-27 14:52:31.286 ERROR 26835 --- [wal-write-worker%null-#447] ROOT : 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.StorageException: Failed to write 
> buffer.]]
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to write buffer.
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3444)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3249)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
> [ignite-core-2.5.7.jar!/:2.5.7]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> Caused by: java.nio.channels.ClosedChannelException: null
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) 
> ~[na:1.8.0_201]
> at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) 
> ~[na:1.8.0_201]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3437)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> ... 3 common frames omitted
> {code}
> It appears that there following sequence is possible:
>  * Thread A attempts to log a large record which does not fit segment, 
> {{addRecord}} fails and the thread A starts segment rollover. I successfully 
> runs {{flushOrWait(null)}} and gets de-scheduled before adding switch segment 
> record
>  * Thread B attempts to log another record, which fits exactly till the end 
> of the current segment. The record is added to the buffer
>  * Thread A resumes and fails to add the switch segment record. No flush is 
> performed and the thread immediately proceeds for wal-writer close
>  * WAL writer thread wakes up, sees that there is a CLOSE request, closes the 
> file IO and immediately proceeds to write unflushed changes causing the 
> exception.
> Unconditional flush after switch segment record write should fix the issue.
> Besides the bug itself, I suggest the following changes to the 
> {{FileWriteHandleImpl}} ({{FileWriteAheadLogManager}} in earlier versions):
>  * There is an {{fsync(filePtr)}} call inside {{close()}}; however, 
> {{fsync()}} checks the {{stop}} flag (which is set inside {{close}}) and 
> returns immediately after {{flushOrWait()}} if the flag is set - this is very 
> confusing. After all, the {{close()}} itself explicitly calls {{force}} after 
> flush
>  * There is an ignored IO exception in mmap mode - this should be propagated 
> to the failure handler
>  * In WAL writer, we check for file CLOSE and then attemp to write to 
> (possibly) the same write handle - write should be always before close
>  * In WAL writer, there are racy reads of current handle - it would be better 
> if we read the current handle once and then operate on it during the whole 
> loop iteration



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12127) WAL writer may close file IO with unflushed changes when MMAP is disabled

2019-09-04 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12127:

Ignite Flags:   (was: Docs Required,Release Notes Required)

> WAL writer may close file IO with unflushed changes when MMAP is disabled
> -
>
> Key: IGNITE-12127
> URL: https://issues.apache.org/jira/browse/IGNITE-12127
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Most likely the issue manifests itself as the following critical error:
> {code}
> 2019-08-27 14:52:31.286 ERROR 26835 --- [wal-write-worker%null-#447] ROOT : 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.StorageException: Failed to write 
> buffer.]]
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to write buffer.
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3444)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3249)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
> [ignite-core-2.5.7.jar!/:2.5.7]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> Caused by: java.nio.channels.ClosedChannelException: null
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) 
> ~[na:1.8.0_201]
> at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) 
> ~[na:1.8.0_201]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3437)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> ... 3 common frames omitted
> {code}
> It appears that there following sequence is possible:
>  * Thread A attempts to log a large record which does not fit segment, 
> {{addRecord}} fails and the thread A starts segment rollover. I successfully 
> runs {{flushOrWait(null)}} and gets de-scheduled before adding switch segment 
> record
>  * Thread B attempts to log another record, which fits exactly till the end 
> of the current segment. The record is added to the buffer
>  * Thread A resumes and fails to add the switch segment record. No flush is 
> performed and the thread immediately proceeds for wal-writer close
>  * WAL writer thread wakes up, sees that there is a CLOSE request, closes the 
> file IO and immediately proceeds to write unflushed changes causing the 
> exception.
> Unconditional flush after switch segment record write should fix the issue.
> Besides the bug itself, I suggest the following changes to the 
> {{FileWriteHandleImpl}} ({{FileWriteAheadLogManager}} in earlier versions):
>  * There is an {{fsync(filePtr)}} call inside {{close()}}; however, 
> {{fsync()}} checks the {{stop}} flag (which is set inside {{close}}) and 
> returns immediately after {{flushOrWait()}} if the flag is set - this is very 
> confusing. After all, the {{close()}} itself explicitly calls {{force}} after 
> flush
>  * There is an ignored IO exception in mmap mode - this should be propagated 
> to the failure handler
>  * In WAL writer, we check for file CLOSE and then attemp to write to 
> (possibly) the same write handle - write should be always before close
>  * In WAL writer, there are racy reads of current handle - it would be better 
> if we read the current handle once and then operate on it during the whole 
> loop iteration



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12121) Double checkpoint triggering due to incorrect place of update current checkpoint

2019-09-04 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12121:

Ignite Flags:   (was: Docs Required,Release Notes Required)

> Double checkpoint triggering due to incorrect place of update current 
> checkpoint
> 
>
> Key: IGNITE-12121
> URL: https://issues.apache.org/jira/browse/IGNITE-12121
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Double checkpoint triggering due to incorrect place of update current 
> checkpoint. This can lead to two ckeckpoints one by one if checkpoint trigger 
> was 'too many dirty pages'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12121) Double checkpoint triggering due to incorrect place of update current checkpoint

2019-09-04 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12121:

Fix Version/s: 2.8

> Double checkpoint triggering due to incorrect place of update current 
> checkpoint
> 
>
> Key: IGNITE-12121
> URL: https://issues.apache.org/jira/browse/IGNITE-12121
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Double checkpoint triggering due to incorrect place of update current 
> checkpoint. This can lead to two ckeckpoints one by one if checkpoint trigger 
> was 'too many dirty pages'.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-12128) Potentially pds corruption on a failed node during checkpoint

2019-08-30 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-12128:
---

Assignee: Anton Kalashnikov

> Potentially pds corruption on a failed node during checkpoint
> -
>
> Key: IGNITE-12128
> URL: https://issues.apache.org/jira/browse/IGNITE-12128
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Anton Kalashnikov
>Priority: Critical
> Fix For: 2.7.6
>
>
> There are the case when we start a checkpoint but not create CP file marker, 
> but PageMemory may start to flush dirty pages from checkpoint pages to page 
> store.  If node crashed at this moment, we can get inconsistency state, 
> because we still not write checkpoint marker to disk but already write some 
> pages for this checkpoint. If we try to recover from this state we cat get 
> any sort of corruption problem. Recovery logic may not recognize that crash 
> was during checkpoint because we did not write file marker when we start 
> checkpoint but write some pages for this checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12128) Potentially pds corruption on a failed node during checkpoint

2019-08-30 Thread Dmitriy Govorukhin (Jira)
Dmitriy Govorukhin created IGNITE-12128:
---

 Summary: Potentially pds corruption on a failed node during 
checkpoint
 Key: IGNITE-12128
 URL: https://issues.apache.org/jira/browse/IGNITE-12128
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin


There are the case when we start a checkpoint but not create CP file marker, 
but PageMemory may start to flush dirty pages from checkpoint pages to page 
store.  If node crashed at this moment, we can get inconsistency state, because 
we still not write checkpoint marker to disk but already write some pages for 
this checkpoint. If we try to recover from this state we cat get any sort of 
corruption problem. Recovery logic may not recognize that crash was during 
checkpoint because we did not write file marker when we start checkpoint but 
write some pages for this checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12128) Potentially pds corruption on a failed node during checkpoint

2019-08-30 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12128:

Fix Version/s: 2.7.6

> Potentially pds corruption on a failed node during checkpoint
> -
>
> Key: IGNITE-12128
> URL: https://issues.apache.org/jira/browse/IGNITE-12128
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>
> There are the case when we start a checkpoint but not create CP file marker, 
> but PageMemory may start to flush dirty pages from checkpoint pages to page 
> store.  If node crashed at this moment, we can get inconsistency state, 
> because we still not write checkpoint marker to disk but already write some 
> pages for this checkpoint. If we try to recover from this state we cat get 
> any sort of corruption problem. Recovery logic may not recognize that crash 
> was during checkpoint because we did not write file marker when we start 
> checkpoint but write some pages for this checkpoint.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12127) WAL writer may close file IO with unflushed changes when MMAP is disabled

2019-08-30 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12127:

Fix Version/s: 2.7.6

> WAL writer may close file IO with unflushed changes when MMAP is disabled
> -
>
> Key: IGNITE-12127
> URL: https://issues.apache.org/jira/browse/IGNITE-12127
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>
> Most likely the issue manifests itself as the following critical error:
> {code}
> 2019-08-27 14:52:31.286 ERROR 26835 --- [wal-write-worker%null-#447] ROOT : 
> Critical system error detected. Will be handled accordingly to configured 
> handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
> failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
> o.a.i.i.processors.cache.persistence.StorageException: Failed to write 
> buffer.]]
> org.apache.ignite.internal.processors.cache.persistence.StorageException: 
> Failed to write buffer.
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3444)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3249)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
> [ignite-core-2.5.7.jar!/:2.5.7]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
> Caused by: java.nio.channels.ClosedChannelException: null
> at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) 
> ~[na:1.8.0_201]
> at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) 
> ~[na:1.8.0_201]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
>  ~[ignite-core-2.5.7.jar!/:2.5.7]
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3437)
>  [ignite-core-2.5.7.jar!/:2.5.7]
> ... 3 common frames omitted
> {code}
> It appears that there following sequence is possible:
>  * Thread A attempts to log a large record which does not fit segment, 
> {{addRecord}} fails and the thread A starts segment rollover. I successfully 
> runs {{flushOrWait(null)}} and gets de-scheduled before adding switch segment 
> record
>  * Thread B attempts to log another record, which fits exactly till the end 
> of the current segment. The record is added to the buffer
>  * Thread A resumes and fails to add the switch segment record. No flush is 
> performed and the thread immediately proceeds for wal-writer close
>  * WAL writer thread wakes up, sees that there is a CLOSE request, closes the 
> file IO and immediately proceeds to write unflushed changes causing the 
> exception.
> Unconditional flush after switch segment record write should fix the issue.
> Besides the bug itself, I suggest the following changes to the 
> {{FileWriteHandleImpl}} ({{FileWriteAheadLogManager}} in earlier versions):
>  * There is an {{fsync(filePtr)}} call inside {{close()}}; however, 
> {{fsync()}} checks the {{stop}} flag (which is set inside {{close}}) and 
> returns immediately after {{flushOrWait()}} if the flag is set - this is very 
> confusing. After all, the {{close()}} itself explicitly calls {{force}} after 
> flush
>  * There is an ignored IO exception in mmap mode - this should be propagated 
> to the failure handler
>  * In WAL writer, we check for file CLOSE and then attemp to write to 
> (possibly) the same write handle - write should be always before close
>  * In WAL writer, there are racy reads of current handle - it would be better 
> if we read the current handle once and then operate on it during the whole 
> loop iteration



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12127) WAL writer may close file IO with unflushed changes when MMAP is disabled

2019-08-30 Thread Dmitriy Govorukhin (Jira)
Dmitriy Govorukhin created IGNITE-12127:
---

 Summary: WAL writer may close file IO with unflushed changes when 
MMAP is disabled
 Key: IGNITE-12127
 URL: https://issues.apache.org/jira/browse/IGNITE-12127
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin
Assignee: Dmitriy Govorukhin


Most likely the issue manifests itself as the following critical error:
{code}
2019-08-27 14:52:31.286 ERROR 26835 --- [wal-write-worker%null-#447] ROOT : 
Critical system error detected. Will be handled accordingly to configured 
handler [hnd=class o.a.i.failure.StopNodeOrHaltFailureHandler, 
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.StorageException: Failed to write buffer.]]
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to write buffer.
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3444)
 [ignite-core-2.5.7.jar!/:2.5.7]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.body(FileWriteAheadLogManager.java:3249)
 [ignite-core-2.5.7.jar!/:2.5.7]
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) 
[ignite-core-2.5.7.jar!/:2.5.7]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_201]
Caused by: java.nio.channels.ClosedChannelException: null
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:110) 
~[na:1.8.0_201]
at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:253) 
~[na:1.8.0_201]
at 
org.apache.ignite.internal.processors.cache.persistence.file.RandomAccessFileIO.position(RandomAccessFileIO.java:48)
 ~[ignite-core-2.5.7.jar!/:2.5.7]
at 
org.apache.ignite.internal.processors.cache.persistence.file.FileIODecorator.position(FileIODecorator.java:41)
 ~[ignite-core-2.5.7.jar!/:2.5.7]
at 
org.apache.ignite.internal.processors.cache.persistence.file.AbstractFileIO.writeFully(AbstractFileIO.java:111)
 ~[ignite-core-2.5.7.jar!/:2.5.7]
at 
org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$WALWriter.writeBuffer(FileWriteAheadLogManager.java:3437)
 [ignite-core-2.5.7.jar!/:2.5.7]
... 3 common frames omitted
{code}

It appears that there following sequence is possible:
 * Thread A attempts to log a large record which does not fit segment, 
{{addRecord}} fails and the thread A starts segment rollover. I successfully 
runs {{flushOrWait(null)}} and gets de-scheduled before adding switch segment 
record
 * Thread B attempts to log another record, which fits exactly till the end of 
the current segment. The record is added to the buffer
 * Thread A resumes and fails to add the switch segment record. No flush is 
performed and the thread immediately proceeds for wal-writer close
 * WAL writer thread wakes up, sees that there is a CLOSE request, closes the 
file IO and immediately proceeds to write unflushed changes causing the 
exception.

Unconditional flush after switch segment record write should fix the issue.

Besides the bug itself, I suggest the following changes to the 
{{FileWriteHandleImpl}} ({{FileWriteAheadLogManager}} in earlier versions):
 * There is an {{fsync(filePtr)}} call inside {{close()}}; however, {{fsync()}} 
checks the {{stop}} flag (which is set inside {{close}}) and returns 
immediately after {{flushOrWait()}} if the flag is set - this is very 
confusing. After all, the {{close()}} itself explicitly calls {{force}} after 
flush
 * There is an ignored IO exception in mmap mode - this should be propagated to 
the failure handler
 * In WAL writer, we check for file CLOSE and then attemp to write to 
(possibly) the same write handle - write should be always before close
 * In WAL writer, there are racy reads of current handle - it would be better 
if we read the current handle once and then operate on it during the whole loop 
iteration



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12116) Cache doesn't support array as key

2019-08-29 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12116:

Ignite Flags: Release Notes Required  (was: Docs Required,Release Notes 
Required)

> Cache doesn't support array as key
> --
>
> Key: IGNITE-12116
> URL: https://issues.apache.org/jira/browse/IGNITE-12116
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The ignite cache doesn't support array as key. You couldn't do the base 
> operations with it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12116) Cache doesn't support array as key

2019-08-29 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12116:

Fix Version/s: 2.8

> Cache doesn't support array as key
> --
>
> Key: IGNITE-12116
> URL: https://issues.apache.org/jira/browse/IGNITE-12116
> Project: Ignite
>  Issue Type: Improvement
>  Components: cache
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The ignite cache doesn't support array as key. You couldn't do the base 
> operations with it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-12110) Bugs & tests fixes

2019-08-27 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-12110:
---

Assignee: Dmitriy Govorukhin

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-12110
> URL: https://issues.apache.org/jira/browse/IGNITE-12110
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Fixed test:
> - *.testMassiveServersShutdown2
> - Fixed isStopped flag was't set during disconnect
> - Don't allow "NodeStoppingException" to fail checkpoint thread on stopping
> Test improvements:
> - Add page lock tracker and failure handler for all BPlustTree tests



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12110) Bugs & tests fixes

2019-08-27 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12110:

Description: 
Fixed test:
- *.testMassiveServersShutdown2
- Fixed isStopped flag was't set during disconnect
- Don't allow "NodeStoppingException" to fail checkpoint thread on stopping

Test improvements:
- Add page lock tracker and failure handler for all BPlustTree tests

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-12110
> URL: https://issues.apache.org/jira/browse/IGNITE-12110
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
>
> Fixed test:
> - *.testMassiveServersShutdown2
> - Fixed isStopped flag was't set during disconnect
> - Don't allow "NodeStoppingException" to fail checkpoint thread on stopping
> Test improvements:
> - Add page lock tracker and failure handler for all BPlustTree tests



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12110) Bugs & tests fixes

2019-08-27 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12110:

Fix Version/s: 2.8

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-12110
> URL: https://issues.apache.org/jira/browse/IGNITE-12110
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Fixed test:
> - *.testMassiveServersShutdown2
> - Fixed isStopped flag was't set during disconnect
> - Don't allow "NodeStoppingException" to fail checkpoint thread on stopping
> Test improvements:
> - Add page lock tracker and failure handler for all BPlustTree tests



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12110) Bugs & tests fixes

2019-08-27 Thread Dmitriy Govorukhin (Jira)
Dmitriy Govorukhin created IGNITE-12110:
---

 Summary:  Bugs & tests fixes
 Key: IGNITE-12110
 URL: https://issues.apache.org/jira/browse/IGNITE-12110
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-27 Thread Dmitriy Govorukhin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916558#comment-16916558
 ] 

Dmitriy Govorukhin commented on IGNITE-12102:
-

[~akalashnikov] Looks good to me, merged to master. Thanks for the contribution!

> idle_verify should show info about lost partitions
> --
>
> Key: IGNITE-12102
> URL: https://issues.apache.org/jira/browse/IGNITE-12102
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Govorukhin
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In the current implementation, idle_verify do not show lost partitions, and 
> check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-26 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-12102:
---

Assignee: Dmitriy Govorukhin

> idle_verify should show info about lost partitions
> --
>
> Key: IGNITE-12102
> URL: https://issues.apache.org/jira/browse/IGNITE-12102
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In the current implementation, idle_verify do not show lost partitions, and 
> check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-26 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-12102:
---

Assignee: Anton Kalashnikov  (was: Dmitriy Govorukhin)

> idle_verify should show info about lost partitions
> --
>
> Key: IGNITE-12102
> URL: https://issues.apache.org/jira/browse/IGNITE-12102
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Govorukhin
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>
> In the current implementation, idle_verify do not show lost partitions, and 
> check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-26 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12102:

Fix Version/s: 2.8

> idle_verify should show info about lost partitions
> --
>
> Key: IGNITE-12102
> URL: https://issues.apache.org/jira/browse/IGNITE-12102
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In the current implementation, idle_verify do not show lost partitions, and 
> check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-26 Thread Dmitriy Govorukhin (Jira)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12102:

Ignite Flags:   (was: Docs Required)

> idle_verify should show info about lost partitions
> --
>
> Key: IGNITE-12102
> URL: https://issues.apache.org/jira/browse/IGNITE-12102
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Dmitriy Govorukhin
>Priority: Major
>
> In the current implementation, idle_verify do not show lost partitions, and 
> check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IGNITE-12102) idle_verify should show info about lost partitions

2019-08-26 Thread Dmitriy Govorukhin (Jira)
Dmitriy Govorukhin created IGNITE-12102:
---

 Summary: idle_verify should show info about lost partitions
 Key: IGNITE-12102
 URL: https://issues.apache.org/jira/browse/IGNITE-12102
 Project: Ignite
  Issue Type: Improvement
Reporter: Dmitriy Govorukhin


In the current implementation, idle_verify do not show lost partitions, and 
check shows that everything is fine but it is not true.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (IGNITE-12081) Page replacement can reload invalid page during checkpoint

2019-08-17 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16909794#comment-16909794
 ] 

Dmitriy Govorukhin commented on IGNITE-12081:
-

[~dpavlov] Please, review my changes.

> Page replacement can reload invalid page during checkpoint
> --
>
> Key: IGNITE-12081
> URL: https://issues.apache.org/jira/browse/IGNITE-12081
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> The attached unit test demonstrates the issue. It is likely that all 
> baselines are affected starting from 2.4
> As a part of this ticket, we must add more unit-tests for checkpointing 
> protocol invariants we rely on.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12081) Page replacement can reload invalid page during checkpoint

2019-08-17 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12081:

Reviewer: Dmitriy Pavlov

> Page replacement can reload invalid page during checkpoint
> --
>
> Key: IGNITE-12081
> URL: https://issues.apache.org/jira/browse/IGNITE-12081
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> The attached unit test demonstrates the issue. It is likely that all 
> baselines are affected starting from 2.4
> As a part of this ticket, we must add more unit-tests for checkpointing 
> protocol invariants we rely on.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-10808) Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage

2019-08-16 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-10808:

Reviewer: Sergey Chugunov  (was: Alexey Goncharuk)

> Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage
> --
>
> Key: IGNITE-10808
> URL: https://issues.apache.org/jira/browse/IGNITE-10808
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Stanislav Lukyanov
>Assignee: Denis Mekhanikov
>Priority: Major
>  Labels: discovery
> Fix For: 2.8
>
> Attachments: IgniteMetricsOverflowTest.java
>
>
> A node receives a new metrics update message every `metricsUpdateFrequency` 
> milliseconds, and the message will be put at the top of the queue (because it 
> is a high priority message).
> If processing one message takes more than `metricsUpdateFrequency` then 
> multiple `TcpDiscoveryMetricsUpdateMessage` will be in the queue. A long 
> enough delay (e.g. caused by a network glitch or GC) may lead to the queue 
> building up tens of metrics update messages which are essentially useless to 
> be processed. Finally, if processing a message on average takes a little more 
> than `metricsUpdateFrequency` (even for a relatively short period of time, 
> say, for a minute due to network issues) then the message worker will end up 
> processing only the metrics updates and the cluster will essentially hang.
> Reproducer is attached. In the test, the queue first builds up and then very 
> slowly being teared down, causing "Failed to wait for PME" messages.
> Need to change ServerImpl's SocketReader not to put another metrics update 
> message to the top of the queue if it already has one (or replace the one at 
> the top with new one).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12081) Page replacement can reload invalid page during checkpoint

2019-08-16 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12081:

Fix Version/s: 2.7.6

> Page replacement can reload invalid page during checkpoint
> --
>
> Key: IGNITE-12081
> URL: https://issues.apache.org/jira/browse/IGNITE-12081
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.7.6
>
>
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> The attached unit test demonstrates the issue. It is likely that all 
> baselines are affected starting from 2.4
> As a part of this ticket, we must add more unit-tests for checkpointing 
> protocol invariants we rely on.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12081) Page replacement can reload invalid page during checkpoint

2019-08-16 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-12081:
---

 Summary: Page replacement can reload invalid page during checkpoint
 Key: IGNITE-12081
 URL: https://issues.apache.org/jira/browse/IGNITE-12081
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin
Assignee: Dmitriy Govorukhin


There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

The attached unit test demonstrates the issue. It is likely that all baselines 
are affected starting from 2.4

As a part of this ticket, we must add more unit-tests for checkpointing 
protocol invariants we rely on.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-12057) Persistence files are stored to temp dir

2019-08-16 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16908994#comment-16908994
 ] 

Dmitriy Govorukhin commented on IGNITE-12057:
-

Merged to master and ignite-2.7.6 

> Persistence files are stored to temp dir
> 
>
> Key: IGNITE-12057
> URL: https://issues.apache.org/jira/browse/IGNITE-12057
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Anton Kalashnikov
>Priority: Critical
> Fix For: 2.7.6
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> h2. Description
> Check this thread:
> [https://stackoverflow.com/questions/56951913/ignite-persistent-schema-tables-disappeared-sometimes/56977212#56977212]
> This prospect almost dropped us because the company could figure out why 
> persistence files disappear upon restarts. They turned off WARN logging level 
> and could see our warning saying that the files are written to such a 
> directory.
> I've updated Ignite docs:
> [https://apacheignite.readme.io/docs/distributed-persistent-store#section-persistence-path-management]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-10808) Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage

2019-08-14 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-10808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16907346#comment-16907346
 ] 

Dmitriy Govorukhin commented on IGNITE-10808:
-

[~sergey-chugunov] Could you please to help with the review?

> Discovery message queue may build up with TcpDiscoveryMetricsUpdateMessage
> --
>
> Key: IGNITE-10808
> URL: https://issues.apache.org/jira/browse/IGNITE-10808
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.7
>Reporter: Stanislav Lukyanov
>Assignee: Denis Mekhanikov
>Priority: Major
>  Labels: discovery
> Fix For: 2.8
>
> Attachments: IgniteMetricsOverflowTest.java
>
>
> A node receives a new metrics update message every `metricsUpdateFrequency` 
> milliseconds, and the message will be put at the top of the queue (because it 
> is a high priority message).
> If processing one message takes more than `metricsUpdateFrequency` then 
> multiple `TcpDiscoveryMetricsUpdateMessage` will be in the queue. A long 
> enough delay (e.g. caused by a network glitch or GC) may lead to the queue 
> building up tens of metrics update messages which are essentially useless to 
> be processed. Finally, if processing a message on average takes a little more 
> than `metricsUpdateFrequency` (even for a relatively short period of time, 
> say, for a minute due to network issues) then the message worker will end up 
> processing only the metrics updates and the cluster will essentially hang.
> Reproducer is attached. In the test, the queue first builds up and then very 
> slowly being teared down, causing "Failed to wait for PME" messages.
> Need to change ServerImpl's SocketReader not to put another metrics update 
> message to the top of the queue if it already has one (or replace the one at 
> the top with new one).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-5714) Implementation of suspend/resume for pessimistic transactions

2019-08-13 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-5714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905979#comment-16905979
 ] 

Dmitriy Govorukhin commented on IGNITE-5714:


[~alex_pl] Why issue resolve without fix version?

> Implementation of suspend/resume for pessimistic transactions
> -
>
> Key: IGNITE-5714
> URL: https://issues.apache.org/jira/browse/IGNITE-5714
> Project: Ignite
>  Issue Type: Sub-task
>  Components: general
>Reporter: Alexey Kuznetsov
>Assignee: Aleksey Plekhanov
>Priority: Major
>  Labels: iep-34
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> Support transaction suspend()\resume() operations for pessimistic 
> transactions. Resume can be called in another thread.
>_+But there is a problem+_: Imagine, we started pessimistic transaction in 
> thread T1 and then perform put operation, which leads to sending 
> GridDistributedLockRequest to another node. Lock request contains thread id 
> of the transaction. Then we call suspend, resume in another thread and we 
> also must send messages to other nodes to change thread id. 
> It seems complicated task.It’s better to get rid of sending thread id to the 
> nodes.
> We can use transaction xid on other nodes instead of thread id. Xid is sent 
> to nodes in GridDistributedLockRequest#nearXidVer
>_+Proposed solution+_ : On remote nodes instead of thread id of near 
> transaction GridDistributedLockRequest#threadId use its xid 
> GridDistributedLockRequest#nearXidVer.
> Remove usages of near transaction's thread id on remote nodes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905113#comment-16905113
 ] 

Dmitriy Govorukhin commented on IGNITE-12060:
-

Merged to master.

 

> Incorrect row size calculation, lead to tree corruption
> ---
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not. Size 
> dependent on shared group or not.
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12060:

Summary: Incorrect row size calculation, lead to tree corruption  (was: 
Incorrect row size calculation, lead to tree corruption.)

> Incorrect row size calculation, lead to tree corruption
> ---
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not. Size 
> dependent on shared group or not.
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption.

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12060:

Description: 
We do not correctly calculate old row size and new row size for check in-place 
update. One of them may include cacheId but other not. Size dependent on shared 
group or not.

org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow

  was:
We do not correctly calculate old row size and new row size for check in-place 
update. One of them may include cacheId but other not.

org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow


> Incorrect row size calculation, lead to tree corruption.
> 
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not. Size 
> dependent on shared group or not.
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption.

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12060:

Ignite Flags:   (was: Docs Required)

> Incorrect row size calculation, lead to tree corruption.
> 
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption.

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12060:

Description: 
We do not correctly calculate old row size and new row size for check in-place 
update. One of them may include cacheId but other not.

org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow

  was:We do not correctly calculate old row size and new row size for check 
in-place update. One of them may include cacheId but other not.


> Incorrect row size calculation, lead to tree corruption.
> 
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not.
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.CacheDataStoreImpl#canUpdateOldRow



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption.

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12060:

Description: We do not correctly calculate old row size and new row size 
for check in-place update. One of them may include cacheId but other not.

> Incorrect row size calculation, lead to tree corruption.
> 
>
> Key: IGNITE-12060
> URL: https://issues.apache.org/jira/browse/IGNITE-12060
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> We do not correctly calculate old row size and new row size for check 
> in-place update. One of them may include cacheId but other not.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12060) Incorrect row size calculation, lead to tree corruption.

2019-08-12 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-12060:
---

 Summary: Incorrect row size calculation, lead to tree corruption.
 Key: IGNITE-12060
 URL: https://issues.apache.org/jira/browse/IGNITE-12060
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin
Assignee: Dmitriy Govorukhin
 Fix For: 2.8






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12059) DiskPageCompressionConfigValidationTest.testIncorrectStaticCacheConfiguration fails

2019-08-12 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12059:

Fix Version/s: 2.8

> DiskPageCompressionConfigValidationTest.testIncorrectStaticCacheConfiguration 
> fails
> ---
>
> Key: IGNITE-12059
> URL: https://issues.apache.org/jira/browse/IGNITE-12059
> Project: Ignite
>  Issue Type: Bug
>Reporter: Eduard Shangareev
>Assignee: Eduard Shangareev
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> DiskPageCompressionConfigValidationTest.testIncorrectStaticCacheConfiguration 
> fails because validation was removed in IGNITE-9562.
> Need to restore this validation.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12057) Persistence files are stored to temp dir

2019-08-10 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-12057:
---

 Summary: Persistence files are stored to temp dir
 Key: IGNITE-12057
 URL: https://issues.apache.org/jira/browse/IGNITE-12057
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin


h2. Description
Check this thread:
[https://stackoverflow.com/questions/56951913/ignite-persistent-schema-tables-disappeared-sometimes/56977212#56977212]

This prospect almost dropped us because the company could figure out why 
persistence files disappear upon restarts. They turned off WARN logging level 
and could see our warning saying that the files are written to such a directory.

I've updated Ignite docs:
[https://apacheignite.readme.io/docs/distributed-persistent-store#section-persistence-path-management]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12057) Persistence files are stored to temp dir

2019-08-10 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12057:

Fix Version/s: 2.8

> Persistence files are stored to temp dir
> 
>
> Key: IGNITE-12057
> URL: https://issues.apache.org/jira/browse/IGNITE-12057
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Critical
> Fix For: 2.8
>
>
> h2. Description
> Check this thread:
> [https://stackoverflow.com/questions/56951913/ignite-persistent-schema-tables-disappeared-sometimes/56977212#56977212]
> This prospect almost dropped us because the company could figure out why 
> persistence files disappear upon restarts. They turned off WARN logging level 
> and could see our warning saying that the files are written to such a 
> directory.
> I've updated Ignite docs:
> [https://apacheignite.readme.io/docs/distributed-persistent-store#section-persistence-path-management]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-12048) Bugs & tests fixes

2019-08-08 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902938#comment-16902938
 ] 

Dmitriy Govorukhin commented on IGNITE-12048:
-

Looks like it is the same test as in master [link to 
test|[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-1194594567102543889=testDetails]].
 Now name generated by parameters.

> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart
> Reduce test time execution (scale factor for a long-running tests)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (IGNITE-12048) Bugs & tests fixes

2019-08-08 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902938#comment-16902938
 ] 

Dmitriy Govorukhin edited comment on IGNITE-12048 at 8/8/19 12:50 PM:
--

Looks like it is the same test as in master 
[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-1194594567102543889=testDetails].
 Now name generated by parameters.


was (Author: dmitriygovorukhin):
Looks like it is the same test as in master [link to 
test|[https://ci.ignite.apache.org/project.html?projectId=IgniteTests24Java8=-1194594567102543889=testDetails]].
 Now name generated by parameters.

> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart
> Reduce test time execution (scale factor for a long-running tests)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12048:

Description: 
Page replacement can reload invalid page during checkpoint

There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

checkpointReadLock() may hang during node stop

I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
 The following code hang:
{code:java}
checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
.getUninterruptibly();
{code}
It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
stopped and {{cpBeginFut}} will be never completed.

Fixed 
ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1

Fixed  *.testFailAfterStart

Reduce test time execution (scale factor for a long-running tests)

  was:
Page replacement can reload invalid page during checkpoint

There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

checkpointReadLock() may hang during node stop

I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
 The following code hang:
{code:java}
checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
.getUninterruptibly();
{code}
It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
stopped and {{cpBeginFut}} will be never completed.

Fixed 
ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1

Fixed  *.testFailAfterStart


> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart
> Reduce test time execution (scale factor for a long-running tests)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12048:

Description: 
Page replacement can reload invalid page during checkpoint

There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

checkpointReadLock() may hang during node stop

I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
 The following code hang:
{code:java}
checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
.getUninterruptibly();
{code}
It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
stopped and {{cpBeginFut}} will be never completed.

Fixed 
ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1

Fixed  *.testFailAfterStart

  was:
Page replacement can reload invalid page during checkpoint

There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

-

checkpointReadLock() may hang during node stop

I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
The following code hang:
{code:java}
checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
.getUninterruptibly();
{code}
It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
stopped and {{cpBeginFut}} will be never completed.

-

Fixed 
ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1

Fixed  *.testFailAfterStart


> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-12048:
---

 Summary: Bugs & tests fixes
 Key: IGNITE-12048
 URL: https://issues.apache.org/jira/browse/IGNITE-12048
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin


Page replacement can reload invalid page during checkpoint

There is a race between {{writeCheckpointPages}} and page replacement process:
 * Checkpointer thread begins a checkpoint
 * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
content *and clear dirty flag*
 * Page replacement tries to find a page for replacement and chooses this page, 
the page is thrown away
 * Before the page is written back to the store, the page is acquired again.

As a result, an older copy of the page is brought back to memory, which causes 
all kinds of corruption exceptions and assertions.

-

checkpointReadLock() may hang during node stop

I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
The following code hang:
{code:java}
checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
.getUninterruptibly();
{code}
It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
stopped and {{cpBeginFut}} will be never completed.

-

Fixed 
ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1

Fixed  *.testFailAfterStart



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12048:

Fix Version/s: 2.8

> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-12048:
---

Assignee: Dmitriy Govorukhin

> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-12048) Bugs & tests fixes

2019-08-07 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-12048:

Ignite Flags:   (was: Docs Required)

> Bugs & tests fixes
> --
>
> Key: IGNITE-12048
> URL: https://issues.apache.org/jira/browse/IGNITE-12048
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Page replacement can reload invalid page during checkpoint
> There is a race between {{writeCheckpointPages}} and page replacement process:
>  * Checkpointer thread begins a checkpoint
>  * Checkpointer thread calls {{getPageForCheckpoint()}}, which will copy page 
> content *and clear dirty flag*
>  * Page replacement tries to find a page for replacement and chooses this 
> page, the page is thrown away
>  * Before the page is written back to the store, the page is acquired again.
> As a result, an older copy of the page is brought back to memory, which 
> causes all kinds of corruption exceptions and assertions.
> 
> checkpointReadLock() may hang during node stop
> I got this hang during one of PDS (Indexing) runs (thread-dump is attached). 
>  The following code hang:
> {code:java}
> checkpointer.wakeupForCheckpoint(0, "too many dirty pages").cpBeginFut
> .getUninterruptibly();
> {code}
> It looks like {{wakeupForCheckpoint}} can be called after the checkpointer is 
> stopped and {{cpBeginFut}} will be never completed.
> 
> Fixed 
> ZookeeperDiscoveryCommunicationFailureTest.testCommunicationFailureResolve_CachesInfo1
> Fixed  *.testFailAfterStart



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-12006) Threads may be parked for indefinite time during throttling after spurious wakeups

2019-07-26 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893692#comment-16893692
 ] 

Dmitriy Govorukhin commented on IGNITE-12006:
-

Other changes look good.

> Threads may be parked for indefinite time during throttling after spurious 
> wakeups
> --
>
> Key: IGNITE-12006
> URL: https://issues.apache.org/jira/browse/IGNITE-12006
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the log we see the following behavior:
> {noformat}
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#328%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#328%NODE%xyzGridNodeName% for timeout(ms)=16335
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#326%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#326%NODE%xyzGridNodeName% for timeout(ms)=13438
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#277%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#277%NODE%xyzGridNodeName% for timeout(ms)=11609
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=18009
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=15557
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#307%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#307%NODE%xyzGridNodeName% for timeout(ms)=27938
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#316%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#316%NODE%xyzGridNodeName% for timeout(ms)=12189
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#311%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#311%NODE%xyzGridNodeName% for timeout(ms)=11056
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#295%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#295%NODE%xyzGridNodeName% for timeout(ms)=20848
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#290%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#290%NODE%xyzGridNodeName% for timeout(ms)=14816
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#332%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#332%NODE%xyzGridNodeName% for timeout(ms)=14110
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#298%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#298%NODE%xyzGridNodeName% for timeout(ms)=10028
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#304%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#304%NODE%xyzGridNodeName% for timeout(ms)=19855
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=41277
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#291%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#291%NODE%xyzGridNodeName% for timeout(ms)=17151
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#308%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#308%NODE%xyzGridNodeName% for timeout(ms)=39312
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#322%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#322%NODE%xyzGridNodeName% for timeout(ms)=43341
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#306%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#306%NODE%xyzGridNodeName% for timeout(ms)=21890
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#315%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#315%NODE%xyzGridNodeName% for timeout(ms)=18909
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=74129
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#305%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#305%NODE%xyzGridNodeName% for timeout(ms)=26608
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#309%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#309%NODE%xyzGridNodeName% for timeout(ms)=77835
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#291%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#291%NODE%xyzGridNodeName% for timeout(ms)=90104
> 2019-07-04 06:29:03.650[WARN 
> 

[jira] [Commented] (IGNITE-12006) Threads may be parked for indefinite time during throttling after spurious wakeups

2019-07-26 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16893691#comment-16893691
 ] 

Dmitriy Govorukhin commented on IGNITE-12006:
-

[~antonovsergey93] Looks like unused an excpetion in 
org.apache.ignite.internal.processors.cache.persistence.pagemem.IgniteThrottlingUnitTest#wakeupThrottledThread
 method throws, also code formating issued in first loop.

> Threads may be parked for indefinite time during throttling after spurious 
> wakeups
> --
>
> Key: IGNITE-12006
> URL: https://issues.apache.org/jira/browse/IGNITE-12006
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In the log we see the following behavior:
> {noformat}
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#328%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#328%NODE%xyzGridNodeName% for timeout(ms)=16335
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#326%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#326%NODE%xyzGridNodeName% for timeout(ms)=13438
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#277%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#277%NODE%xyzGridNodeName% for timeout(ms)=11609
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=18009
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=15557
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#307%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#307%NODE%xyzGridNodeName% for timeout(ms)=27938
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#316%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#316%NODE%xyzGridNodeName% for timeout(ms)=12189
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#311%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#311%NODE%xyzGridNodeName% for timeout(ms)=11056
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#295%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#295%NODE%xyzGridNodeName% for timeout(ms)=20848
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#290%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#290%NODE%xyzGridNodeName% for timeout(ms)=14816
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#332%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#332%NODE%xyzGridNodeName% for timeout(ms)=14110
> 2019-07-04 06:29:03.649[WARN 
> ][sys-#298%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#298%NODE%xyzGridNodeName% for timeout(ms)=10028
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#304%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#304%NODE%xyzGridNodeName% for timeout(ms)=19855
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#331%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#331%NODE%xyzGridNodeName% for timeout(ms)=41277
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#291%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#291%NODE%xyzGridNodeName% for timeout(ms)=17151
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#308%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#308%NODE%xyzGridNodeName% for timeout(ms)=39312
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#322%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#322%NODE%xyzGridNodeName% for timeout(ms)=43341
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#306%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#306%NODE%xyzGridNodeName% for timeout(ms)=21890
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#315%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#315%NODE%xyzGridNodeName% for timeout(ms)=18909
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#321%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#321%NODE%xyzGridNodeName% for timeout(ms)=74129
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#305%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#305%NODE%xyzGridNodeName% for timeout(ms)=26608
> 2019-07-04 06:29:03.650[WARN 
> ][sys-#309%NODE%xyzGridNodeName%][o.a.i.i.p.c.p.pagemem.PageMemoryImpl] 
> Parking thread=sys-#309%NODE%xyzGridNodeName% for timeout(ms)=77835
> 2019-07-04 06:29:03.650[WARN 
> 

[jira] [Commented] (IGNITE-11998) Fix DataPageScan for fragmented pages.

2019-07-20 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889420#comment-16889420
 ] 

Dmitriy Govorukhin commented on IGNITE-11998:
-

Muted tests 5b2948bfc2d78eff04eb4d30425c8d18d6e6e26b

> Fix DataPageScan for fragmented pages.
> --
>
> Key: IGNITE-11998
> URL: https://issues.apache.org/jira/browse/IGNITE-11998
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>
> Fragmented pages crash JVM when accessed by DataPageScan scanner/query 
> optimized scanner. It happens when scanner accesses data in later chunk in 
> fragmented entry but treats it like the first one, expecting length of the 
> payload, which is absent and replaced with raw entry data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-11998) Fix DataPageScan for fragmented pages.

2019-07-20 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11998:

Priority: Blocker  (was: Major)

> Fix DataPageScan for fragmented pages.
> --
>
> Key: IGNITE-11998
> URL: https://issues.apache.org/jira/browse/IGNITE-11998
> Project: Ignite
>  Issue Type: Bug
>Reporter: Ivan Bessonov
>Priority: Blocker
> Fix For: 2.8
>
>
> Fragmented pages crash JVM when accessed by DataPageScan scanner/query 
> optimized scanner. It happens when scanner accesses data in later chunk in 
> fragmented entry but treats it like the first one, expecting length of the 
> payload, which is absent and replaced with raw entry data.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (IGNITE-11953) BTree corruption caused by byte array values

2019-07-17 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin resolved IGNITE-11953.
-
Resolution: Fixed

Fixed in IGNITE-11982

> BTree corruption caused by byte array values
> 
>
> Key: IGNITE-11953
> URL: https://issues.apache.org/jira/browse/IGNITE-11953
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In some cases for caches with cache group, we can get BTree corruption 
> exception.
> {code}
> 09:53:58,890][SEVERE][sys-stripe-10-#11][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=CustomFailureHandler 
> [ignoreCriticalErrors=false, disabled=false][StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0]], failureCtx=FailureContext [type=CRITICAL_ERROR, 
> err=class o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing 
> a transaction has produced runtime exception]]class 
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
> Committing a transaction has produced runtime exception
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:922)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:799)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:608)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:478)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:535)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1055)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:931)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:887)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:117)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:209)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:504)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=427, 
> val=Grkg1DUF3yQE6tC9Se50mi5w.T, hasValBytes=true], hash=1872857770, 
> cacheId=-420893003]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1811)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1620)
>   at 
> 

[jira] [Updated] (IGNITE-11982) Fix bugs of pds

2019-07-15 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11982:

Ignite Flags:   (was: Docs Required)

> Fix bugs of pds
> ---
>
> Key: IGNITE-11982
> URL: https://issues.apache.org/jira/browse/IGNITE-11982
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> fixed pds crash:
> * Fail during logical recovery
> * JVM crash in all compatibility LFS tests
> * WAL segments serialization problem
> * Unable to read last WAL record after crash during checkpoint
> * Node failed on detecting storage block size if page compression enabled on 
> many caches
> * Can not change baseline for in-memory cluster
> * SqlFieldsQuery DELETE FROM causes JVM crash
> * Fixed IgniteCheckedException: Compound exception for CountDownFuture.
> fixed tests:
> * WalCompactionAndPageCompressionTest
> * IgnitePdsRestartAfterFailedToWriteMetaPageTest.test
>  * GridPointInTimeRecoveryRebalanceTest.testRecoveryNotFailsIfWalSomewhereEnab
> * 
> IgniteClusterActivateDeactivateTest.testDeactivateSimple_5_Servers_5_Clients_Fro
> * IgniteCacheReplicatedQuerySelfTest.testNodeLeft 
> * .NET tests
> optimization:
> * Replace TcpDiscoveryNode to nodeId in TcpDiscoveryMessages
> * Failures to deserialize discovery data should be handled by a failure 
> handler
> * Optimize GridToStringBuilder



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (IGNITE-11982) Fix bugs of pds

2019-07-15 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11982:

Fix Version/s: 2.8

> Fix bugs of pds
> ---
>
> Key: IGNITE-11982
> URL: https://issues.apache.org/jira/browse/IGNITE-11982
> Project: Ignite
>  Issue Type: Bug
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> fixed pds crash:
> * Fail during logical recovery
> * JVM crash in all compatibility LFS tests
> * WAL segments serialization problem
> * Unable to read last WAL record after crash during checkpoint
> * Node failed on detecting storage block size if page compression enabled on 
> many caches
> * Can not change baseline for in-memory cluster
> * SqlFieldsQuery DELETE FROM causes JVM crash
> * Fixed IgniteCheckedException: Compound exception for CountDownFuture.
> fixed tests:
> * WalCompactionAndPageCompressionTest
> * IgnitePdsRestartAfterFailedToWriteMetaPageTest.test
>  * GridPointInTimeRecoveryRebalanceTest.testRecoveryNotFailsIfWalSomewhereEnab
> * 
> IgniteClusterActivateDeactivateTest.testDeactivateSimple_5_Servers_5_Clients_Fro
> * IgniteCacheReplicatedQuerySelfTest.testNodeLeft 
> * .NET tests
> optimization:
> * Replace TcpDiscoveryNode to nodeId in TcpDiscoveryMessages
> * Failures to deserialize discovery data should be handled by a failure 
> handler
> * Optimize GridToStringBuilder



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (IGNITE-11969) Incorrect DefaultConcurrencyLevel value in .net test

2019-07-09 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881565#comment-16881565
 ] 

Dmitriy Govorukhin commented on IGNITE-11969:
-

[~akalashnikov] LGTM, merged to master.

> Incorrect DefaultConcurrencyLevel value in .net test
> 
>
> Key: IGNITE-11969
> URL: https://issues.apache.org/jira/browse/IGNITE-11969
> Project: Ignite
>  Issue Type: Test
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Incorrect DefaultConcurrencyLevel value in .net test after default 
> configuration in java was changed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11969) Incorrect DefaultConcurrencyLevel value in .net test

2019-07-09 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11969:

Fix Version/s: 2.8

> Incorrect DefaultConcurrencyLevel value in .net test
> 
>
> Key: IGNITE-11969
> URL: https://issues.apache.org/jira/browse/IGNITE-11969
> Project: Ignite
>  Issue Type: Test
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>
> Incorrect DefaultConcurrencyLevel value in .net test after default 
> configuration in java was changed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11969) Incorrect DefaultConcurrencyLevel value in .net test

2019-07-09 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11969:

Ignite Flags:   (was: Docs Required)

> Incorrect DefaultConcurrencyLevel value in .net test
> 
>
> Key: IGNITE-11969
> URL: https://issues.apache.org/jira/browse/IGNITE-11969
> Project: Ignite
>  Issue Type: Test
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>
> Incorrect DefaultConcurrencyLevel value in .net test after default 
> configuration in java was changed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11952) Bug fixes and improvements in console utilities & test fixes

2019-07-08 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880685#comment-16880685
 ] 

Dmitriy Govorukhin commented on IGNITE-11952:
-

[~sergey-chugunov] Looks good to me, merged to master. Looks like .NET failed 
in master too, without your changes.

> Bug fixes and improvements in console utilities & test fixes
> 
>
> Key: IGNITE-11952
> URL: https://issues.apache.org/jira/browse/IGNITE-11952
> Project: Ignite
>  Issue Type: Bug
>Reporter: Sergey Chugunov
>Assignee: Sergey Chugunov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Contains the following fixes:
> * U.currentTimeMillis() must not be used for timeouts calculation in 
> Discovery and Communication SPI
> * control.sh --baseline shows incorrect information
> * control.sh print unclear message (secured cluster)
> * Client disconnect detection time linearly depends on number of network 
> interfaces of remote node
> * Various improvements in javadocs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11963) Remove ContinuousQueryDeserializationErrorOnNodeJoinTest

2019-07-04 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11963:

Fix Version/s: 2.8

> Remove ContinuousQueryDeserializationErrorOnNodeJoinTest
> 
>
> Key: IGNITE-11963
> URL: https://issues.apache.org/jira/browse/IGNITE-11963
> Project: Ignite
>  Issue Type: Test
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Test ContinuousQueryDeserializationErrorOnNodeJoinTest is invalid after 
> IGNITE-11914 and should be removed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11957) IdleVerify command should print end time of execution.

2019-07-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11957:

Fix Version/s: 2.8

> IdleVerify command should print end time of execution.
> --
>
> Key: IGNITE-11957
> URL: https://issues.apache.org/jira/browse/IGNITE-11957
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Andrey Kalinin
>Assignee: Andrey Kalinin
>Priority: Minor
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9934) Improve logging on partition map exchange

2019-07-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-9934:
---
Fix Version/s: 2.8

> Improve logging on partition map exchange
> -
>
> Key: IGNITE-9934
> URL: https://issues.apache.org/jira/browse/IGNITE-9934
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Andrey Kalinin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Partition Map Exchange (PME) is a cluster wide process, be the reason it does 
> not completed before then each node do not done its part of job.
> Coordinator, as a not witch managed the process, can to print quantity nodes 
> finished its stage of PME and other than, which not yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-9934) Improve logging on partition map exchange

2019-07-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-9934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-9934:
---
Ignite Flags:   (was: Docs Required)

> Improve logging on partition map exchange
> -
>
> Key: IGNITE-9934
> URL: https://issues.apache.org/jira/browse/IGNITE-9934
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Vladislav Pyatkov
>Assignee: Andrey Kalinin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Partition Map Exchange (PME) is a cluster wide process, be the reason it does 
> not completed before then each node do not done its part of job.
> Coordinator, as a not witch managed the process, can to print quantity nodes 
> finished its stage of PME and other than, which not yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11955) Fix control.sh issues related to IGNITE-11876 and IGNITE-11913

2019-07-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11955:

Fix Version/s: 2.8

> Fix control.sh issues related to IGNITE-11876 and IGNITE-11913
> --
>
> Key: IGNITE-11955
> URL: https://issues.apache.org/jira/browse/IGNITE-11955
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Chudov
>Assignee: Denis Chudov
>Priority: Major
> Fix For: 2.8
>
>
> Umbrella ticket for control.sh issues related to IGNITE-11876 and IGNITE-11913



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11874) Fix mismatch between idle_verify results with and without -dump option.

2019-07-02 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11874:

Fix Version/s: 2.8

> Fix mismatch between idle_verify results with and without -dump option.
> ---
>
> Key: IGNITE-11874
> URL: https://issues.apache.org/jira/browse/IGNITE-11874
> Project: Ignite
>  Issue Type: Bug
>Reporter: Denis Chudov
>Assignee: Denis Chudov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> In VerifyBackupPartitionsTaskV2 , when arg is not instance of 
> VisorIdleVerifyDumpTaskArg (i. e. idle_verify is launched without dump 
> option), the set of filtered caches contains all caches including system 
> ones, while by default system caches should be excluded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11953) BTree corruption caused by byte array values

2019-07-02 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-11953:
---

Assignee: Dmitriy Govorukhin

> BTree corruption caused by byte array values
> 
>
> Key: IGNITE-11953
> URL: https://issues.apache.org/jira/browse/IGNITE-11953
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In some cases for caches with cache group, we can get BTree corruption 
> exception.
> {code}
> 09:53:58,890][SEVERE][sys-stripe-10-#11][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=CustomFailureHandler 
> [ignoreCriticalErrors=false, disabled=false][StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0]], failureCtx=FailureContext [type=CRITICAL_ERROR, 
> err=class o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing 
> a transaction has produced runtime exception]]class 
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
> Committing a transaction has produced runtime exception
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:922)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:799)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:608)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:478)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:535)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1055)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:931)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:887)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:117)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:209)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:504)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=427, 
> val=Grkg1DUF3yQE6tC9Se50mi5w.T, hasValBytes=true], hash=1872857770, 
> cacheId=-420893003]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1811)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1620)
>   at 
> 

[jira] [Created] (IGNITE-11953) BTree corruption caused by byte array values

2019-07-02 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-11953:
---

 Summary: BTree corruption caused by byte array values
 Key: IGNITE-11953
 URL: https://issues.apache.org/jira/browse/IGNITE-11953
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin


In some cases for caches with cache group, we can get BTree corruption 
exception.

{code}
09:53:58,890][SEVERE][sys-stripe-10-#11][] Critical system error detected. Will 
be handled accordingly to configured handler [hnd=CustomFailureHandler 
[ignoreCriticalErrors=false, disabled=false][StopNodeOrHaltFailureHandler 
[tryStop=false, timeout=0]], failureCtx=FailureContext [type=CRITICAL_ERROR, 
err=class o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing a 
transaction has produced runtime exception]]class 
org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
Committing a transaction has produced runtime exception
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:922)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:799)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:608)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:478)
at 
org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:535)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1055)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:931)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:887)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:117)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:209)
at 
org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:207)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
at 
org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
at 
org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
at 
org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
at 
org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:504)
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
at java.lang.Thread.run(Thread.java:748)
Caused by: class 
org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
 Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=427, 
val=Grkg1DUF3yQE6tC9Se50mi5w.T, hasValBytes=true], hash=1872857770, 
cacheId=-420893003]
at 
org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1811)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1620)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1603)
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager$GridCacheDataStore.invoke(GridCacheOffheapManager.java:2131)
at 
org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.invoke(IgniteCacheOffheapManagerImpl.java:442)
at 

[jira] [Updated] (IGNITE-11953) BTree corruption caused by byte array values

2019-07-02 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11953:

Fix Version/s: 2.8

> BTree corruption caused by byte array values
> 
>
> Key: IGNITE-11953
> URL: https://issues.apache.org/jira/browse/IGNITE-11953
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In some cases for caches with cache group, we can get BTree corruption 
> exception.
> {code}
> 09:53:58,890][SEVERE][sys-stripe-10-#11][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=CustomFailureHandler 
> [ignoreCriticalErrors=false, disabled=false][StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0]], failureCtx=FailureContext [type=CRITICAL_ERROR, 
> err=class o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing 
> a transaction has produced runtime exception]]class 
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
> Committing a transaction has produced runtime exception
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:922)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:799)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:608)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:478)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:535)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1055)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:931)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:887)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:117)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:209)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:504)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=427, 
> val=Grkg1DUF3yQE6tC9Se50mi5w.T, hasValBytes=true], hash=1872857770, 
> cacheId=-420893003]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1811)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1620)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1603)
>   at 
> 

[jira] [Updated] (IGNITE-11953) BTree corruption caused by byte array values

2019-07-02 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11953:

Ignite Flags:   (was: Docs Required)

> BTree corruption caused by byte array values
> 
>
> Key: IGNITE-11953
> URL: https://issues.apache.org/jira/browse/IGNITE-11953
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> In some cases for caches with cache group, we can get BTree corruption 
> exception.
> {code}
> 09:53:58,890][SEVERE][sys-stripe-10-#11][] Critical system error detected. 
> Will be handled accordingly to configured handler [hnd=CustomFailureHandler 
> [ignoreCriticalErrors=false, disabled=false][StopNodeOrHaltFailureHandler 
> [tryStop=false, timeout=0]], failureCtx=FailureContext [type=CRITICAL_ERROR, 
> err=class o.a.i.i.transactions.IgniteTxHeuristicCheckedException: Committing 
> a transaction has produced runtime exception]]class 
> org.apache.ignite.internal.transactions.IgniteTxHeuristicCheckedException: 
> Committing a transaction has produced runtime exception
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxAdapter.heuristicException(IgniteTxAdapter.java:800)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxLocalAdapter.userCommit(IgniteTxLocalAdapter.java:922)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocalAdapter.localFinish(GridDhtTxLocalAdapter.java:799)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.localFinish(GridDhtTxLocal.java:608)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.finishTx(GridDhtTxLocal.java:478)
>   at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtTxLocal.commitDhtLocalAsync(GridDhtTxLocal.java:535)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finishDhtLocal(IgniteTxHandler.java:1055)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.finish(IgniteTxHandler.java:931)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.processNearTxFinishRequest(IgniteTxHandler.java:887)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler.access$200(IgniteTxHandler.java:117)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:209)
>   at 
> org.apache.ignite.internal.processors.cache.transactions.IgniteTxHandler$3.apply(IgniteTxHandler.java:207)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.processMessage(GridCacheIoManager.java:1129)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.onMessage0(GridCacheIoManager.java:594)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:393)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.handleMessage(GridCacheIoManager.java:319)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager.access$100(GridCacheIoManager.java:109)
>   at 
> org.apache.ignite.internal.processors.cache.GridCacheIoManager$1.onMessage(GridCacheIoManager.java:308)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.invokeListener(GridIoManager.java:1568)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.processRegularMessage0(GridIoManager.java:1196)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager.access$4200(GridIoManager.java:126)
>   at 
> org.apache.ignite.internal.managers.communication.GridIoManager$9.run(GridIoManager.java:1092)
>   at 
> org.apache.ignite.internal.util.StripedExecutor$Stripe.body(StripedExecutor.java:504)
>   at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:119)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: class 
> org.apache.ignite.internal.processors.cache.persistence.tree.CorruptedTreeException:
>  Runtime failure on search row: SearchRow [key=KeyCacheObjectImpl [part=427, 
> val=Grkg1DUF3yQE6tC9Se50mi5w.T, hasValBytes=true], hash=1872857770, 
> cacheId=-420893003]
>   at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.invoke(BPlusTree.java:1811)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke0(IgniteCacheOffheapManagerImpl.java:1620)
>   at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.invoke(IgniteCacheOffheapManagerImpl.java:1603)
>   at 
> 

[jira] [Updated] (IGNITE-11844) Should to filtered indexes by cache name instead of validate all caches in group

2019-07-01 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11844:

Fix Version/s: 2.8

> Should to filtered indexes by cache name instead of validate all caches in 
> group
> 
>
> Key: IGNITE-11844
> URL: https://issues.apache.org/jira/browse/IGNITE-11844
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladislav Pyatkov
>Assignee: Andrey Kalinin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> control.sh utility method validate_indexes checks all indexes of all caches 
> in group. Just do specify one caches (from generic group) in caches list, 
> then all indexes from all caches (that group) will be start to validate and 
> this can consume more time, than checks indexes only specified caches.
> Will be correct to validate only indexes of specified caches, for the purpose 
> need to filtered caches, by list from parameters, in shared group.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-3653) P2P doesn't work for remote filter and filter factory.

2019-07-01 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-3653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876009#comment-16876009
 ] 

Dmitriy Govorukhin commented on IGNITE-3653:


[~dmekhanikov] Looks good for me. 

> P2P doesn't work for remote filter and filter factory.
> --
>
> Key: IGNITE-3653
> URL: https://issues.apache.org/jira/browse/IGNITE-3653
> Project: Ignite
>  Issue Type: Bug
>  Components: cache
>Affects Versions: 1.6
>Reporter: Nikolai Tikhonov
>Assignee: Denis Mekhanikov
>Priority: Major
> Fix For: 2.8
>
> Attachments: CCP2PTest.patch
>
>
> Remote filter and filter factory classes were not deployed on nodes which 
> join to cluster after their initialization. Test attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11869) Rework control.sh tests structure

2019-06-27 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11869:

Fix Version/s: 2.8

> Rework control.sh tests structure
> -
>
> Key: IGNITE-11869
> URL: https://issues.apache.org/jira/browse/IGNITE-11869
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> GridCommandHandlerIndexingTest. extends GridCommandHandlerTest. It's bad 
> design. We should create common abstract test class and extend it in 
> GridCommandHandlerIndexingTest and GridCommandHandlerTest.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11878) Rebuild index skips MOVING partitions when historical re balance

2019-06-19 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11878:

Fix Version/s: 2.8

> Rebuild index skips MOVING partitions when historical re balance
> 
>
> Key: IGNITE-11878
> URL: https://issues.apache.org/jira/browse/IGNITE-11878
> Project: Ignite
>  Issue Type: Bug
>Affects Versions: 2.5, 2.6, 2.7
>Reporter: Stepachev Maksim
>Assignee: Stepachev Maksim
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Rebuild index skips MOVING partitions when historical rebalance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IGNITE-11934) Bugs & tests fixes

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-11934:
---

Assignee: Dmitriy Govorukhin

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-11934
> URL: https://issues.apache.org/jira/browse/IGNITE-11934
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> This issue contains fixes for several issues:
>  * AssertionError occurs on the client when coordinator killed (with ZK 
> discovery)
>  * IgniteVersionUtils#BUILD_TSTAMP_DATE_FORMATTER is used in a non 
> thread-safe manner.
>  * Possible discovery race on node joining with Authenticator.
>  * PageLocksCommand#parseArguments cannot properly parse arguments user, 
> password if its at the end of arguments list.
>  * Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop 
> failed on TC
>  * IgniteWalFlushBackgroundSelfTest.testFailWhileStart & 
> IgniteWalFlushLogOnlySelfTest.testFailWhileStart fail in disk compression 
> suite.
>  * IgniteClientConnectAfterCommunicationFailureTest fails
>  * Add scale factor for PageLockTrackerTests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11934) Bugs & tests fixes

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11934:

Ignite Flags:   (was: Docs Required)

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-11934
> URL: https://issues.apache.org/jira/browse/IGNITE-11934
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> This issue contains fixes for several issues:
>  * AssertionError occurs on the client when coordinator killed (with ZK 
> discovery)
>  * IgniteVersionUtils#BUILD_TSTAMP_DATE_FORMATTER is used in a non 
> thread-safe manner.
>  * Possible discovery race on node joining with Authenticator.
>  * PageLocksCommand#parseArguments cannot properly parse arguments user, 
> password if its at the end of arguments list.
>  * Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop 
> failed on TC
>  * IgniteWalFlushBackgroundSelfTest.testFailWhileStart & 
> IgniteWalFlushLogOnlySelfTest.testFailWhileStart fail in disk compression 
> suite.
>  * IgniteClientConnectAfterCommunicationFailureTest fails
>  * Add scale factor for PageLockTrackerTests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IGNITE-11934) Bugs & tests fixes

2019-06-18 Thread Dmitriy Govorukhin (JIRA)
Dmitriy Govorukhin created IGNITE-11934:
---

 Summary:  Bugs & tests fixes
 Key: IGNITE-11934
 URL: https://issues.apache.org/jira/browse/IGNITE-11934
 Project: Ignite
  Issue Type: Bug
Reporter: Dmitriy Govorukhin


This issue contains fixes for several issues:
 * AssertionError occurs on the client when coordinator killed (with ZK 
discovery)
 * IgniteVersionUtils#BUILD_TSTAMP_DATE_FORMATTER is used in a non thread-safe 
manner.
 * Possible discovery race on node joining with Authenticator.
 * PageLocksCommand#parseArguments cannot properly parse arguments user, 
password if its at the end of arguments list.
 * Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop 
failed on TC

 * IgniteWalFlushBackgroundSelfTest.testFailWhileStart & 
IgniteWalFlushLogOnlySelfTest.testFailWhileStart fail in disk compression suite.
 * IgniteClientConnectAfterCommunicationFailureTest fails
 * Add scale factor for PageLockTrackerTests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11934) Bugs & tests fixes

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11934:

Fix Version/s: 2.8

>  Bugs & tests fixes
> ---
>
> Key: IGNITE-11934
> URL: https://issues.apache.org/jira/browse/IGNITE-11934
> Project: Ignite
>  Issue Type: Bug
>Reporter: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> This issue contains fixes for several issues:
>  * AssertionError occurs on the client when coordinator killed (with ZK 
> discovery)
>  * IgniteVersionUtils#BUILD_TSTAMP_DATE_FORMATTER is used in a non 
> thread-safe manner.
>  * Possible discovery race on node joining with Authenticator.
>  * PageLocksCommand#parseArguments cannot properly parse arguments user, 
> password if its at the end of arguments list.
>  * Test CheckpointFreeListTest.testRestoreFreeListCorrectlyAfterRandomStop 
> failed on TC
>  * IgniteWalFlushBackgroundSelfTest.testFailWhileStart & 
> IgniteWalFlushLogOnlySelfTest.testFailWhileStart fail in disk compression 
> suite.
>  * IgniteClientConnectAfterCommunicationFailureTest fails
>  * Add scale factor for PageLockTrackerTests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11875) Thin client is unable to authenticate with long password

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11875:

Reviewer: Sergey Chugunov  (was: Dmitriy Govorukhin)

> Thin client is unable to authenticate with long password
> 
>
> Key: IGNITE-11875
> URL: https://issues.apache.org/jira/browse/IGNITE-11875
> Project: Ignite
>  Issue Type: Bug
>  Components: jdbc, odbc, thin client
>Affects Versions: 2.7
>Reporter: Igor Sapego
>Assignee: Igor Sapego
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Token authentication could use long usernames/passwords, that leads to 
> "Invalid handshake message" 
> ClientListenerNioServerBuffer:
> {code:java}
> if (cnt == msgSize) {
> byte[] data0 = data;
> reset();
> return data0;
> }
> else {
> if (checkHandshake && cnt > 0 && (msgSize > 
> ClientListenerNioListener.MAX_HANDSHAKE_MSG_SIZE
> || data[0] != ClientListenerRequest.HANDSHAKE))
> throw new IgniteCheckedException("Invalid handshake message");
> return null;
> }
> {code}
> The reproducer is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11931) Rewrite @WithSystemProperty handling using JUnit rules.

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16866359#comment-16866359
 ] 

Dmitriy Govorukhin commented on IGNITE-11931:
-

[~ibessonov] LGTM, merged to master.

> Rewrite @WithSystemProperty handling using JUnit rules.
> ---
>
> Key: IGNITE-11931
> URL: https://issues.apache.org/jira/browse/IGNITE-11931
> Project: Ignite
>  Issue Type: Test
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> @WithSystemProperty can only be used in classes that inherit 
> GridAbstractTest. This should be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11931) Rewrite @WithSystemProperty handling using JUnit rules.

2019-06-18 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11931:

Fix Version/s: (was: 3.0)
   2.8

> Rewrite @WithSystemProperty handling using JUnit rules.
> ---
>
> Key: IGNITE-11931
> URL: https://issues.apache.org/jira/browse/IGNITE-11931
> Project: Ignite
>  Issue Type: Test
>Reporter: Ivan Bessonov
>Assignee: Ivan Bessonov
>Priority: Major
>  Labels: MakeTeamcityGreenAgain
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> @WithSystemProperty can only be used in classes that inherit 
> GridAbstractTest. This should be changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11869) control.sh idle_verify/validate_indexes shouldn't throw GridNotIdleException, if user pages wasn't modified in checkpoint.

2019-06-11 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861397#comment-16861397
 ] 

Dmitriy Govorukhin commented on IGNITE-11869:
-

[~antonovsergey93] I reviewed your changes, and I have a serious concern. 
Absolutely incorrect store dirtyUserPagesPresent flag in PageMemory. 

PageMemory should not know anything that not related pages,  this flag sets 
True for not UTILITY_CACHE_GROUP_ID is it abstraction leak.

> control.sh idle_verify/validate_indexes shouldn't throw GridNotIdleException, 
> if user pages wasn't modified in checkpoint.
> --
>
> Key: IGNITE-11869
> URL: https://issues.apache.org/jira/browse/IGNITE-11869
> Project: Ignite
>  Issue Type: Improvement
>Affects Versions: 2.8
>Reporter: Sergey Antonov
>Assignee: Sergey Antonov
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We shouldn't throw GridNotIdleException, if checkpoint contains dirty pages 
> related to ignite-sys-cache (system background activities) only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-10281) Log to file all jars in classpath on start node.

2019-06-11 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-10281:

Ignite Flags:   (was: Docs Required)

> Log to file all jars in classpath on start node.
> 
>
> Key: IGNITE-10281
> URL: https://issues.apache.org/jira/browse/IGNITE-10281
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Sergey Antonov
>Assignee: Denis Chudov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should print all jars in classpath for analize jar's hell.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11892) Incorrect assert in wal scanner test

2019-06-05 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856723#comment-16856723
 ] 

Dmitriy Govorukhin commented on IGNITE-11892:
-

[~akalashnikov] Looks good to me, merged to master.

> Incorrect assert  in wal scanner test
> -
>
> Key: IGNITE-11892
> URL: https://issues.apache.org/jira/browse/IGNITE-11892
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci.ignite.apache.org/viewLog.html?buildId=4038516=IgniteTests24Java8_Pds2
> {noformat}
> junit.framework.AssertionFailedError: Next WAL record :: Record : PAGE_RECORD 
> - Unable to convert to string representation.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.scanner.WalScannerTest.shouldDumpToFileFoundRecord(WalScannerTest.java:254)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11892) Incorrect assert in wal scanner test

2019-06-05 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11892:

Ignite Flags:   (was: Docs Required)

> Incorrect assert  in wal scanner test
> -
>
> Key: IGNITE-11892
> URL: https://issues.apache.org/jira/browse/IGNITE-11892
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci.ignite.apache.org/viewLog.html?buildId=4038516=IgniteTests24Java8_Pds2
> {noformat}
> junit.framework.AssertionFailedError: Next WAL record :: Record : PAGE_RECORD 
> - Unable to convert to string representation.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.scanner.WalScannerTest.shouldDumpToFileFoundRecord(WalScannerTest.java:254)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11892) Incorrect assert in wal scanner test

2019-06-05 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11892:

Fix Version/s: 2.8

> Incorrect assert  in wal scanner test
> -
>
> Key: IGNITE-11892
> URL: https://issues.apache.org/jira/browse/IGNITE-11892
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Anton Kalashnikov
>Assignee: Anton Kalashnikov
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci.ignite.apache.org/viewLog.html?buildId=4038516=IgniteTests24Java8_Pds2
> {noformat}
> junit.framework.AssertionFailedError: Next WAL record :: Record : PAGE_RECORD 
> - Unable to convert to string representation.
>   at 
> org.apache.ignite.internal.processors.cache.persistence.wal.scanner.WalScannerTest.shouldDumpToFileFoundRecord(WalScannerTest.java:254)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IGNITE-11750) Implement locked pages info dump for long-running B+Tree operations

2019-06-05 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856559#comment-16856559
 ] 

Dmitriy Govorukhin commented on IGNITE-11750:
-

[~sergey-chugunov] Thanks for the review! Merged to master.

> Implement locked pages info dump for long-running B+Tree operations
> ---
>
> Key: IGNITE-11750
> URL: https://issues.apache.org/jira/browse/IGNITE-11750
> Project: Ignite
>  Issue Type: Improvement
>Reporter: Alexey Goncharuk
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I've stumbled upon an incident where a batch of Ignite threads were hanging 
> on BPlusTree operations trying to acquire read or write lock on pages. From 
> the thread dump it is impossible to check if there is an issue with 
> {{OffheapReadWriteLock}} or there is a subtle deadlock in the tree.
> I suggest we implement a timeout for page lock acquire and tracking of locked 
> pages. This should be relatively easy to implement in {{PageHandler}} (the 
> only thing to consider is performance degradation). If a timeout occurs, we 
> should print all the locks currently owned by a thread. This way we should be 
> able to determine if there is a deadlock in the {{BPlusTree}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (IGNITE-11835) Support JMX/control.sh API for page lock dump

2019-06-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11835:

Description: 
Support JMX/control.sh API for page lock dump

JMX
{code}
public interface PageLockMXBean  {
String dumpLocks();

void dumpLocksToLog();

String dumpLocksToFile();

String dumpLocksToFile(String path);
}
{code}

control.sh
{code}
--diagnostic pageLocks dump [--path path_to_file] [--all|--nodes 
nodeId1,nodeId2,..|--nodes consistentId1,consistentId2,..]
--diagnostic pageLocks dump_log [--all|--nodes nodeId1,nodeId2,..|--nodes 
consistentId1,consistentId2,..]
{code}

HeapArrayLockStack and HeapArrayLockStack output:
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockStackTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages stack: main time=(1559050284306, 2019-05-28 16:31:24.306)
-> Try Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284393, 2019-05-28 16:31:24.393)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [11(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=111 
[pageIdHex=006f, partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [111(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=111 [pageIdHex=006f, 
partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = []
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
{code}

HeapArrayLockLog and OffHeapLockLog
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockLogTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
-> Try Read lock nextOpPageId=1, nextOpStructureId=123 
[pageIdHex=0001, partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
-> Try Read lock nextOpPageId=11, nextOpStructureId=123 
[pageIdHex=000b, partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0),11(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
L=1 <- Read unlock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 

[jira] [Commented] (IGNITE-11835) Support JMX/control.sh API for page lock dump

2019-06-03 Thread Dmitriy Govorukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/IGNITE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854665#comment-16854665
 ] 

Dmitriy Govorukhin commented on IGNITE-11835:
-

[~skozlov] Agree, fixed.


> Support JMX/control.sh API for page lock dump
> -
>
> Key: IGNITE-11835
> URL: https://issues.apache.org/jira/browse/IGNITE-11835
> Project: Ignite
>  Issue Type: Sub-task
>Reporter: Dmitriy Govorukhin
>Assignee: Dmitriy Govorukhin
>Priority: Major
> Fix For: 2.8
>
>
> Support JMX/control.sh API for page lock dump
> JMX
> {code}
> public interface PageLockMXBean  {
> String dumpLocks();
> void dumpLocksToLog();
> String dumpLocksToFile();
> String dumpLocksToFile(String path);
> }
> {code}
> control.sh
> {code}
> --diagnostic pageLocks dump // Save dump to file generated in 
> IGNITE_HOME/work dir.
> --diagnostic pageLocks dump log // Print dump to console on node.
> --diagnostic pageLocks dump {path} // Save dump to specific path.
> --diagnostic pageLocks dump -a  or --all //  Dump on all nodes.
> --diagnostic pageLocks dump {UUID} {UUID} or {constId} {constId} // Dump 
> on subset nodes.
> {code}
> HeapArrayLockStack and HeapArrayLockStack output:
> org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockStackTest#testThreeReadPageLock_3
> {code}
> Locked pages = []
> Locked pages stack: main time=(1559050284306, 2019-05-28 16:31:24.306)
>   -> Try Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages stack: main time=(1559050284393, 2019-05-28 16:31:24.393)
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   -> Try Read lock structureId=123 pageId=11 [pageIdHex=000b, 
> partId=11, pageIdx=11, flags=]
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [11(r=1|w=0),1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   Read lock structureId=123 pageId=11 [pageIdHex=000b, 
> partId=11, pageIdx=11, flags=]
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   -> Try Read lock structureId=123 pageId=111 
> [pageIdHex=006f, partId=111, pageIdx=111, flags=]
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [111(r=1|w=0),1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   Read lock structureId=123 pageId=111 [pageIdHex=006f, 
> partId=111, pageIdx=111, flags=]
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
>   Read lock structureId=123 pageId=1 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = []
> Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
> {code}
> HeapArrayLockLog and OffHeapLockLog
> org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockLogTest#testThreeReadPageLock_3
> {code}
> Locked pages = []
> Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
> -> Try Read lock nextOpPageId=1, nextOpStructureId=123 
> [pageIdHex=0001, partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
> L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> Locked pages = [1(r=1|w=0)]
> Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
> L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
> partId=1, pageIdx=1, flags=]
> -> Try Read lock nextOpPageId=11, nextOpStructureId=123 
> [pageIdHex=000b, partId=11, pageIdx=11, flags=]
> Locked pages = [1(r=1|w=0),11(r=1|w=0)]
> Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
> L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
> partId=1, pageIdx=1, 

[jira] [Updated] (IGNITE-11835) Support JMX/control.sh API for page lock dump

2019-06-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11835:

Description: 
Support JMX/control.sh API for page lock dump

JMX
{code}
public interface PageLockMXBean  {
String dumpLocks();

void dumpLocksToLog();

String dumpLocksToFile();

String dumpLocksToFile(String path);
}
{code}

control.sh
{code}
--diagnostic pageLocks dump // Save dump to file generated in IGNITE_HOME/work 
dir.
--diagnostic pageLocks dump log // Print dump to console on node.
--diagnostic pageLocks dump {path} // Save dump to specific path.
--diagnostic pageLocks dump -a  or --all //  Dump on all nodes.
--diagnostic pageLocks dump {UUID} {UUID} or {constId} {constId} // Dump on 
subset nodes.
{code}

HeapArrayLockStack and HeapArrayLockStack output:
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockStackTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages stack: main time=(1559050284306, 2019-05-28 16:31:24.306)
-> Try Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284393, 2019-05-28 16:31:24.393)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [11(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=111 
[pageIdHex=006f, partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [111(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=111 [pageIdHex=006f, 
partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = []
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
{code}

HeapArrayLockLog and OffHeapLockLog
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockLogTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
-> Try Read lock nextOpPageId=1, nextOpStructureId=123 
[pageIdHex=0001, partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
-> Try Read lock nextOpPageId=11, nextOpStructureId=123 
[pageIdHex=000b, partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0),11(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
L=1 <- Read unlock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = 

[jira] [Updated] (IGNITE-11835) Support JMX/control.sh API for page lock dump

2019-06-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11835:

Description: 
Support JMX/control.sh API for page lock dump

JMX
{code}
public interface PageLockMXBean  {
String dumpLocks();

void dumpLocksToLog();

String dumpLocksToFile();

String dumpLocksToFile(String path);
}
{code}

control.sh
{code}
--diagnostic pageLocks dump // Save dump to file generated in IGNITE_HOME/work 
dir.
--diagnostic pageLocks dump log // Print dump to console on node.
--diagnostic pageLocks dump {path} // Save dump to specific path.
--diagnostic pageLocks dump -a  // Dump on all nodes.
--diagnostic pageLocks dump {UUID} {UUID} or {constId} {constId} // Dump on 
subset nodes.
{code}

HeapArrayLockStack and HeapArrayLockStack output:
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockStackTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages stack: main time=(1559050284306, 2019-05-28 16:31:24.306)
-> Try Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284393, 2019-05-28 16:31:24.393)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [11(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=111 
[pageIdHex=006f, partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [111(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=111 [pageIdHex=006f, 
partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = []
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
{code}

HeapArrayLockLog and OffHeapLockLog
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockLogTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
-> Try Read lock nextOpPageId=1, nextOpStructureId=123 
[pageIdHex=0001, partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
-> Try Read lock nextOpPageId=11, nextOpStructureId=123 
[pageIdHex=000b, partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0),11(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
L=1 <- Read unlock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]

[jira] [Updated] (IGNITE-11835) Support JMX/control.sh API for page lock dump

2019-06-03 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-11835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin updated IGNITE-11835:

Description: 
Support JMX/control.sh API for page lock dump

JMX
{code}
public interface PageLockMXBean  {
String dumpLocks();

void dumpLocksToLog();

String dumpLocksToFile();

String dumpLocksToFile(String path);
}
{code}

control.sh
{code}
--diagnostic pageLocksTracker dump // Save dump to file generated in 
IGNITE_HOME/work dir.
--diagnostic pageLocksTracker dump log // Print dump to console on node.
--diagnostic pageLocksTracker dump {path} // Save dump to specific path.
--diagnostic pageLocksTracker dump -a  // Dump on all nodes.
--diagnostic pageLocksTracker dump {UUID} {UUID} or {constId} {constId} // 
Dump on subset nodes.
{code}

HeapArrayLockStack and HeapArrayLockStack output:
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockStackTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages stack: main time=(1559050284306, 2019-05-28 16:31:24.306)
-> Try Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284393, 2019-05-28 16:31:24.393)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [11(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=11 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
-> Try Read lock structureId=123 pageId=111 
[pageIdHex=006f, partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [111(r=1|w=0),1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=111 [pageIdHex=006f, 
partId=111, pageIdx=111, flags=]
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
Read lock structureId=123 pageId=1 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = []
Locked pages stack: main time=(1559050284394, 2019-05-28 16:31:24.394)
{code}

HeapArrayLockLog and OffHeapLockLog
org.apache.ignite.internal.processors.cache.persistence.diagnostic.PageLockLogTest#testThreeReadPageLock_3
{code}
Locked pages = []
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
-> Try Read lock nextOpPageId=1, nextOpStructureId=123 
[pageIdHex=0001, partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
-> Try Read lock nextOpPageId=11, nextOpStructureId=123 
[pageIdHex=000b, partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0),11(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]


Locked pages = [1(r=1|w=0)]
Locked pages log: main time=(1559049634782, 2019-05-28 16:20:34.782)
L=1 -> Read lock pageId=1, structureId=123 [pageIdHex=0001, 
partId=1, pageIdx=1, flags=]
L=2 -> Read lock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, flags=]
L=1 <- Read unlock pageId=11, structureId=123 [pageIdHex=000b, 
partId=11, pageIdx=11, 

[jira] [Assigned] (IGNITE-6324) Transactional cache data partially available after crash.

2019-05-31 Thread Dmitriy Govorukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/IGNITE-6324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy Govorukhin reassigned IGNITE-6324:
--

Assignee: (was: Dmitriy Govorukhin)

> Transactional cache data partially available after crash.
> -
>
> Key: IGNITE-6324
> URL: https://issues.apache.org/jira/browse/IGNITE-6324
> Project: Ignite
>  Issue Type: Bug
>  Components: persistence
>Affects Versions: 1.9, 2.1
>Reporter: Stanilovsky Evgeny
>Priority: Major
> Fix For: 2.8
>
> Attachments: InterruptCommitedThreadTest.java
>
>
> If InterruptedException raise in client code during pds store operations we 
> can obtain inconsistent cache after restart. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >