[jira] [Commented] (IGNITE-17369) Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.

2022-11-07 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17629681#comment-17629681
 ] 

Ignite TC Bot commented on IGNITE-17369:


{panel:title=Branch: [pull/10286/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10286/head] Base: [master] : New Tests 
(11)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Control Utility 1{color} [[tests 
2|https://ci2.ignite.apache.org/viewLog.html?buildId=6896927]]
* {color:#013220}IgniteControlUtilityTestSuite: 
GridCommandHandlerTest.testClusterCreateSnapshotWarning - PASSED{color}
* {color:#013220}IgniteControlUtilityTestSuite: 
GridCommandHandlerWithSSLTest.testClusterCreateSnapshotWarning - PASSED{color}

{color:#8b}Snapshots{color} [[tests 
8|https://ci2.ignite.apache.org/viewLog.html?buildId=6896977]]
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamingIntoInMememoryDoesntAffectSnapshot[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotOverwriting[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotDefault[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testOtherCacheRestores[Encryption=false] - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamingIntoInMememoryDoesntAffectSnapshot[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotOverwriting[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotDefault[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testOtherCacheRestores[Encryption=true] - 
PASSED{color}

{color:#8b}Control Utility (Zookeeper){color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=6896928]]
* {color:#013220}ZookeeperIgniteControlUtilityTestSuite: 
GridCommandHandlerTest.testClusterCreateSnapshotWarning - PASSED{color}

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=6896996buildTypeId=IgniteTests24Java8_RunAll]

> Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.
> -
>
> Key: IGNITE-17369
> URL: https://issues.apache.org/jira/browse/IGNITE-17369
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Assignee: Vladimir Steshin
>Priority: Major
>  Labels: ise, ise.lts
> Attachments: IgniteClusterShanpshotStreamerTest.java
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ignite fails to restore snapshot created under streamed load:
> {code:java}
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=148]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=29, 
> partitionState=OWNING, size=29, partHash=827765854], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=9, partitionState=OWNING, size=9, partHash=-1515069105]]
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=146]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=28, 
> partitionState=OWNING, size=28, partHash=1497908810], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=5, partitionState=OWNING, size=5, partHash=821195757]]
> {code}
> Test (attached):
> {code:java}
> public void testClusterSnapshotConsistencyWithStreamer() throws Exception 
> {
> int grids = 2;
> CountDownLatch loadNumberBeforeSnapshot = new CountDownLatch(60_000);
> AtomicBoolean stopLoading = new AtomicBoolean(false);
> dfltCacheCfg = null;
> Class.forName("org.apache.ignite.IgniteJdbcDriver");
> String tableName = "TEST_TBL1";
> startGrids(grids);
> grid(0).cluster().state(ACTIVE);
> IgniteInternalFuture load1 = runLoad(tableName, false, 1, true, 
> stopLoading, loadNumberBeforeSnapshot);
> loadNumberBeforeSnapshot.await();
> grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
> stopLoading.set(true);
> load1.get();
> grid(0).cache("SQL_PUBLIC_" + 

[jira] [Commented] (IGNITE-17369) Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.

2022-11-03 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17628395#comment-17628395
 ] 

Ignite TC Bot commented on IGNITE-17369:


{panel:title=Branch: [pull/10286/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/10286/head] Base: [master] : New Tests 
(11)|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}
{color:#8b}Control Utility 1{color} [[tests 
2|https://ci2.ignite.apache.org/viewLog.html?buildId=6890577]]
* {color:#013220}IgniteControlUtilityTestSuite: 
GridCommandHandlerTest.testClusterCreateSnapshotWarning - PASSED{color}
* {color:#013220}IgniteControlUtilityTestSuite: 
GridCommandHandlerWithSSLTest.testClusterCreateSnapshotWarning - PASSED{color}

{color:#8b}Snapshots{color} [[tests 
8|https://ci2.ignite.apache.org/viewLog.html?buildId=6890627]]
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamingIntoInMememoryDoesntAffectSnapshot[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotOverwriting[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotDefault[Encryption=false]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testOtherCacheRestores[Encryption=false] - 
PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamingIntoInMememoryDoesntAffectSnapshot[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotOverwriting[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testStreamerWhileSnapshotDefault[Encryption=true]
 - PASSED{color}
* {color:#013220}IgniteSnapshotTestSuite: 
IgniteClusterSnapshotStreamerTest.testOtherCacheRestores[Encryption=true] - 
PASSED{color}

{color:#8b}Control Utility (Zookeeper){color} [[tests 
1|https://ci2.ignite.apache.org/viewLog.html?buildId=6890578]]
* {color:#013220}ZookeeperIgniteControlUtilityTestSuite: 
GridCommandHandlerTest.testClusterCreateSnapshotWarning - PASSED{color}

{panel}
[TeamCity *-- Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=6890646buildTypeId=IgniteTests24Java8_RunAll]

> Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.
> -
>
> Key: IGNITE-17369
> URL: https://issues.apache.org/jira/browse/IGNITE-17369
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Assignee: Vladimir Steshin
>Priority: Major
>  Labels: ise, ise.lts
> Attachments: IgniteClusterShanpshotStreamerTest.java
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Ignite fails to restore snapshot created under streamed load:
> {code:java}
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=148]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=29, 
> partitionState=OWNING, size=29, partHash=827765854], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=9, partitionState=OWNING, size=9, partHash=-1515069105]]
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=146]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=28, 
> partitionState=OWNING, size=28, partHash=1497908810], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=5, partitionState=OWNING, size=5, partHash=821195757]]
> {code}
> Test (attached):
> {code:java}
> public void testClusterSnapshotConsistencyWithStreamer() throws Exception 
> {
> int grids = 2;
> CountDownLatch loadNumberBeforeSnapshot = new CountDownLatch(60_000);
> AtomicBoolean stopLoading = new AtomicBoolean(false);
> dfltCacheCfg = null;
> Class.forName("org.apache.ignite.IgniteJdbcDriver");
> String tableName = "TEST_TBL1";
> startGrids(grids);
> grid(0).cluster().state(ACTIVE);
> IgniteInternalFuture load1 = runLoad(tableName, false, 1, true, 
> stopLoading, loadNumberBeforeSnapshot);
> loadNumberBeforeSnapshot.await();
> grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
> stopLoading.set(true);
> load1.get();
> grid(0).cache("SQL_PUBLIC_" + 

[jira] [Commented] (IGNITE-17369) Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.

2022-10-03 Thread Vladimir Steshin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17612278#comment-17612278
 ] 

Vladimir Steshin commented on IGNITE-17369:
---

Snapshot can begin work with different state of kin partitions. The shapshot 
process waits for the datastreamer futures. 
(GridCacheMvccManager.addDataStreamerFuture()). The problem is that these 
futures are created separately and concurrently on primary and backups nodes by 
IsolatedUpdater. As result, at the checkpoint some backups might be written 
without the primaries. And opposite. There are no updates accepted during 
checkpoint. Late streamer updates is not written to snapshoting partitions.

1) V1 (10285). 
PR brings watching DataStreamer futures in snapshot process. The futures are 
created before writing streamer batch on any node. We cannot relay on it as on 
final and consistent write for current/latest streamer batch or certain 
record/entry. But we know that datastreamer is in progress at the checkpoint 
and that it is on pause. We can invalidate snapshot at this moment.

2) V2 (10286).
IsolatedUpdater could just notify snapshot process, if exists, that concurrent 
inconsistent update is on. A notification of at least one entry on any node 
wound be enough. Should work in practice.
In theory the solution is not resilent. On streamer batch could've been 
entirely written before snapshot. Second batch after. First batch writes 
partition on primaries or backups. Second writes the rest. Snapshot is 
inconsistent.

3) V2 (10284).
We could mark that datastreamer is on on any first streamer batch received. And 
unmark somehow later. If datastreamer is marked
as active, the snapshot process could check this mark. Since the mark is set 
before writting data, it is set before the datastreamer future which is waiting 
for the snapshot process.

The problem is how to close such mark. When node left? Node can live forever. 
Send special closing request? The streamer node

can do not close streamer at all. Meaning no close() is invoked. Moreoever, 
datastreamer work through CommunicatioSPI. Which doesn't guarantee delivery. We 
can be sure the closing request is delivered and streamer is unmarked. On 
closing requests, a rebalance can happen. Should be processed too. Or 
datastreamer can be canceled. Looks like wee need a discovery closing message. 
Much simpler and reliable.

This solution looks hazardous.

> Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.
> -
>
> Key: IGNITE-17369
> URL: https://issues.apache.org/jira/browse/IGNITE-17369
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Assignee: Vladimir Steshin
>Priority: Major
>  Labels: ise, ise.lts
> Attachments: IgniteClusterShanpshotStreamerTest.java
>
>
> Ignite fails to restore snapshot created under streamed load:
> {code:java}
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=148]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=29, 
> partitionState=OWNING, size=29, partHash=827765854], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=9, partitionState=OWNING, size=9, partHash=-1515069105]]
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=146]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=28, 
> partitionState=OWNING, size=28, partHash=1497908810], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=5, partitionState=OWNING, size=5, partHash=821195757]]
> {code}
> Test (attached):
> {code:java}
> public void testClusterSnapshotConsistencyWithStreamer() throws Exception 
> {
> int grids = 2;
> CountDownLatch loadNumberBeforeSnapshot = new CountDownLatch(60_000);
> AtomicBoolean stopLoading = new AtomicBoolean(false);
> dfltCacheCfg = null;
> Class.forName("org.apache.ignite.IgniteJdbcDriver");
> String tableName = "TEST_TBL1";
> startGrids(grids);
> grid(0).cluster().state(ACTIVE);
> IgniteInternalFuture load1 = runLoad(tableName, false, 1, true, 
> stopLoading, loadNumberBeforeSnapshot);
> loadNumberBeforeSnapshot.await();
> grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
> stopLoading.set(true);
> load1.get();
> grid(0).cache("SQL_PUBLIC_" + tableName).destroy();
> grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, 
> 

[jira] [Commented] (IGNITE-17369) Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.

2022-09-26 Thread Vladimir Steshin (Jira)


[ 
https://issues.apache.org/jira/browse/IGNITE-17369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609543#comment-17609543
 ] 

Vladimir Steshin commented on IGNITE-17369:
---

Snapshot should warn user about possileble data corruption by streamer. 
Snapshot task should finish with this message and negative status.

> Snapshot is inconsistent under streamed loading with 'allowOverwrite==false'.
> -
>
> Key: IGNITE-17369
> URL: https://issues.apache.org/jira/browse/IGNITE-17369
> Project: Ignite
>  Issue Type: Bug
>Reporter: Vladimir Steshin
>Assignee: Vladimir Steshin
>Priority: Major
>  Labels: ise, ise.lts
> Attachments: IgniteClusterShanpshotStreamerTest.java
>
>
> Ignite fails to restore snapshot created under streamed load:
> {code:java}
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=148]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=29, 
> partitionState=OWNING, size=29, partHash=827765854], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=9, partitionState=OWNING, size=9, partHash=-1515069105]]
> Conflict partition: PartitionKeyV2 [grpId=109386747, 
> grpName=SQL_PUBLIC_TEST_TBL1, partId=146]
> Partition instances: [PartitionHashRecordV2 [isPrimary=false, 
> consistentId=snapshot.IgniteClusterShanpshotStreamerTest0, updateCntr=28, 
> partitionState=OWNING, size=28, partHash=1497908810], PartitionHashRecordV2 
> [isPrimary=false, consistentId=snapshot.IgniteClusterShanpshotStreamerTest1, 
> updateCntr=5, partitionState=OWNING, size=5, partHash=821195757]]
> {code}
> Test (attached):
> {code:java}
> public void testClusterSnapshotConsistencyWithStreamer() throws Exception 
> {
> int grids = 2;
> CountDownLatch loadNumberBeforeSnapshot = new CountDownLatch(60_000);
> AtomicBoolean stopLoading = new AtomicBoolean(false);
> dfltCacheCfg = null;
> Class.forName("org.apache.ignite.IgniteJdbcDriver");
> String tableName = "TEST_TBL1";
> startGrids(grids);
> grid(0).cluster().state(ACTIVE);
> IgniteInternalFuture load1 = runLoad(tableName, false, 1, true, 
> stopLoading, loadNumberBeforeSnapshot);
> loadNumberBeforeSnapshot.await();
> grid(0).snapshot().createSnapshot(SNAPSHOT_NAME).get();
> stopLoading.set(true);
> load1.get();
> grid(0).cache("SQL_PUBLIC_" + tableName).destroy();
> grid(0).snapshot().restoreSnapshot(SNAPSHOT_NAME, 
> F.asList("SQL_PUBLIC_TEST_TBL1")).get();
> }
> /** */
> private IgniteInternalFuture runLoad(String tblName, boolean useCache, 
> int backups, boolean streaming, AtomicBoolean stop,
> CountDownLatch startSnp) {
> return GridTestUtils.runMultiThreadedAsync(() -> {
> if(useCache) {
> String cacheName = "SQL_PUBLIC_" + tblName.toUpperCase();
> IgniteCache cache = grid(0)
> .createCache(new CacheConfiguration Object>(cacheName).setBackups(backups)
> .setCacheMode(CacheMode.REPLICATED));
> try (IgniteDataStreamer ds = 
> grid(0).dataStreamer(cacheName)) {
> for (int i = 0; !stop.get(); ++i) {
> if (streaming)
> ds.addData(i, new Account(i, i - 1));
> else
> cache.put(i, new Account(i, i - 1));
> if (startSnp.getCount() > 0)
> startSnp.countDown();
> Thread.yield();
> }
> }
> } else {
> try (Connection conn = 
> DriverManager.getConnection("jdbc:ignite:thin://127.0.0.1/")) {
> createTable(conn, tblName, backups);
> try (PreparedStatement stmt = 
> conn.prepareStatement("INSERT INTO " + tblName +
> "(id, name, orgid, dep) VALUES(?, ?, ?, ?)")) {
> if (streaming)
> conn.prepareStatement("SET STREAMING 
> ON;").execute();
> int leftLimit = 97; // letter 'a'
> int rightLimit = 122; // letter'z'
> int targetStringLength = 15;
> Random rand = new Random();
> //
> for (int i = 0; !stop.get(); ++i) {
> int orgid = rand.ints(1, 0, 
> 5).findFirst().getAsInt();
> String val = rand.ints(leftLimit, rightLimit + 
> 1).limit(targetStringLength)