Re: Cluster crush on 2.7.0

2019-07-09 Thread Ilya Kasnacheev
Hello!

I can't comment on your LDAP situation, but you can see errors like this
one in test if you clear Ignite work dir before node is completely stopped.

See for example https://issues.apache.org/jira/browse/IGNITE-8797

Regards,
-- 
Ilya Kasnacheev


вт, 9 июл. 2019 г. в 15:04, Andrey Davydov :

>
>
> Hello all,
>
>
>
> Sometimes we get very strange ignite cluster crush during execution of
> tests for our system.
>
>
>
> 2019-07-09 11:02:59,710 [main] ERROR com.imperva.ddc.core.Driver:176 -
> Ldap Connection to nodelegateddomen.local failed
>
> 
>
> 2019-07-09 11:02:59,715 [main] ERROR com.imperva.ddc.core.Driver:116 -
> Test connection has failed. Results: Connection to host
> nodelegateddomen.dev002.local has failed. Reason:
> com.imperva.ddc.core.exceptions.InvalidConnectionException:
> org.apache.directory.ldap.client.api.exception.InvalidConnectionException:
> Cannot connect to the server: Hostname 'nodelegateddomen.local' could not
> be resolved.
>
> 2019-07-09 11:03:05,482 [db-checkpoint-thread-#10414] ERROR :134 -
> Critical system error detected. Will be handled accordingly to configured
> handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
> super=AbstractFailureHandler
> [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], failureCtx=FailureContext
> [type=CRITICAL_ERROR, err=class
> o.a.i.i.processors.cache.persistence.StorageException: Failed to write
> checkpoint entry [ptr=FileWALPointer [idx=0, fileOff=2032270, len=31143],
> cpTs=1562670185474, cpId=1bab48e6-f29c-4fb3-acd0-385783244ad9, type=START]]]
>
> org.apache.ignite.internal.processors.cache.persistence.StorageException:
> Failed to write checkpoint entry [ptr=FileWALPointer [idx=0,
> fileOff=2032270, len=31143], cpTs=1562670185474,
> cpId=1bab48e6-f29c-4fb3-acd0-385783244ad9, type=START]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.writeCheckpointEntry(GridCacheDatabaseSharedManager.java:2853)
> ~[ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3841)
> ~[ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3279)
> [ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3178)
> [ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
> [ignite-core-2.7.0.jar:2.7.0]
>
>  at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
>
> Caused by: java.nio.file.NoSuchFileException:
> /buildir/testdir/ignitewd/db/storage/node00-98797cd6-5f62-4e0f-8bbd-08163bb111ae/cp/1562670185474-1bab48e6-f29c-4fb3-acd0-385783244ad9-START.bin.tmp
>
>  at
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> ~[?:1.8.0_212]
>
>  at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> ~[?:1.8.0_212]
>
>  at
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> ~[?:1.8.0_212]
>
>  at
> sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
> ~[?:1.8.0_212]
>
>  at
> java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
> ~[?:1.8.0_212]
>
>  at
> java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
> ~[?:1.8.0_212]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.(AsyncFileIO.java:57)
> ~[ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:53)
> ~[ignite-core-2.7.0.jar:2.7.0]
>
>  at
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.writeCheckpointEntry(GridCacheDatabaseSharedManager.java:2836)
> ~[ignite-core-2.7.0.jar:2.7.0]
>
>  ... 5 more
>
>
>
> I don’t know way to reproduce =(, But we got this crash twice during last
> month, on the same test. And this test don’t make any payload for cluster,
> this test is to check how our authorization subsystem handle case when
> ActiveDirectory is not available. No any manipulation with grid.
>
>
>
> I have no any idea how it works, but both cluster crush was on the same
> place.
>
>
>
> We use Imperva as AD client
>
>
>
> ddc-core
>
> com.imperva.ddc
>
> 7.3.3.0.0.0
>
>
>
> We use embedded Ignite with enabled percistence.
>
>
>
> Thanks for any help.
>
>
>
> Andrey.
>
>
>
> 2019-07-09 11:02:59,710 [main] ERROR 

Cluster crush on 2.7.0

2019-07-09 Thread Andrey Davydov

Hello all,

Sometimes we get very strange ignite cluster crush during execution of tests 
for our system.

2019-07-09 11:02:59,710 [main] ERROR com.imperva.ddc.core.Driver:176 - Ldap 
Connection to nodelegateddomen.local failed

2019-07-09 11:02:59,715 [main] ERROR com.imperva.ddc.core.Driver:116 - Test 
connection has failed. Results: Connection to host 
nodelegateddomen.dev002.local has failed. Reason: 
com.imperva.ddc.core.exceptions.InvalidConnectionException: 
org.apache.directory.ldap.client.api.exception.InvalidConnectionException: 
Cannot connect to the server: Hostname 'nodelegateddomen.local' could not be 
resolved.
2019-07-09 11:03:05,482 [db-checkpoint-thread-#10414] ERROR :134 - Critical 
system error detected. Will be handled accordingly to configured handler 
[hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0, 
super=AbstractFailureHandler [ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]], 
failureCtx=FailureContext [type=CRITICAL_ERROR, err=class 
o.a.i.i.processors.cache.persistence.StorageException: Failed to write 
checkpoint entry [ptr=FileWALPointer [idx=0, fileOff=2032270, len=31143], 
cpTs=1562670185474, cpId=1bab48e6-f29c-4fb3-acd0-385783244ad9, type=START]]]
org.apache.ignite.internal.processors.cache.persistence.StorageException: 
Failed to write checkpoint entry [ptr=FileWALPointer [idx=0, fileOff=2032270, 
len=31143], cpTs=1562670185474, cpId=1bab48e6-f29c-4fb3-acd0-385783244ad9, 
type=START]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.writeCheckpointEntry(GridCacheDatabaseSharedManager.java:2853)
 ~[ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3841)
 ~[ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:3279)
 [ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:3178)
 [ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120) 
[ignite-core-2.7.0.jar:2.7.0]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.nio.file.NoSuchFileException: 
/buildir/testdir/ignitewd/db/storage/node00-98797cd6-5f62-4e0f-8bbd-08163bb111ae/cp/1562670185474-1bab48e6-f29c-4fb3-acd0-385783244ad9-START.bin.tmp
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) 
~[?:1.8.0_212]
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) 
~[?:1.8.0_212]
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) 
~[?:1.8.0_212]
at 
sun.nio.fs.UnixFileSystemProvider.newAsynchronousFileChannel(UnixFileSystemProvider.java:196)
 ~[?:1.8.0_212]
at 
java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:248)
 ~[?:1.8.0_212]
at 
java.nio.channels.AsynchronousFileChannel.open(AsynchronousFileChannel.java:301)
 ~[?:1.8.0_212]
at 
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.(AsyncFileIO.java:57)
 ~[ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory.create(AsyncFileIOFactory.java:53)
 ~[ignite-core-2.7.0.jar:2.7.0]
at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.writeCheckpointEntry(GridCacheDatabaseSharedManager.java:2836)
 ~[ignite-core-2.7.0.jar:2.7.0]
... 5 more

I don’t know way to reproduce =(, But we got this crash twice during last 
month, on the same test. And this test don’t make any payload for cluster, this 
test is to check how our authorization subsystem handle case when 
ActiveDirectory is not available. No any manipulation with grid. 

I have no any idea how it works, but both cluster crush was on the same place.

We use Imperva as AD client

ddc-core
com.imperva.ddc
7.3.3.0.0.0

We use embedded Ignite with enabled percistence.

Thanks for any help.

Andrey.

2019-07-09 11:02:59,710 [main] ERROR com.imperva.ddc.core.Driver:176 - Ldap 
Connection to nodelegateddomen.local failed
org.apache.directory.ldap.client.api.exception.InvalidConnectionException: 
Cannot connect to the server: Hostname 'nodelegateddomen.local' could not be 
resolved.
at 
org.apache.directory.ldap.client.api.LdapNetworkConnection.connect(LdapNetworkConnection.java:758)
 ~[api-all-1.0.3.jar:1.0.3]
at 
org.apache.directory.ldap.client.api.LdapNetworkConnection.bindAsync(LdapNetworkConnection.java:1368)
 ~[api-all-1.0.3.jar:1.0.3]
at