Seems that the system worker has been blocked on your end for more
than 30 seconds and this caused the shutdown due to an watchdog:
[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][G] Blocked
system-critical thread has been detected. This can lead to
cluster-wide undefined behaviour [threadName=db-checkpoint-thread,
blockedFor=32s]
[2019-04-12T10:52:27,451][WARN ][tcp-disco-msg-worker-#2][G] Thread
[name="db-checkpoint-thread-#61", id=115, state=WAITING, blockCnt=39,
waitCnt=309]
Lock
[object=java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@92173e8,
ownerName=null, ownerId=-1]
[2019-04-12T10:52:27,451][ERROR][tcp-disco-msg-worker-#2][] Critical
system error detected. Will be handled accordingly to configured
handler [hnd=StopNodeOrHaltFailureHandler [tryStop=false, timeout=0,
super=AbstractFailureHandler
[ignoredFailureTypes=[SYSTEM_WORKER_BLOCKED]]],
failureCtx=FailureContext [type=SYSTEM_WORKER_BLOCKED, err=class
o.a.i.IgniteException: GridWorker [name=db-checkpoint-thread,
igniteInstanceName=null, finished=false, heartbeatTs=1555066315438]]]
Try to tune this watchdog or disable. That's what the docs say:
https://apacheignite.readme.io/docs/critical-failures-handling#section-critical-workers-health-check
Ignite has an internal mechanism for verifying that critical workers
are operational. Each worker is regularly checked whether it's alive
and is updating its heartbeat timestamp. If either of the conditions
is not observed for the configured period of time, the worker is
regarded as blocked and Ignite will output that information to the log
file. The period of inactivity is specified by the
IgniteConfiguration.systemWorkerBlockedTimeout property (in
milliseconds; the default value equals the failure detection timeout
<https://apacheignite.readme.io/docs/tcpip-discovery#section-failure-detection-timeout>).
This behavior will be revisited in Ignite soon:
http://apache-ignite-developers.2346864.n4.nabble.com/GridDhtInvalidPartitionException-takes-the-cluster-down-td41459.html
-
Denis
On Mon, Apr 15, 2019 at 9:13 PM shivakumar <[email protected]> wrote:
> Hi all,
> I created a table with JDBC connection with native persistence enabled in
> partitioned mode and i have 2 ignite nodes (2.7.0 version) running in
> kubernetes environment, then i ingested 1500000 records, when i try to drop
> the table both the pods are restarting one after the other.
> Please find the attached thread dump logs
> and after this drop statement is unsuccessful
>
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> | TABLE_CAT | TABLE_SCHEM |
>
> TABLE_NAME | TABLE_TYPE |
> REMARKS
> |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> | | PUBLIC |
> DEVICE
> | TABLE | |
> | | PUBLIC |
> DIMENSIONS | TABLE |
>
> |
> | | PUBLIC | CELL
>
> | TABLE | |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> 0: jdbc:ignite:thin://ignite-service.cign.svc> DROP TABLE IF EXISTS
> PUBLIC.DEVICE;
> Error: Statement is closed. (state=,code=0)
> java.sql.SQLException: Statement is closed.
> at
>
> org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.ensureNotClosed(JdbcThinStatement.java:862)
> at
>
> org.apache.ignite.internal.jdbc.thin.JdbcThinStatement.getWarnings(JdbcThinStatement.java:454)
> at sqlline.Commands.execute(Commands.java:849)
> at sqlline.Commands.sql(Commands.java:733)
> at sqlline.SqlLine.dispatch(SqlLine.java:795)
> at sqlline.SqlLine.begin(SqlLine.java:668)
> at sqlline.SqlLine.start(SqlLine.java:373)
> at sqlline.SqlLine.main(SqlLine.java:265)
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !quit
> Closing: org.apache.ignite.internal.jdbc.thin.JdbcThinConnection
> [root@vm-10-99-26-135 bin]# ./sqlline.sh --verbose=true -u
>
> "jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;"
> issuing: !connect
>
> jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
> '' '' org.apache.ignite.IgniteJdbcThinDriver
> Connecting to
>
> jdbc:ignite:thin://ignite-service.cign.svc.cluster.local:10800;user=ignite;password=ignite;
> Connected to: Apache Ignite (version 2.7.0#19700101-sha1:00000000)
> Driver: Apache Ignite Thin JDBC Driver (version
> 2.7.0#20181130-sha1:256ae401)
> Autocommit status: true
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> sqlline version 1.3.0
> 0: jdbc:ignite:thin://ignite-service.cign.svc> !tables
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> | TABLE_CAT | TABLE_SCHEM |
>
> TABLE_NAME | TABLE_TYPE |
> REMARKS
> |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> | | PUBLIC |
> DEVICE
> | TABLE | |
> | | PUBLIC |
> DIMENSIONS | TABLE |
>
> |
> | | PUBLIC | CELL
>
> | TABLE | |
>
> +--------------------------------+--------------------------------+--------------------------------+--------------------------------+---------------------------------+
> 0: jdbc:ignite:thin://ignite-service.cign.svc> select count(*) from DEVICE;
> +--------------------------------+
> | COUNT(*) |
> +--------------------------------+
> | 1500000 |
> +--------------------------------+
> 1 row selected (5.665 seconds)
> 0: jdbc:ignite:thin://ignite-service.cign.svc>
>
> ignite_thread_dump.txt
> <
> http://apache-ignite-users.70518.x6.nabble.com/file/t2244/ignite_thread_dump.txt>
>
>
>
> shiva
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>