It's hard to determine the problem by these messages. I don't see anything unhealthy regarding persistence - checkpoint start is a regular event.

There were cases when excessive GC load on client side seriously affected throughput/latency of data streaming. You may consider playing with the following data streamer parameters:

public int perNodeBufferSize(int bufSize) - defines how many items should be saved in buffer before send it to server node  public void perNodeParallelOperations(int parallelOps) - defines how many buffers can be sent to the node without acknowledge that it was processed

Best Regards,
Ivan Rakov

On 19.02.2018 22:24, lawrencefinn wrote:
Okay I am trying to reproduce.  It hasn't got stuck yet, but the client got
disconnected and reconnected recently.  I don't think it is related to GC
because I am recording GC times and it does not jump up that much.  Could
the system get slow on a lot of io?  i see this in the ignite log:

[19:13:01,988][WARNING][grid-timeout-worker-#71][diagnostic] Found long
running cache future [startTime=19:11:56.656, curTime=19:13:01.911,
  fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=62a2a255-3320-4040-aa23-ffb86dec7586, opRes=false, expCnt=-1, rcvdCnt=0,
primaryRes=false, done=false, waitFor=null, rcvd=null],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=3, minorTopVer=14], remapTopVer=null,
err=null, futId=313296239, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=1229092316]]]]
[19:13:01,988][WARNING][grid-timeout-worker-#71][diagnostic] Found long
running cache future [startTime=19:11:39.917, curTime=19:13:01.911,
fut=GridNearAtomicSingleUpdateFuture [reqState=Primary
[id=62a2a255-3320-4040-aa23-ffb86dec7586, opRes=false, expCnt=-1, rcvdCnt=0,
primaryRes=false, done=false, waitFor=null, rcvd=null],
super=GridNearAtomicAbstractUpdateFuture [remapCnt=100,
topVer=AffinityTopologyVersion [topVer=3, minorTopVer=14], remapTopVer=null,
err=null, futId=312914655, super=GridFutureAdapter [ignoreInterrupts=false,
state=INIT, res=null, hash=15435296]]]]
[19:13:51,057][INFO][db-checkpoint-thread-#110][GridCacheDatabaseSharedManager]
Checkpoint started [checkpointId=77744626-04e6-4e17-bda7-23ecb50bbe19,
startPtr=FileWALPointer [idx=9600, fileOffset=35172819, len=124303,
forceFlush=true], checkpointLockWait=57708ms, checkpointLockHoldTime=64ms,
pages=3755135, reason='too many dirty pages']
[19:14:01,919][INFO][grid-timeout-worker-#71][IgniteKernal]
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
     ^-- Node [id=62a2a255, uptime=01:42:41.752]
     ^-- H/N/C [hosts=2, nodes=3, CPUs=64]
     ^-- CPU [cur=77.83%, avg=39.11%, GC=0.13%]
     ^-- PageMemory [pages=5111642]
     ^-- Heap [used=11669MB, free=43.02%, comm=20480MB]
     ^-- Non heap [used=67MB, free=95.56%, comm=69MB]
     ^-- Public thread pool [active=0, idle=0, qSize=0]
     ^-- System thread pool [active=0, idle=6, qSize=0]
     ^-- Outbound messages queue [size=0]
[19:15:03,470][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
accepted incoming connection [rmtAddr=/127.0.0.1, rmtPort=33542]
[19:15:03,470][INFO][tcp-disco-srvr-#2][TcpDiscoverySpi] TCP discovery
spawning a new thread for connection [rmtAddr=/127.0.0.1, rmtPort=33542]


My app log has:
2018-02-19 19:15:02,176 [WARN] from
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi in
tcp-client-disco-reconnector-#5%cabooseGrid% - Timed out waiting for message
to be read (most probably, the reason is long GC pauses on remote node)
[curTimeout=5000, rmtAddr=/127.0.0.1:47500, rmtPort=47500]
2018-02-19 19:15:02,176 [ERROR] from
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi in
tcp-client-disco-reconnector-#5%cabooseGrid% - Exception on joining: Failed
to deserialize object with given class loader:
sun.misc.Launcher$AppClassLoader@28d93b30
org.apache.ignite.IgniteCheckedException: Failed to deserialize object with
given class loader: sun.misc.Launcher$AppClassLoader@28d93b30
         at
org.apache.ignite.marshaller.jdk.JdkMarshaller.unmarshal0(JdkMarshaller.java:129)
         at
org.apache.ignite.marshaller.AbstractNodeNameAwareMarshaller.unmarshal(AbstractNodeNameAwareMarshaller.java:94)
         at
org.apache.ignite.internal.util.IgniteUtils.unmarshal(IgniteUtils.java:9740)
         at
org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.readMessage(TcpDiscoverySpi.java:1590)
         at
org.apache.ignite.spi.discovery.tcp.ClientImpl.sendJoinRequest(ClientImpl.java:627)
         at
org.apache.ignite.spi.discovery.tcp.ClientImpl.joinTopology(ClientImpl.java:524)
         at
org.apache.ignite.spi.discovery.tcp.ClientImpl.access$900(ClientImpl.java:124)
         at
org.apache.ignite.spi.discovery.tcp.ClientImpl$Reconnector.body(ClientImpl.java:1377)
         at
org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)
Caused by: java.net.SocketTimeoutException: Read timed out



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to