client service fail when reconnect,all server nodes stopped then restart

2018-05-15 Thread wangsan
i see some bugs such as https://issues.apache.org/jira/browse/IGNITE-2766 https://github.com/apache/ignite/pull/2627 when i apply the patch, cache can be reused when client reconnect. but

Re: ignite c++ support service grid?

2018-06-07 Thread wangsan
can i use c++ disturbted closure on java ingite node -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

ignite c++ support service grid?

2018-05-27 Thread wangsan
>From the doc https://apacheignite-cpp.readme.io/docs , I did not find any service grid description. I just want to make sure ignite c++ don't support service grid(ignite service rpc) now, but maybe later. I have some c++ nodes do deeplearning which

Re: two data region with two nodes

2018-08-01 Thread wangsan
Thank you for your answer, Maybe I use region config the wrong way. There are two apps(more apps with different roles) with different ignite configs, first App :set default region with persistence enable :set cache a,with nodefilter in first apps,default region second app :set

Re: two data region with two nodes

2018-08-10 Thread wangsan
I just want A module: node a1 a2 make cache cache_a be persistence and balance in A module only with persistence region p_region B module node b1,b2 with cache cache_b be non_persistence and distribute in B module.B will access cache_a and have same region p_region for persistence even cache_b

Re: two data region with two nodes

2018-08-08 Thread wangsan
tks, I will try the oldest node select method to ensure only one node to process the event message. As mentioned earlier, in my project ,I have several modules。eg: module A , daemon node with cache nodecache,machinecache .all the cache persistent。 module B, search node with cache

how ignite c++ node set baselinetopology

2018-08-13 Thread wangsan
I have a java node with persistence. Then when c++ node join.how to active and set blt with api? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

ignte cluster hang with GridCachePartitionExchangeManager

2018-08-24 Thread wangsan
Now my cluster topology is Node a,b,c,d all with persistence enable and peerclassloader false. b c d have different class(cache b) from a. 1.When any node crash with oom(memory or stack) .all nodes hang with " - Still waiting for initial partition map exchange " 2.When a start first, b,c,d start

Re: ignte cluster hang with GridCachePartitionExchangeManager

2018-08-28 Thread wangsan
; import org.apache.ignite.logger.slf4j.Slf4jLogger; import org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi; import org.apache.ignite.spi.discovery.tcp.ipfinder.multicast.TcpDiscoveryMulticastIpFinder; import org.junit.Test; import com.google.common.collect.Lists; /** * @author wangsan * @date

ignite cluster management

2018-08-27 Thread wangsan
I am using ignite to manage my cluster.when node join I save an item in my NodeCache, when node left I delete an item in my NodeCache with ignite event listener. Use custom NodeCache but not ignite cluster topology,The reason is I want to keep node status persistence when node left for some

Re: ignte cluster hang with GridCachePartitionExchangeManager

2018-09-11 Thread wangsan
Yes , It was blocked when do cache operation in discovery event listeners when node left events arrival concurrently. I just do cache operation in another thread. Then the listener will not be blocked. The original cause may be that discovery event processor hold the server latch.when do cache

Re: npe: Failed to reinitialize local partition

2018-10-19 Thread wangsan
Thanks! The ticket description is especially like my scene. I will apply the patch! I have a little confused about the persistence file. If I have a default region without persistence. and cache2a use default region. the db file(directory) won't be created. But if I have region1 (with

zookeeper jd(joindata) never delete

2018-10-18 Thread wangsan
ignite version 2.5.0 discovery spi: ZookeeperDiscoverySpi when I start a client node c. c can join the cluster. but c just join the cluster never call any cache method. the jd path children willl never deleteed: -- Sent

Re: npe: Failed to reinitialize local partition

2018-10-23 Thread wangsan
Tks! I applyed the patch from the issue.And the NPE was fixed. But another problems happen: As the issue says: Steps to reproduce: Ignite server region1 (with persistence) region2 (without persistence) client cache1a from region1 – with custom affinity cache2a fom region2 – with custom affinity

Re: npe: Failed to reinitialize local partition

2018-10-24 Thread wangsan
IgniteTwoRegionsRebuildIndexTest.java The file is the test case. I use getOrCreateCache with name. But the cache template I have beed defined before.When reconnect ,It looks like the cache

Re: npe: Failed to reinitialize local partition

2018-10-28 Thread wangsan
Hi: I run the test in v2.5 (with npe patch) , The in_memory cache backups will be 0 when reconnect. When I use v2.7 ,It's ok! When v2.7 release will be released? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

npe: Failed to reinitialize local partition

2018-10-19 Thread wangsan
when a c++ node with cache(c++ defined) join the java server node(persistence true). it will throw this exception,When I clear server persistence home then restart.the exception won't throw Why? o.a.i.i.p.c.d.d.p.GridDhtPartitionsExchangeFuture - Failed to reinitialize local partitions

ZookeeperDiscovery Waiting for local join event

2018-11-08 Thread wangsan
I use ZookeeperDiscovery,but sometimes the cluster will be wried,client can not join the cluster,and the zookeeper node /jd will have many children nodes. When I restart a server node,the exception is: 19:14:06.675 [zk-192_0_0_1_DAEMON_NODE_8991-EventThread] INFO

ZookeeperDiscovery block when communication error

2018-11-12 Thread wangsan
I have a server node in zone A ,then I start a client from zone B, Now access between A,B was controlled by firewall,The acl is B can access A,but A can not access B. So when client in zone B join the cluster,the communication will fail caused by firewall. But when client in zone B closed, The

persistence when disk full

2018-11-07 Thread wangsan
I have two nodes (server and persistence true) in different machines.That node n1 in machine m1,node n2 in machine m2. When m1 disk full,Two node n1,n1 will be failed.Not only node n1? How can I only make node n1 failed? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ignte cluster hang with GridCachePartitionExchangeManager

2018-09-03 Thread wangsan
Thanks! Can I do cache operations(update cache item) in another thread from discovery event listeners? And the operation(update cache item) will execute concurrently or execute before partition map exchange? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ignite cluster management

2018-09-03 Thread wangsan
1. Note that you will have problem writing to caches when topology changes. Why not just query ignite.cluster(), expose node traits as node.attributes()? >> Yes, I use attributes for node join. But I want to know the stopped >> nodes(when node left) . Then I can do some operations such as

ignite.events().remoteListen unmarshal0 throw class not found

2018-09-20 Thread wangsan
A server and a client, then client call: line_1: CacheNodeFilter rmtFilter = new CacheNodeFilter(serviceType, uuid); line_2: ignite.events().remoteListen(listenerDispatcher, rmtFilter, EventType.EVT_CACHE_OBJECT_EXPIRED); I am quite sure all the server and client have the class

ignite zk: Possible starvation in striped pool

2019-01-22 Thread wangsan
10:38:31.577 [grid-timeout-worker-#55%DAEMON-NODE-10-153-106-16-8991%] WARN o.a.ignite.internal.util.typedef.G - >>> Possible starvation in striped pool. Thread name: sys-stripe-9-#10%DAEMON-NODE-10-153-106-16-8991% Queue: [] Deadlock: false Completed: 17156 Thread

Re: ignite zk: Possible starvation in striped pool

2019-01-22 Thread wangsan
Thank you! I see that this is communication spi not discovery spi. But in other nodes there are many zk session timeout message or zk reconnect fail message. And the starvation message only print in the node which have the zk server(not cluster,just three zk node in one machine) in the same

Re: NPE when start

2019-01-22 Thread wangsan
I will try to reproduce the excepiton. I am worried about it will happen as the cache remote listener rmtFilter @IgniteAsyncSupported public UUID remoteListen(@Nullable IgniteBiPredicate locLsnr, @Nullable IgnitePredicate rmtFilter, @Nullable int... types)

Re: Failed to read data from remote connection

2018-12-17 Thread wangsan
Now the cluster have 100+ nodes, when 'Start check connection process' happens, Some node will throw oom with Direct buffer memory (java nio). When check connections,Many nio socker will be create ,Then oom happens? How to fix the oom except larger xmx? Thanks. -- Sent from:

Re: Failed to read data from remote connection

2018-12-03 Thread wangsan
Do you shut down C++ node properly prior killing the process? Yeath, c++ node was killed by kill -9 .not sighup. It is a wrong ops,And I will use kill ops. Does this exceptions impacts cluster's functionality anyhow? I am not sure about the exceptions. My cluster will crash with oom

Why GridDiscoveryManager onSegmentation use StopNodeFailureHandler?

2018-11-26 Thread wangsan
Why GridDiscoveryManager onSegmentation use StopNodeFailureHandler? But not the StopNodeOrHaltFailureHandler ! In my case,when net problem happen, I wan't the jvm close? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Failed to read data from remote connection

2018-11-27 Thread wangsan
I have a cluster with zookeeper discovery, Eg java server node s1,java client node jc1 and cpp client node cpp1 Sometimes when cpp1 restart ,s1 and jc1 will throw this exception many times Failed to process selector key [ses=GridSelectorNioSessionImpl And cpp1 with have many messages

Re: Failed to read data from remote connection

2018-11-27 Thread wangsan
As I restart cpp client many times concurrently ,may be zkcluster(ignite) has some node path has been closed. >From cpp client logs , I can see zkdiscovery watch 44 first,but the node has been closed

Failed to read data from remote connection

2018-11-27 Thread wangsan
When client (c++ node) restart mulit times, The server and other client will throw this excption ERROR o.a.i.s.c.tcp.TcpCommunicationSpi - Failed to read data from remote connection (will wait for 2000ms). org.apache.ignite.IgniteCheckedException: Failed to select events on selector.

Re: Why GridDiscoveryManager onSegmentation use StopNodeFailureHandler?

2018-11-27 Thread wangsan
Can I config the FailureHandler when segment error happen? The IgniteConfiguration FailureHandler only used on other exception(error) happens? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Why GridDiscoveryManager onSegmentation use StopNodeFailureHandler?

2018-11-27 Thread wangsan
Can I use LifecycleEventType.AFTER_NODE_STOP to stop jvm? if segment event happens? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Failed to read data from remote connection

2019-01-11 Thread wangsan
Yeath, set a larger MaxDirectMemorySize . But, I am afraid of when nodes size be more larger.The directmemory will be larger with node sizes. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

NPE when start

2019-01-11 Thread wangsan
When a client node start with zk discovery and persistence enabled,Some null point exceptions will be throw (when the node start on a new machine ) The exception traces as follows: 12:26:03.288 [zk-172_22_29_108_SEARCH_NODE_8100-EventThread] ERROR o.a.i.i.p.c.GridContinuousProcessor - Failed to

RE: Failed to read data from remote connection

2019-01-29 Thread wangsan
When check connections,Many nio socker will be create(one socker per node) ,Then direct memory will grow up with the node count? -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Why GridDiscoveryManager onSegmentation use StopNodeFailureHandler?

2019-01-29 Thread wangsan
Thanks! The IgniteConfiguration.segmentationPolicy RESTART_JVM would be a little misleading. Exit java with some exit code ,The java application will ignore the exit code. -- Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: ZookeeperDiscovery block when communication error

2019-01-29 Thread wangsan
Thank you! When I use zk discovery.I find many nodes in zookeeper path /jd/ . In my opinion.When new node join,Then a new /jd/ child node will be created,When the node join the cluster success,the /jd/ path will be removed.But in my cluster,That will be many remnant /jd/ nodes. -- Sent from:

zk connect loss

2019-05-23 Thread wangsan
13:10:06.119 [main] ERROR - Failed to resolve default logging config file: config/java.util.logging.properties 13:10:37.097 [main] ERROR o.a.i.s.d.z.internal.ZookeeperClient - Operation failed with unexpected error, connection lost: org.apache.zookeeper.KeeperException$ConnectionLossException:

ignite multi thread stop bug

2018-08-28 Thread wangsan wang
In my test case:see the attachment.I can reproduce the bug 1. testStartServer 2. testMultiClient 3. close testMultiClient 4. then testStartServer will Failed to wait for partition map exchang I need update cache in DiscoveryEvent. how to fix this?

Re: ignte cluster hang with GridCachePartitionExchangeManager

2018-08-27 Thread wangsan wang
About question 2, debug level like this: I start a node,then b,c,d,e,f nodes in mulitithread. then close them all. in the debugs log, A server latch created with participantsSize=5 but only one countdown .then latch will be hang. simple logs is: >>> ++ > >>> Topology snapshot. >