Hi team,

I performed a simple check of CAP theorem on an Apache Ignite cluster and
observed a few things related to tolerance and availability of the system.

Here are the steps
1) Created a cluster of three Ignite servers - S1, S2, S3, say S1 is started
first so it is the coordinator.
2) Topology version : 3
3) 13 clients (C1 to C13) connect to the cluster say, sporadically 
4) Topology version: 16 = 3+13

Let's say the clients start writing into their respective distinct caches.
After 7 or 8 minutes into this activity, I kill S2 by doing a kill -9. What
I have observed is that I start getting the following errors for any cache
writes occurring afterwards

50008_116305_11951_2_12472_978_1_0_2 javax.cache.CacheException: class
org.apache.ignite.IgniteCheckedException: Some of DataStreamer operations
failed [failedCount=1]
        at
org.apache.ignite.internal.processors.cache.GridCacheUtils.convertToCacheException(GridCacheUtils.java:1337)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1287)
        at
org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1388)
        at com.abc.datagrid.DataGridClient.writeAll(DataGridClient.java:209)

Therefore the observation is that it is not partition or fault tolerant and
in such a situation, rest of the cluster does not seem available for
writing.

Can someone throw some light here ? I can share more logs.







--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to