Ignite failing catastrophically

Chris Berry Sat, 03 Jun 2017 10:48:05 -0700

Hi,

I have a big problem. Ignite is failing catastrophically for me.

This is the scenario;

We start a Cluster of 15 Ignite Server Nodes.
These are initially empty.
Then some Kafka feeds are enabled that streams data into 4 independent
caches -- simultaneously (using DataStreamers)
Each cache is configured with 1 primary and 2 backups – and as a PARTITIONED
cache.
These attempt to load ~0.5M entries into each cache.
These Kafka feeds are streamed from a Client Node on 4 Threads into the
caches

Almost always a Node will fail during this operation.
And this will lead to a catastrophic, cascading failure of the entire
Cluster.

But on the failing Nodes, there is no information whatsoever as to what
caused the failure.
Nothing. No OOM. No Exceptions. Nothing.
The logs simply stop.
I have GC logging enabled, and there are no long pauses.
Thus, I am baffled

I have tried increasing memory.
I have tried increasing timeouts to ridiculous numbers;

```
COMPUTE_TASK_TIMEOUT=5000
DISCOVERY_ACK_TIMEOUT=30000
DISCOVERY_JOIN_TIMEOUT=120000
DISCOVERY_MAX_ACK_TIMEOUT=37000
DISCOVERY_NETWORK_TIMEOUT=120000
FAILURE_DETECTION_TIMEOUT=120000
IGNITE_LOG_LEVEL=INFO
IGNITE_LONG_OPERATIONS_DUMP_TIMEOUT=200000
IGNITE_QUIET=false
```

But nothing helps.

What can I do to get better information out of Ignite??
It is basically failing silently.

Is there some tuning parameters that I am missing?
I would be happy to supply further config information.

This is with Ignite 2.0.0

We have invested quite a bit of effort to get Ignite running for our
application.
And this is a show-stopper for us.
NOTE: this does not happen with the smaller feeds that we have in our dev
environment.

Thanks,
-- Chris

--
View this message in context:
http://apache-ignite-users.70518.x6.nabble.com/Ignite-failing-catastrophically-tp13357.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Ignite failing catastrophically

Reply via email to