Segfault in GridCacheEvictionManager.touch()

2019-02-18 Thread breischl
I've been having problems recently with nodes crashing out of a running, 18-node cluster. I managed to catch the following error log out of one of them. Unfortunately I did not manage to grab the core dump before the AWS instance was destroyed. I'll try to grab one if I can. This application in

Re: Deadlock during cache loading

2018-07-03 Thread breischl
Also, I probably should have mentioned this earlier but we're not using WAL or any disk persistence. So everything should be in-memory, and generally on-heap. I think that makes it less likely that we were blocked on plain throughput of some hardware or virtual-hardware. -- Sent from:

Re: Deadlock during cache loading

2018-07-02 Thread breischl
(OT: Sorry about the duplicate posts, for some reason Nabble was refusing to show me new posts so I thought my earlier ones had been lost.) >Why did you decide, that cluster is deadlocked in the first place? Because all of the Datastreamer threads were stuck waiting on locks, and no progress was

Re: Deadlock during cache loading

2018-07-02 Thread breischl
Ah, I had not thought of that, thanks. Interestingly, going to a smaller cluster seems to have worked around the problem. We were running a 44-node cluster using 3 backups of the data. Switching to two separate 22-node clusters, each with 1 backup, seems to work just fine. Is there some limit to

Re: Deadlock during cache loading

2018-07-01 Thread breischl
@DaveHarvey, I'll look at that tomorrow. Seems potentially complicated, but if that's what has to happen we'll figure it out. Interestingly, cutting the cluster to half as many nodes (by reducing the number of backups) seems to have resolved the issue. Is there a guideline for how large a

Re: OutOfMemoryError while streaming

2018-07-01 Thread breischl
The keys are hashed so in theory they should be distributed relatively evenly. I traced the logic to do the hashing once and it seemed ok, but it's pretty complicated so I won't claim to know it that well. We use UUIDs and it does seem to distribute pretty much evenly. Are you sure all the nodes

Re: OutOfMemoryError while streaming

2018-06-29 Thread breischl
Non-heap memory is different than off-heap memory. Non-heap is (roughly speaking) memory that the JVM itself uses. Off-heap is what Ignite is using for storage off the heap. So you're probably not looking at what you think you're looking at. -- Sent from:

Re: OutOfMemoryError while streaming

2018-06-29 Thread breischl
You're probably just running out of memory, though if you examine the stacktrace it may tell you if you're running out of heap or off-heap memory. If there's a call to Unsafe.something() in there, it's probably off-heap. Otherwise it's probably on-heap. You do seem to be configuring only a 3 GB

Re: Deadlock during cache loading

2018-06-29 Thread breischl
StreamTransformer does an invoke() pretty much exactly like what I'm doing, so that would not seem to change anything. https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53 I may try using a put(), but since I need to

Re: Deadlock during cache loading

2018-06-29 Thread breischl
Hi Denis, It was not clear to me that we could do the update from within the StreamReceiver without some sort of cache operation. Would we just use the CacheEntry.setValue() method to do that? Something roughly like the following? Thanks! public void receive(IgniteCache cache, Collection>

RE: Deadlock during cache loading

2018-06-29 Thread breischl
That does seem to be what's happening, but we're only invoke()'ing on keys that were passed into receive(), so that should not require going off-box. Right? Here's the relevant code... @Override public void receive(IgniteCache cache, Collection> newEntries) throws IgniteException { for

Re: Deadlock during cache loading

2018-06-28 Thread breischl
Just found a bunch of these in my logs as well. Note this is showing starvation in the system threadpool, not the datastreamer threadpool, but perhaps they're related? [2018-06-28T17:39:55,728Z](grid-timeout-worker-#23)([]) WARN - G - >>> Possible starvation in striped pool. Thread name:

RE: Deadlock during cache loading

2018-06-28 Thread breischl
Also... >What you showed that the stream receiver called invoke() and did not get an answer, not a deadlock. It's not that I'm getting back a null, it's that all the threads are blocked waiting on the invoke() call, and no progress is being made. That sounds a lot like a deadlock. I guess you

RE: Deadlock during cache loading

2018-06-28 Thread breischl
>our a stream receiver called invoke() and that in turn did another invoke, which was the actual bug. So Ignite's invoke() implementation called itself? >It was helpful when we did the invoke using a custom thread pool, I'm not sure I understand the concept here. Is the idea to have an

RE: Deadlock during cache loading

2018-06-28 Thread breischl
Thanks Dave. I am using Ignite v2.4.0. Would a newer version potentially help? This problem seems to come and go. I didn't hit it for a few days, and now we've hit it on two deployments in a row. It may be some sort of timing or external factor that provokes it. The most recent case we hit the

RE: Deadlock during cache loading

2018-06-22 Thread breischl
In our case we're only using the receiver as you describe, to update the key that it was invoked for. Our actual use case is that the incoming stream of data sometimes sends us old data, which we want to discard rather than cache. So the StreamReceiver examines the value already in the cache and

RE: Deadlock during cache loading

2018-06-22 Thread breischl
Hi Stan, Thanks for taking a look. I'm having trouble finding anywhere that it's documented what I can or can't call inside a receiver. Is it just put()/get() that are allowed? Also, I noticed that the default StreamTransformer implementation calls invoke() from within a receiver. So is that

Deadlock during cache loading

2018-06-21 Thread breischl
We've run into a problem recently where it appears our cache is deadlocking during loading. What I mean by "loading" is that we start up a new cluster in AWS, unconnected to any existing cluster, and then shove a bunch of data into it from Kafka. During this process it's not taking any significant

Re: Setting DefaultDataRegion to zero size

2018-05-23 Thread breischl
Hi, Thanks for the reply. We have enabled onHeap cache, and I know there's no way to explicitly disable off heap storage. But if I set the default DataRegion to zero size will that /effectively/ disable off-heap storage for all of our data? And would doing that cause anything else to fail?

Setting DefaultDataRegion to zero size

2018-05-23 Thread breischl
I'm working on a project where we're using Ignite primarily as a cache for large & complex objects that we need in their entirety. Based on reading elsewhere it seems like our performance would be best if we didn't use off-heap storage at all. To that end, would it make sense (or even work) for me