I've been having problems recently with nodes crashing out of a running,
18-node cluster. I managed to catch the following error log out of one of
them. Unfortunately I did not manage to grab the core dump before the AWS
instance was destroyed. I'll try to grab one if I can.
This application in
Also, I probably should have mentioned this earlier but we're not using WAL
or any disk persistence. So everything should be in-memory, and generally
on-heap. I think that makes it less likely that we were blocked on plain
throughput of some hardware or virtual-hardware.
--
Sent from:
(OT: Sorry about the duplicate posts, for some reason Nabble was refusing to
show me new posts so I thought my earlier ones had been lost.)
>Why did you decide, that cluster is deadlocked in the first place?
Because all of the Datastreamer threads were stuck waiting on locks, and no
progress was
Ah, I had not thought of that, thanks.
Interestingly, going to a smaller cluster seems to have worked around the
problem. We were running a 44-node cluster using 3 backups of the data.
Switching to two separate 22-node clusters, each with 1 backup, seems to
work just fine. Is there some limit to
@DaveHarvey, I'll look at that tomorrow. Seems potentially complicated, but
if that's what has to happen we'll figure it out.
Interestingly, cutting the cluster to half as many nodes (by reducing the
number of backups) seems to have resolved the issue. Is there a guideline
for how large a
The keys are hashed so in theory they should be distributed relatively
evenly. I traced the logic to do the hashing once and it seemed ok, but it's
pretty complicated so I won't claim to know it that well. We use UUIDs and
it does seem to distribute pretty much evenly.
Are you sure all the nodes
Non-heap memory is different than off-heap memory. Non-heap is (roughly
speaking) memory that the JVM itself uses. Off-heap is what Ignite is using
for storage off the heap. So you're probably not looking at what you think
you're looking at.
--
Sent from:
You're probably just running out of memory, though if you examine the
stacktrace it may tell you if you're running out of heap or off-heap memory.
If there's a call to Unsafe.something() in there, it's probably off-heap.
Otherwise it's probably on-heap.
You do seem to be configuring only a 3 GB
StreamTransformer does an invoke() pretty much exactly like what I'm doing,
so that would not seem to change anything.
https://github.com/apache/ignite/blob/master/modules/core/src/main/java/org/apache/ignite/stream/StreamTransformer.java#L50-L53
I may try using a put(), but since I need to
Hi Denis,
It was not clear to me that we could do the update from within the
StreamReceiver without some sort of cache operation. Would we just use the
CacheEntry.setValue() method to do that? Something roughly like the
following?
Thanks!
public void receive(IgniteCache cache,
Collection>
That does seem to be what's happening, but we're only invoke()'ing on keys
that were passed into receive(), so that should not require going off-box.
Right?
Here's the relevant code...
@Override
public void receive(IgniteCache cache,
Collection> newEntries) throws IgniteException {
for
Just found a bunch of these in my logs as well. Note this is showing
starvation in the system threadpool, not the datastreamer threadpool, but
perhaps they're related?
[2018-06-28T17:39:55,728Z](grid-timeout-worker-#23)([]) WARN - G - >>>
Possible starvation in striped pool.
Thread name:
Also...
>What you showed that the stream receiver called invoke() and did not get an
answer, not a deadlock.
It's not that I'm getting back a null, it's that all the threads are blocked
waiting on the invoke() call, and no progress is being made. That sounds a
lot like a deadlock. I guess you
>our a stream receiver called invoke() and that in turn did another invoke,
which was the actual bug.
So Ignite's invoke() implementation called itself?
>It was helpful when we did the invoke using a custom thread pool,
I'm not sure I understand the concept here. Is the idea to have an
Thanks Dave. I am using Ignite v2.4.0. Would a newer version potentially
help?
This problem seems to come and go. I didn't hit it for a few days, and now
we've hit it on two deployments in a row. It may be some sort of timing or
external factor that provokes it. The most recent case we hit the
In our case we're only using the receiver as you describe, to update the key
that it was invoked for. Our actual use case is that the incoming stream of
data sometimes sends us old data, which we want to discard rather than
cache. So the StreamReceiver examines the value already in the cache and
Hi Stan,
Thanks for taking a look. I'm having trouble finding anywhere that it's
documented what I can or can't call inside a receiver. Is it just
put()/get() that are allowed?
Also, I noticed that the default StreamTransformer implementation calls
invoke() from within a receiver. So is that
We've run into a problem recently where it appears our cache is deadlocking
during loading. What I mean by "loading" is that we start up a new cluster
in AWS, unconnected to any existing cluster, and then shove a bunch of data
into it from Kafka. During this process it's not taking any significant
Hi,
Thanks for the reply. We have enabled onHeap cache, and I know there's no
way to explicitly disable off heap storage. But if I set the default
DataRegion to zero size will that /effectively/ disable off-heap storage for
all of our data? And would doing that cause anything else to fail?
I'm working on a project where we're using Ignite primarily as a cache for
large & complex objects that we need in their entirety. Based on reading
elsewhere it seems like our performance would be best if we didn't use
off-heap storage at all. To that end, would it make sense (or even work) for
me
20 matches
Mail list logo