RE: Question about data distribution

Stanislav Lukyanov Mon, 29 Jan 2018 01:13:35 -0800

AFAIU you simultaneously have about 1 million entries which correspond to 5-10 
groups (measurements).
Is that correct?
If so, that might be the reason of the distribution that you see. Even though 
you have a lot of entries,
you only have a few affinity mapped groups. Ignite tries keep all entries of 
the same measurement together,
and each measurement has ~50% chance to end up on the first or the second node. 
For 5 measurements, it results
in a ~3% chance that all data will be stored on a single node.
To have a better distribution among the cluster, try to restructure your data 
to have more distinct values
of the affinity mapped IDs. For example, you could split each measurement in, 
say, 1k or 10k batches, assign each
batch an ID and make that batch ID @AffinityKeyMapped instead of the 
measurementId.

Thanks,
Stan

From: svonn
Sent: 27 января 2018 г. 15:46
To: [email protected]
Subject: RE: Question about data distribution

Hi!

My class for my keys looks like this.

private String deviceId;

    @AffinityKeyMapped
    private long measurementId;

    private long timestamp;

    public IgniteKey(String deviceId, long measurementId, long timestamp) {
        this.deviceId = deviceId;
        this.measurementId = measurementId;
        this.timestamp = timestamp;

    }

One device can have multiple measurements, but any calculation only requires
other entries from the same measurement as of now, thus only the
measurementId should be relevant.

One measurement contains 100k - 200k entries in one stream, and 500-1000 in
the other stream. Both streams use the same class for keys.

Whenever a new measurementId arrives I'm doing some output on the node it's
being processed on - I've had following case:
Measurement 1 (short M1) -> node1
M2 -> node1
M3 -> node2
M4 -> node1
M5 -> node1
M6 -> node1

I expected that even M2 will already be placed on node2 - however,
performance wise, I don't think either node is close to it's limit, I'm not
sure if that also relevant.
Due to the 5min expiry policy I can end up with one node having ~1 million
cache entries while the other one has 0.

- svonn

--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

RE: Question about data distribution

Reply via email to