Re: Transaction Throughput in Data Streamer

2018-08-13 Thread dkarachentsev
Hi,

It looks like the most of the time transactions in receiver are waiting for
locks. Any lock adds serialization for parallel code. And in your case I
don't think it's possible to tune throughput with settings, because ten
transactions could wait when one finish. You need to change algorithm. 

The most effective way would be to stream data with DataStreamer with
disabled allowOverride and without any transactions. You need to stream data
independently if it's possible, avoid serial code and non-local cache
reads/writes.

Thanks!
-Dmitry



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Transaction Throughput in Data Streamer

2018-08-09 Thread Dave Harvey
We are trying to load and transform a large amount of data using the
IgniteDataStreamer using a custom StreamReceiver.We'd like this to run
a lot faster, and we cannot find anything that is close to saturated,
except the data-streamer threads, queues.   This is 2.5, with Ignite
persistence, and enough memory to fit all the data.

I was looking to turn some knob, like the size of a thread pool, to
increase the throughput, but I can't find any bottleneck.   If I turn up
the demand, the throughput does not increase, and the per transaction
latency increases. This would indicate a bottleneck somewhere.

The application has loaded about 900 million records of type A at this
point, and now we would like to load 2.5B records of type B.Records of
type A have a key and a unique ID.Records of type B have a different
key type, plus a foreign field that is A's unique ID.   The key we use in
ignite for record B is (B's key, A's key as affinity key). We also
maintain caches to map A's ID back to its key, and something similar for B.

For each record the stream receiver starts a pessimistic transaction,we
will end up with 1 local gets and 2-3 gets with no affinity (i.e. 50% local
on two nodes), and 2-4 puts, before we commit the transaction.  (FULL_SYNC
caches). There are a several fields with indices.

I've simplified this down to two nodes, with 4 cache caches each with one
backup, all with WAL LOGGING disabled.  The two nodes have 256GB of memory
and 32 CPUs and local SSDs that are unmirrored (i3.8xlarge on AWS). The
network is supposed to be 10 Gb.   The dataset is basically in memory, and
with the WAL disabled there is very little I/O.

The WAL logging disable only pushed the transaction rate from about 1750
to about 2000 TPS.

The CPU doesn't get above 20%, the network bandwidth is  only about 6MB/s
from each node and only about 1500 packets per second per node.   The read
wait time on the SSDs  is only enough to lock up a single thread, and there
are no writes except during checkpoints.

When I look at thread dumps, there is no obvious bottleneck except for the
Datastreamer threads.  Doubling the number of DataStreamer threads from
current 64 to 128 has no effect on throughput.

Looking via MXbeans, where I have a fix for IGNITE-7616, the DataStreamer
pool is saturated.   The "Striped Executor" is not.  With the WAL enabled,
the "StripedExecutor" shows some bursty load, when disabled the active
threads are queue low.  The work is distributed across the StripedExecutor
threads.   The nonDataStreamer  thread pools all frequently go to 0 active
threads, while the DataStreamer pool stays backed up.

With the WAL on with 64 DataStreamer threads, there tended to be able 53
"Owner transactions" on the node.

A snapshot of transactions outstanding follows.

Is there another place to look?   The DS threads tend top be waiting on
futures,  and the other threads are consistent with the relatively

THanks
-DH

f0a49c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [dae1a619-4886-4001-8ac5-6651339c67b7
[ip-172-17-0-1.ec2.internal, ip-10-32-98-209.ec2.internal]], DURATION:
104

33549c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [6d3f06d6-3346-4ca7-8d5d-b5d8af2ad12e
[ip-172-17-0-1.ec2.internal, ip-10-32-97-243.ec2.internal],
dae1a619-4886-4001-8ac5-6651339c67b7 [ip-172-17-0-1.ec2.internal,
ip-10-32-98-209.ec2.internal]], DURATION: 134

b0949c53561--08a9-7ea9--0002=ACTIVE, NEAR, DURATION: 114

2ca49c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [6d3f06d6-3346-4ca7-8d5d-b5d8af2ad12e
[ip-172-17-0-1.ec2.internal, ip-10-32-97-243.ec2.internal]], DURATION:
104

96349c53561--08a9-7ea9--0002=PREPARED, NEAR, DURATION: 134

9ca49c53561--08a9-7ea9--0002=ACTIVE, NEAR, DURATION: 104

28f39c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [dae1a619-4886-4001-8ac5-6651339c67b7
[ip-172-17-0-1.ec2.internal, ip-10-32-98-209.ec2.internal]], DURATION:
215

a2649c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [dae1a619-4886-4001-8ac5-6651339c67b7
[ip-172-17-0-1.ec2.internal, ip-10-32-98-209.ec2.internal]], DURATION:
124

e7849c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [6d3f06d6-3346-4ca7-8d5d-b5d8af2ad12e
[ip-172-17-0-1.ec2.internal, ip-10-32-97-243.ec2.internal]], DURATION:
114

06849c53561--08a9-7ea9--0002=ACTIVE, NEAR, DURATION: 114

89849c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [6d3f06d6-3346-4ca7-8d5d-b5d8af2ad12e
[ip-172-17-0-1.ec2.internal, ip-10-32-97-243.ec2.internal]], DURATION:
114

35549c53561--08a9-7ea9--0002=ACTIVE, NEAR, DURATION: 134

f0449c53561--08a9-7ea9--0002=PREPARING, NEAR,
PRIMARY: [dae1a619-4886-4001-8ac5-6651339c67b7
[ip-172-17-0-1.ec2.internal, ip-10-32-98-209.ec2.internal]], DURATION:
134