Hi all,
I have been experimenting with ignite and have run into a problem
scaling up to larger clusters.
I am playing with only two different use cases, 1) a Hadoop MapReduce
accelerator 2) an in memory data grid (no secondary file system) being
accessed by frameworks using the HDFS
Everything works fine with a smaller cluster (8 nodes) but with a
larger cluster (64 nodes) it takes a couple of minutes for all the
nodes to register with the cluster(which would be ok) and mapreduce
jobs just hang and never return.
I've compiled the latest Ignite 1.4 (with ignite.edition=hadoop) from
source, and am using it with Hadoop 2.7.1 just trying to run things
like the pi estimator and wordcount examples.
I started with the config/hadoop/default-config.xml
I can't use multicast so I've configured it to use static IP based
discovery with just a single node/port range.
I've increased the heartbeat frequency to 10000 and that seemed to
help make things more stable once all the nodes do join the cluster.
I've also played with increasing both the socket timeout and the ack
timeout but that seemed to just make it take longer for nodes to
attempt to join the cluster after a failed attempt.
I have access to a couple of different clusters, we allocate resources
with slurm so I get a piece of a cluster to play with (hence the
no-multicast restriction). The nodes all have fast networks (FDR
InfiniBand) and a decent amount of memory (64GB-128GB) but no local
storage (or swap space).
As mentioned earlier, I disable the secondaryFilesystem.
Any advice/hints/example xml configs would be extremely welcome.
I also haven't been seeing the expected performance using the hdfs api
to access ignite. I've tried both using the hdfs cli to do some simple
timings of put/get and a little java program that writes then reads a
file. Even with small files (500MB) that should be kept completely in
a single node, I only see about 250MB/s for writes and reads are much
slower than that (4x to 10x). The writes are better than hdfs (our
hdfs is backed with pretty poor storage) but reads are much slower.
Now I haven't tried scaling this at all but with an 8 node ignite
cluster and a single "client" access a single file I would hope for
something closer to memory speeds. (if you would like me to split this
into another message to the list just let me know, I'm assuming the
cause it the same---I missed a required config setting ;-) )
Thanks in advance for any help,
Joe