another quick update on stuff:
since moving back to hdfs 0.20.2 (with hbase still at 0.92), we found
that while we made significant gains in throughput, that most of our
regionservers IPC threads were stuck somewhere in HWal.append(out of 50,
42 were in append, of which 20 were in sync), limiting throughput
despite significant free hardware resources.
Because the WAL writes of a single RS all go sequentially to one HDFS
file, we assumed that we could improve throughput by separating writes
to more WAL files and more HDs. To do this we ran multiple region
servers on each node.
The scaling wasn't linear(we were in no way increasing hardware, just
the number of regionservers), but we are now getting significantly more
throughput.
I would personally not say that this is a great approach to have to
take, it would generally be better to build more smaller servers which
will thus not limit themselves by trying to put a lot of data per server
through a single WAL file.
Of course there may be another solution to this that I'm not aware of?
If so I'd love to hear it.