On 07/10/2012 02:36 AM, Brock Noland wrote:
If you ran the workload with file channel and then took 10 thread
dumps I think we'd have enough to understand what is going on.
Brock
I've taken some dumps and you can find them here:
http://people.apache.org/~juhanic/ca-flume-fc-dumps.tar.gz
I also included a png from visualvm's thread visualization where you can
confirm that the source is constantly busy(trying to get stuff into the
file channel), while the 5 sinks are pretty idle. Let me know if there's
anything else I can provide
On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly
<[email protected]> wrote:
It is currently pushing only 10 events per second or so(roughly 250 bytes
per event). This is with datadir/checkpoint on the same directory. Of course
the fact that there is a tail process running and that tomcat is also
writing out logs is without a doubt compounding the problem somewhat.
I haven't taken a serious look at thread dumps of the file channel since I
don't have a thorough understanding of it. However analysis has involved
trying varying numbers of sinks(no throughput difference) and replacing with
memory channel(which easily handles the 650 ish requests per second we have
per server for the particular api, no problems even with a single sink).
Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will
have an average latency of 4.16ms, so for alternating seeks between the
checkpoint and the data dir, if each of those writes happens in order,
you're already limited to best case of barely more than 100 events per
second. Our experience so far has shown it to be significantly less.
I do believe that batching a bunch of puts or takes with a single commit
together as two seeks followed by writes(or one if we can only use a single
storage file) could give significant returns when paired with a batching
sink/source(which many already do... Requesting multiple items at a time).
If there is any specific data you would like I would be happy to try and
provide it.
On 07/09/2012 05:22 PM, Brock Noland wrote:
On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly
<[email protected]> wrote:
- Intended setup with flume was a file channel connected to an avro sink.
With only a single disk available, it is extremely slow. JDBC channel is
also extremely slow, and MemoryChannel will fill up and start refusing puts
as soon as a network issue comes up.
Have you taken a few thread dumps or done other analysis? When you say
"extremely slow" what do you mean? Configured for no dataloss FileChannel is
going to be doing a lot of fsync'ing so I am not surprised it's slow. That
is a property of disks not FileChannel. I think we should use group commit
but that shouldn't make it 10x faster.
Brock