Out of interest, there is a JIRA for graceful shutdown - FLUME-1318. Please add your design thoughts in JIRA
--Mubarak On Jul 9, 2012, at 10:36 AM, Brock Noland wrote: > If you ran the workload with file channel and then took 10 thread > dumps I think we'd have enough to understand what is going on. > > Brock > > On Mon, Jul 9, 2012 at 11:49 AM, Juhani Connolly > <[email protected]> wrote: >> It is currently pushing only 10 events per second or so(roughly 250 bytes >> per event). This is with datadir/checkpoint on the same directory. Of course >> the fact that there is a tail process running and that tomcat is also >> writing out logs is without a doubt compounding the problem somewhat. >> >> I haven't taken a serious look at thread dumps of the file channel since I >> don't have a thorough understanding of it. However analysis has involved >> trying varying numbers of sinks(no throughput difference) and replacing with >> memory channel(which easily handles the 650 ish requests per second we have >> per server for the particular api, no problems even with a single sink). >> >> Since you say there's heavy fsyncing, and with 7200rpm disks, each seek will >> have an average latency of 4.16ms, so for alternating seeks between the >> checkpoint and the data dir, if each of those writes happens in order, >> you're already limited to best case of barely more than 100 events per >> second. Our experience so far has shown it to be significantly less. >> >> I do believe that batching a bunch of puts or takes with a single commit >> together as two seeks followed by writes(or one if we can only use a single >> storage file) could give significant returns when paired with a batching >> sink/source(which many already do... Requesting multiple items at a time). >> >> If there is any specific data you would like I would be happy to try and >> provide it. >> >> >> On 07/09/2012 05:22 PM, Brock Noland wrote: >> >> On Mon, Jul 9, 2012 at 8:51 AM, Juhani Connolly >> <[email protected]> wrote: >>> >>> - Intended setup with flume was a file channel connected to an avro sink. >>> With only a single disk available, it is extremely slow. JDBC channel is >>> also extremely slow, and MemoryChannel will fill up and start refusing puts >>> as soon as a network issue comes up. >> >> >> Have you taken a few thread dumps or done other analysis? When you say >> "extremely slow" what do you mean? Configured for no dataloss FileChannel is >> going to be doing a lot of fsync'ing so I am not surprised it's slow. That >> is a property of disks not FileChannel. I think we should use group commit >> but that shouldn't make it 10x faster. >> >> Brock >> >> >> > > > > -- > Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
