Perfect. Again, thank you so much for your time. :) The timeout increase bought be some time, but it still ended up with the Exception. I love the multiple sinks idea...I should have thought of that :)
Chris On Mon, Feb 4, 2013 at 8:22 PM, Juhani Connolly < juhani_conno...@cyberagent.co.jp> wrote: > Hey > > > On 02/02/2013 01:40 AM, Chris Neal wrote: > > Thanks for the help Juhani :) I'll take a look with Ganglia and see what > things look like. > > Any thoughts on keeping the ExecSource.batchSize, > MemoryChannel.transactionCapacity, AvroSink.batch-size, and > HDFSSink.batchSize the same? > > It's not really important, so long as the avro batch size is less than > or equal to the channel transaction capacity. The HDFS sinks batch size is > independent of them both. > > > I looked at the MemoryChannel code, and noticed that there is a timeout > parameter passed to doCommit(), where the execption is being thrown. Just > for fun, I increased it from the default to 10 seconds, and now things are > running smoothly with the same config as before. It's been running for > about 24 hours now. A step in the right direction anyway! :) > > > If that fixed it, it sounds like your data is just very bursty and > sometimes gets fed in faster than it's drained out. The solution to that > would be either to enlarge your temporary buffer(the mem channel), to > throttle the incoming data(probably not possible) or to increase drain > speed(more sinks running in parallel) > > > Thanks again. > Chris > > On Thu, Jan 31, 2013 at 8:12 PM, Juhani Connolly < > juhani_conno...@cyberagent.co.jp> wrote: > >> Hi Chris, >> >> The most likely cause of that error is that the sinks are draining >> requests slower than your sources are feeding fresh data. Over time it will >> fill up the capacity of your memory channel, which will then start refusing >> additional put requests. >> >> You can confirm this by connecting with jmx or ganglia. >> >> If the write is extremely bursty, it's possible that it's just >> temporarily going over the sink consumption rate, and increasing the >> channel capacity could work. Otherwise, increasing the avro batch size, or >> adding additional avro sinks(more threads) may also help. I think that >> setting up ganglia monitoring and looking at the incoming and outgoing >> event counts and channel fill states helps a lot in diagnosing these >> bottlenecks, you should look into doing that. >> >> >> On 02/01/2013 02:01 AM, Chris Neal wrote: >> >> Hi all. >> >> I need some thoughts on sizing/tuning of the above (common) route in >> FlumeNG to maximize throughput. Here is my setup: >> >> *Source JVM (ExecSource/MemoryChannel/AvroSink):* >> -Xmx4g >> -Xms4g >> -XX:MaxDirectMemorySize=256m >> >> Number of ExecSources in config: 124 (yes, it's a ton. Can't do >> anything about it :) The write rate to the source files is fairly fast and >> bursty. >> >> ExecSource.batchSize = 1000 >> (so, when all 124 tail -F instances get 1000 events, they all dump to the >> memory channel) >> >> MemoryChannel.capacity = 1000000 >> MemoryChannel.transactionCapacity = 1000 >> (somewhat unclear on what this is. Docs say "The number of events stored >> in the channel per transaction", but what is a "transaction" to a >> MemoryChannel?) >> >> AvroSink.batchSize = 1000 >> >> *Destination JVM (AvroSource/FileChannel/HDFSSink)* >> (Cluster of two JVMs on two servers, each configured the same as per >> below) >> -Xms=2g >> -Xmx=2g >> -XX:MaxDirectMemorySize is not defined, so whatever the default is >> >> AvroSource.threads = 64 >> FileChannel.transactionCapacity = 1000 >> FileChannel.capacity = 32000000 >> HDFSSink.batchSize = 1000 >> HDFSSink.threadPoolSize = 64 >> >> With this configuration, in about 5 minutes, I get the common Exception: >> >> "Space for commit to queue couldn't be acquired Sinks are likely not >> keeping up with sources, or the buffer size is too tight" >> >> on the Source JVM. It is no where near the 4g max, rather only at >> about 2.5g. >> >> I'm wondering about the logic of having all the batch sizes/transaction >> sizes 1000. My thought was that would keep from fragmenting the transfer >> of data, but maybe that's flawed? Should the sizes be different? >> >> Also curious about increasing the MaxDirectMemorySize to something >> larger than 256MB? I tried removing it altogether in my Source JVM (which >> makes the size unbounded), but that didn't seem to make a difference. >> >> I'm having some trouble figuring out where the backup is happening, and >> how to open up the gates. :) >> >> Thanks in advance for any suggestions. >> Chris >> >> >> > >