Thanks much Jeff. This is exactly what I needed to know. Much appreciated. I have been experimenting with having multiple flows on the same agent just writing to different disks to improve the throughput as well.
On Mon, Oct 7, 2013 at 10:16 PM, Jeff Lord <jl...@cloudera.com> wrote: > Yes the file channel is designed to handle this and is what you should be > using. > You are also on the right track regarding sizing your file channel to > account for the number of events that could accumulate in the event that > your terminal sink is unable to complete transactions. With the amount of > data that you would like to buffer it will take a file channel somewhere > around 72GB. > So some other things you should consider here are the size of your hard > drives, the drain rate of a single sink on that channel once the terminal > destination is up again, durability in the event of a drive failure and so > on. For these reasons you may decide that you want to have a few agents on > separate hosts that can help to spread the load. > > Hope this is helpful. > > -Jeff > > > On Mon, Oct 7, 2013 at 6:54 AM, David Sinclair < > dsincl...@chariotsolutions.com> wrote: > >> I am using a AMQP Souce, so I don't know how changing to a JMS source >> would have any difference. >> >> I am concerned about the volume of data and the file channel. Even if I >> switched to JMS, my question would be the same. >> >> >> On Fri, Oct 4, 2013 at 4:46 PM, Hari Shreedharan < >> hshreedha...@cloudera.com> wrote: >> >>> Have you tried the JMS Source? It can pick up data directly into Flume. >>> >>> >>> Thanks, >>> Hari >>> >>> On Friday, October 4, 2013 at 11:59 AM, David Sinclair wrote: >>> >>> Hi, >>> >>> I have a question regarding the RollingFileSink and >>> SpoolingDirectorySource. I was trying to write everything from an AMQP >>> source to a file sink, then have the spooling directory source pick up >>> these files. This won't work as the files aren't immutable. >>> >>> If I use a File Channel to store the events between my source and sink, >>> is there a concern about the number of events in the channel if the sink is >>> unable to deliver said events? For example, I will be getting around 5K >>> messages/sec and the size is about 2K. So roughly 10MB a second. If the >>> sink is unable to deliver the messages for 2 hours, that would be 36 >>> million events in the channel. >>> >>> Is the file channel designed to handle this? Or should I have a file >>> sink in between. >>> >>> thanks >>> >>> dave >>> >>> >>> >> >