Can you put these issues you found on a jira here: https://issues.apache.org/jira/browse/FLUME. If this is a real issue, we should fix it. Ideally the sink should reconnect to a broken HDFS, but probably only after the initial connection. I am not sure what happens if the HDFS connection fails.
Thanks, Hari On Friday, October 4, 2013 at 8:09 AM, DSuiter RDX wrote: > I can see that being an issue - hopefully your HDFS never hiccups, but if it > does, or if you need to stop it, it seems like restarting the agent is the > only way to recover... > > As a workaround, you may be able to set up a file channel, and then maybe > some kind of trigger script to restart them if the HDFS service bounces? Just > throwing spaghetti there... > > Have you explored Kafka as an alternative? I haven't gone deeply into it, but > I know some people have found it to be better for their design than Flume. > > Well, hopefully you get the answers you need. If you rewrite the HDFS sink > with this built-in, I'm sure the project will be interested! > > Devin Suiter > Jr. Data Solutions Software Engineer > > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 > Google Voice: 412-256-8556 | www.rdx.com (http://www.rdx.com/) > > On Fri, Oct 4, 2013 at 10:42 AM, David Sinclair > <[email protected] (mailto:[email protected])> > wrote: > > Thanks Devin. I have looked at the source I can absolutely say for certain > > that the connection is never re-established because there is no code that > > detects that type of error. > > > > What I was looking for from the devs was confirmation on my findings and > > any work arounds besides writing my own HDFS Sink. > > > > Not having this recovery gracefully is a pain and may prevent us from using > > flume. > > > > > > On Fri, Oct 4, 2013 at 9:21 AM, DSuiter RDX <[email protected] > > (mailto:[email protected])> wrote: > > > David, > > > > > > In experimenting with the file_roll sink for local logging, I noticed > > > that the file it wrote to was created when the agent starts. If you start > > > the agent, then remove the file, and attempt to write, there is no new > > > file created. Perhaps HDFS sink is similar, in that when the sink starts, > > > the destination is established, and then if that file chain is broken, > > > Flume cannot gracefully detect and correct that. It may have something to > > > do with how the sink is looking for the target? I'm not a developer for > > > Flume, but, that is my observed behavior on file roll. I am working > > > through kinks in hdfs sink with remote TCP logging from rsyslog right > > > now...maybe I will have some more insight for you in a few days... > > > > > > Devin Suiter > > > Jr. Data Solutions Software Engineer > > > > > > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 > > > Google Voice: 412-256-8556 (tel:412-256-8556) | www.rdx.com > > > (http://www.rdx.com/) > > > > > > > > > On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair > > > <[email protected] (mailto:[email protected])> > > > wrote: > > > > Anyone? > > > > > > > > This is what I am seeing for the scenarios I asked, but wanted > > > > confirmation from devs on expected behavior. > > > > HDFS isn't available before ever trying to create/write to a file - > > > > continually tries to create the file and finally succeeds when the > > > > cluster is available. > > > > HDFS becomes unavailable after already creating a file and starting to > > > > write to it - the writer looses the connection, but even after the > > > > cluster is available again it never re-establishes a connect. Data loss > > > > occurs since it never recovers > > > > HDFS is unavailable when trying to close a file - suffers from same > > > > problems as above > > > > > > > > > > > > > > > > > > > > On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair > > > > <[email protected] > > > > (mailto:[email protected])> wrote: > > > > > Hi all, > > > > > > > > > > I have created an AMQP Source that is being used to feed an HDFS > > > > > Sink. Everything is working as expected, but I wanted to try out some > > > > > error scenarios. > > > > > > > > > > After creating a file in HDFS and starting to write to it I shutdown > > > > > HDFS. I saw the errors in the log as I would expect, and after the > > > > > configured roll time tried to close the file. Since HDFS wasn't > > > > > running it wasn't able to do so. I restarted HDFS in hope that it > > > > > would try the close again but it did not. > > > > > > > > > > Can someone tell me expected behavior under the following scenarios? > > > > > > > > > > HDFS isn't available before ever trying to create/write to a file > > > > > HDFS becomes unavailable after already creating a file and starting > > > > > to write to it > > > > > HDFS is unavailable when trying to close a file > > > > > > > > > > I'd also be happy to contribute the AMQP source. I wrote the old > > > > > version for the original flume > > > > > > > > > > > > > > > https://github.com/stampy88/flume-amqp-plugin/ > > > > > > > > > > Let me know if you'd be interested and thanks for the answers. > > > > > > > > > > dave > > > > > >
