Can you put these issues you found on a jira here: 
https://issues.apache.org/jira/browse/FLUME. If this is a real issue, we should 
fix it. Ideally the sink should reconnect to a broken HDFS, but probably only 
after the initial connection. I am not sure what happens if the HDFS connection 
fails.
 



Thanks,
Hari


On Friday, October 4, 2013 at 8:09 AM, DSuiter RDX wrote:

> I can see that being an issue - hopefully your HDFS never hiccups, but if it 
> does, or if you need to stop it, it seems like restarting the agent is the 
> only way to recover...
> 
> As a workaround, you may be able to set up a file channel, and then maybe 
> some kind of trigger script to restart them if the HDFS service bounces? Just 
> throwing spaghetti there... 
> 
> Have you explored Kafka as an alternative? I haven't gone deeply into it, but 
> I know some people have found it to be better for their design than Flume.
> 
> Well, hopefully you get the answers you need. If you rewrite the HDFS sink 
> with this built-in, I'm sure the project will be interested! 
> 
> Devin Suiter
> Jr. Data Solutions Software Engineer
> 
> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> Google Voice: 412-256-8556 | www.rdx.com (http://www.rdx.com/) 
> 
> On Fri, Oct 4, 2013 at 10:42 AM, David Sinclair 
> <[email protected] (mailto:[email protected])> 
> wrote:
> > Thanks Devin. I have looked at the source I can absolutely say for certain 
> > that the connection is never re-established because there is no code that 
> > detects that type of error. 
> > 
> > What I was looking for from the devs was confirmation on my findings and 
> > any work arounds besides writing my own HDFS Sink. 
> > 
> > Not having this recovery gracefully is a pain and may prevent us from using 
> > flume. 
> > 
> > 
> > On Fri, Oct 4, 2013 at 9:21 AM, DSuiter RDX <[email protected] 
> > (mailto:[email protected])> wrote:
> > > David,
> > > 
> > > In experimenting with the file_roll sink for local logging, I noticed 
> > > that the file it wrote to was created when the agent starts. If you start 
> > > the agent, then remove the file, and attempt to write, there is no new 
> > > file created. Perhaps HDFS sink is similar, in that when the sink starts, 
> > > the destination is established, and then if that file chain is broken, 
> > > Flume cannot gracefully detect and correct that. It may have something to 
> > > do with how the sink is looking for the target? I'm not a developer for 
> > > Flume, but, that is my observed behavior on file roll. I am working 
> > > through kinks in hdfs sink with remote TCP logging from rsyslog right 
> > > now...maybe I will have some more insight for you in a few days... 
> > > 
> > > Devin Suiter
> > > Jr. Data Solutions Software Engineer
> > > 
> > > 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212
> > > Google Voice: 412-256-8556 (tel:412-256-8556) | www.rdx.com 
> > > (http://www.rdx.com/) 
> > > 
> > > 
> > > On Fri, Oct 4, 2013 at 9:08 AM, David Sinclair 
> > > <[email protected] (mailto:[email protected])> 
> > > wrote:
> > > > Anyone?
> > > > 
> > > > This is what I am seeing for the scenarios I asked, but wanted 
> > > > confirmation from devs on expected behavior. 
> > > > HDFS isn't available before ever trying to create/write to a file  - 
> > > > continually tries to create the file and finally succeeds when the 
> > > > cluster is available. 
> > > > HDFS becomes unavailable after already creating a file and starting to 
> > > > write to it - the writer looses the connection, but even after the 
> > > > cluster is available again it never re-establishes a connect. Data loss 
> > > > occurs since it never recovers
> > > > HDFS is unavailable when trying to close a file - suffers from same 
> > > > problems as above
> > > > 
> > > > 
> > > > 
> > > > 
> > > > On Tue, Oct 1, 2013 at 11:04 AM, David Sinclair 
> > > > <[email protected] 
> > > > (mailto:[email protected])> wrote:
> > > > > Hi all,
> > > > > 
> > > > > I have created an AMQP Source that is being used to feed an HDFS 
> > > > > Sink. Everything is working as expected, but I wanted to try out some 
> > > > > error scenarios.  
> > > > > 
> > > > > After creating a file in HDFS and starting to write to it I shutdown 
> > > > > HDFS. I saw the errors in the log as I would expect, and after the 
> > > > > configured roll time tried to close the file. Since HDFS wasn't 
> > > > > running it wasn't able to do so. I restarted HDFS in hope that it 
> > > > > would try the close again but it did not.  
> > > > > 
> > > > > Can someone tell me expected behavior under the following scenarios?
> > > > > 
> > > > > HDFS isn't available before ever trying to create/write to a file
> > > > > HDFS becomes unavailable after already creating a file and starting 
> > > > > to write to it
> > > > > HDFS is unavailable when trying to close a file
> > > > > 
> > > > > I'd also be happy to contribute the AMQP source. I wrote the old 
> > > > > version for the original flume
> > > > > 
> > > > > 
> > > > > https://github.com/stampy88/flume-amqp-plugin/
> > > > > 
> > > > > Let me know if you'd be interested and thanks for the answers.
> > > > > 
> > > > > dave 
> > > 
> > 
> 

Reply via email to