David, It was a mix. Our test pipeline is what would euphemistically be called "low-velocity" when it comes to data. When we experimented with rollInterval, we found a lot of lingering .tmp, but we did not have an idleTimeout set on that config IIRC, since we were testing parameters in isolation. I feel like we also accidentally tested the default roll parameters when we first started too, because we didn't realize the defaults are inclusive by default. However, I still have files that are something like 6 weeks old now, my test cluster VM has been rebooted many times in the interim, I have spun up dozens of different Flume agent configs in the weeks in between, and those files are still named .tmp and show 0 bytes. Like I said, I am sure I can run "hadoop fs -mv <name.avro.tmp> <name.avro> and that will change the name, I am just not sure that, without all the other parts of the Flume pipeline, that they would get properly closed in HDFS, especially because these are from tier 2 of an Avro tiered-ingest agent config. When I read about serialization/deserialization, it seems like StreamWriter not closing the stream correctly or exiting properly will cause issues. I guess I'll just give it a shot, since it's just junk data anyway.
Thanks again, *Devin Suiter* Jr. Data Solutions Software Engineer 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 Google Voice: 412-256-8556 | www.rdx.com On Mon, Nov 11, 2013 at 11:03 AM, Hari Shreedharan < [email protected]> wrote: > This is because like you said you have too many files open at the same > time. HDFS stream classes keep a pretty large buffer (this is HDFS client > code not Flume) which will be cleaned up when the file is closed. Meeting > maxOpenFiles to a smaller number is a good way to handle this. > > On Monday, November 11, 2013, David Sinclair wrote: > >> I forgot to mention that map is contained in the HDFSEventSink class. >> >> Devin, >> >> Are you setting a roll interval? I use roll intervals so the .tmp files >> were getting closed, even if they were idle. They were just never being >> removed from that hashmap. >> >> >> On Mon, Nov 11, 2013 at 10:10 AM, DSuiter RDX <[email protected]> wrote: >> >>> David, >>> >>> This is insightful - I found the need to place an idleTimeout value in >>> the Flume config, but we were not running out of memory, we just found out >>> that lots of unclosed .tmp files got left laying around when the roll >>> occurred. I believe these are registering as under-replicated blocks as >>> well - in my pseudo-distributed testbed, I have 5 under-replicated >>> blocks...when the replication factor for pseudo-mode is "1" - and so we >>> don't like them in the actual cluster. >>> >>> Can you tell me, in your research, have you found a good way to close >>> the .tmp files out so they are properly acknowledged by HDFS/BucketWriter? >>> Or is simply renaming them sufficient? I've been concerned that the manual >>> rename approach might leave some floating metadata around, which is not >>> ideal. >>> >>> If you're not sure, don't sweat it, obviously. I was just wondering if >>> you already knew and could save me some empirical research time... >>> >>> Thanks! >>> *Devin Suiter* >>> Jr. Data Solutions Software Engineer >>> 100 Sandusky Street | 2nd Floor | Pittsburgh, PA 15212 >>> Google Voice: 412-256-8556 | www.rdx.com >>> >>> >>> On Mon, Nov 11, 2013 at 10:01 AM, David Sinclair < >>> [email protected]> wrote: >>> >>>> Hi all, >>>> >>>> I have been investigating an OutOfMemory error when using the HDFS >>>> event sink. I have determined the problem to be with the >>>> >>>> WriterLinkedHashMap sfWriters; >>>> >>>> Depending on how you generate your file name/directory path, you can >>>> run out of memory pretty quickly. You need to either set the >>>> *idleTimeout* to some non-zero value or set the number of >>>> *maxOpenFiles*. >>>> >>>> The map keeps references to BucketWriter around longer than they are >>>> needed. I was able to reproduce this consistently and took a heap dump to >>>> verify that objects being kept around. >>>> >>>> I will update this Jira to reflect my findings >>>> >>>> >>>> https://issues.apache.org/jira/browse/FLUME-1326?jql=project%20%3D%20FLUME%20AND%20text%20~%20%22memory%20leak%22 >>>> >>>> dave >>>> >>> >>> >>
