Re: process failed - java.lang.OutOfMemoryError

Brock Noland Sat, 02 Mar 2013 09:31:30 -0800

Try turning on HeapDumpOnOutOfMemoryError so we can peek at the heap dump.


-- 
Brock Noland
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, March 1, 2013 at 5:57 PM, Denis Lowe wrote:

> process failed - java.lang.OutOfMemoryError
> 
> We observed the following error:
> 01 Mar 2013 21:37:24,807 ERROR 
> [SinkRunner-PollingRunner-DefaultSinkProcessor] 
> (org.apache.flume.sink.hdfs.HDFSEventSink.process:460)  - process failed
> java.lang.OutOfMemoryError
>         at org.apache.hadoop.io.compress.zlib.ZlibCompressor.init(Native 
> Method)
>         at 
> org.apache.hadoop.io.compress.zlib.ZlibCompressor.<init>(ZlibCompressor.java:222)
>         at 
> org.apache.hadoop.io.compress.GzipCodec$GzipZlibCompressor.<init>(GzipCodec.java:159)
>         at 
> org.apache.hadoop.io.compress.GzipCodec.createCompressor(GzipCodec.java:109)
>         at 
> org.apache.hadoop.io.compress.GzipCodec.createOutputStream(GzipCodec.java:92)
>         at 
> org.apache.flume.sink.hdfs.HDFSCompressedDataStream.open(HDFSCompressedDataStream.java:70)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.doOpen(BucketWriter.java:216)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.access$000(BucketWriter.java:53)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:172)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter$1.run(BucketWriter.java:170)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:143)
>         at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:170)
>         at 
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:364)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:729)
>         at 
> org.apache.flume.sink.hdfs.HDFSEventSink$2.call(HDFSEventSink.java:727)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> 
> Unfortunately the error does not state if it is because of lack of Heap, Perm 
> or Direct Memory?
> 
> Looking at the system memory we could see that we were using 3GB of 7GB (ie 
> less than half of the physical memory was used) 
> 
> Using VisualVM profiler we could see that we had not maxed out the Heap 
> Memory 75MB of 131MB (allocated)
> PermGen was fine 16MB of 27MB (allocated)
> 
> Buffer Usage is as follows: 
> Direct Memory:
> < 50MB (this gets freed after each GC)
> 
> Mapped Memory:
> count 9
> 144MB (always stays constant)
> 
> I'm assuming the -XX:MaxDirectMemorySize is for Direct Buffer Memory usage 
> NOT Mapped buffer Memory? 
> 
> The other thing we noticed was that after restart the flume process "RES" 
> size starts at around 200MB and then over a period of a week will grow up to 
> 3GB after which we observed the above error. 
> Unfortunately we cannot see where this 3GB of memory is being used when 
> profiled with VisualVM and JConsole (max heap size is set to 256MB) - there 
> definitely appears to be a slow memory leak?
> 
> Flume is the only process running on this server:
> 64bit Centos
> java version "1.6.0_27" (64bit)
> 
> The flume collector is configured with 8 file channels writing to S3 using 
> the HDFS sink. (8 upstream servers a pushing events to 2 downsteam 
> collectors) 
> 
> Each of the 8 channels/sinks is configured as follows:
> ## impression source
> agent.sources.impressions.type = avro
> agent.sources.impressions.bind = 0.0.0.0
> agent.sources.impressions.port = 5001
> agent.sources.impressions.channels = impressions-s3-channel
> ## impression  channel
> agent.channels.impressions-s3-channel.type = file
> agent.channels.impressions-s3-channel.checkpointDir = 
> /mnt/flume-ng/checkpoint/impressions-s3-channel
> agent.channels.impressions-s3-channel.dataDirs = 
> /mnt/flume-ng/data1/impressions-s3-channel,/mnt/flume-ng/data2/impressions-s3-channel
> agent.channels.impressions-s3-channel.maxFileSize = 210000000
> agent.channels.impressions-s3-channel.capacity = 2000000
> agent.channels.impressions-s3-channel.checkpointInterval = 300000
> agent.channels.impressions-s3-channel.transactionCapacity = 10000
> # impression s3 sink
> agent.sinks.impressions-s3-sink.type = hdfs
> agent.sinks.impressions-s3-sink.channel = impressions-s3-channel
> agent.sinks.impressions-s3-sink.hdfs.path = s3n://KEY:SECRET_KEY@S3-PATH
> agent.sinks.impressions-s3-sink.hdfs.filePrefix = 
> impressions-%{collector-host}
> agent.sinks.impressions-s3-sink.hdfs.callTimeout = 0
> agent.sinks.impressions-s3-sink.hdfs.rollInterval = 3600
> agent.sinks.impressions-s3-sink.hdfs.rollSize = 450000000
> agent.sinks.impressions-s3-sink.hdfs.rollCount = 0
> agent.sinks.impressions-s3-sink.hdfs.codeC = gzip
> agent.sinks.impressions-s3-sink.hdfs.fileType = CompressedStream
> agent.sinks.impressions-s3-sink.hdfs.batchSize = 100
> 
> I am using flume-ng 1.3.1 with the following parameters: 
> JAVA_OPTS="-Xms64m -Xmx256m -Xss128k -XX:MaxDirectMemorySize=256m 
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails 
> -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/mnt/logs/flume-ng/gc.log"
> 
> We have 2 collectors running and they both fail at pretty much the same time.
> 
> So from what i can see there appears to be a slow memory leak with the HDFS 
> sink, but have no idea how track this down or what alternate configuration i 
> can use to prevent this from happening again? 
> 
> Any ideas would be greatly appreciated?
>

Re: process failed - java.lang.OutOfMemoryError

Reply via email to