2017-11-07 18:22 GMT+01:00 Stack <[email protected]>:
> On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <[email protected]>
> wrote:
>
> > For others that run into similar issue, it turned out that the
> > OutOfMemoryError was thrown (and subsequently hidden) on the client side.
> > The error was caused by excessive direct memory usage in Java NIO's
> > bytebuffer caching (described here:
> > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > -Djdk.nio.maxCachedBufferSize=262144
> > allowed the application to complete.
> >
> >
> Suggestions for how to expose the client-side OOME Daniel? We should add
> note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and
> make sure the exception makes it out!)
>
Well I found the problem by adding printStackTrace to
AsyncProcess.createLog function, which was responsible for logging the
original OOME. This is not very elegant, and I wouldn't recommend adding it
to the official codebase, but the stack trace offers some hints:
java.io.IOException: com.google.protobuf.ServiceException:
java.lang.OutOfMemoryError: Direct buffer memory
at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:329)
at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:130)
at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:53)
at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
at
org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:727)
at
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: com.google.protobuf.ServiceException:
java.lang.OutOfMemoryError: Direct buffer memory
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:240)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:34142)
at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:128)
... 8 more
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Unknown Source)
at java.nio.DirectByteBuffer.<init>(Unknown Source)
at java.nio.ByteBuffer.allocateDirect(Unknown
Source)
at sun.nio.ch.Util.getTemporaryDirectBuffer(Unknown
Source)
at sun.nio.ch.IOUtil.write(Unknown Source)
at sun.nio.ch.SocketChannelImpl.write(Unknown
Source)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)
at
org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:169)
at java.io.BufferedOutputStream.write(Unknown
Source)
at java.io.DataOutputStream.write(Unknown Source)
at
org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
at
org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:921)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:874)
at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
... 11 more
This stack trace comes from cdh5.10.2 version, but the master branch is
sufficiently similar. So, depending on what we want to achieve, we could:
- just replace catch(Throwable e) in AbstractRpcClient.callBlockingMethod
with something more fine-grained and fail the application
- or forward OOME in callBlockingMethod, but add information about
maxCachedBufferSize,
also failing the application but suggesting possible corrective action to
the user
- or pass the error to the user, allowing the application to intercept it.
Not sure yet how to do that, but we would need to do something about the
connection becoming unusable after OOME, in case user decides to keep going.
What's your take?
> Thanks for updating the list,
> S
>
>
>
> > Yet another proof that correct handling of OOME is hard.
> > Thanks,
> > Daniel
> >
> > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <[email protected]>:
> >
> > > Thanks for the hints. I'll see if we can explicitly set
> > > MaxDirectMemorySize to a safe number.
> > > Thanks,
> > > Daniel
> > >
> > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <[email protected]>:
> > >
> > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > >> classes/sun/misc/VM.java#l184
> > >>
> > >> // The initial value of this field is arbitrary; during JRE
> > >> initialization
> > >> // it will be reset to the value specified on the command line, if
> > >> any,
> > >> // otherwise to Runtime.getRuntime().maxMemory().
> > >>
> > >> which goes all the way down to memory/heap.cpp to whatever was left to
> > the
> > >> reserved memory depending on the flags and the platform used as
> Vladimir
> > >> says.
> > >>
> > >> Also, depending on which distribution and features are used there are
> > >> specific guidelines about setting that parameter so mileage might
> vary.
> > >>
> > >> thanks,
> > >> esteban.
> > >>
> > >>
> > >>
> > >> --
> > >> Cloudera, Inc.
> > >>
> > >>
> > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > >> [email protected]>
> > >> wrote:
> > >>
> > >> > >> The default value is zero, which means the maximum direct memory
> is
> > >> > unbounded.
> > >> >
> > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > default
> > >> is
> > >> > platform specific
> > >> >
> > >> > The link above is for JRockit JVM I presume?
> > >> >
> > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > >> [email protected]>
> > >> > wrote:
> > >> >
> > >> > > I don't think is truly unbounded, IIRC it s limited to the maximum
> > >> > > allocated heap.
> > >> > >
> > >> > > thanks,
> > >> > > esteban.
> > >> > >
> > >> > > --
> > >> > > Cloudera, Inc.
> > >> > >
> > >> > >
> > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <[email protected]>
> > wrote:
> > >> > >
> > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> optionxx.
> > >> htm :
> > >> > > >
> > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > >> > > >
> > >> > > > Default Value
> > >> > > >
> > >> > > > The default value is zero, which means the maximum direct memory
> > is
> > >> > > > unbounded.
> > >> > > >
> > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> > >> > > > [email protected]>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which means
> > >> > unlimited
> > >> > > > as
> > >> > > > > far
> > >> > > > > >> as I can tell.
> > >> > > > >
> > >> > > > > Not sure if this is true. The only conforming that link I
> found
> > >> was
> > >> > for
> > >> > > > > JRockit JVM.
> > >> > > > >
> > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> > >> > [email protected]
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Vladimir,
> > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which means
> > >> > unlimited
> > >> > > as
> > >> > > > > far
> > >> > > > > > as I can tell.
> > >> > > > > > Thanks,
> > >> > > > > > Daniel
> > >> > > > > >
> > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> > >> > [email protected]
> > >> > > >:
> > >> > > > > >
> > >> > > > > > > Have you try to increase direct memory size for server
> > >> process?
> > >> > > > > > > -XXMaxDirectMemorySize=?
> > >> > > > > > >
> > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> > >> > > > [email protected]>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hello,
> > >> > > > > > > > I'm running an application doing a lot of Puts (size
> > >> anywhere
> > >> > > > > between 0
> > >> > > > > > > and
> > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting an
> > error
> > >> > like
> > >> > > > the
> > >> > > > > > > > below:
> > >> > > > > > > > 2017-10-09 04:29:29,811 WARN [AsyncProcess] - #13368,
> > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> > >> failed=1ops,
> > >> > > last
> > >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> > >> > > > > ServiceException:
> > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, tracking
> > >> > started
> > >> > > > Mon
> > >> > > > > > Oct
> > >> > > > > > > 09
> > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> > >> > > > > > > >
> > >> > > > > > > > After that the connection to RegionServer becomes
> > unusable.
> > >> > Every
> > >> > > > > > > > subsequent attempt to execute Put on that connection
> > >> results in
> > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory by
> > >> reducing
> > >> > > the
> > >> > > > > > number
> > >> > > > > > > > of tries to 1.
> > >> > > > > > > >
> > >> > > > > > > > The host running HBase appears to have at least a few GB
> > of
> > >> > free
> > >> > > > > memory
> > >> > > > > > > > available. Server logs do not mention anything about
> this
> > >> > error.
> > >> > > > > > Cluster
> > >> > > > > > > is
> > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > >> > > > > > > >
> > >> > > > > > > > Is this a known problem? Are there workarounds
> available?
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Daniel
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>