On Wed, Nov 8, 2017 at 3:31 AM, Abhishek Singh Chouhan <
abhishekchouhan...@gmail.com> wrote:

> I faced the same issue and have been debugging this for some time now(the
> logging is not very helpful as daniel mentions :)).
> Looking deeper into this i realized that the side effects also are large
> incorrect byte buffer allocations on the server side apart from call
> timeouts on the client side.
> Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215>
> for
> this
>
>
Thank you lads for the info. Lets carry-on over in HBASE-19215. Good one.
S



> On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <djelins...@gmail.com>
> wrote:
>
> > 2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>:
> >
> > > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <djelins...@gmail.com>
> > > wrote:
> > >
> > > > For others that run into similar issue, it turned out that the
> > > > OutOfMemoryError was thrown (and subsequently hidden) on the client
> > side.
> > > > The error was caused by excessive direct memory usage in Java NIO's
> > > > bytebuffer caching (described here:
> > > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > > > -Djdk.nio.maxCachedBufferSize=262144
> > > > allowed the application to complete.
> > > >
> > > >
> > > Suggestions for how to expose the client-side OOME Daniel? We should
> add
> > > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize"
> (and
> > > make sure the exception makes it out!)
> > >
> >
> > Well I found the problem by adding printStackTrace to
> > AsyncProcess.createLog function, which was responsible for logging the
> > original OOME. This is not very elegant, and I wouldn't recommend adding
> it
> > to the official codebase, but the stack trace offers some hints:
> >
> > java.io.IOException: com.google.protobuf.ServiceException:
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at
> > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(
> > ProtobufUtil.java:329)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:130)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:53)
> >
> >                         at
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> > RpcRetryingCaller.java:200)
> >
> >                         at
> > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$
> > SingleServerRequestRunnable.run(AsyncProcess.java:727)
> >
> >                         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> >
> >                         at java.util.concurrent.FutureTask.run(Unknown
> > Source)
> >
> >                         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> >
> >                         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> >
> >                         at java.lang.Thread.run(Unknown Source)
> >
> > Caused by: com.google.protobuf.ServiceException:
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> > AbstractRpcClient.java:240)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient$
> > BlockingRpcChannelImplementation.callBlockingMethod(
> > AbstractRpcClient.java:336)
> >
> >                         at
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> > BlockingStub.multi(ClientProtos.java:34142)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:128)
> >
> >                         ... 8 more
> >
> > Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at java.nio.Bits.reserveMemory(Unknown Source)
> >
> >                         at java.nio.DirectByteBuffer.<init>(Unknown
> > Source)
> >
> >                         at java.nio.ByteBuffer.allocateDirect(Unknown
> > Source)
> >
> >                         at sun.nio.ch.Util.getTemporaryDirectBuffer(
> > Unknown
> > Source)
> >
> >                         at sun.nio.ch.IOUtil.write(Unknown Source)
> >
> >                         at sun.nio.ch.SocketChannelImpl.write(Unknown
> > Source)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream$Writer.
> > performIO(SocketOutputStream.java:63)
> >
> >                         at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> > SocketIOWithTimeout.java:142)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream.write(
> > SocketOutputStream.java:159)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream.write(
> > SocketOutputStream.java:117)
> >
> >                         at
> > org.apache.hadoop.security.SaslOutputStream.write(
> > SaslOutputStream.java:169)
> >
> >                         at java.io.BufferedOutputStream.write(Unknown
> > Source)
> >
> >                         at java.io.DataOutputStream.write(Unknown
> Source)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> > writeRequest(RpcClientImpl.java:921)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(
> > RpcClientImpl.java:874)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> > AbstractRpcClient.java:227)
> >
> >                         ... 11 more
> > This stack trace comes from cdh5.10.2 version, but the master branch is
> > sufficiently similar. So, depending on what we want to achieve, we could:
> > - just replace catch(Throwable e) in AbstractRpcClient.
> callBlockingMethod
> > with something more fine-grained and fail the application
> > - or forward OOME in callBlockingMethod, but add information about
> > maxCachedBufferSize,
> > also failing the application but suggesting possible corrective action to
> > the user
> > - or pass the error to the user, allowing the application to intercept
> it.
> > Not sure yet how to do that, but we would need to do something about the
> > connection becoming unusable after OOME, in case user decides to keep
> > going.
> > What's your take?
> >
> >
> >
> >
> > > Thanks for updating the list,
> > > S
> > >
> > >
> > >
> > > > Yet another proof that correct handling of OOME is hard.
> > > > Thanks,
> > > > Daniel
> > > >
> > > > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <djelins...@gmail.com>:
> > > >
> > > > > Thanks for the hints. I'll see if we can explicitly set
> > > > > MaxDirectMemorySize to a safe number.
> > > > > Thanks,
> > > > > Daniel
> > > > >
> > > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <este...@cloudera.com
> >:
> > > > >
> > > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > > > >> classes/sun/misc/VM.java#l184
> > > > >>
> > > > >>     // The initial value of this field is arbitrary; during JRE
> > > > >> initialization
> > > > >>     // it will be reset to the value specified on the command
> line,
> > if
> > > > >> any,
> > > > >>     // otherwise to Runtime.getRuntime().maxMemory().
> > > > >>
> > > > >> which goes all the way down to memory/heap.cpp to whatever was
> left
> > to
> > > > the
> > > > >> reserved memory depending on the flags and the platform used as
> > > Vladimir
> > > > >> says.
> > > > >>
> > > > >> Also, depending on which distribution and features are used there
> > are
> > > > >> specific guidelines about setting that parameter so mileage might
> > > vary.
> > > > >>
> > > > >> thanks,
> > > > >> esteban.
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Cloudera, Inc.
> > > > >>
> > > > >>
> > > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > > > >> vladrodio...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > >> The default value is zero, which means the maximum direct
> > memory
> > > is
> > > > >> > unbounded.
> > > > >> >
> > > > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > > > default
> > > > >> is
> > > > >> > platform specific
> > > > >> >
> > > > >> > The link above is for JRockit JVM I presume?
> > > > >> >
> > > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > > > >> este...@cloudera.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > I don't think is truly unbounded, IIRC it s limited to the
> > maximum
> > > > >> > > allocated heap.
> > > > >> > >
> > > > >> > > thanks,
> > > > >> > > esteban.
> > > > >> > >
> > > > >> > > --
> > > > >> > > Cloudera, Inc.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yuzhih...@gmail.com>
> > > > wrote:
> > > > >> > >
> > > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> > > optionxx.
> > > > >> htm :
> > > > >> > > >
> > > > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > > > >> > > >
> > > > >> > > > Default Value
> > > > >> > > >
> > > > >> > > > The default value is zero, which means the maximum direct
> > memory
> > > > is
> > > > >> > > > unbounded.
> > > > >> > > >
> > > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> > > > >> > > > vladrodio...@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which
> > means
> > > > >> > unlimited
> > > > >> > > > as
> > > > >> > > > > far
> > > > >> > > > > >> as I can tell.
> > > > >> > > > >
> > > > >> > > > > Not sure if this is true. The only conforming that link I
> > > found
> > > > >> was
> > > > >> > for
> > > > >> > > > > JRockit JVM.
> > > > >> > > > >
> > > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> > > > >> > djelins...@gmail.com
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Vladimir,
> > > > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which
> means
> > > > >> > unlimited
> > > > >> > > as
> > > > >> > > > > far
> > > > >> > > > > > as I can tell.
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Daniel
> > > > >> > > > > >
> > > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> > > > >> > vladrodio...@gmail.com
> > > > >> > > >:
> > > > >> > > > > >
> > > > >> > > > > > > Have you try to increase direct memory size for server
> > > > >> process?
> > > > >> > > > > > > -XXMaxDirectMemorySize=?
> > > > >> > > > > > >
> > > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> > > > >> > > > djelins...@gmail.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hello,
> > > > >> > > > > > > > I'm running an application doing a lot of Puts (size
> > > > >> anywhere
> > > > >> > > > > between 0
> > > > >> > > > > > > and
> > > > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting
> an
> > > > error
> > > > >> > like
> > > > >> > > > the
> > > > >> > > > > > > > below:
> > > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] -
> #13368,
> > > > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> > > > >> failed=1ops,
> > > > >> > > last
> > > > >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> > > > >> > > > > ServiceException:
> > > > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> > > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534,
> > tracking
> > > > >> > started
> > > > >> > > > Mon
> > > > >> > > > > > Oct
> > > > >> > > > > > > 09
> > > > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> > > > >> > > > > > > >
> > > > >> > > > > > > > After that the connection to RegionServer becomes
> > > > unusable.
> > > > >> > Every
> > > > >> > > > > > > > subsequent attempt to execute Put on that connection
> > > > >> results in
> > > > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory
> by
> > > > >> reducing
> > > > >> > > the
> > > > >> > > > > > number
> > > > >> > > > > > > > of tries to 1.
> > > > >> > > > > > > >
> > > > >> > > > > > > > The host running HBase appears to have at least a
> few
> > GB
> > > > of
> > > > >> > free
> > > > >> > > > > memory
> > > > >> > > > > > > > available. Server logs do not mention anything about
> > > this
> > > > >> > error.
> > > > >> > > > > > Cluster
> > > > >> > > > > > > is
> > > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Is this a known problem? Are there workarounds
> > > available?
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Daniel
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to