On Wed, Nov 8, 2017 at 3:31 AM, Abhishek Singh Chouhan < abhishekchouhan...@gmail.com> wrote:
> I faced the same issue and have been debugging this for some time now(the > logging is not very helpful as daniel mentions :)). > Looking deeper into this i realized that the side effects also are large > incorrect byte buffer allocations on the server side apart from call > timeouts on the client side. > Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215> > for > this > > Thank you lads for the info. Lets carry-on over in HBASE-19215. Good one. S > On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <djelins...@gmail.com> > wrote: > > > 2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>: > > > > > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <djelins...@gmail.com> > > > wrote: > > > > > > > For others that run into similar issue, it turned out that the > > > > OutOfMemoryError was thrown (and subsequently hidden) on the client > > side. > > > > The error was caused by excessive direct memory usage in Java NIO's > > > > bytebuffer caching (described here: > > > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting > > > > -Djdk.nio.maxCachedBufferSize=262144 > > > > allowed the application to complete. > > > > > > > > > > > Suggestions for how to expose the client-side OOME Daniel? We should > add > > > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" > (and > > > make sure the exception makes it out!) > > > > > > > Well I found the problem by adding printStackTrace to > > AsyncProcess.createLog function, which was responsible for logging the > > original OOME. This is not very elegant, and I wouldn't recommend adding > it > > to the official codebase, but the stack trace offers some hints: > > > > java.io.IOException: com.google.protobuf.ServiceException: > > java.lang.OutOfMemoryError: Direct buffer memory > > > > at > > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException( > > ProtobufUtil.java:329) > > > > at > > org.apache.hadoop.hbase.client.MultiServerCallable. > > call(MultiServerCallable.java:130) > > > > at > > org.apache.hadoop.hbase.client.MultiServerCallable. > > call(MultiServerCallable.java:53) > > > > at > > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries( > > RpcRetryingCaller.java:200) > > > > at > > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$ > > SingleServerRequestRunnable.run(AsyncProcess.java:727) > > > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > > > > at java.util.concurrent.FutureTask.run(Unknown > > Source) > > > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > > > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > > > > at java.lang.Thread.run(Unknown Source) > > > > Caused by: com.google.protobuf.ServiceException: > > java.lang.OutOfMemoryError: Direct buffer memory > > > > at > > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > > AbstractRpcClient.java:240) > > > > at > > org.apache.hadoop.hbase.ipc.AbstractRpcClient$ > > BlockingRpcChannelImplementation.callBlockingMethod( > > AbstractRpcClient.java:336) > > > > at > > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$ > > BlockingStub.multi(ClientProtos.java:34142) > > > > at > > org.apache.hadoop.hbase.client.MultiServerCallable. > > call(MultiServerCallable.java:128) > > > > ... 8 more > > > > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > > > > at java.nio.Bits.reserveMemory(Unknown Source) > > > > at java.nio.DirectByteBuffer.<init>(Unknown > > Source) > > > > at java.nio.ByteBuffer.allocateDirect(Unknown > > Source) > > > > at sun.nio.ch.Util.getTemporaryDirectBuffer( > > Unknown > > Source) > > > > at sun.nio.ch.IOUtil.write(Unknown Source) > > > > at sun.nio.ch.SocketChannelImpl.write(Unknown > > Source) > > > > at > > org.apache.hadoop.net.SocketOutputStream$Writer. > > performIO(SocketOutputStream.java:63) > > > > at > > org.apache.hadoop.net.SocketIOWithTimeout.doIO( > > SocketIOWithTimeout.java:142) > > > > at > > org.apache.hadoop.net.SocketOutputStream.write( > > SocketOutputStream.java:159) > > > > at > > org.apache.hadoop.net.SocketOutputStream.write( > > SocketOutputStream.java:117) > > > > at > > org.apache.hadoop.security.SaslOutputStream.write( > > SaslOutputStream.java:169) > > > > at java.io.BufferedOutputStream.write(Unknown > > Source) > > > > at java.io.DataOutputStream.write(Unknown > Source) > > > > at > > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277) > > > > at > > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266) > > > > at > > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection. > > writeRequest(RpcClientImpl.java:921) > > > > at > > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest( > > RpcClientImpl.java:874) > > > > at > > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243) > > > > at > > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod( > > AbstractRpcClient.java:227) > > > > ... 11 more > > This stack trace comes from cdh5.10.2 version, but the master branch is > > sufficiently similar. So, depending on what we want to achieve, we could: > > - just replace catch(Throwable e) in AbstractRpcClient. > callBlockingMethod > > with something more fine-grained and fail the application > > - or forward OOME in callBlockingMethod, but add information about > > maxCachedBufferSize, > > also failing the application but suggesting possible corrective action to > > the user > > - or pass the error to the user, allowing the application to intercept > it. > > Not sure yet how to do that, but we would need to do something about the > > connection becoming unusable after OOME, in case user decides to keep > > going. > > What's your take? > > > > > > > > > > > Thanks for updating the list, > > > S > > > > > > > > > > > > > Yet another proof that correct handling of OOME is hard. > > > > Thanks, > > > > Daniel > > > > > > > > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <djelins...@gmail.com>: > > > > > > > > > Thanks for the hints. I'll see if we can explicitly set > > > > > MaxDirectMemorySize to a safe number. > > > > > Thanks, > > > > > Daniel > > > > > > > > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <este...@cloudera.com > >: > > > > > > > > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/ > > > > >> classes/sun/misc/VM.java#l184 > > > > >> > > > > >> // The initial value of this field is arbitrary; during JRE > > > > >> initialization > > > > >> // it will be reset to the value specified on the command > line, > > if > > > > >> any, > > > > >> // otherwise to Runtime.getRuntime().maxMemory(). > > > > >> > > > > >> which goes all the way down to memory/heap.cpp to whatever was > left > > to > > > > the > > > > >> reserved memory depending on the flags and the platform used as > > > Vladimir > > > > >> says. > > > > >> > > > > >> Also, depending on which distribution and features are used there > > are > > > > >> specific guidelines about setting that parameter so mileage might > > > vary. > > > > >> > > > > >> thanks, > > > > >> esteban. > > > > >> > > > > >> > > > > >> > > > > >> -- > > > > >> Cloudera, Inc. > > > > >> > > > > >> > > > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov < > > > > >> vladrodio...@gmail.com> > > > > >> wrote: > > > > >> > > > > >> > >> The default value is zero, which means the maximum direct > > memory > > > is > > > > >> > unbounded. > > > > >> > > > > > >> > That is not correct. If you do not specify MaxDirectMemorySize, > > > > default > > > > >> is > > > > >> > platform specific > > > > >> > > > > > >> > The link above is for JRockit JVM I presume? > > > > >> > > > > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez < > > > > >> este...@cloudera.com> > > > > >> > wrote: > > > > >> > > > > > >> > > I don't think is truly unbounded, IIRC it s limited to the > > maximum > > > > >> > > allocated heap. > > > > >> > > > > > > >> > > thanks, > > > > >> > > esteban. > > > > >> > > > > > > >> > > -- > > > > >> > > Cloudera, Inc. > > > > >> > > > > > > >> > > > > > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yuzhih...@gmail.com> > > > > wrote: > > > > >> > > > > > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/ > > > optionxx. > > > > >> htm : > > > > >> > > > > > > > >> > > > java -XX:MaxDirectMemorySize=2g myApp > > > > >> > > > > > > > >> > > > Default Value > > > > >> > > > > > > > >> > > > The default value is zero, which means the maximum direct > > memory > > > > is > > > > >> > > > unbounded. > > > > >> > > > > > > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov < > > > > >> > > > vladrodio...@gmail.com> > > > > >> > > > wrote: > > > > >> > > > > > > > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which > > means > > > > >> > unlimited > > > > >> > > > as > > > > >> > > > > far > > > > >> > > > > >> as I can tell. > > > > >> > > > > > > > > >> > > > > Not sure if this is true. The only conforming that link I > > > found > > > > >> was > > > > >> > for > > > > >> > > > > JRockit JVM. > > > > >> > > > > > > > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński < > > > > >> > djelins...@gmail.com > > > > >> > > > > > > > >> > > > > wrote: > > > > >> > > > > > > > > >> > > > > > Vladimir, > > > > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which > means > > > > >> > unlimited > > > > >> > > as > > > > >> > > > > far > > > > >> > > > > > as I can tell. > > > > >> > > > > > Thanks, > > > > >> > > > > > Daniel > > > > >> > > > > > > > > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov < > > > > >> > vladrodio...@gmail.com > > > > >> > > >: > > > > >> > > > > > > > > > >> > > > > > > Have you try to increase direct memory size for server > > > > >> process? > > > > >> > > > > > > -XXMaxDirectMemorySize=? > > > > >> > > > > > > > > > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński < > > > > >> > > > djelins...@gmail.com> > > > > >> > > > > > > wrote: > > > > >> > > > > > > > > > > >> > > > > > > > Hello, > > > > >> > > > > > > > I'm running an application doing a lot of Puts (size > > > > >> anywhere > > > > >> > > > > between 0 > > > > >> > > > > > > and > > > > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting > an > > > > error > > > > >> > like > > > > >> > > > the > > > > >> > > > > > > > below: > > > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN [AsyncProcess] - > #13368, > > > > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1 > > > > >> failed=1ops, > > > > >> > > last > > > > >> > > > > > > > exception: java.io.IOException: com.google.protobuf. > > > > >> > > > > ServiceException: > > > > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on > > > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, > > tracking > > > > >> > started > > > > >> > > > Mon > > > > >> > > > > > Oct > > > > >> > > > > > > 09 > > > > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure > > > > >> > > > > > > > > > > > >> > > > > > > > After that the connection to RegionServer becomes > > > > unusable. > > > > >> > Every > > > > >> > > > > > > > subsequent attempt to execute Put on that connection > > > > >> results in > > > > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory > by > > > > >> reducing > > > > >> > > the > > > > >> > > > > > number > > > > >> > > > > > > > of tries to 1. > > > > >> > > > > > > > > > > > >> > > > > > > > The host running HBase appears to have at least a > few > > GB > > > > of > > > > >> > free > > > > >> > > > > memory > > > > >> > > > > > > > available. Server logs do not mention anything about > > > this > > > > >> > error. > > > > >> > > > > > Cluster > > > > >> > > > > > > is > > > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2. > > > > >> > > > > > > > > > > > >> > > > > > > > Is this a known problem? Are there workarounds > > > available? > > > > >> > > > > > > > Thanks, > > > > >> > > > > > > > Daniel > > > > >> > > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >