Thanks for the analysis. Do you mind opening a Jira ?
On Jan 10, 2012, at 7:51 AM, Yves Langisch <[email protected]> wrote: > Still happens with HBase 0.90.5/Hadoop 1.0.0. But I think I have some more > insights on this topic. Following an up to date stack trace: > > java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.HRegionServer.convertThrowableToIOE(HRegionServer.java:986) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2008) > at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > Caused by: java.lang.NullPointerException > at > java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:881) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.addRowLock(HRegionServer.java:2018) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.lockRow(HRegionServer.java:2004) > ... 5 more > > After checking the source code I've noticed that the value which is going to > be put into the HashMap can be null in the case where the waitForLock flag is > true or the rowLockWaitDuration is expired (HRegion#internalObtainRowLock, > line 2111ff). The latter I think happens in our case as we have heavy load > hitting the server. > > IMHO this case should be handled somehow and must not lead to a NPE. > > - > Yves > > On Dec 30, 2011, at 12:12 PM, Yves Langisch wrote: > >> Still happens but before I'm going to add some debugging information I'll >> try to deploy the new version 0.90.5. >> >> - >> Yves >> >> On Dec 18, 2011, at 12:08 AM, Stack wrote: >> >>> On Fri, Dec 16, 2011 at 8:20 AM, Yves Langisch <[email protected]> wrote: >>>> I'm using the async hbase client (1.0) and there is no way to choose a >>>> lockId on my own: >>>> >>>> <snippet> >>>> return database.client().lockRow( >>>> new RowLockRequest(TableManager.ID_TABLE_NAME, >>>> MAXID_ROW)).join(); >>>> >>>> </snippet> >>>> >>>> Any ideas what else could be wrong here? >>>> >>> >>> Looking at the code on regionserver side, >>> http://svn.apache.org/viewvc/hbase/tags/0.90.4/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java?view=markup, >>> down around line 1994, its unlikely the region is null since we should >>> throw NotServingRegionException if can't find region (and we check for >>> null region name a few lines up) so maybe its something in the way we >>> do obtainRowLock on line 1995? >>> >>> Any chance of your instrumenting the regionserver? Adding a bit of >>> debugging and deploying the debugging regionserver? >>> >>> My guess is we haven't seen this before because not many use rowlocks >>> (rowlocks if long-lived and lots of contending clients could freeze >>> you out of the server; each client blocked waiting on rowlock to clear >>> occupies a handler of which there are a bounded number). >>> >>> St.Ack >>> >> >
