Eran:
The error indicated some zookeeper related issue.
Do you see KeeperException after the Error log ?
I searched 90 codebase but couldn't find the exact log phrase:
zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in
CLOSI" {} \; -print
zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \;
-print
Cheers
On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[email protected]> wrote:
> I don't see any prior HDFS issues in the 15 minutes before this exception.
> The logs on the datanode reported as problematic are clean as well.
> However, I now see the log is full of errors like this:
> 2012-03-28 00:15:05,358 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing
> close of gs_users,731481|S
> n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
> 2012-03-28 00:15:05,359 WARN
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error
> getting node's version in CLOSIN
> G state, aborting close of
> gs_users,731481|Sn쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5.
>
> -eran
>
>
>
> On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <[email protected]
> >wrote:
>
> > Any chance we can see what happened before that too? Usually you
> > should see a lot more HDFS spam before getting that all the datanodes
> > are bad.
> >
> > J-D
> >
> > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[email protected]> wrote:
> > > Hi,
> > >
> > > We have region server sporadically stopping under load due supposedly
> to
> > > errors writing to HDFS. Things like:
> > >
> > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error
> > while
> > > syncing
> > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad.
> Aborting..
> > >
> > > It's happening with a different region server and data node every time,
> > so
> > > it's not a problem with one specific server and there doesn't seem to
> be
> > > anything really wrong with either of them. I've already increased the
> > file
> > > descriptor limit, datanode xceivers and data node handler count. Any
> idea
> > > what can be causing these errors?
> > >
> > >
> > > A more complete log is here: http://pastebin.com/wC90xU2x
> > >
> > > Thanks.
> > >
> > > -eran
> >
>