Can you look even further? Like a day? J-D
On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[email protected]> wrote: > I don't see any prior HDFS issues in the 15 minutes before this exception. > The logs on the datanode reported as problematic are clean as well. > However, I now see the log is full of errors like this: > 2012-03-28 00:15:05,358 DEBUG > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Processing > close of gs_users,731481|S > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5. > 2012-03-28 00:15:05,359 WARN > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error > getting node's version in CLOSIN > G state, aborting close of > gs_users,731481|Sn쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5. > > -eran > > > > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <[email protected]>wrote: > >> Any chance we can see what happened before that too? Usually you >> should see a lot more HDFS spam before getting that all the datanodes >> are bad. >> >> J-D >> >> On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[email protected]> wrote: >> > Hi, >> > >> > We have region server sporadically stopping under load due supposedly to >> > errors writing to HDFS. Things like: >> > >> > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error >> while >> > syncing >> > java.io.IOException: All datanodes 10.1.104.10:50010 are bad. Aborting.. >> > >> > It's happening with a different region server and data node every time, >> so >> > it's not a problem with one specific server and there doesn't seem to be >> > anything really wrong with either of them. I've already increased the >> file >> > descriptor limit, datanode xceivers and data node handler count. Any idea >> > what can be causing these errors? >> > >> > >> > A more complete log is here: http://pastebin.com/wC90xU2x >> > >> > Thanks. >> > >> > -eran >>
