hmmm... I couldn't find it either, so I've looked at the history of that file and sure enough a few check-ins back it had that message. I have no idea how something like this could happen. I know I had some merge issues when I first got the latest version and built that project but I've then reverted all local changes and rebuilt. The only thing I can imagine is that the previous compiled class file was not modified and it was the one that got included in the JAR, although I don;t really know how can it happen.
-eran On Wed, Mar 28, 2012 at 18:53, Ted Yu <[email protected]> wrote: > Eran: > The error indicated some zookeeper related issue. > Do you see KeeperException after the Error log ? > > I searched 90 codebase but couldn't find the exact log phrase: > > zhihyu$ find src/main -name '*.java' -exec grep "getting node's version in > CLOSI" {} \; -print > zhihyu$ find src/main -name '*.java' -exec grep 'Error getting ' {} \; > -print > > Cheers > > On Wed, Mar 28, 2012 at 9:45 AM, Eran Kutner <[email protected]> wrote: > > > I don't see any prior HDFS issues in the 15 minutes before this > exception. > > The logs on the datanode reported as problematic are clean as well. > > However, I now see the log is full of errors like this: > > 2012-03-28 00:15:05,358 DEBUG > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: > Processing > > close of gs_users,731481|S > > n쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5. > > 2012-03-28 00:15:05,359 WARN > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: Error > > getting node's version in CLOSIN > > G state, aborting close of > > > gs_users,731481|Sn쒪㝨眳ԫ䂣⫰==,1331226388691.29929cb2200b3541ead85e17b836ade5. > > > > -eran > > > > > > > > On Wed, Mar 28, 2012 at 18:38, Jean-Daniel Cryans <[email protected] > > >wrote: > > > > > Any chance we can see what happened before that too? Usually you > > > should see a lot more HDFS spam before getting that all the datanodes > > > are bad. > > > > > > J-D > > > > > > On Wed, Mar 28, 2012 at 4:28 AM, Eran Kutner <[email protected]> wrote: > > > > Hi, > > > > > > > > We have region server sporadically stopping under load due supposedly > > to > > > > errors writing to HDFS. Things like: > > > > > > > > 2012-03-28 00:37:11,210 WARN org.apache.hadoop.hdfs.DFSClient: Error > > > while > > > > syncing > > > > java.io.IOException: All datanodes 10.1.104.10:50010 are bad. > > Aborting.. > > > > > > > > It's happening with a different region server and data node every > time, > > > so > > > > it's not a problem with one specific server and there doesn't seem to > > be > > > > anything really wrong with either of them. I've already increased the > > > file > > > > descriptor limit, datanode xceivers and data node handler count. Any > > idea > > > > what can be causing these errors? > > > > > > > > > > > > A more complete log is here: http://pastebin.com/wC90xU2x > > > > > > > > Thanks. > > > > > > > > -eran > > > > > >
