I resolved the problem in trunk, a patch for 0.89 can also be found attached in the jira. It should apply cleanly. Let me know if it works better for you!
Thx, J-D On Fri, Nov 5, 2010 at 11:45 AM, Erdem Agaoglu <[email protected]> wrote: > Previous problems probably correlate. When log split and replay fails > [because of the empty log], master thinks the cluster is in a clean state > and assigns a new .META. on some regionserver. It does not know about the > previous assignment of .META. thus does not split any logs for it. When > regionserver opens new .META., it looks to hdfs and finds some persistent > file for it, and since there is no recovered.edits it starts to serve. > Depending on the state those files 2 things happen: > > 1. If the cluster is really new (like an hour new) all the .META. data will > be on previous RS memstore. So there will be no accessible regions = lost > tables > 2. If the cluster is running for a while or gracefully restarted before, the > .META. data from the previous run would be persisted. So the regions from > this previous run will be accessible but not the newly generated or > re-assigned regions = unfinishing scans or corrupt data > *. No matter what, since master thinks the .META. is new, it will not know > about and unassign original regions = double assignment > > So if we are right about those assumptions, it seems it's the same problem. > > We've learned a lot about HBase this week, and we owe it to your pointers, > thanks. > I am grateful have the chance to contribute to the community in any way,... > at last :). > > On Fri, Nov 5, 2010 at 7:43 PM, Jean-Daniel Cryans <[email protected]>wrote: > >> Very interesting finding Erdem! I tried it on a local empty instance >> and I set the roll period to 6 secs, here's what I saw: >> >> org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms >> elapsed >> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using >> syncFs -- HDFS-200 >> org.apache.hadoop.hbase.regionserver.wal.HLog: Roll >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177 >> %3A60020.1288977933829, >> entries=1, filesize=295. New hlog >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177 >> %3A60020.1288977943913 >> org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove >> out of total 1; oldest outstanding sequenceid is 270055 from region >> -ROOT-,,0 >> org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file >> /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177 >> %3A60020.1288977933829 >> whose highest sequenceid is 270054 to >> /hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829 >> >> This looks like a very bad bug, I opened >> https://issues.apache.org/jira/browse/HBASE-3198 >> >> Can you correlate this with the previous tests you did? Hopefully it's >> the same problem. >> >> Thanks a lot for your patience and investigative work! >> >> J-D >> >> On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <[email protected]> >> wrote: >> > We have found the problem, again. But this time, we dug deeper and >> extracted >> > more info. >> > >> > First, the reason that we were unable to reproduce was trying to kill the >> RS >> > that holds -ROOT- too soon. That way, >> > 1. log split goes fine, >> > 2. -ROOT- gets assigned on somewhere else, >> > 3. log replaying for that RS goes smoothly >> > 4. .META. does not change. >> > 5. everything is normal >> > >> > But if we try after a while of running the cluster, >> > 1. log split fails with java.io.EOFException for the only hlog file, >> > 2. -ROOT- gets assigned on somewhere else, >> > 3. no log is replayed since none split >> > 4. .META. is invalid, a new .META. is designated >> > 5. data corrupted >> > >> > Before killing RS that holds -ROOT-, we checked the only hlog file of it, >> > and see that it contains the updates for the place of the .META. in the >> > first scenario. But in the second failing scenario, the file was empty. >> We >> > found that the log rolling process, which happens once an hour by >> default, >> > somehow thinks that the hlog is old and cleans it up. >> > >> > # LogRoller: Hlog roll period 3600000ms elapsed >> > # SequenceFileLogWriter: Using syncFs -- HDFS-200 >> > # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New >> hlog >> > /hbase/.logs/A/blahfile0476 >> > # HLog: Found 1 hlogs to remove out of total 1; oldest outstanding >> > sequenceid is 4 from region -ROOT-,,0 >> > # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest >> > sequenceid is 3 to /hbase/.oldlogs/blahfile0316 >> > >> > In that instance, blahfile0316 contains some rows while blahfile0476 is >> > empty. It very much seems like the problem is here, as it thinks the data >> in >> > memstore to be persisted while it is not. This seemed like a bug to us >> but >> > might as well be a misconfiguration. Any idea is greatly appreciated. >> > >> > >> > On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <[email protected] >> >wrote: >> > >> >> We suspect some misconfiguration too but are unsure what that might be. >> >> Almost all of the configuration parameters are in default along with >> >> dfs.support.append which is true in append branch. We checked RS logs >> and >> >> couldn't find any Exceptions, everything seems normal so it is unlikely >> to >> >> be a bug. The only WARN was about the last hlog file being empty but i >> guess >> >> this is normal. >> >> >> >> To confirm it is a append-configuration related problem, we killed the >> RS >> >> that holds -ROOT- right after running the cluster and checked the >> >> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i >> think >> >> append works correctly. The file got removed after a while. >> >> >> >> BTW, we tried to reproduce it with smaller data (to control manually, >> like >> >> 2 rows in 1 table, newly put) but didn't have any problems. I am >> starting to >> >> think the problem just disappeared. I'll increase rows to hundreds and >> check >> >> again. >> >> >> >> Until i get the problem again, Thanks for your time. >> >> >> >> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans <[email protected] >> >wrote: >> >> >> >>> This confirms my suspicion that -ROOT-'s latest data was lost and that >> >>> .META. got double assigned, which easily explains all the weirdness >> >>> you're seeing after the log replay. >> >>> >> >>> The most probable reason would be misconfiguration, since the append >> >>> feature is very well tested and a simple kill -9 wouldn't cause all >> >>> sorts of issues. Else, it could be caused by an HBase bug, in which >> >>> case exceptions would probably come out either during log splitting or >> >>> during log replay (which happens when the region is reassigned to >> >>> another region server). In your case you could take a look at the RS >> >>> log where -ROOT- ends up and see if there's anything weird. >> >>> >> >>> To test if appends works correctly, in order to rule that out, just >> >>> bring up your cluster and kill -9 right away the RS that's holding >> >>> ROOT. Once the master did the log splitting, do a lookup directly in >> >>> hadoop (bin/hadoop fs -cat) on that log file which should be under >> >>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry. >> >>> If it's empty or missing, it's probably a misconfiguration. >> >>> >> >>> J-D >> >>> >> >>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <[email protected] >> > >> >>> wrote: >> >>> > We reproduced the problem by kill -9ing the region server that hosts >> >>> -ROOT- >> >>> > again. Results are the same with what i have explained before. After >> log >> >>> > splitting metaScanner complained that .META. is not valid and >> reassigned >> >>> it >> >>> > to somewhere else. As a result, the number of rows in .META. reduced >> >>> from 24 >> >>> > to 16 meaning we lost access to some regions and some table >> definitions. >> >>> To >> >>> > support what you have said about the data in the memstore, we seem to >> >>> have >> >>> > lost only recent regions. >> >>> > >> >>> > status 'detailed' showed no regionsInTransition, and showed a total >> of >> >>> 39 >> >>> > regions assigned to region servers, but as i've said before, we have >> >>> access >> >>> > to only 16 of them. No need to mention our queries fail. >> >>> > >> >>> > Any idea we could try is greatly appreciated. Thanks in advance. >> >>> > >> >>> > PS. Something probably irrelevant happened this time, interface 60010 >> >>> lists >> >>> > all our tables while 'list' command in shell lists only a few. We get >> >>> > TableNotFoundExceptions on tables shown in web interface but missing >> in >> >>> > 'list' command output. Seems like web interface is able to list >> >>> inaccessible >> >>> > tables. >> >>> > >> >>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu < >> [email protected] >> >>> >wrote: >> >>> > >> >>> >> By "lost tables" i mean no user table definitions were listed in >> >>> interface >> >>> >> 60010 and my queries got me TableNotFoundExceptions. Routine >> >>> BaseScanner >> >>> >> logged like >> >>> >> >> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ... >> >>> complete >> >>> >> >> >>> >> so i guess my .META. was empty. But unfortunately I don't know if >> any >> >>> >> regions were stuck in transition, i'll be sure to check this out >> while >> >>> i try >> >>> >> to reproduce. >> >>> >> >> >>> >> I rechecked the logs, specifically after the splitting completed and >> it >> >>> >> contains lines like "Current assignment is not valid..." so i guess >> >>> this is >> >>> >> something unexpected. But in hopes of some configuration error you >> may >> >>> spot, >> >>> >> whole log goes like that (stripped to be more readable): >> >>> >> >> >>> >> # hlog file splitting completed in 80372 millis >> >>> >> # Log split complete, meta reassignment and scanning: >> >>> >> # ProcessServerShutdown reassigning ROOT region >> >>> >> # -ROOT- region unset (but not set to be reassigned) >> >>> >> # ROOT inserted into regionsInTransition >> >>> >> # Assigning for serverName=C... total nregions to assign=1, regions >> to >> >>> >> give other servers than this=0, isMetaAssign=true >> >>> >> # Assigning serverName=C... 1 regions >> >>> >> # Assigning region -ROOT-... to C... >> >>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1 >> >>> >> # RegionManager.rootScanner scanning meta region {server: C ... >> >>> >> # Current assignment of .META.... is not valid; serverAddress=A, >> >>> >> startCode=blah unknown; checking once more! >> >>> >> # Current assignment of .META.... is not valid; serverAddress=A, >> >>> >> startCode=blah unknown. >> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: >> C >> >>> ... >> >>> >> complete >> >>> >> # RegionManager.rootScanner scanning meta region {server: C ... >> >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: >> C >> >>> ... >> >>> >> complete >> >>> >> # Assigning for serverName=E... total nregions to assign=-1, >> regions >> >>> to >> >>> >> give other servers than this=2, isMetaAssign=true >> >>> >> # Assigning serverName=E... -1 regions >> >>> >> # Assigning region .META.... to E... >> >>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1 >> >>> >> # Processing todo: PendingOpenOperation from E... >> >>> >> # .META.... open on E >> >>> >> # Updated row .META.... in region -ROOT- ... >> >>> >> # Adding to onlineMetaRegions: {server: E ... >> >>> >> # RegionManager.metaScanner scanning meta region {server: E ... >> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: >> E >> >>> ... >> >>> >> # All 1 .META. region(s) scanned >> >>> >> 10 secs later >> >>> >> # Processing todo: ProcessServerShutdown of F >> >>> >> # Process shutdown of server F...: logSplit: true, rootRescanned: >> >>> false, >> >>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1 >> >>> >> # Log split complete, meta reassignment and scanning >> >>> >> # Process server shutdown scanning root region on C >> >>> >> # Process server shutdown scanning root region on C finished master >> >>> >> # process server shutdown scanning .META.,,1 on E >> >>> >> # process server shutdown finished scanning .META.,,1 on E >> >>> >> # Removed F... from deadservers Map >> >>> >> >> >>> >> Thanks again. >> >>> >> >> >>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans < >> >>> [email protected]>wrote: >> >>> >> >> >>> >>> The fact that the tables are "revived" is a clue here IMO, but >> let's >> >>> >>> go back to more basic questions... >> >>> >>> >> >>> >>> So when you say that during step 1 you lost tables, what do you >> mean >> >>> >>> by "lost"? Were the rows of those tables still in .META.? Were the >> >>> >>> regions stuck in transition (in the shell, do: status 'detailed')? >> Or >> >>> >>> when you tried to query them you just got a TableNotFoundException? >> >>> >>> >> >>> >>> Also, the fact that only -ROOT- and not .META. was on this region >> >>> >>> server means that if there was any data lost, it would be .META.'s >> >>> >>> location and it would have been assigned somewhere else (E), but >> still >> >>> >>> stayed assigned on A. Since the data is in the memstore, recent >> data >> >>> >>> couldn't be read by this second assignment of .META. but... it >> could >> >>> >>> also be reassigned for a "normal" reason like rebalancing. The best >> >>> >>> way to confirm that is when the -ROOT- region gets reassigned at >> the >> >>> >>> end of step 1 (so this is after the message that goes like "...file >> >>> >>> splitting completed in 80372..."), do so see something like this in >> >>> >>> the master's log: "Current assignment of .META.,,some_timestamp is >> >>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so, >> then >> >>> >>> it seems that data was lost and this is really unexpected. >> >>> >>> >> >>> >>> J-D >> >>> >>> >> >>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu < >> >>> [email protected]> >> >>> >>> wrote: >> >>> >>> > Hi again, >> >>> >>> > >> >>> >>> > I have re-checked our configuration to confirm that we have >> >>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in >> >>> logs. I >> >>> >>> > inspected logs around log splits to find something, but i'm not >> sure >> >>> >>> that's >> >>> >>> > what we are looking for. In the first step of the scenario i >> >>> mentioned >> >>> >>> > before (where we kill -9ed everything on the node that hosts the >> >>> ROOT >> >>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for >> >>> >>> clarity) >> >>> >>> > >> >>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908 >> >>> >>> > >> >>> >>> > Then it goes over every single one like >> >>> >>> > >> >>> >>> > # Splitting hlog 1 of 7 >> >>> >>> > # Splitting hlog 2 of 7 >> >>> >>> > # ... >> >>> >>> > # Splitting hlog 7 of 7 >> >>> >>> > >> >>> >>> > On the 7th hlog it WARNs with two lines >> >>> >>> > >> >>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229 >> >>> %3A60020.1288021443546 >> >>> >>> might >> >>> >>> > be still open, length is 0 >> >>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229 >> >>> >>> %3A60020.1288021443546 >> >>> >>> > for reading. File is emptyjava.io.EOFException >> >>> >>> > >> >>> >>> > And completes with >> >>> >>> > >> >>> >>> > # log file splitting completed in 80372 millis for >> >>> >>> > .logs/F,60020,1287491528908 >> >>> >>> > >> >>> >>> > This might be it, but on the sixth step (where we kill -9ed the >> >>> >>> RegionServer >> >>> >>> > that hosts the only META region), it splits 2 hlogs without any >> >>> empty >> >>> >>> file >> >>> >>> > problems nor any log above INFO, but as i told before, our >> testtable >> >>> >>> still >> >>> >>> > got lost. >> >>> >>> > >> >>> >>> > I'll try to reproduce the problem in a cleaner way, but in the >> >>> meantime, >> >>> >>> any >> >>> >>> > kind of pointers to any problems we might have is greatly >> >>> appreciated. >> >>> >>> > >> >>> >>> > >> >>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu < >> >>> [email protected] >> >>> >>> >wrote: >> >>> >>> > >> >>> >>> >> Thanks for the answer. >> >>> >>> >> >> >>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember >> >>> both >> >>> >>> the >> >>> >>> >> conf file and the logs, and don't recall seeing any errors on >> >>> 60010. I >> >>> >>> >> crawled through logs all yesterday but don't remember anything >> >>> >>> indicating a >> >>> >>> >> specific error too. But i'm not sure about that. Let me check >> that >> >>> and >> >>> >>> get >> >>> >>> >> back here on monday. >> >>> >>> >> >> >>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans < >> >>> >>> [email protected]>wrote: >> >>> >>> >> >> >>> >>> >>> First thing I'd check is if your configuration has >> >>> dfs.support.append, >> >>> >>> >>> you can confirm this by looking at your region server logs. >> When a >> >>> RS >> >>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs -- >> >>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- >> HDFS-200 >> >>> -- >> >>> >>> >>> not available, dfs.support.append=false". Also the master web >> ui >> >>> (on >> >>> >>> >>> port 60010) will print an error message regarding that. >> >>> >>> >>> >> >>> >>> >>> If it's all ok, then you should take a look at the master log >> when >> >>> it >> >>> >>> >>> does the log splitting and see if it contains any obvious >> errors. >> >>> >>> >>> >> >>> >>> >>> J-D >> >>> >>> >>> >> >>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu < >> >>> >>> [email protected]> >> >>> >>> >>> wrote: >> >>> >>> >>> > Hi all, >> >>> >>> >>> > >> >>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an >> >>> >>> >>> HBase/MapReduce >> >>> >>> >>> > application on. In order to simulate a power failure we kill >> >>> -9ed >> >>> >>> all >> >>> >>> >>> things >> >>> >>> >>> > hadoop related on one of the slave nodes (DataNode, >> >>> RegionServer, >> >>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was >> on >> >>> >>> this >> >>> >>> >>> node >> >>> >>> >>> > too). We were expecting a smooth transition on all services >> but >> >>> were >> >>> >>> >>> unable >> >>> >>> >>> > to get on HBase end. While our regions seemed intact (not >> >>> >>> confirmed), we >> >>> >>> >>> > lost table definitions that pointed some kind of META region >> >>> fail. >> >>> >>> So >> >>> >>> >>> our >> >>> >>> >>> > application failed with several TableNotFoundExceptions. >> >>> Simulation >> >>> >>> was >> >>> >>> >>> > conducted with no-load and extremely small data (like 10 rows >> in >> >>> 3 >> >>> >>> >>> tables). >> >>> >>> >>> > >> >>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop >> >>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the >> >>> configuration >> >>> >>> >>> > parameters are in default. >> >>> >>> >>> > >> >>> >>> >>> > If we did something wrong up to this point, please ignore the >> >>> rest >> >>> >>> of >> >>> >>> >>> the >> >>> >>> >>> > message as i'll try to explain what we did to reproduce it >> and >> >>> might >> >>> >>> be >> >>> >>> >>> > irrelevant. >> >>> >>> >>> > >> >>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is >> >>> master-like >> >>> >>> >>> node, >> >>> >>> >>> > others are slaves and power fail is on F. Since we have >> little >> >>> data, >> >>> >>> we >> >>> >>> >>> have >> >>> >>> >>> > one ROOT and only one META region. I'll try to sum up the >> whole >> >>> >>> >>> scenario. >> >>> >>> >>> > >> >>> >>> >>> > A: NN, DN, JT, TT, HM, RS >> >>> >>> >>> > B: DN, TT, RS, ZK >> >>> >>> >>> > C: DN, TT, RS, ZK >> >>> >>> >>> > D: DN, TT, RS, ZK >> >>> >>> >>> > E: DN, TT, RS, ZK >> >>> >>> >>> > F: SNN, DN, TT, RS, ZK >> >>> >>> >>> > >> >>> >>> >>> > 0. Initial state -> ROOT: F, META: A >> >>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited >> >>> for >> >>> >>> about >> >>> >>> >>> half >> >>> >>> >>> > an hour to get nothing BTW >> >>> >>> >>> > 2. Put F back online -> No effect >> >>> >>> >>> > 3. Create a table 'testtable' to see if we lose it >> >>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again >> >>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again >> >>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost >> >>> >>> >>> 'testtable' >> >>> >>> >>> > but get our tables from before the simulation. It seemed like >> >>> >>> because A >> >>> >>> >>> had >> >>> >>> >>> > META before the simulation, the table definitions were >> revived. >> >>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost >> 2 >> >>> out >> >>> >>> of >> >>> >>> >>> our >> >>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems >> >>> >>> corrupted >> >>> >>> >>> too >> >>> >>> >>> > as our Scans don't finish. >> >>> >>> >>> > 8. Run to mailing-list. >> >>> >>> >>> > >> >>> >>> >>> > First of all thanks for reading up to this point. From what >> we >> >>> are >> >>> >>> now, >> >>> >>> >>> we >> >>> >>> >>> > are not even sure if this is the expected behavior, like if >> ROOT >> >>> or >> >>> >>> META >> >>> >>> >>> > region dies we lose data and must do sth like hbck, or if we >> are >> >>> >>> missing >> >>> >>> >>> a >> >>> >>> >>> > configuration, or if this is a bug. No need to mention that >> we >> >>> are >> >>> >>> >>> > relatively new to HBase so the last possibility is that if we >> >>> didn't >> >>> >>> >>> > understand it at all. >> >>> >>> >>> > >> >>> >>> >>> > Thanks in advance for any ideas. >> >>> >>> >>> > >> >>> >>> >>> > -- >> >>> >>> >>> > erdem agaoglu >> >>> >>> >>> > >> >>> >>> >>> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> -- >> >>> >>> >> erdem agaoglu >> >>> >>> >> >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> > -- >> >>> >>> > erdem agaoglu >> >>> >>> > >> >>> >>> >> >>> >> >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> erdem agaoglu >> >>> >> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > erdem agaoglu >> >>> > >> >>> >> >> >> >> >> >> >> >> -- >> >> erdem agaoglu >> >> >> > >> > >> > >> > -- >> > erdem agaoglu >> > >> > > > > -- > erdem agaoglu >
