Very interesting finding Erdem! I tried it on a local empty instance and I set the roll period to 6 secs, here's what I saw:
org.apache.hadoop.hbase.regionserver.LogRoller: Hlog roll period 6000ms elapsed org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs -- HDFS-200 org.apache.hadoop.hbase.regionserver.wal.HLog: Roll /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977933829, entries=1, filesize=295. New hlog /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977943913 org.apache.hadoop.hbase.regionserver.wal.HLog: Found 1 hlogs to remove out of total 1; oldest outstanding sequenceid is 270055 from region -ROOT-,,0 org.apache.hadoop.hbase.regionserver.wal.HLog: moving old hlog file /hbase-89-su/.logs/hbasedev,60020,1288977933643/10.10.1.177%3A60020.1288977933829 whose highest sequenceid is 270054 to /hbase-89-su/.oldlogs/10.10.1.177%3A60020.1288977933829 This looks like a very bad bug, I opened https://issues.apache.org/jira/browse/HBASE-3198 Can you correlate this with the previous tests you did? Hopefully it's the same problem. Thanks a lot for your patience and investigative work! J-D On Fri, Nov 5, 2010 at 9:28 AM, Erdem Agaoglu <[email protected]> wrote: > We have found the problem, again. But this time, we dug deeper and extracted > more info. > > First, the reason that we were unable to reproduce was trying to kill the RS > that holds -ROOT- too soon. That way, > 1. log split goes fine, > 2. -ROOT- gets assigned on somewhere else, > 3. log replaying for that RS goes smoothly > 4. .META. does not change. > 5. everything is normal > > But if we try after a while of running the cluster, > 1. log split fails with java.io.EOFException for the only hlog file, > 2. -ROOT- gets assigned on somewhere else, > 3. no log is replayed since none split > 4. .META. is invalid, a new .META. is designated > 5. data corrupted > > Before killing RS that holds -ROOT-, we checked the only hlog file of it, > and see that it contains the updates for the place of the .META. in the > first scenario. But in the second failing scenario, the file was empty. We > found that the log rolling process, which happens once an hour by default, > somehow thinks that the hlog is old and cleans it up. > > # LogRoller: Hlog roll period 3600000ms elapsed > # SequenceFileLogWriter: Using syncFs -- HDFS-200 > # HLog: Roll /hbase/.logs/A/blahfile0316, entries=1, filesize=312. New hlog > /hbase/.logs/A/blahfile0476 > # HLog: Found 1 hlogs to remove out of total 1; oldest outstanding > sequenceid is 4 from region -ROOT-,,0 > # HLog: moving old hlog file /hbase/.logs/A/blahfile0316 whose highest > sequenceid is 3 to /hbase/.oldlogs/blahfile0316 > > In that instance, blahfile0316 contains some rows while blahfile0476 is > empty. It very much seems like the problem is here, as it thinks the data in > memstore to be persisted while it is not. This seemed like a bug to us but > might as well be a misconfiguration. Any idea is greatly appreciated. > > > On Wed, Nov 3, 2010 at 4:36 PM, Erdem Agaoglu <[email protected]>wrote: > >> We suspect some misconfiguration too but are unsure what that might be. >> Almost all of the configuration parameters are in default along with >> dfs.support.append which is true in append branch. We checked RS logs and >> couldn't find any Exceptions, everything seems normal so it is unlikely to >> be a bug. The only WARN was about the last hlog file being empty but i guess >> this is normal. >> >> To confirm it is a append-configuration related problem, we killed the RS >> that holds -ROOT- right after running the cluster and checked the >> /hbase/.oldlogs/. There was a file containing a line for -ROOT- so i think >> append works correctly. The file got removed after a while. >> >> BTW, we tried to reproduce it with smaller data (to control manually, like >> 2 rows in 1 table, newly put) but didn't have any problems. I am starting to >> think the problem just disappeared. I'll increase rows to hundreds and check >> again. >> >> Until i get the problem again, Thanks for your time. >> >> On Tue, Nov 2, 2010 at 7:28 PM, Jean-Daniel Cryans >> <[email protected]>wrote: >> >>> This confirms my suspicion that -ROOT-'s latest data was lost and that >>> .META. got double assigned, which easily explains all the weirdness >>> you're seeing after the log replay. >>> >>> The most probable reason would be misconfiguration, since the append >>> feature is very well tested and a simple kill -9 wouldn't cause all >>> sorts of issues. Else, it could be caused by an HBase bug, in which >>> case exceptions would probably come out either during log splitting or >>> during log replay (which happens when the region is reassigned to >>> another region server). In your case you could take a look at the RS >>> log where -ROOT- ends up and see if there's anything weird. >>> >>> To test if appends works correctly, in order to rule that out, just >>> bring up your cluster and kill -9 right away the RS that's holding >>> ROOT. Once the master did the log splitting, do a lookup directly in >>> hadoop (bin/hadoop fs -cat) on that log file which should be under >>> /hbase/.oldlogs/serverstring/somelog and look for that -ROOT- entry. >>> If it's empty or missing, it's probably a misconfiguration. >>> >>> J-D >>> >>> On Tue, Nov 2, 2010 at 9:02 AM, Erdem Agaoglu <[email protected]> >>> wrote: >>> > We reproduced the problem by kill -9ing the region server that hosts >>> -ROOT- >>> > again. Results are the same with what i have explained before. After log >>> > splitting metaScanner complained that .META. is not valid and reassigned >>> it >>> > to somewhere else. As a result, the number of rows in .META. reduced >>> from 24 >>> > to 16 meaning we lost access to some regions and some table definitions. >>> To >>> > support what you have said about the data in the memstore, we seem to >>> have >>> > lost only recent regions. >>> > >>> > status 'detailed' showed no regionsInTransition, and showed a total of >>> 39 >>> > regions assigned to region servers, but as i've said before, we have >>> access >>> > to only 16 of them. No need to mention our queries fail. >>> > >>> > Any idea we could try is greatly appreciated. Thanks in advance. >>> > >>> > PS. Something probably irrelevant happened this time, interface 60010 >>> lists >>> > all our tables while 'list' command in shell lists only a few. We get >>> > TableNotFoundExceptions on tables shown in web interface but missing in >>> > 'list' command output. Seems like web interface is able to list >>> inaccessible >>> > tables. >>> > >>> > On Tue, Nov 2, 2010 at 9:56 AM, Erdem Agaoglu <[email protected] >>> >wrote: >>> > >>> >> By "lost tables" i mean no user table definitions were listed in >>> interface >>> >> 60010 and my queries got me TableNotFoundExceptions. Routine >>> BaseScanner >>> >> logged like >>> >> >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region ... >>> complete >>> >> >>> >> so i guess my .META. was empty. But unfortunately I don't know if any >>> >> regions were stuck in transition, i'll be sure to check this out while >>> i try >>> >> to reproduce. >>> >> >>> >> I rechecked the logs, specifically after the splitting completed and it >>> >> contains lines like "Current assignment is not valid..." so i guess >>> this is >>> >> something unexpected. But in hopes of some configuration error you may >>> spot, >>> >> whole log goes like that (stripped to be more readable): >>> >> >>> >> # hlog file splitting completed in 80372 millis >>> >> # Log split complete, meta reassignment and scanning: >>> >> # ProcessServerShutdown reassigning ROOT region >>> >> # -ROOT- region unset (but not set to be reassigned) >>> >> # ROOT inserted into regionsInTransition >>> >> # Assigning for serverName=C... total nregions to assign=1, regions to >>> >> give other servers than this=0, isMetaAssign=true >>> >> # Assigning serverName=C... 1 regions >>> >> # Assigning region -ROOT-... to C... >>> >> # Processing MSG_REPORT_OPEN: -ROOT-... from C...; 1 of 1 >>> >> # RegionManager.rootScanner scanning meta region {server: C ... >>> >> # Current assignment of .META.... is not valid; serverAddress=A, >>> >> startCode=blah unknown; checking once more! >>> >> # Current assignment of .META.... is not valid; serverAddress=A, >>> >> startCode=blah unknown. >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C >>> ... >>> >> complete >>> >> # RegionManager.rootScanner scanning meta region {server: C ... >>> >> # RegionManager.rootScanner scan of 1 row(s) of meta region {server: C >>> ... >>> >> complete >>> >> # Assigning for serverName=E... total nregions to assign=-1, regions >>> to >>> >> give other servers than this=2, isMetaAssign=true >>> >> # Assigning serverName=E... -1 regions >>> >> # Assigning region .META.... to E... >>> >> # Processing MSG_REPORT_OPEN: .META.... from E...; 1 of 1 >>> >> # Processing todo: PendingOpenOperation from E... >>> >> # .META.... open on E >>> >> # Updated row .META.... in region -ROOT- ... >>> >> # Adding to onlineMetaRegions: {server: E ... >>> >> # RegionManager.metaScanner scanning meta region {server: E ... >>> >> # RegionManager.metaScanner scan of 0 row(s) of meta region {server: E >>> ... >>> >> # All 1 .META. region(s) scanned >>> >> 10 secs later >>> >> # Processing todo: ProcessServerShutdown of F >>> >> # Process shutdown of server F...: logSplit: true, rootRescanned: >>> false, >>> >> numberOfMetaRegions: 1, onlineMetaRegions.size(): 1 >>> >> # Log split complete, meta reassignment and scanning >>> >> # Process server shutdown scanning root region on C >>> >> # Process server shutdown scanning root region on C finished master >>> >> # process server shutdown scanning .META.,,1 on E >>> >> # process server shutdown finished scanning .META.,,1 on E >>> >> # Removed F... from deadservers Map >>> >> >>> >> Thanks again. >>> >> >>> >> On Mon, Nov 1, 2010 at 6:58 PM, Jean-Daniel Cryans < >>> [email protected]>wrote: >>> >> >>> >>> The fact that the tables are "revived" is a clue here IMO, but let's >>> >>> go back to more basic questions... >>> >>> >>> >>> So when you say that during step 1 you lost tables, what do you mean >>> >>> by "lost"? Were the rows of those tables still in .META.? Were the >>> >>> regions stuck in transition (in the shell, do: status 'detailed')? Or >>> >>> when you tried to query them you just got a TableNotFoundException? >>> >>> >>> >>> Also, the fact that only -ROOT- and not .META. was on this region >>> >>> server means that if there was any data lost, it would be .META.'s >>> >>> location and it would have been assigned somewhere else (E), but still >>> >>> stayed assigned on A. Since the data is in the memstore, recent data >>> >>> couldn't be read by this second assignment of .META. but... it could >>> >>> also be reassigned for a "normal" reason like rebalancing. The best >>> >>> way to confirm that is when the -ROOT- region gets reassigned at the >>> >>> end of step 1 (so this is after the message that goes like "...file >>> >>> splitting completed in 80372..."), do so see something like this in >>> >>> the master's log: "Current assignment of .META.,,some_timestamp is >>> >>> not valid; serverAddress=blah, startcode=blah unknown."? If so, then >>> >>> it seems that data was lost and this is really unexpected. >>> >>> >>> >>> J-D >>> >>> >>> >>> On Mon, Nov 1, 2010 at 1:36 AM, Erdem Agaoglu < >>> [email protected]> >>> >>> wrote: >>> >>> > Hi again, >>> >>> > >>> >>> > I have re-checked our configuration to confirm that we have >>> >>> > dfs.support.append enabled and saw "Using syncFs -- HDFS-200" in >>> logs. I >>> >>> > inspected logs around log splits to find something, but i'm not sure >>> >>> that's >>> >>> > what we are looking for. In the first step of the scenario i >>> mentioned >>> >>> > before (where we kill -9ed everything on the node that hosts the >>> ROOT >>> >>> > region), HLog says (stripping hdfs:// prefixes and hostnames for >>> >>> clarity) >>> >>> > >>> >>> > # Splitting 7 hlog(s) in .logs/F,60020,1287491528908 >>> >>> > >>> >>> > Then it goes over every single one like >>> >>> > >>> >>> > # Splitting hlog 1 of 7 >>> >>> > # Splitting hlog 2 of 7 >>> >>> > # ... >>> >>> > # Splitting hlog 7 of 7 >>> >>> > >>> >>> > On the 7th hlog it WARNs with two lines >>> >>> > >>> >>> > # File .logs/F,60020,1287491528908/10.1.10.229 >>> %3A60020.1288021443546 >>> >>> might >>> >>> > be still open, length is 0 >>> >>> > # Could not open .logs/F,60020,1287491528908/10.1.10.229 >>> >>> %3A60020.1288021443546 >>> >>> > for reading. File is emptyjava.io.EOFException >>> >>> > >>> >>> > And completes with >>> >>> > >>> >>> > # log file splitting completed in 80372 millis for >>> >>> > .logs/F,60020,1287491528908 >>> >>> > >>> >>> > This might be it, but on the sixth step (where we kill -9ed the >>> >>> RegionServer >>> >>> > that hosts the only META region), it splits 2 hlogs without any >>> empty >>> >>> file >>> >>> > problems nor any log above INFO, but as i told before, our testtable >>> >>> still >>> >>> > got lost. >>> >>> > >>> >>> > I'll try to reproduce the problem in a cleaner way, but in the >>> meantime, >>> >>> any >>> >>> > kind of pointers to any problems we might have is greatly >>> appreciated. >>> >>> > >>> >>> > >>> >>> > On Fri, Oct 29, 2010 at 9:25 AM, Erdem Agaoglu < >>> [email protected] >>> >>> >wrote: >>> >>> > >>> >>> >> Thanks for the answer. >>> >>> >> >>> >>> >> I am pretty sure we have dfs.support.append enabled. I remember >>> both >>> >>> the >>> >>> >> conf file and the logs, and don't recall seeing any errors on >>> 60010. I >>> >>> >> crawled through logs all yesterday but don't remember anything >>> >>> indicating a >>> >>> >> specific error too. But i'm not sure about that. Let me check that >>> and >>> >>> get >>> >>> >> back here on monday. >>> >>> >> >>> >>> >> On Thu, Oct 28, 2010 at 7:30 PM, Jean-Daniel Cryans < >>> >>> [email protected]>wrote: >>> >>> >> >>> >>> >>> First thing I'd check is if your configuration has >>> dfs.support.append, >>> >>> >>> you can confirm this by looking at your region server logs. When a >>> RS >>> >>> >>> starts, it creates an HLog and will print out: "Using syncFs -- >>> >>> >>> HDFS-200" if it's configured, else you'll see "syncFs -- HDFS-200 >>> -- >>> >>> >>> not available, dfs.support.append=false". Also the master web ui >>> (on >>> >>> >>> port 60010) will print an error message regarding that. >>> >>> >>> >>> >>> >>> If it's all ok, then you should take a look at the master log when >>> it >>> >>> >>> does the log splitting and see if it contains any obvious errors. >>> >>> >>> >>> >>> >>> J-D >>> >>> >>> >>> >>> >>> On Thu, Oct 28, 2010 at 12:58 AM, Erdem Agaoglu < >>> >>> [email protected]> >>> >>> >>> wrote: >>> >>> >>> > Hi all, >>> >>> >>> > >>> >>> >>> > We have a testing cluster of 6 nodes which we try to run an >>> >>> >>> HBase/MapReduce >>> >>> >>> > application on. In order to simulate a power failure we kill >>> -9ed >>> >>> all >>> >>> >>> things >>> >>> >>> > hadoop related on one of the slave nodes (DataNode, >>> RegionServer, >>> >>> >>> > TaskTracker, ZK quorum peer and i think SecondaryNameNode was on >>> >>> this >>> >>> >>> node >>> >>> >>> > too). We were expecting a smooth transition on all services but >>> were >>> >>> >>> unable >>> >>> >>> > to get on HBase end. While our regions seemed intact (not >>> >>> confirmed), we >>> >>> >>> > lost table definitions that pointed some kind of META region >>> fail. >>> >>> So >>> >>> >>> our >>> >>> >>> > application failed with several TableNotFoundExceptions. >>> Simulation >>> >>> was >>> >>> >>> > conducted with no-load and extremely small data (like 10 rows in >>> 3 >>> >>> >>> tables). >>> >>> >>> > >>> >>> >>> > On our setup, HBase is 0.89.20100924, r1001068 while Hadoop >>> >>> >>> > runs 0.20.3-append-r964955-1240, r960957. Most of the >>> configuration >>> >>> >>> > parameters are in default. >>> >>> >>> > >>> >>> >>> > If we did something wrong up to this point, please ignore the >>> rest >>> >>> of >>> >>> >>> the >>> >>> >>> > message as i'll try to explain what we did to reproduce it and >>> might >>> >>> be >>> >>> >>> > irrelevant. >>> >>> >>> > >>> >>> >>> > Say the machines are named A, B, C, D, E, F; where A is >>> master-like >>> >>> >>> node, >>> >>> >>> > others are slaves and power fail is on F. Since we have little >>> data, >>> >>> we >>> >>> >>> have >>> >>> >>> > one ROOT and only one META region. I'll try to sum up the whole >>> >>> >>> scenario. >>> >>> >>> > >>> >>> >>> > A: NN, DN, JT, TT, HM, RS >>> >>> >>> > B: DN, TT, RS, ZK >>> >>> >>> > C: DN, TT, RS, ZK >>> >>> >>> > D: DN, TT, RS, ZK >>> >>> >>> > E: DN, TT, RS, ZK >>> >>> >>> > F: SNN, DN, TT, RS, ZK >>> >>> >>> > >>> >>> >>> > 0. Initial state -> ROOT: F, META: A >>> >>> >>> > 1. Power fail on F -> ROOT: C, META: E -> lost tables, waited >>> for >>> >>> about >>> >>> >>> half >>> >>> >>> > an hour to get nothing BTW >>> >>> >>> > 2. Put F back online -> No effect >>> >>> >>> > 3. Create a table 'testtable' to see if we lose it >>> >>> >>> > 4. Kill -9ed DataNode on F -> No effect -> Start it again >>> >>> >>> > 5. Kill -9ed RegionServer on F -> No effect -> Start it again >>> >>> >>> > 6. Kill -9ed RegionServer on E -> ROOT: C, META: A -> We lost >>> >>> >>> 'testtable' >>> >>> >>> > but get our tables from before the simulation. It seemed like >>> >>> because A >>> >>> >>> had >>> >>> >>> > META before the simulation, the table definitions were revived. >>> >>> >>> > 7. Restarted the whole cluster -> ROOT: A, META: F -> We lost 2 >>> out >>> >>> of >>> >>> >>> our >>> >>> >>> > original 6 tables, 'testtable' revived. That small data seems >>> >>> corrupted >>> >>> >>> too >>> >>> >>> > as our Scans don't finish. >>> >>> >>> > 8. Run to mailing-list. >>> >>> >>> > >>> >>> >>> > First of all thanks for reading up to this point. From what we >>> are >>> >>> now, >>> >>> >>> we >>> >>> >>> > are not even sure if this is the expected behavior, like if ROOT >>> or >>> >>> META >>> >>> >>> > region dies we lose data and must do sth like hbck, or if we are >>> >>> missing >>> >>> >>> a >>> >>> >>> > configuration, or if this is a bug. No need to mention that we >>> are >>> >>> >>> > relatively new to HBase so the last possibility is that if we >>> didn't >>> >>> >>> > understand it at all. >>> >>> >>> > >>> >>> >>> > Thanks in advance for any ideas. >>> >>> >>> > >>> >>> >>> > -- >>> >>> >>> > erdem agaoglu >>> >>> >>> > >>> >>> >>> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> >> -- >>> >>> >> erdem agaoglu >>> >>> >> >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > erdem agaoglu >>> >>> > >>> >>> >>> >> >>> >> >>> >> >>> >> -- >>> >> erdem agaoglu >>> >> >>> > >>> > >>> > >>> > -- >>> > erdem agaoglu >>> > >>> >> >> >> >> -- >> erdem agaoglu >> > > > > -- > erdem agaoglu >
