When I start HBase I usually just tail the master log, but it's actually just a few seconds then another few seconds for .META. then it starts assigning all other regions.
Did you make sure your master log was clean of errors? J-D On Thu, Jul 1, 2010 at 5:40 PM, Jinsong Hu <[email protected]> wrote: > yes, it terminated correctely. there is no exception while running the > add_table. > > are you saying that after restart, I need to wait for some time for the > -ROOT- to > be assigned ? usually how long I need to wait ? > > Jimmy > > -------------------------------------------------- > From: "Jean-Daniel Cryans" <[email protected]> > Sent: Thursday, July 01, 2010 5:27 PM > To: <[email protected]> > Subject: Re: dilemma of memory and CPU for hbase. > >> Did you see any exception when you ran add_table? Did it even >> terminated correctly? >> >> After a restart, the regions aren't readily available. If something >> blocked the master from assigning -ROOT-, it should be pretty evident >> by looking at the master log. >> >> J-D >> >> On Thu, Jul 1, 2010 at 5:23 PM, Jinsong Hu <[email protected]> wrote: >>> >>> After I run the add_table.rb, I refreshed the master's UI page, and then >>> clicked on the table to show the regions. I expect that all regions will >>> be >>> there. >>> But , I found that there are significantly fewer regions. Lots of regions >>> that was there before were gone. >>> >>> I then restarted the whole hbase master and region server. And now it is >>> even worse. the master UI page doesn't even load. saying the _ROOT region >>> is and .META is not served by any regionserver. The whole cluster is not >>> in >>> a usable state. >>> >>> That forced me to rename the /hbase to /hbase-0.20.4, and restart all >>> hbase >>> master and regionservers. recreate all tables, etc.essentially starting >>> from scratch. >>> >>> Jimmy >>> >>> -------------------------------------------------- >>> From: "Jean-Daniel Cryans" <[email protected]> >>> Sent: Thursday, July 01, 2010 5:10 PM >>> To: <[email protected]> >>> Subject: Re: dilemma of memory and CPU for hbase. >>> >>>> add_table.rb doesn't actually write much in the file system, all your >>>> data is still there. It just wipes all the .META. entries and replaces >>>> them with the .regioninfo files found in every region directory. >>>> >>>> Can you define what you mean by "corrupted". It's really an >>>> overloaded-term. >>>> >>>> J-D >>>> >>>> On Thu, Jul 1, 2010 at 5:01 PM, Jinsong Hu <[email protected]> >>>> wrote: >>>>> >>>>> Hi, Jean: >>>>> Thanks! I will run the add_table.rb and see if it fixes the problem. >>>>> Our namenode is backed up with HA and DRBD, and the hbase master >>>>> machine >>>>> colocates with name node , job tracker so we are not wasting resources. >>>>> >>>>> The region hole probably comes from previous 0.20.4 hbase operation. >>>>> the >>>>> 0.20.4 hbase was >>>>> very unstable during its operation. lots of times the master says the >>>>> region >>>>> is not there but actually >>>>> the region server says it was serving the region. >>>>> >>>>> >>>>> I followed the instruction and run commands like >>>>> >>>>> bin/hbase org.jruby.Main bin/add_table.rb /hbase/table_name >>>>> >>>>> After the execution, I found all my tables are corrupted and I can't >>>>> use >>>>> it >>>>> any more. restarting hbase >>>>> doesn't help either. I have to wipe out all the /hbase directory and >>>>> start >>>>> from scratch. >>>>> >>>>> >>>>> it looks that the add_table.rb can corrupt the whole hbase. Anyway, I >>>>> am >>>>> regenerating the data from >>>>> scratch and let's see if it will work out. >>>>> >>>>> Jimmy. >>>>> >>>>> >>>>> -------------------------------------------------- >>>>> From: "Jean-Daniel Cryans" <[email protected]> >>>>> Sent: Thursday, July 01, 2010 2:17 PM >>>>> To: <[email protected]> >>>>> Subject: Re: dilemma of memory and CPU for hbase. >>>>> >>>>>> (taking the conversation back to the list after receiving logs and >>>>>> heap >>>>>> dump) >>>>>> >>>>>> The issue here is actually much more nasty than it seems. But before I >>>>>> describe the problem, you said: >>>>>> >>>>>>> I have 3 machines as hbase master (only 1 is active), 3 zookeepers. >>>>>>> 8 >>>>>>> regionservers. >>>>>> >>>>>> If those are all distinct machines, you are wasting a lot of hardware. >>>>>> Unless you have a HA Namenode (I highly doubt), then you already have >>>>>> a SPOF there so you might as well put every service on that single >>>>>> node (1 master, 1 zookeeper). You might be afraid of using only 1 ZK >>>>>> node, but unless you share the zookeeper ensemble between clusters >>>>>> then losing the Namenode is as bad as losing ZK so might as well put >>>>>> them together. At StumbleUpon we have 2-3 clusters using the same >>>>>> ensembles, so it makes more sense to put them in a HA setup. >>>>>> >>>>>> That said, in your log I see: >>>>>> >>>>>> 2010-06-29 00:00:00,064 DEBUG >>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts >>>>>> interrupted at index=0 because:Requested row out of range for HRegion >>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah >>>>>> ... >>>>>> 2010-06-29 12:26:13,352 DEBUG >>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer: Batch puts >>>>>> interrupted at index=0 because:Requested row out of range for HRegion >>>>>> Spam_MsgEventTable,2010-06-28 11:34:02blah >>>>>> >>>>>> So for 12 hours (and probably more), the same row was requested almost >>>>>> every 100ms but it was always failing on a WrongRegionException >>>>>> (that's the name of what we see here). You probably use the write >>>>>> buffer since you want to import as fast as possible, so all these >>>>>> buffers are left unused after the clients terminate their RPC. That >>>>>> rate of failed insertion must have kept your garbage collector _very_ >>>>>> busy, and at some point the JVM OOMEd. This is the stack from your >>>>>> OOME: >>>>>> >>>>>> java.lang.OutOfMemoryError: Java heap space >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invocation.readFields(HBaseRPC.java:175) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.processData(HBaseServer.java:867) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Connection.readAndProcess(HBaseServer.java:835) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:419) >>>>>> at >>>>>> >>>>>> >>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Listener.run(HBaseServer.java:318) >>>>>> >>>>>> This is where we deserialize client data, so it correlates with what I >>>>>> just described. >>>>>> >>>>>> Now, this means that you probably have a hole (or more) in your .META. >>>>>> table. It usually happens after a region server fails if it was >>>>>> carrying it (since data loss is possible with that version of HDFS) or >>>>>> if a bug in the master messes up the .META. region. Now 2 things: >>>>>> >>>>>> - It would be nice to know why you have a hole. Look at your .META. >>>>>> table around the row in your region server log, you should see that >>>>>> the start/end keys don't match. Then you can look in the master log >>>>>> from yesterday to search for what went wrong, maybe see some >>>>>> exceptions, or maybe a region server failed for any reason and it was >>>>>> hosting .META. >>>>>> >>>>>> - You probably want to fix your table. Use the bin/add_table.rb >>>>>> script (other people on this list used it in the past, search the >>>>>> archive for more info). >>>>>> >>>>>> Finally (whew!), if you are still developing your solution around >>>>>> HBase, you might want to try out one of our dev release that does work >>>>>> with a durable Hadoop release. See >>>>>> http://hbase.apache.org/docs/r0.89.20100621/ for more info. Cloudera's >>>>>> CDH3b2 also has everything you need. >>>>>> >>>>>> J-D >>>>>> >>>>>> On Thu, Jul 1, 2010 at 12:03 PM, Jean-Daniel Cryans >>>>>> <[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> 653 regions is very low, even if you had a total of 3 region servers >>>>>>> I >>>>>>> wouldn't expect any problem. >>>>>>> >>>>>>> So to me it seems to point towards either a configuration issue or a >>>>>>> usage issue. Can you: >>>>>>> >>>>>>> - Put the log of one region server that OOMEd on a public server. >>>>>>> - Tell us more about your setup: # of nodes, hardware, configuration >>>>>>> file >>>>>>> - Tell us more about how you insert data into HBase >>>>>>> >>>>>>> And BTW are you trying to do an initial import of your data set? If >>>>>>> so, have you considered using HFileOutputFormat? >>>>>>> >>>>>>> Thx, >>>>>>> >>>>>>> J-D >>>>>>> >>>>>>> On Thu, Jul 1, 2010 at 11:52 AM, Jinsong Hu <[email protected]> >>>>>>> wrote: >>>>>>>> >>>>>>>> Hi, Sir: >>>>>>>> I am using hbase 0.20.5 and this morning I found that 3 of my >>>>>>>> region >>>>>>>> server running out of memory. >>>>>>>> the regionserver is given 6G memory each, and on average, I have 653 >>>>>>>> regions >>>>>>>> in total. max store size >>>>>>>> is 256M. I analyzed the dump and it shows that there are too many >>>>>>>> HRegion in >>>>>>>> memory. >>>>>>>> >>>>>>>> Previously set max store size to 2G, but then I found the region >>>>>>>> server >>>>>>>> constantly does minor compaction and the CPU usage is very high, It >>>>>>>> also >>>>>>>> blocks the heavy client record insertion. >>>>>>>> >>>>>>>> So now I am limited on one side by memory, limited on another size >>>>>>>> by >>>>>>>> CPU. >>>>>>>> Is there anyway to get out of this dilemma ? >>>>>>>> >>>>>>>> Jimmy. >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
