Thanks all for your help with this, everything seems much more stable for the meantime. I have a backlog loading job to run over a great deal of data, so I might separate out my region servers from my task trackers for the meantime.
Thanks again, Jamie On 8 July 2010 17:46, Jean-Daniel Cryans <[email protected]> wrote: > OS cache is good, glad you figured out your memory problem. > > J-D > > On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <[email protected]> > wrote: >> Morning all. Day 2 begins... >> >> I discussed this with someone else earlier and they pointed out that >> we also have task trackers running on all of those nodes, which will >> affect the amount of memory being used when jobs are being run. Each >> tasktracker had a maximum of 8 maps and 8 reduces configured per node, >> with a JVM Xmx of 512mb each. Clearly this implies a fully utilised >> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's >> before the datanode does anything, or HBase for that matter. >> >> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the >> Xmx to 256mb, giving a potential maximum task overhead of 2GB per >> node. Running 'vmstat 20' now, under load from mapreduce jobs, >> suggests that the actual free memory is about the same, but the memory >> cache is much much bigger, which presumably is healthlier as, in >> theory, that ought to relinquish memory to processes that request it. >> >> Lets see if that does the trick! >> >> ta >> >> Jamie >> >> >> On 7 July 2010 19:30, Jean-Daniel Cryans <[email protected]> wrote: >>> YouAreDead means that the region server's session was expired, GC >>> seems like your major problem. (file problems can happen after a GC >>> sleep because they were moved around while the process was sleeping, >>> you also get the same kind of messages with xcievers issue... sorry >>> for the confusion) >>> >>> By over committing the memory I meant trying to fit too much stuff in >>> the amount of RAM that you have. I guess it's the map and reduce tasks >>> that eat all the free space? Why not lower their number? >>> >>> J-D >>> >>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill >>> <[email protected]> wrote: >>>> PS, I've now reset my MAX_FILESIZE back to the default. (from the 1GB >>>> i raised it to). It caused me to run into a delightful >>>> 'YouAreDeadException' which looks very related to the Garbage >>>> collection issues on the Troubleshooting page, as my Zookeeper session >>>> expired. >>>> >>>> Thanks >>>> >>>> Jamie >>>> >>>> >>>> >>>> On 7 July 2010 19:19, Jamie Cockrill <[email protected]> wrote: >>>>> By overcommit, do you mean make my overcommit_ratio higher on each box >>>>> (its at the default 50 at the moment)? What I'm noticing at the moment >>>>> is that hadoop is taking up the vast majority of the memory on the >>>>> boxes. >>>>> >>>>> I found this article: >>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/ >>>>> which Todd, it looks like you replied to. Does this sound like a >>>>> similar problem? No worries if you can't remember, it was back in >>>>> january! This article suggests reducing the amount of memory allocated >>>>> to Hadoop at startup, how would I go about doing this? >>>>> >>>>> Thank you everyone for your patience so far. Sorry if this is taking >>>>> up a lot of your time. >>>>> >>>>> Thanks, >>>>> >>>>> Jamie >>>>> >>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <[email protected]> wrote: >>>>>> swappinness at 0 is good, but also don't overcommit your memory! >>>>>> >>>>>> J-D >>>>>> >>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill >>>>>> <[email protected]> wrote: >>>>>>> I think you're right. >>>>>>> >>>>>>> Unfortunately the machines are on a separate network to this laptop, >>>>>>> so I'm having to type everything across, apologies if it doesn't >>>>>>> translate well... >>>>>>> >>>>>>> free -m gave: >>>>>>> >>>>>>> Mem Total Used Free >>>>>>> 7992 7939 53 >>>>>>> b/c 7877 114 >>>>>>> Swap: 23415 895 22519 >>>>>>> >>>>>>> I did this on another node that isn't being smashed at the moment and >>>>>>> the numbers came out similar, but the buffers/cache free was higher >>>>>>> >>>>>>> vmstat -20 is giving non-zero si and so's ranging between 3 and just >>>>>>> short of 5000. >>>>>>> >>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests setting >>>>>>> swappiness to 0, is that just a case of changing the value in >>>>>>> /proc/sys/vm/swappiness? >>>>>>> >>>>>>> thanks >>>>>>> >>>>>>> Jamie >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7 July 2010 18:40, Todd Lipcon <[email protected]> wrote: >>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> On the subject of GC and heap, I've left those as defaults. I could >>>>>>>>> look at those if that's the next logical step? Would there be anything >>>>>>>>> in any of the logs that I should look at? >>>>>>>>> >>>>>>>>> One thing I have noticed is that it does take an absolute age to log >>>>>>>>> in to the DN/RS to restart the RS once it's fallen over, in one >>>>>>>>> instance it took about 10 minutes. These are 8GB, 4 core amd64 boxes >>>>>>>>> >>>>>>>>> >>>>>>>> That indicates swapping. Can you run "free -m" on the node? >>>>>>>> >>>>>>>> Also let "vmstat 20" run while running your job and observe the "si" >>>>>>>> and >>>>>>>> "so" columns. If those are nonzero, it indicates you're swapping, and >>>>>>>> you've >>>>>>>> oversubscribed your RAM (very easy on 8G machines) >>>>>>>> >>>>>>>> -Todd >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> ta >>>>>>>>> >>>>>>>>> Jamie >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <[email protected]> wrote: >>>>>>>>> > Bad news, it looks like my xcievers is set as it should be, it's in >>>>>>>>> > the hdfs-site.xml and looking at the job.xml of one of my jobs in >>>>>>>>> > the >>>>>>>>> > job-tracker, it's showing that property as set to 2047. I've cat | >>>>>>>>> > grepped one of the datanode logs and although there were a few in >>>>>>>>> > there, they were from a few months ago. I've upped my MAX_FILESIZE >>>>>>>>> > on >>>>>>>>> > my table to 1GB to see if that helps (not sure if it will!). >>>>>>>>> > >>>>>>>>> > Thanks, >>>>>>>>> > >>>>>>>>> > Jamie >>>>>>>>> > >>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <[email protected]> >>>>>>>>> > wrote: >>>>>>>>> >> xcievers exceptions will be in the datanodes' logs, and your >>>>>>>>> >> problem >>>>>>>>> >> totally looks like it. 0.20.5 will have the same issue (since it's >>>>>>>>> >> on >>>>>>>>> >> the HDFS side) >>>>>>>>> >> >>>>>>>>> >> J-D >>>>>>>>> >> >>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill >>>>>>>>> >> <[email protected]> wrote: >>>>>>>>> >>> Hi Todd & JD, >>>>>>>>> >>> >>>>>>>>> >>> Environment: >>>>>>>>> >>> All (hadoop and HBase) installed as of karmic-cdh3, which means: >>>>>>>>> >>> Hadoop 0.20.2+228 >>>>>>>>> >>> HBase 0.89.20100621+17 >>>>>>>>> >>> Zookeeper 3.3.1+7 >>>>>>>>> >>> >>>>>>>>> >>> Unfortunately my whole cluster of regionservers have now crashed, >>>>>>>>> >>> so I >>>>>>>>> >>> can't really say if it was swapping too much. There is a DEBUG >>>>>>>>> >>> statement just before it crashes saying: >>>>>>>>> >>> >>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog >>>>>>>>> >>> writer in >>>>>>>>> >>> hdfs://<somewhere on my HDFS, in /hbase> >>>>>>>>> >>> >>>>>>>>> >>> What follows is: >>>>>>>>> >>> >>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: >>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException: >>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No >>>>>>>>> >>> lease >>>>>>>>> >>> on <file location as above> File does not exist. Holder >>>>>>>>> >>> DFSClient_-11113603 does not have any open files >>>>>>>>> >>> >>>>>>>>> >>> It then seems to try and do some error recovery (Error Recovery >>>>>>>>> >>> for >>>>>>>>> >>> block null bad datanode[0] nodes == null), fails (Could not get >>>>>>>>> >>> block >>>>>>>>> >>> locations. Source file "<hbase file as before>" - Aborting). >>>>>>>>> >>> There is >>>>>>>>> >>> then an ERROR org.apache...HRegionServer: Close and delete failed. >>>>>>>>> >>> There is then a similar LeaseExpiredException as above. >>>>>>>>> >>> >>>>>>>>> >>> There are then a couple of messages from HRegionServer saying that >>>>>>>>> >>> it's notifying master of its shutdown and stopping itself. The >>>>>>>>> >>> shutdown hook then fires and the RemoteException and >>>>>>>>> >>> LeaseExpiredExceptions are printed again. >>>>>>>>> >>> >>>>>>>>> >>> ulimit is set to 65000 (it's in the regionserver log, printed as I >>>>>>>>> >>> restarted the regionserver), however I haven't got the xceivers >>>>>>>>> >>> set >>>>>>>>> >>> anywhere. I'll give that a go. It does seem very odd as I did >>>>>>>>> >>> have a >>>>>>>>> >>> few of them fall over one at a time with a few early loads, but >>>>>>>>> >>> that >>>>>>>>> >>> seemed to be because the regions weren't splitting properly, so >>>>>>>>> >>> all >>>>>>>>> >>> the traffic was going to one node and it was being overwhelmed. >>>>>>>>> >>> Once I >>>>>>>>> >>> throttled it, after one load it a region split seemed to get >>>>>>>>> >>> triggered, which flung regions all over, which made subsequent >>>>>>>>> >>> loads >>>>>>>>> >>> much more distributed. However, perhaps the time-bomb was >>>>>>>>> >>> ticking... >>>>>>>>> >>> I'll have a go at specifying the xcievers property. I'm pretty >>>>>>>>> >>> certain i've got everything else covered, except the patches as >>>>>>>>> >>> referenced in the JIRA. >>>>>>>>> >>> >>>>>>>>> >>> I just grepped some of the log files and didn't get an explicit >>>>>>>>> >>> exception with 'xciever' in it. >>>>>>>>> >>> >>>>>>>>> >>> I am considering downgrading(?) to 0.20.5, however because >>>>>>>>> >>> everything >>>>>>>>> >>> is installed as per karmic-cdh3, I'm a bit reluctant to do so as >>>>>>>>> >>> presumably Cloudera has tested each of these versions against each >>>>>>>>> >>> other? And I don't really want to introduce further versioning >>>>>>>>> >>> issues. >>>>>>>>> >>> >>>>>>>>> >>> Thanks, >>>>>>>>> >>> >>>>>>>>> >>> Jamie >>>>>>>>> >>> >>>>>>>>> >>> >>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans <[email protected]> >>>>>>>>> >>> wrote: >>>>>>>>> >>>> Jamie, >>>>>>>>> >>>> >>>>>>>>> >>>> Does your configuration meets the requirements? >>>>>>>>> >>>> >>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements >>>>>>>>> >>>> >>>>>>>>> >>>> ulimit and xcievers, if not set, are usually time bombs that >>>>>>>>> >>>> blow off >>>>>>>>> when >>>>>>>>> >>>> the cluster is under load. >>>>>>>>> >>>> >>>>>>>>> >>>> J-D >>>>>>>>> >>>> >>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM, Jamie Cockrill < >>>>>>>>> [email protected]>wrote: >>>>>>>>> >>>> >>>>>>>>> >>>>> Dear all, >>>>>>>>> >>>>> >>>>>>>>> >>>>> My current HBase/Hadoop architecture has HBase region servers >>>>>>>>> >>>>> on the >>>>>>>>> >>>>> same physical boxes as the HDFS data-nodes. I'm getting an >>>>>>>>> >>>>> awful lot >>>>>>>>> >>>>> of region server crashes. The last thing that happens appears >>>>>>>>> >>>>> to be a >>>>>>>>> >>>>> DroppedSnapshot Exception, caused by an IOException: could not >>>>>>>>> >>>>> complete write to file <file on HDFS>. I am running it under >>>>>>>>> >>>>> load, >>>>>>>>> how >>>>>>>>> >>>>> heavy that is I'm not sure how that is quantified, but I'm >>>>>>>>> >>>>> guessing >>>>>>>>> it >>>>>>>>> >>>>> is a load issue. >>>>>>>>> >>>>> >>>>>>>>> >>>>> Is it common practice to put region servers on data-nodes? Is it >>>>>>>>> >>>>> common to see region server crashes when either the HDFS or >>>>>>>>> >>>>> region >>>>>>>>> >>>>> server (or both) is under heavy load? I'm guessing that is the >>>>>>>>> >>>>> case >>>>>>>>> as >>>>>>>>> >>>>> I've seen a few similar posts. I've not got a great deal of >>>>>>>>> >>>>> capacity >>>>>>>>> >>>>> to be separating region servers from HDFS data nodes, but it >>>>>>>>> >>>>> might be >>>>>>>>> >>>>> an argument I could make. >>>>>>>>> >>>>> >>>>>>>>> >>>>> Thanks >>>>>>>>> >>>>> >>>>>>>>> >>>>> Jamie >>>>>>>>> >>>>> >>>>>>>>> >>>> >>>>>>>>> >>> >>>>>>>>> >> >>>>>>>>> > >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Todd Lipcon >>>>>>>> Software Engineer, Cloudera >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
