This would be done at the expense of network IO, since you will lose locality for jobs that read/write to HBase. Also I guess the datanodes are also there, so HBase will lose locality with HDFS.
J-D On Thu, Jul 8, 2010 at 10:07 AM, Jamie Cockrill <[email protected]> wrote: > Thanks all for your help with this, everything seems much more stable > for the meantime. I have a backlog loading job to run over a great > deal of data, so I might separate out my region servers from my task > trackers for the meantime. > > Thanks again, > > Jamie > > > > On 8 July 2010 17:46, Jean-Daniel Cryans <[email protected]> wrote: >> OS cache is good, glad you figured out your memory problem. >> >> J-D >> >> On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <[email protected]> >> wrote: >>> Morning all. Day 2 begins... >>> >>> I discussed this with someone else earlier and they pointed out that >>> we also have task trackers running on all of those nodes, which will >>> affect the amount of memory being used when jobs are being run. Each >>> tasktracker had a maximum of 8 maps and 8 reduces configured per node, >>> with a JVM Xmx of 512mb each. Clearly this implies a fully utilised >>> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's >>> before the datanode does anything, or HBase for that matter. >>> >>> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the >>> Xmx to 256mb, giving a potential maximum task overhead of 2GB per >>> node. Running 'vmstat 20' now, under load from mapreduce jobs, >>> suggests that the actual free memory is about the same, but the memory >>> cache is much much bigger, which presumably is healthlier as, in >>> theory, that ought to relinquish memory to processes that request it. >>> >>> Lets see if that does the trick! >>> >>> ta >>> >>> Jamie >>> >>> >>> On 7 July 2010 19:30, Jean-Daniel Cryans <[email protected]> wrote: >>>> YouAreDead means that the region server's session was expired, GC >>>> seems like your major problem. (file problems can happen after a GC >>>> sleep because they were moved around while the process was sleeping, >>>> you also get the same kind of messages with xcievers issue... sorry >>>> for the confusion) >>>> >>>> By over committing the memory I meant trying to fit too much stuff in >>>> the amount of RAM that you have. I guess it's the map and reduce tasks >>>> that eat all the free space? Why not lower their number? >>>> >>>> J-D >>>> >>>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill >>>> <[email protected]> wrote: >>>>> PS, I've now reset my MAX_FILESIZE back to the default. (from the 1GB >>>>> i raised it to). It caused me to run into a delightful >>>>> 'YouAreDeadException' which looks very related to the Garbage >>>>> collection issues on the Troubleshooting page, as my Zookeeper session >>>>> expired. >>>>> >>>>> Thanks >>>>> >>>>> Jamie >>>>> >>>>> >>>>> >>>>> On 7 July 2010 19:19, Jamie Cockrill <[email protected]> wrote: >>>>>> By overcommit, do you mean make my overcommit_ratio higher on each box >>>>>> (its at the default 50 at the moment)? What I'm noticing at the moment >>>>>> is that hadoop is taking up the vast majority of the memory on the >>>>>> boxes. >>>>>> >>>>>> I found this article: >>>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/ >>>>>> which Todd, it looks like you replied to. Does this sound like a >>>>>> similar problem? No worries if you can't remember, it was back in >>>>>> january! This article suggests reducing the amount of memory allocated >>>>>> to Hadoop at startup, how would I go about doing this? >>>>>> >>>>>> Thank you everyone for your patience so far. Sorry if this is taking >>>>>> up a lot of your time. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Jamie >>>>>> >>>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <[email protected]> wrote: >>>>>>> swappinness at 0 is good, but also don't overcommit your memory! >>>>>>> >>>>>>> J-D >>>>>>> >>>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill >>>>>>> <[email protected]> wrote: >>>>>>>> I think you're right. >>>>>>>> >>>>>>>> Unfortunately the machines are on a separate network to this laptop, >>>>>>>> so I'm having to type everything across, apologies if it doesn't >>>>>>>> translate well... >>>>>>>> >>>>>>>> free -m gave: >>>>>>>> >>>>>>>> Mem Total Used Free >>>>>>>> 7992 7939 53 >>>>>>>> b/c 7877 114 >>>>>>>> Swap: 23415 895 22519 >>>>>>>> >>>>>>>> I did this on another node that isn't being smashed at the moment and >>>>>>>> the numbers came out similar, but the buffers/cache free was higher >>>>>>>> >>>>>>>> vmstat -20 is giving non-zero si and so's ranging between 3 and just >>>>>>>> short of 5000. >>>>>>>> >>>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests setting >>>>>>>> swappiness to 0, is that just a case of changing the value in >>>>>>>> /proc/sys/vm/swappiness? >>>>>>>> >>>>>>>> thanks >>>>>>>> >>>>>>>> Jamie >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 7 July 2010 18:40, Todd Lipcon <[email protected]> wrote: >>>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill >>>>>>>>> <[email protected]>wrote: >>>>>>>>> >>>>>>>>>> On the subject of GC and heap, I've left those as defaults. I could >>>>>>>>>> look at those if that's the next logical step? Would there be >>>>>>>>>> anything >>>>>>>>>> in any of the logs that I should look at? >>>>>>>>>> >>>>>>>>>> One thing I have noticed is that it does take an absolute age to log >>>>>>>>>> in to the DN/RS to restart the RS once it's fallen over, in one >>>>>>>>>> instance it took about 10 minutes. These are 8GB, 4 core amd64 boxes >>>>>>>>>> >>>>>>>>>> >>>>>>>>> That indicates swapping. Can you run "free -m" on the node? >>>>>>>>> >>>>>>>>> Also let "vmstat 20" run while running your job and observe the "si" >>>>>>>>> and >>>>>>>>> "so" columns. If those are nonzero, it indicates you're swapping, and >>>>>>>>> you've >>>>>>>>> oversubscribed your RAM (very easy on 8G machines) >>>>>>>>> >>>>>>>>> -Todd >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> ta >>>>>>>>>> >>>>>>>>>> Jamie >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> > Bad news, it looks like my xcievers is set as it should be, it's in >>>>>>>>>> > the hdfs-site.xml and looking at the job.xml of one of my jobs in >>>>>>>>>> > the >>>>>>>>>> > job-tracker, it's showing that property as set to 2047. I've cat | >>>>>>>>>> > grepped one of the datanode logs and although there were a few in >>>>>>>>>> > there, they were from a few months ago. I've upped my MAX_FILESIZE >>>>>>>>>> > on >>>>>>>>>> > my table to 1GB to see if that helps (not sure if it will!). >>>>>>>>>> > >>>>>>>>>> > Thanks, >>>>>>>>>> > >>>>>>>>>> > Jamie >>>>>>>>>> > >>>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <[email protected]> >>>>>>>>>> > wrote: >>>>>>>>>> >> xcievers exceptions will be in the datanodes' logs, and your >>>>>>>>>> >> problem >>>>>>>>>> >> totally looks like it. 0.20.5 will have the same issue (since >>>>>>>>>> >> it's on >>>>>>>>>> >> the HDFS side) >>>>>>>>>> >> >>>>>>>>>> >> J-D >>>>>>>>>> >> >>>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill >>>>>>>>>> >> <[email protected]> wrote: >>>>>>>>>> >>> Hi Todd & JD, >>>>>>>>>> >>> >>>>>>>>>> >>> Environment: >>>>>>>>>> >>> All (hadoop and HBase) installed as of karmic-cdh3, which means: >>>>>>>>>> >>> Hadoop 0.20.2+228 >>>>>>>>>> >>> HBase 0.89.20100621+17 >>>>>>>>>> >>> Zookeeper 3.3.1+7 >>>>>>>>>> >>> >>>>>>>>>> >>> Unfortunately my whole cluster of regionservers have now >>>>>>>>>> >>> crashed, so I >>>>>>>>>> >>> can't really say if it was swapping too much. There is a DEBUG >>>>>>>>>> >>> statement just before it crashes saying: >>>>>>>>>> >>> >>>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog >>>>>>>>>> >>> writer in >>>>>>>>>> >>> hdfs://<somewhere on my HDFS, in /hbase> >>>>>>>>>> >>> >>>>>>>>>> >>> What follows is: >>>>>>>>>> >>> >>>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: >>>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException: >>>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No >>>>>>>>>> >>> lease >>>>>>>>>> >>> on <file location as above> File does not exist. Holder >>>>>>>>>> >>> DFSClient_-11113603 does not have any open files >>>>>>>>>> >>> >>>>>>>>>> >>> It then seems to try and do some error recovery (Error Recovery >>>>>>>>>> >>> for >>>>>>>>>> >>> block null bad datanode[0] nodes == null), fails (Could not get >>>>>>>>>> >>> block >>>>>>>>>> >>> locations. Source file "<hbase file as before>" - Aborting). >>>>>>>>>> >>> There is >>>>>>>>>> >>> then an ERROR org.apache...HRegionServer: Close and delete >>>>>>>>>> >>> failed. >>>>>>>>>> >>> There is then a similar LeaseExpiredException as above. >>>>>>>>>> >>> >>>>>>>>>> >>> There are then a couple of messages from HRegionServer saying >>>>>>>>>> >>> that >>>>>>>>>> >>> it's notifying master of its shutdown and stopping itself. The >>>>>>>>>> >>> shutdown hook then fires and the RemoteException and >>>>>>>>>> >>> LeaseExpiredExceptions are printed again. >>>>>>>>>> >>> >>>>>>>>>> >>> ulimit is set to 65000 (it's in the regionserver log, printed as >>>>>>>>>> >>> I >>>>>>>>>> >>> restarted the regionserver), however I haven't got the xceivers >>>>>>>>>> >>> set >>>>>>>>>> >>> anywhere. I'll give that a go. It does seem very odd as I did >>>>>>>>>> >>> have a >>>>>>>>>> >>> few of them fall over one at a time with a few early loads, but >>>>>>>>>> >>> that >>>>>>>>>> >>> seemed to be because the regions weren't splitting properly, so >>>>>>>>>> >>> all >>>>>>>>>> >>> the traffic was going to one node and it was being overwhelmed. >>>>>>>>>> >>> Once I >>>>>>>>>> >>> throttled it, after one load it a region split seemed to get >>>>>>>>>> >>> triggered, which flung regions all over, which made subsequent >>>>>>>>>> >>> loads >>>>>>>>>> >>> much more distributed. However, perhaps the time-bomb was >>>>>>>>>> >>> ticking... >>>>>>>>>> >>> I'll have a go at specifying the xcievers property. I'm pretty >>>>>>>>>> >>> certain i've got everything else covered, except the patches as >>>>>>>>>> >>> referenced in the JIRA. >>>>>>>>>> >>> >>>>>>>>>> >>> I just grepped some of the log files and didn't get an explicit >>>>>>>>>> >>> exception with 'xciever' in it. >>>>>>>>>> >>> >>>>>>>>>> >>> I am considering downgrading(?) to 0.20.5, however because >>>>>>>>>> >>> everything >>>>>>>>>> >>> is installed as per karmic-cdh3, I'm a bit reluctant to do so as >>>>>>>>>> >>> presumably Cloudera has tested each of these versions against >>>>>>>>>> >>> each >>>>>>>>>> >>> other? And I don't really want to introduce further versioning >>>>>>>>>> >>> issues. >>>>>>>>>> >>> >>>>>>>>>> >>> Thanks, >>>>>>>>>> >>> >>>>>>>>>> >>> Jamie >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans <[email protected]> >>>>>>>>>> >>> wrote: >>>>>>>>>> >>>> Jamie, >>>>>>>>>> >>>> >>>>>>>>>> >>>> Does your configuration meets the requirements? >>>>>>>>>> >>>> >>>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements >>>>>>>>>> >>>> >>>>>>>>>> >>>> ulimit and xcievers, if not set, are usually time bombs that >>>>>>>>>> >>>> blow off >>>>>>>>>> when >>>>>>>>>> >>>> the cluster is under load. >>>>>>>>>> >>>> >>>>>>>>>> >>>> J-D >>>>>>>>>> >>>> >>>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM, Jamie Cockrill < >>>>>>>>>> [email protected]>wrote: >>>>>>>>>> >>>> >>>>>>>>>> >>>>> Dear all, >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> My current HBase/Hadoop architecture has HBase region servers >>>>>>>>>> >>>>> on the >>>>>>>>>> >>>>> same physical boxes as the HDFS data-nodes. I'm getting an >>>>>>>>>> >>>>> awful lot >>>>>>>>>> >>>>> of region server crashes. The last thing that happens appears >>>>>>>>>> >>>>> to be a >>>>>>>>>> >>>>> DroppedSnapshot Exception, caused by an IOException: could not >>>>>>>>>> >>>>> complete write to file <file on HDFS>. I am running it under >>>>>>>>>> >>>>> load, >>>>>>>>>> how >>>>>>>>>> >>>>> heavy that is I'm not sure how that is quantified, but I'm >>>>>>>>>> >>>>> guessing >>>>>>>>>> it >>>>>>>>>> >>>>> is a load issue. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Is it common practice to put region servers on data-nodes? Is >>>>>>>>>> >>>>> it >>>>>>>>>> >>>>> common to see region server crashes when either the HDFS or >>>>>>>>>> >>>>> region >>>>>>>>>> >>>>> server (or both) is under heavy load? I'm guessing that is the >>>>>>>>>> >>>>> case >>>>>>>>>> as >>>>>>>>>> >>>>> I've seen a few similar posts. I've not got a great deal of >>>>>>>>>> >>>>> capacity >>>>>>>>>> >>>>> to be separating region servers from HDFS data nodes, but it >>>>>>>>>> >>>>> might be >>>>>>>>>> >>>>> an argument I could make. >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Thanks >>>>>>>>>> >>>>> >>>>>>>>>> >>>>> Jamie >>>>>>>>>> >>>>> >>>>>>>>>> >>>> >>>>>>>>>> >>> >>>>>>>>>> >> >>>>>>>>>> > >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Todd Lipcon >>>>>>>>> Software Engineer, Cloudera >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
