OS cache is good, glad you figured out your memory problem. J-D
On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockr...@gmail.com> wrote: > Morning all. Day 2 begins... > > I discussed this with someone else earlier and they pointed out that > we also have task trackers running on all of those nodes, which will > affect the amount of memory being used when jobs are being run. Each > tasktracker had a maximum of 8 maps and 8 reduces configured per node, > with a JVM Xmx of 512mb each. Clearly this implies a fully utilised > node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's > before the datanode does anything, or HBase for that matter. > > As such, I've dropped it to 4 maps, 4 reduces per node and reduced the > Xmx to 256mb, giving a potential maximum task overhead of 2GB per > node. Running 'vmstat 20' now, under load from mapreduce jobs, > suggests that the actual free memory is about the same, but the memory > cache is much much bigger, which presumably is healthlier as, in > theory, that ought to relinquish memory to processes that request it. > > Lets see if that does the trick! > > ta > > Jamie > > > On 7 July 2010 19:30, Jean-Daniel Cryans <jdcry...@apache.org> wrote: >> YouAreDead means that the region server's session was expired, GC >> seems like your major problem. (file problems can happen after a GC >> sleep because they were moved around while the process was sleeping, >> you also get the same kind of messages with xcievers issue... sorry >> for the confusion) >> >> By over committing the memory I meant trying to fit too much stuff in >> the amount of RAM that you have. I guess it's the map and reduce tasks >> that eat all the free space? Why not lower their number? >> >> J-D >> >> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill >> <jamie.cockr...@gmail.com> wrote: >>> PS, I've now reset my MAX_FILESIZE back to the default. (from the 1GB >>> i raised it to). It caused me to run into a delightful >>> 'YouAreDeadException' which looks very related to the Garbage >>> collection issues on the Troubleshooting page, as my Zookeeper session >>> expired. >>> >>> Thanks >>> >>> Jamie >>> >>> >>> >>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockr...@gmail.com> wrote: >>>> By overcommit, do you mean make my overcommit_ratio higher on each box >>>> (its at the default 50 at the moment)? What I'm noticing at the moment >>>> is that hadoop is taking up the vast majority of the memory on the >>>> boxes. >>>> >>>> I found this article: >>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/ >>>> which Todd, it looks like you replied to. Does this sound like a >>>> similar problem? No worries if you can't remember, it was back in >>>> january! This article suggests reducing the amount of memory allocated >>>> to Hadoop at startup, how would I go about doing this? >>>> >>>> Thank you everyone for your patience so far. Sorry if this is taking >>>> up a lot of your time. >>>> >>>> Thanks, >>>> >>>> Jamie >>>> >>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcry...@apache.org> wrote: >>>>> swappinness at 0 is good, but also don't overcommit your memory! >>>>> >>>>> J-D >>>>> >>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill >>>>> <jamie.cockr...@gmail.com> wrote: >>>>>> I think you're right. >>>>>> >>>>>> Unfortunately the machines are on a separate network to this laptop, >>>>>> so I'm having to type everything across, apologies if it doesn't >>>>>> translate well... >>>>>> >>>>>> free -m gave: >>>>>> >>>>>> Mem Total Used Free >>>>>> 7992 7939 53 >>>>>> b/c 7877 114 >>>>>> Swap: 23415 895 22519 >>>>>> >>>>>> I did this on another node that isn't being smashed at the moment and >>>>>> the numbers came out similar, but the buffers/cache free was higher >>>>>> >>>>>> vmstat -20 is giving non-zero si and so's ranging between 3 and just >>>>>> short of 5000. >>>>>> >>>>>> That seems to be it I guess. Hadoop troubleshooting suggests setting >>>>>> swappiness to 0, is that just a case of changing the value in >>>>>> /proc/sys/vm/swappiness? >>>>>> >>>>>> thanks >>>>>> >>>>>> Jamie >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 7 July 2010 18:40, Todd Lipcon <t...@cloudera.com> wrote: >>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill >>>>>>> <jamie.cockr...@gmail.com>wrote: >>>>>>> >>>>>>>> On the subject of GC and heap, I've left those as defaults. I could >>>>>>>> look at those if that's the next logical step? Would there be anything >>>>>>>> in any of the logs that I should look at? >>>>>>>> >>>>>>>> One thing I have noticed is that it does take an absolute age to log >>>>>>>> in to the DN/RS to restart the RS once it's fallen over, in one >>>>>>>> instance it took about 10 minutes. These are 8GB, 4 core amd64 boxes >>>>>>>> >>>>>>>> >>>>>>> That indicates swapping. Can you run "free -m" on the node? >>>>>>> >>>>>>> Also let "vmstat 20" run while running your job and observe the "si" and >>>>>>> "so" columns. If those are nonzero, it indicates you're swapping, and >>>>>>> you've >>>>>>> oversubscribed your RAM (very easy on 8G machines) >>>>>>> >>>>>>> -Todd >>>>>>> >>>>>>> >>>>>>> >>>>>>>> ta >>>>>>>> >>>>>>>> Jamie >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockr...@gmail.com> wrote: >>>>>>>> > Bad news, it looks like my xcievers is set as it should be, it's in >>>>>>>> > the hdfs-site.xml and looking at the job.xml of one of my jobs in the >>>>>>>> > job-tracker, it's showing that property as set to 2047. I've cat | >>>>>>>> > grepped one of the datanode logs and although there were a few in >>>>>>>> > there, they were from a few months ago. I've upped my MAX_FILESIZE on >>>>>>>> > my table to 1GB to see if that helps (not sure if it will!). >>>>>>>> > >>>>>>>> > Thanks, >>>>>>>> > >>>>>>>> > Jamie >>>>>>>> > >>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <jdcry...@apache.org> wrote: >>>>>>>> >> xcievers exceptions will be in the datanodes' logs, and your problem >>>>>>>> >> totally looks like it. 0.20.5 will have the same issue (since it's >>>>>>>> >> on >>>>>>>> >> the HDFS side) >>>>>>>> >> >>>>>>>> >> J-D >>>>>>>> >> >>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill >>>>>>>> >> <jamie.cockr...@gmail.com> wrote: >>>>>>>> >>> Hi Todd & JD, >>>>>>>> >>> >>>>>>>> >>> Environment: >>>>>>>> >>> All (hadoop and HBase) installed as of karmic-cdh3, which means: >>>>>>>> >>> Hadoop 0.20.2+228 >>>>>>>> >>> HBase 0.89.20100621+17 >>>>>>>> >>> Zookeeper 3.3.1+7 >>>>>>>> >>> >>>>>>>> >>> Unfortunately my whole cluster of regionservers have now crashed, >>>>>>>> >>> so I >>>>>>>> >>> can't really say if it was swapping too much. There is a DEBUG >>>>>>>> >>> statement just before it crashes saying: >>>>>>>> >>> >>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer >>>>>>>> >>> in >>>>>>>> >>> hdfs://<somewhere on my HDFS, in /hbase> >>>>>>>> >>> >>>>>>>> >>> What follows is: >>>>>>>> >>> >>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: >>>>>>>> >>> org.apache.hadoop.ipc.RemoteException: >>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No >>>>>>>> >>> lease >>>>>>>> >>> on <file location as above> File does not exist. Holder >>>>>>>> >>> DFSClient_-11113603 does not have any open files >>>>>>>> >>> >>>>>>>> >>> It then seems to try and do some error recovery (Error Recovery for >>>>>>>> >>> block null bad datanode[0] nodes == null), fails (Could not get >>>>>>>> >>> block >>>>>>>> >>> locations. Source file "<hbase file as before>" - Aborting). There >>>>>>>> >>> is >>>>>>>> >>> then an ERROR org.apache...HRegionServer: Close and delete failed. >>>>>>>> >>> There is then a similar LeaseExpiredException as above. >>>>>>>> >>> >>>>>>>> >>> There are then a couple of messages from HRegionServer saying that >>>>>>>> >>> it's notifying master of its shutdown and stopping itself. The >>>>>>>> >>> shutdown hook then fires and the RemoteException and >>>>>>>> >>> LeaseExpiredExceptions are printed again. >>>>>>>> >>> >>>>>>>> >>> ulimit is set to 65000 (it's in the regionserver log, printed as I >>>>>>>> >>> restarted the regionserver), however I haven't got the xceivers set >>>>>>>> >>> anywhere. I'll give that a go. It does seem very odd as I did have >>>>>>>> >>> a >>>>>>>> >>> few of them fall over one at a time with a few early loads, but >>>>>>>> >>> that >>>>>>>> >>> seemed to be because the regions weren't splitting properly, so all >>>>>>>> >>> the traffic was going to one node and it was being overwhelmed. >>>>>>>> >>> Once I >>>>>>>> >>> throttled it, after one load it a region split seemed to get >>>>>>>> >>> triggered, which flung regions all over, which made subsequent >>>>>>>> >>> loads >>>>>>>> >>> much more distributed. However, perhaps the time-bomb was >>>>>>>> >>> ticking... >>>>>>>> >>> I'll have a go at specifying the xcievers property. I'm pretty >>>>>>>> >>> certain i've got everything else covered, except the patches as >>>>>>>> >>> referenced in the JIRA. >>>>>>>> >>> >>>>>>>> >>> I just grepped some of the log files and didn't get an explicit >>>>>>>> >>> exception with 'xciever' in it. >>>>>>>> >>> >>>>>>>> >>> I am considering downgrading(?) to 0.20.5, however because >>>>>>>> >>> everything >>>>>>>> >>> is installed as per karmic-cdh3, I'm a bit reluctant to do so as >>>>>>>> >>> presumably Cloudera has tested each of these versions against each >>>>>>>> >>> other? And I don't really want to introduce further versioning >>>>>>>> >>> issues. >>>>>>>> >>> >>>>>>>> >>> Thanks, >>>>>>>> >>> >>>>>>>> >>> Jamie >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans <jdcry...@apache.org> >>>>>>>> >>> wrote: >>>>>>>> >>>> Jamie, >>>>>>>> >>>> >>>>>>>> >>>> Does your configuration meets the requirements? >>>>>>>> >>>> >>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements >>>>>>>> >>>> >>>>>>>> >>>> ulimit and xcievers, if not set, are usually time bombs that blow >>>>>>>> >>>> off >>>>>>>> when >>>>>>>> >>>> the cluster is under load. >>>>>>>> >>>> >>>>>>>> >>>> J-D >>>>>>>> >>>> >>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM, Jamie Cockrill < >>>>>>>> jamie.cockr...@gmail.com>wrote: >>>>>>>> >>>> >>>>>>>> >>>>> Dear all, >>>>>>>> >>>>> >>>>>>>> >>>>> My current HBase/Hadoop architecture has HBase region servers on >>>>>>>> >>>>> the >>>>>>>> >>>>> same physical boxes as the HDFS data-nodes. I'm getting an awful >>>>>>>> >>>>> lot >>>>>>>> >>>>> of region server crashes. The last thing that happens appears to >>>>>>>> >>>>> be a >>>>>>>> >>>>> DroppedSnapshot Exception, caused by an IOException: could not >>>>>>>> >>>>> complete write to file <file on HDFS>. I am running it under >>>>>>>> >>>>> load, >>>>>>>> how >>>>>>>> >>>>> heavy that is I'm not sure how that is quantified, but I'm >>>>>>>> >>>>> guessing >>>>>>>> it >>>>>>>> >>>>> is a load issue. >>>>>>>> >>>>> >>>>>>>> >>>>> Is it common practice to put region servers on data-nodes? Is it >>>>>>>> >>>>> common to see region server crashes when either the HDFS or >>>>>>>> >>>>> region >>>>>>>> >>>>> server (or both) is under heavy load? I'm guessing that is the >>>>>>>> >>>>> case >>>>>>>> as >>>>>>>> >>>>> I've seen a few similar posts. I've not got a great deal of >>>>>>>> >>>>> capacity >>>>>>>> >>>>> to be separating region servers from HDFS data nodes, but it >>>>>>>> >>>>> might be >>>>>>>> >>>>> an argument I could make. >>>>>>>> >>>>> >>>>>>>> >>>>> Thanks >>>>>>>> >>>>> >>>>>>>> >>>>> Jamie >>>>>>>> >>>>> >>>>>>>> >>>> >>>>>>>> >>> >>>>>>>> >> >>>>>>>> > >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Todd Lipcon >>>>>>> Software Engineer, Cloudera >>>>>>> >>>>>> >>>>> >>>> >>> >> >