Re: HBase on same boxes as HDFS Data nodes

Jean-Daniel Cryans Thu, 08 Jul 2010 09:47:07 -0700

OS cache is good, glad you figured out your memory problem.

J-D


On Thu, Jul 8, 2010 at 2:03 AM, Jamie Cockrill <jamie.cockr...@gmail.com> wrote:
> Morning all. Day 2 begins...
>
> I discussed this with someone else earlier and they pointed out that
> we also have task trackers running on all of those nodes, which will
> affect the amount of memory being used when jobs are being run. Each
> tasktracker had a maximum of 8 maps and 8 reduces configured per node,
> with a JVM Xmx of 512mb each.  Clearly this implies a fully utilised
> node will use 8*512mb + 8*512mb = 8GB of memory on tasks alone. That's
> before the datanode does anything, or HBase for that matter.
>
> As such, I've dropped it to 4 maps, 4 reduces per node and reduced the
> Xmx to 256mb, giving a potential maximum task overhead of 2GB per
> node. Running 'vmstat 20' now, under load from mapreduce jobs,
> suggests that the actual free memory is about the same, but the memory
> cache is much much bigger, which presumably is healthlier as, in
> theory, that ought to relinquish memory to processes that request it.
>
> Lets see if that does the trick!
>
> ta
>
> Jamie
>
>
> On 7 July 2010 19:30, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
>> YouAreDead means that the region server's session was expired, GC
>> seems like your major problem. (file problems can happen after a GC
>> sleep because they were moved around while the process was sleeping,
>> you also get the same kind of messages with xcievers issue... sorry
>> for the confusion)
>>
>> By over committing the memory I meant trying to fit too much stuff in
>> the amount of RAM that you have. I guess it's the map and reduce tasks
>> that eat all the free space? Why not lower their number?
>>
>> J-D
>>
>> On Wed, Jul 7, 2010 at 11:22 AM, Jamie Cockrill
>> <jamie.cockr...@gmail.com> wrote:
>>> PS, I've now reset my MAX_FILESIZE back to the default.  (from the 1GB
>>> i raised it to). It caused me to run into a delightful
>>> 'YouAreDeadException' which looks very related to the Garbage
>>> collection issues on the Troubleshooting page, as my Zookeeper session
>>> expired.
>>>
>>> Thanks
>>>
>>> Jamie
>>>
>>>
>>>
>>> On 7 July 2010 19:19, Jamie Cockrill <jamie.cockr...@gmail.com> wrote:
>>>> By overcommit, do you mean make my overcommit_ratio higher on each box
>>>> (its at the default 50 at the moment)? What I'm noticing at the moment
>>>> is that hadoop is taking up the vast majority of the memory on the
>>>> boxes.
>>>>
>>>> I found this article:
>>>> http://blog.rapleaf.com/dev/2010/01/05/the-wrath-of-drwho-or-unpredictable-hadoop-memory-usage/
>>>> which Todd, it looks like you replied to. Does this sound like a
>>>> similar problem? No worries if you can't remember, it was back in
>>>> january! This article suggests reducing the amount of memory allocated
>>>> to Hadoop at startup, how would I go about doing this?
>>>>
>>>> Thank you everyone for your patience so far. Sorry if this is taking
>>>> up a lot of your time.
>>>>
>>>> Thanks,
>>>>
>>>> Jamie
>>>>
>>>> On 7 July 2010 19:03, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
>>>>> swappinness at 0 is good, but also don't overcommit your memory!
>>>>>
>>>>> J-D
>>>>>
>>>>> On Wed, Jul 7, 2010 at 10:53 AM, Jamie Cockrill
>>>>> <jamie.cockr...@gmail.com> wrote:
>>>>>> I think you're right.
>>>>>>
>>>>>> Unfortunately the machines are on a separate network to this laptop,
>>>>>> so I'm having to type everything across, apologies if it doesn't
>>>>>> translate well...
>>>>>>
>>>>>> free -m gave:
>>>>>>
>>>>>> Mem    Total    Used     Free
>>>>>>            7992     7939      53
>>>>>> b/c                    7877    114
>>>>>> Swap: 23415       895  22519
>>>>>>
>>>>>> I did this on another node that isn't being smashed at the moment and
>>>>>> the numbers came out similar, but the buffers/cache free was higher
>>>>>>
>>>>>> vmstat -20 is giving non-zero si and so's ranging between 3 and just
>>>>>> short of 5000.
>>>>>>
>>>>>> That seems to be it I guess. Hadoop troubleshooting suggests setting
>>>>>> swappiness to 0, is that just a case of changing the value in
>>>>>> /proc/sys/vm/swappiness?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>> Jamie
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 7 July 2010 18:40, Todd Lipcon <t...@cloudera.com> wrote:
>>>>>>> On Wed, Jul 7, 2010 at 10:32 AM, Jamie Cockrill 
>>>>>>> <jamie.cockr...@gmail.com>wrote:
>>>>>>>
>>>>>>>> On the subject of GC and heap, I've left those as defaults. I could
>>>>>>>> look at those if that's the next logical step? Would there be anything
>>>>>>>> in any of the logs that I should look at?
>>>>>>>>
>>>>>>>> One thing I have noticed is that it does take an absolute age to log
>>>>>>>> in to the DN/RS to restart the RS once it's fallen over, in one
>>>>>>>> instance it took about 10 minutes. These are 8GB, 4 core amd64 boxes
>>>>>>>>
>>>>>>>>
>>>>>>> That indicates swapping. Can you run "free -m" on the node?
>>>>>>>
>>>>>>> Also let "vmstat 20" run while running your job and observe the "si" and
>>>>>>> "so" columns. If those are nonzero, it indicates you're swapping, and 
>>>>>>> you've
>>>>>>> oversubscribed your RAM (very easy on 8G machines)
>>>>>>>
>>>>>>> -Todd
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> ta
>>>>>>>>
>>>>>>>> Jamie
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7 July 2010 18:30, Jamie Cockrill <jamie.cockr...@gmail.com> wrote:
>>>>>>>> > Bad news, it looks like my xcievers is set as it should be, it's in
>>>>>>>> > the hdfs-site.xml and looking at the job.xml of one of my jobs in the
>>>>>>>> > job-tracker, it's showing that property as set to 2047. I've cat |
>>>>>>>> > grepped one of the datanode logs and although there were a few in
>>>>>>>> > there, they were from a few months ago. I've upped my MAX_FILESIZE on
>>>>>>>> > my table to 1GB to see if that helps (not sure if it will!).
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> >
>>>>>>>> > Jamie
>>>>>>>> >
>>>>>>>> > On 7 July 2010 18:12, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
>>>>>>>> >> xcievers exceptions will be in the datanodes' logs, and your problem
>>>>>>>> >> totally looks like it. 0.20.5 will have the same issue (since it's 
>>>>>>>> >> on
>>>>>>>> >> the HDFS side)
>>>>>>>> >>
>>>>>>>> >> J-D
>>>>>>>> >>
>>>>>>>> >> On Wed, Jul 7, 2010 at 10:08 AM, Jamie Cockrill
>>>>>>>> >> <jamie.cockr...@gmail.com> wrote:
>>>>>>>> >>> Hi Todd & JD,
>>>>>>>> >>>
>>>>>>>> >>> Environment:
>>>>>>>> >>> All (hadoop and HBase) installed as of karmic-cdh3, which means:
>>>>>>>> >>> Hadoop 0.20.2+228
>>>>>>>> >>> HBase 0.89.20100621+17
>>>>>>>> >>> Zookeeper 3.3.1+7
>>>>>>>> >>>
>>>>>>>> >>> Unfortunately my whole cluster of regionservers have now crashed, 
>>>>>>>> >>> so I
>>>>>>>> >>> can't really say if it was swapping too much. There is a DEBUG
>>>>>>>> >>> statement just before it crashes saying:
>>>>>>>> >>>
>>>>>>>> >>> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer 
>>>>>>>> >>> in
>>>>>>>> >>> hdfs://<somewhere on my HDFS, in /hbase>
>>>>>>>> >>>
>>>>>>>> >>> What follows is:
>>>>>>>> >>>
>>>>>>>> >>> WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
>>>>>>>> >>> org.apache.hadoop.ipc.RemoteException:
>>>>>>>> >>> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No 
>>>>>>>> >>> lease
>>>>>>>> >>> on <file location as above> File does not exist. Holder
>>>>>>>> >>> DFSClient_-11113603 does not have any open files
>>>>>>>> >>>
>>>>>>>> >>> It then seems to try and do some error recovery (Error Recovery for
>>>>>>>> >>> block null bad datanode[0] nodes == null), fails (Could not get 
>>>>>>>> >>> block
>>>>>>>> >>> locations. Source file "<hbase file as before>" - Aborting). There 
>>>>>>>> >>> is
>>>>>>>> >>> then an ERROR org.apache...HRegionServer: Close and delete failed.
>>>>>>>> >>> There is then a similar LeaseExpiredException as above.
>>>>>>>> >>>
>>>>>>>> >>> There are then a couple of messages from HRegionServer saying that
>>>>>>>> >>> it's notifying master of its shutdown and stopping itself. The
>>>>>>>> >>> shutdown hook then fires and the RemoteException and
>>>>>>>> >>> LeaseExpiredExceptions are printed again.
>>>>>>>> >>>
>>>>>>>> >>> ulimit is set to 65000 (it's in the regionserver log, printed as I
>>>>>>>> >>> restarted the regionserver), however I haven't got the xceivers set
>>>>>>>> >>> anywhere. I'll give that a go. It does seem very odd as I did have 
>>>>>>>> >>> a
>>>>>>>> >>> few of them fall over one at a time with a few early loads, but 
>>>>>>>> >>> that
>>>>>>>> >>> seemed to be because the regions weren't splitting properly, so all
>>>>>>>> >>> the traffic was going to one node and it was being overwhelmed. 
>>>>>>>> >>> Once I
>>>>>>>> >>> throttled it, after one load it a region split seemed to get
>>>>>>>> >>> triggered, which flung regions all over, which made subsequent 
>>>>>>>> >>> loads
>>>>>>>> >>> much more distributed. However, perhaps the time-bomb was 
>>>>>>>> >>> ticking...
>>>>>>>> >>> I'll  have a go at specifying the xcievers property. I'm pretty
>>>>>>>> >>> certain i've got everything else covered, except the patches as
>>>>>>>> >>> referenced in the JIRA.
>>>>>>>> >>>
>>>>>>>> >>> I just grepped some of the log files and didn't get an explicit
>>>>>>>> >>> exception with 'xciever' in it.
>>>>>>>> >>>
>>>>>>>> >>> I am considering downgrading(?) to 0.20.5, however because 
>>>>>>>> >>> everything
>>>>>>>> >>> is installed as per karmic-cdh3, I'm a bit reluctant to do so as
>>>>>>>> >>> presumably Cloudera has tested each of these versions against each
>>>>>>>> >>> other? And I don't really want to introduce further versioning 
>>>>>>>> >>> issues.
>>>>>>>> >>>
>>>>>>>> >>> Thanks,
>>>>>>>> >>>
>>>>>>>> >>> Jamie
>>>>>>>> >>>
>>>>>>>> >>>
>>>>>>>> >>> On 7 July 2010 17:30, Jean-Daniel Cryans <jdcry...@apache.org> 
>>>>>>>> >>> wrote:
>>>>>>>> >>>> Jamie,
>>>>>>>> >>>>
>>>>>>>> >>>> Does your configuration meets the requirements?
>>>>>>>> >>>>
>>>>>>>> http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
>>>>>>>> >>>>
>>>>>>>> >>>> ulimit and xcievers, if not set, are usually time bombs that blow 
>>>>>>>> >>>> off
>>>>>>>> when
>>>>>>>> >>>> the cluster is under load.
>>>>>>>> >>>>
>>>>>>>> >>>> J-D
>>>>>>>> >>>>
>>>>>>>> >>>> On Wed, Jul 7, 2010 at 9:11 AM, Jamie Cockrill <
>>>>>>>> jamie.cockr...@gmail.com>wrote:
>>>>>>>> >>>>
>>>>>>>> >>>>> Dear all,
>>>>>>>> >>>>>
>>>>>>>> >>>>> My current HBase/Hadoop architecture has HBase region servers on 
>>>>>>>> >>>>> the
>>>>>>>> >>>>> same physical boxes as the HDFS data-nodes. I'm getting an awful 
>>>>>>>> >>>>> lot
>>>>>>>> >>>>> of region server crashes. The last thing that happens appears to 
>>>>>>>> >>>>> be a
>>>>>>>> >>>>> DroppedSnapshot Exception, caused by an IOException: could not
>>>>>>>> >>>>> complete write to file <file on HDFS>. I am running it under 
>>>>>>>> >>>>> load,
>>>>>>>> how
>>>>>>>> >>>>> heavy that is I'm not sure how that is quantified, but I'm 
>>>>>>>> >>>>> guessing
>>>>>>>> it
>>>>>>>> >>>>> is a load issue.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Is it common practice to put region servers on data-nodes? Is it
>>>>>>>> >>>>> common to see region server crashes when either the HDFS or 
>>>>>>>> >>>>> region
>>>>>>>> >>>>> server (or both) is under heavy load? I'm guessing that is the 
>>>>>>>> >>>>> case
>>>>>>>> as
>>>>>>>> >>>>> I've seen a few similar posts. I've not got a great deal of 
>>>>>>>> >>>>> capacity
>>>>>>>> >>>>> to be separating region servers from HDFS data nodes, but it 
>>>>>>>> >>>>> might be
>>>>>>>> >>>>> an argument I could make.
>>>>>>>> >>>>>
>>>>>>>> >>>>> Thanks
>>>>>>>> >>>>>
>>>>>>>> >>>>> Jamie
>>>>>>>> >>>>>
>>>>>>>> >>>>
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Todd Lipcon
>>>>>>> Software Engineer, Cloudera
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: HBase on same boxes as HDFS Data nodes

Reply via email to