Hi,
I also added the swap space and the algorithm ran a bit more but after some
time I faced the same problem again. I am on holidays and will figure out
the issue in next few days. I will keep you guys updated.

Regards,
Behroz



On Fri, Sep 11, 2015 at 6:36 AM, Edward J. Yoon <[email protected]>
wrote:

> Hi, I think you have to adding some swap space. Did you figure out
> what's problem?
>
> On Fri, Sep 4, 2015 at 8:20 AM, Behroz Sikander <[email protected]>
> wrote:
> > More info on this:
> > I noticed that only 2 machines were failing with OutOfMemory. After
> messing
> > around, I found out that the swap memory was 0 for these 2 machines but
> > others had swap space of 1 GB. I added the swap to these machines and it
> > worked. But as expected in the next run of algorithm with more data it
> > crashed again. This time GroomChildProcess crashed with the following log
> > message
> >
> >
> > *OpenJDK 64-Bit Server VM warning: INFO:
> > os::commit_memory(0x00000007fa100000, 42467328, 0) failed; error='Cannot
> > allocate memory' (errno=12)*
> > *#*
> > *# There is insufficient memory for the Java Runtime Environment to
> > continue.*
> > *# Native memory allocation (malloc) failed to allocate 42467328 bytes
> for
> > committing reserved memory.*
> > *# An error report file with more information is saved as:*
> > *#
> >
> /home/behroz/Documents/Packages/tmp_data/hama_tmp/bsp/local/groomServer/attempt_201509040050_0004_000006_0/work/hs_err_pid28850.log*
> >
> > My slave machines have 8GB of RAM, 4 CPUs, 20 GB harddrive and 1GB swap.
> I
> > run 3 groom child process each taking 2GB of RAM. Apart from
> > GroomChildProcess, I have GroomServer, DataNode and TaskManager running
> on
> > the slave. After assigning 2GB ram to 3 child groom process (total 6GB
> > RAM), only 2 GB of RAM is left for others. Do you think this is the
> problem
> > ?
> >
> > Regards,
> > Behroz
> >
> > On Thu, Sep 3, 2015 at 11:39 PM, Behroz Sikander <[email protected]>
> wrote:
> >
> >> Ok I found a strange thing. In my hadoop folder, I found a new file
> named
> >> "hs_err_pid4919.log" inside the $HADOOP_HOME directory.
> >>
> >> The content of the file are
> >>
> >> *#   Increase physical memory or swap space*
> >> *#   Check if swap backing store is full*
> >> *#   Use 64 bit Java on a 64 bit OS*
> >> *#   Decrease Java heap size (-Xmx/-Xms)*
> >> *#   Decrease number of Java threads*
> >> *#   Decrease Java thread stack sizes (-Xss)*
> >> *#   Set larger code cache with -XX:ReservedCodeCacheSize=*
> >> *# This output file may be truncated or incomplete.*
> >> *#*
> >> *#  Out of Memory Error (os_linux.cpp:2809), pid=4919,
> tid=140564483778304*
> >> *#*
> >> *# JRE version: OpenJDK Runtime Environment (7.0_79-b14) (build
> >> 1.7.0_79-b14)*
> >> *# Java VM: OpenJDK 64-Bit Server VM (24.79-b02 mixed mode linux-amd64
> >> compressed oops)*
> >> *# Derivative: IcedTea 2.5.6*
> >> *# Distribution: Ubuntu 14.04 LTS, package 7u79-2.5.6-0ubuntu1.14.04.1*
> >> *# Failed to write core dump. Core dumps have been disabled. To enable
> >> core dumping, try "ulimit -c unlimited" before starting Java again*
> >> *#*
> >>
> >> *---------------  T H R E A D  ---------------*
> >>
> >> *Current thread (0x00007fd7c0438800):  JavaThread "PacketResponder:
> >> BP-1786576942-141.40.254.14-1441293753577:blk_1074136820_396012,
> >> type=HAS_DOWNSTREAM_IN_PIPELINE" daemon [_thread_new, id=11943,
> >> stack(0x00007fd7b80fa000,0x00007fd7b81fb000)]*
> >>
> >> *Stack: [0x00007fd7b80fa000,0x00007fd7b81fb000],  sp=0x00007fd7b81f9be0,
> >>  free space=1022k*
> >> *Native frames: (J=compiled Java code, j=interpreted, Vv=VM code,
> C=native
> >> code)*
> >>
> >> I think my DataNode process is crashing. I now know that it is a out of
> >> memory error but the reason is not sure.
> >>
> >> On Thu, Sep 3, 2015 at 10:25 PM, Behroz Sikander <[email protected]>
> >> wrote:
> >>
> >>> ok. HA = High Availability ?
> >>>
> >>> I am also trying to solve the following problem. But I do not
> understand
> >>> why I get the exception because my algorithm does not have a lot of
> data
> >>> that is being sent to master.
> >>> *'BSP task process exit with nonzero status of 1'*
> >>>
> >>> Each slave node processes some data and sends back a Double array of
> size
> >>> 96 to the master machine. Recently, I was testing the algorithm on 8000
> >>> files when it crashed. This means that 8000 double arrays of size 96
> are
> >>> sent to the master to process. Once master receives all the data, it
> gets
> >>> out of sync and starts the processing again. Here is the calculation
> >>>
> >>> 8000 * 96 * 8 (size of float) = 6144000 = ~6.144 MB.
> >>>
> >>> I am not sure but this does not seem to be alot of data and I think
> >>> message manager that you mentioned should be able to handle it.
> >>>
> >>> Regards,
> >>> Behroz
> >>>
> >>> On Tue, Sep 1, 2015 at 1:07 PM, Edward J. Yoon <[email protected]>
> >>> wrote:
> >>>
> >>>> I'm reading GroomServer code and its taskMonitorService. It seems
> >>>> related with cluster HA.
> >>>>
> >>>> On Sat, Aug 29, 2015 at 1:16 PM, Edward J. Yoon <
> [email protected]>
> >>>> wrote:
> >>>> >> If my Groom Child Process fails for some reason, the processes are
> >>>> not killed automatically
> >>>> >
> >>>> > I also experienced this problem before. I guess, if one of processes
> >>>> > crashed with OutOfMemory, other processes infinitely waiting for it.
> >>>> > This is a bug.
> >>>> >
> >>>> > On Sat, Aug 29, 2015 at 1:02 AM, Behroz Sikander <
> [email protected]>
> >>>> wrote:
> >>>> >> Just another quick question. If my Groom Child Process fails for
> some
> >>>> >> reason, the processes are not killed automatically. If i run JPS
> >>>> command, I
> >>>> >> can still see something like "3791 GroomServer$BSPPeerChild". Is
> this
> >>>> the
> >>>> >> expected behavior ?
> >>>> >>
> >>>> >> I am using latest hama version (0.7.0).
> >>>> >> Regards,
> >>>> >> Behroz
> >>>> >>
> >>>> >> On Fri, Aug 28, 2015 at 4:12 PM, Behroz Sikander <
> [email protected]>
> >>>> wrote:
> >>>> >>
> >>>> >>> Ok I will try it out.
> >>>> >>>
> >>>> >>> No, actually I am learning alot by facing these problems. It is
> >>>> actually a
> >>>> >>> good thing :D
> >>>> >>>
> >>>> >>> Regards,
> >>>> >>> Behroz
> >>>> >>>
> >>>> >>> On Fri, Aug 28, 2015 at 5:52 AM, Edward J. Yoon <
> >>>> [email protected]>
> >>>> >>> wrote:
> >>>> >>>
> >>>> >>>> > message managers. Hmmm, I will recheck my logic related to
> >>>> messages. Btw
> >>>> >>>>
> >>>> >>>> Serialization (like GraphJobMessage) is good idea. It stores
> >>>> multiple
> >>>> >>>> messages in serialized form in a single object to reduce the
> memory
> >>>> >>>> usage and RPC overhead.
> >>>> >>>>
> >>>> >>>> > what is the limit of these message managers ? How much data at
> a
> >>>> single
> >>>> >>>> > time they can handle ?
> >>>> >>>>
> >>>> >>>> It depends on memory.
> >>>> >>>>
> >>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am
> running
> >>>> into
> >>>> >>>> > problems (alot of them :D).
> >>>> >>>>
> >>>> >>>> Haha, sorry for inconvenient and thanks for your reports.
> >>>> >>>>
> >>>> >>>> On Fri, Aug 28, 2015 at 11:25 AM, Behroz Sikander <
> >>>> [email protected]>
> >>>> >>>> wrote:
> >>>> >>>> > Ok. So, I do have a memory problem. I will try to scale out.
> >>>> >>>> >
> >>>> >>>> > *>>Each task processor has two message manager, one for
> outgoing
> >>>> and
> >>>> >>>> one*
> >>>> >>>> >
> >>>> >>>> > *for incoming. All these are handled in memory, so it
> >>>> sometimesrequires
> >>>> >>>> > large memory space.*
> >>>> >>>> > So, you mean that before barrier synchronization, I have alot
> of
> >>>> data in
> >>>> >>>> > message managers. Hmmm, I will recheck my logic related to
> >>>> messages. Btw
> >>>> >>>> > what is the limit of these message managers ? How much data at
> a
> >>>> single
> >>>> >>>> > time they can handle ?
> >>>> >>>> >
> >>>> >>>> > P.S. Each day, as I am moving towards a big cluster I am
> running
> >>>> into
> >>>> >>>> > problems (alot of them :D).
> >>>> >>>> >
> >>>> >>>> > Regards,
> >>>> >>>> > Behroz Sikander
> >>>> >>>> >
> >>>> >>>> > On Fri, Aug 28, 2015 at 4:04 AM, Edward J. Yoon <
> >>>> [email protected]>
> >>>> >>>> > wrote:
> >>>> >>>> >
> >>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this
> correct
> >>>> >>>> >> > understanding ?
> >>>> >>>> >>
> >>>> >>>> >> and,
> >>>> >>>> >>
> >>>> >>>> >> > on a big dataset. I think these exceptions have something to
> >>>> do with
> >>>> >>>> >> Ubuntu
> >>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
> >>>> curious
> >>>> >>>> >> about
> >>>> >>>> >>
> >>>> >>>> >> Yes, you're right.
> >>>> >>>> >>
> >>>> >>>> >> Each task processor has two message manager, one for outgoing
> >>>> and one
> >>>> >>>> >> for incoming. All these are handled in memory, so it sometimes
> >>>> >>>> >> requires large memory space. To solve the OutOfMemory issue,
> you
> >>>> >>>> >> should scale-out your cluster by increasing the number of
> nodes
> >>>> and
> >>>> >>>> >> job tasks, or optimize your algorithm. Another option is
> >>>> >>>> >> disk-spillable message manager. This is not supported yet.
> >>>> >>>> >>
> >>>> >>>> >> On Fri, Aug 28, 2015 at 10:45 AM, Behroz Sikander <
> >>>> [email protected]>
> >>>> >>>> >> wrote:
> >>>> >>>> >> > Hi,
> >>>> >>>> >> > Yes. According to hama-default.xml, each machine will open 3
> >>>> process
> >>>> >>>> with
> >>>> >>>> >> > 2GB memory each. This means that my VMs need atleast 8GB
> >>>> memory (2GB
> >>>> >>>> each
> >>>> >>>> >> > for 3 Groom child process + 2GB for Ubuntu OS). Is this
> correct
> >>>> >>>> >> > understanding ?
> >>>> >>>> >> >
> >>>> >>>> >> > I recently ran into the following exceptions when I was
> trying
> >>>> to run
> >>>> >>>> >> hama
> >>>> >>>> >> > on a big dataset. I think these exceptions have something to
> >>>> do with
> >>>> >>>> >> Ubuntu
> >>>> >>>> >> > OS killing the hama process due to lack of memory. So, I was
> >>>> curious
> >>>> >>>> >> about
> >>>> >>>> >> > my configurations.
> >>>> >>>> >> > 'BSP task process exit with nonzero status of 137.'
> >>>> >>>> >> > 'BSP task process exit with nonzero status of 1'
> >>>> >>>> >> >
> >>>> >>>> >> >
> >>>> >>>> >> >
> >>>> >>>> >> > Regards,
> >>>> >>>> >> > Behroz
> >>>> >>>> >> >
> >>>> >>>> >> > On Fri, Aug 28, 2015 at 3:04 AM, Edward J. Yoon <
> >>>> >>>> [email protected]>
> >>>> >>>> >> > wrote:
> >>>> >>>> >> >
> >>>> >>>> >> >> Hi,
> >>>> >>>> >> >>
> >>>> >>>> >> >> You can change the max tasks per node by setting below
> >>>> property in
> >>>> >>>> >> >> hama-site.xml. :-)
> >>>> >>>> >> >>
> >>>> >>>> >> >>   <property>
> >>>> >>>> >> >>     <name>bsp.tasks.maximum</name>
> >>>> >>>> >> >>     <value>3</value>
> >>>> >>>> >> >>     <description>The maximum number of BSP tasks that will
> be
> >>>> run
> >>>> >>>> >> >> simultaneously
> >>>> >>>> >> >>     by a groom server.</description>
> >>>> >>>> >> >>   </property>
> >>>> >>>> >> >>
> >>>> >>>> >> >>
> >>>> >>>> >> >> On Fri, Aug 28, 2015 at 5:18 AM, Behroz Sikander <
> >>>> >>>> [email protected]>
> >>>> >>>> >> >> wrote:
> >>>> >>>> >> >> > Hi,
> >>>> >>>> >> >> > Recently, I noticed that my hama deployment is only
> opening
> >>>> 3
> >>>> >>>> >> processes
> >>>> >>>> >> >> per
> >>>> >>>> >> >> > machine. This is because of the configuration settings in
> >>>> the
> >>>> >>>> default
> >>>> >>>> >> >> hama
> >>>> >>>> >> >> > file.
> >>>> >>>> >> >> >
> >>>> >>>> >> >> > My questions is why 3 and why not 5 or 7 ? What
> criteria's
> >>>> should
> >>>> >>>> be
> >>>> >>>> >> >> > considered if I want to increase the value ?
> >>>> >>>> >> >> >
> >>>> >>>> >> >> > Regards,
> >>>> >>>> >> >> > Behroz
> >>>> >>>> >> >>
> >>>> >>>> >> >>
> >>>> >>>> >> >>
> >>>> >>>> >> >> --
> >>>> >>>> >> >> Best Regards, Edward J. Yoon
> >>>> >>>> >> >>
> >>>> >>>> >>
> >>>> >>>> >>
> >>>> >>>> >>
> >>>> >>>> >> --
> >>>> >>>> >> Best Regards, Edward J. Yoon
> >>>> >>>> >>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>>
> >>>> >>>> --
> >>>> >>>> Best Regards, Edward J. Yoon
> >>>> >>>>
> >>>> >>>
> >>>> >>>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Best Regards, Edward J. Yoon
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best Regards, Edward J. Yoon
> >>>>
> >>>
> >>>
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
>

Reply via email to