Re: out of memory problem...

Zhuang Kechen Mon, 17 Sep 2012 19:49:16 -0700

Hey, Thomas:
thanks for your reply, I check the namenode logs, nothing like
"org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:" in there,
and I use the 0.5.0 release version, not trunk version. So I don't think
it's the HAMA itself cause this kind of fail. Maybe zookeeper? and I have used
my hadoop cluster for several large data analysis job before, I don't think
the HDFS may cause this fail.


I also try to lower the value of "hama.graph.multi.step.partitioning.interval",
still the same.
I think it's the zookeeper. The bspmaster log have an error like this:

2010-08-18 10:20:25,309 ERROR org.apache.hama.bsp.BSPMaster:
org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /bsp
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:485)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:457)
at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:449)
at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56)

and in the zookeeper log, got 2 warn in last lines:

2012-09-17 21:04:27,866 WARN org.apache.zookeeper.server.NIOServerCnxn:
EndOfStreamException: Unable to read additional data from client sessionid
0x239d433755a0014, likely client has closed socket
2012-09-17 21:04:32,666 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /192.168.0.2:57977 which had sessionid
0x239d433755a0014
2012-09-17 21:04:36,551 WARN org.apache.zookeeper.server.NIOServerCnxn:
EndOfStreamException: Unable to read additional data from client sessionid
0x239d433755a0013, likely client has closed socket
2012-09-17 21:04:36,989 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /192.168.0.3:44924 which had sessionid
0x239d433755a0013

and the groomserver log also have an exception :

2012-09-18 10:22:55,028 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 41060: readAndProcess threw exception java.io.IOException:
Connection reset by peer. Count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
at sun.nio.ch.IOUtil.read(IOUtil.java:191)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:1720)
at org.apache.hadoop.ipc.Server.access$2700(Server.java:94)
at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1094)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)




2012/9/18 Edward J. Yoon <[email protected]>

> Then, please lower the value of
> "hama.graph.multi.step.partitioning.interval".
>
> On Fri, Sep 14, 2012 at 3:45 PM, 庄克琛 <[email protected]> wrote:
> > em... I have try your configure advise and restart the hama.
> >  I use the  Google web graph(
> http://wiki.apache.org/hama/WriteHamaGraphFile
> >  ),
> > Nodes: 875713 Edges: 5105039, which is about 73Mb, upload to a small HDFS
> > cluster(block size is 64Mb), test the PageRank in (
> > http://wiki.apache.org/hama/WriteHamaGraphFile ), got the result as:
> > ################
> > function@624-PC:~/hadoop-1.0.3/hama-0.6.0$ hama jar hama-6-P*
> input-google
> > ouput-google
> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total input paths to process
> : 1
> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total # of splits: 3
> > 12/09/14 14:27:50 INFO bsp.BSPJobClient: Running job:
> job_201008141420_0004
> > 12/09/14 14:27:53 INFO bsp.BSPJobClient: Current supersteps number: 0
> > Java HotSpot(TM) Server VM warning: Attempt to allocate stack guard pages
> > failed.
> > ###################
> >
> > Last time the supersteps  could be 1 or 2, then the same result.
> > the task attempt****.err files are empty.
> > Is the graph too large?
> > I test on a small graph, get the right Rank results
> >
> >
> > 2012/9/14 Edward J. Yoon <[email protected]>
> >
> >> I've added multi-step partitioning method to save memory[1].
> >>
> >> Please try to configure below property to hama-site.xml.
> >>
> >>   <property>
> >>     <name>hama.graph.multi.step.partitioning.interval</name>
> >>     <value>10000000</value>
> >>   </property>
> >>
> >> 1. https://issues.apache.org/jira/browse/HAMA-599
> >>
> >> On Fri, Sep 14, 2012 at 3:13 PM, 庄克琛 <[email protected]> wrote:
> >> > HI, Actually I use this (
> >> >
> >>
> https://builds.apache.org/job/Hama-Nightly/672/artifact/.repository/org/apache/hama/hama-dist/0.6.0-SNAPSHOT/
> >> > )
> >> > to test again, I mean use this 0.6.0SNAPSHOT version replace
> everything,
> >> > got the same out of memory results. I just don't know what cause the
> out
> >> of
> >> > memory fails, only some small graph computing can be finished. Is this
> >> > version finished the "
> >> > [HAMA-596<https://issues.apache.org/jira/browse/HAMA-596>]:Optimize
> >> > memory usage of graph job" ?
> >> > Thanks
> >> >
> >> > 2012/9/14 Thomas Jungblut <[email protected]>
> >> >
> >> >> Hey, what jar did you exactly replace?
> >> >> Am 14.09.2012 07:49 schrieb "庄克琛" <[email protected]>:
> >> >>
> >> >> > hi, every one:
> >> >> > I use the hama-0.5.0 with the hadoop-1.0.3, try to do some large
> >> graphs
> >> >> > analysis.
> >> >> > When I test the PageRank examples, as the (
> >> >> > http://wiki.apache.org/hama/WriteHamaGraphFile) shows, I download
> the
> >> >> > graph
> >> >> > data, and run the PageRank job on a small distributed cluser, I can
> >> only
> >> >> > get the out of memory failed, with Superstep 0,1,2 works well, then
> >> get
> >> >> the
> >> >> > memory out fail.(Each computer have 2G memory) But when I test some
> >> small
> >> >> > graph, everything went well.
> >> >> > Also I try the trunk version(
> >> >> > https://builds.apache.org/job/Hama-Nightly/672/changes#detail3),
> >> replace
> >> >> > my
> >> >> > hama-0.5.0 with the hama-0.6.0-snapshot, only get the same results.
> >> >> > Anyone got better ideas?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > --
> >> >> >
> >> >> > *Zhuang Kechen
> >> >> > *
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> >
> >> > *Zhuang Kechen*
> >> >
> >> > School of Computer Science & Technology
> >> >
> >> > **
> >> > Nanjing University of Science & Technology
> >> >
> >> > Lab.623, School of Computer Sci. & Tech.
> >> >
> >> > No.200, Xiaolingwei Street
> >> >
> >> > Nanjing, Jiangsu, 210094
> >> >
> >> > P.R. China
> >> >
> >> > Tel: 025-84315982**
> >> >
> >> > Email: [email protected]
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
> >
> >
> >
> > --
> >
> > *Zhuang Kechen
> > *
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 

*Zhuang Kechen*

Re: out of memory problem...

Reply via email to