Hey, Thomas: thanks for your reply, I check the namenode logs, nothing like "org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:" in there, and I use the 0.5.0 release version, not trunk version. So I don't think it's the HAMA itself cause this kind of fail. Maybe zookeeper? and I have used my hadoop cluster for several large data analysis job before, I don't think the HDFS may cause this fail.
I also try to lower the value of "hama.graph.multi.step.partitioning.interval", still the same. I think it's the zookeeper. The bspmaster log have an error like this: 2010-08-18 10:20:25,309 ERROR org.apache.hama.bsp.BSPMaster: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /bsp at org.apache.zookeeper.KeeperException.create(KeeperException.java:90) at org.apache.zookeeper.KeeperException.create(KeeperException.java:42) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809) at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837) at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:485) at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:457) at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:449) at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56) and in the zookeeper log, got 2 warn in last lines: 2012-09-17 21:04:27,866 WARN org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x239d433755a0014, likely client has closed socket 2012-09-17 21:04:32,666 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /192.168.0.2:57977 which had sessionid 0x239d433755a0014 2012-09-17 21:04:36,551 WARN org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException: Unable to read additional data from client sessionid 0x239d433755a0013, likely client has closed socket 2012-09-17 21:04:36,989 INFO org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection for client /192.168.0.3:44924 which had sessionid 0x239d433755a0013 and the groomserver log also have an exception : 2012-09-18 10:22:55,028 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 41060: readAndProcess threw exception java.io.IOException: Connection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218) at sun.nio.ch.IOUtil.read(IOUtil.java:191) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359) at org.apache.hadoop.ipc.Server.channelRead(Server.java:1720) at org.apache.hadoop.ipc.Server.access$2700(Server.java:94) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1094) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:537) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:344) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) 2012/9/18 Edward J. Yoon <[email protected]> > Then, please lower the value of > "hama.graph.multi.step.partitioning.interval". > > On Fri, Sep 14, 2012 at 3:45 PM, 庄克琛 <[email protected]> wrote: > > em... I have try your configure advise and restart the hama. > > I use the Google web graph( > http://wiki.apache.org/hama/WriteHamaGraphFile > > ), > > Nodes: 875713 Edges: 5105039, which is about 73Mb, upload to a small HDFS > > cluster(block size is 64Mb), test the PageRank in ( > > http://wiki.apache.org/hama/WriteHamaGraphFile ), got the result as: > > ################ > > function@624-PC:~/hadoop-1.0.3/hama-0.6.0$ hama jar hama-6-P* > input-google > > ouput-google > > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total input paths to process > : 1 > > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total # of splits: 3 > > 12/09/14 14:27:50 INFO bsp.BSPJobClient: Running job: > job_201008141420_0004 > > 12/09/14 14:27:53 INFO bsp.BSPJobClient: Current supersteps number: 0 > > Java HotSpot(TM) Server VM warning: Attempt to allocate stack guard pages > > failed. > > ################### > > > > Last time the supersteps could be 1 or 2, then the same result. > > the task attempt****.err files are empty. > > Is the graph too large? > > I test on a small graph, get the right Rank results > > > > > > 2012/9/14 Edward J. Yoon <[email protected]> > > > >> I've added multi-step partitioning method to save memory[1]. > >> > >> Please try to configure below property to hama-site.xml. > >> > >> <property> > >> <name>hama.graph.multi.step.partitioning.interval</name> > >> <value>10000000</value> > >> </property> > >> > >> 1. https://issues.apache.org/jira/browse/HAMA-599 > >> > >> On Fri, Sep 14, 2012 at 3:13 PM, 庄克琛 <[email protected]> wrote: > >> > HI, Actually I use this ( > >> > > >> > https://builds.apache.org/job/Hama-Nightly/672/artifact/.repository/org/apache/hama/hama-dist/0.6.0-SNAPSHOT/ > >> > ) > >> > to test again, I mean use this 0.6.0SNAPSHOT version replace > everything, > >> > got the same out of memory results. I just don't know what cause the > out > >> of > >> > memory fails, only some small graph computing can be finished. Is this > >> > version finished the " > >> > [HAMA-596<https://issues.apache.org/jira/browse/HAMA-596>]:Optimize > >> > memory usage of graph job" ? > >> > Thanks > >> > > >> > 2012/9/14 Thomas Jungblut <[email protected]> > >> > > >> >> Hey, what jar did you exactly replace? > >> >> Am 14.09.2012 07:49 schrieb "庄克琛" <[email protected]>: > >> >> > >> >> > hi, every one: > >> >> > I use the hama-0.5.0 with the hadoop-1.0.3, try to do some large > >> graphs > >> >> > analysis. > >> >> > When I test the PageRank examples, as the ( > >> >> > http://wiki.apache.org/hama/WriteHamaGraphFile) shows, I download > the > >> >> > graph > >> >> > data, and run the PageRank job on a small distributed cluser, I can > >> only > >> >> > get the out of memory failed, with Superstep 0,1,2 works well, then > >> get > >> >> the > >> >> > memory out fail.(Each computer have 2G memory) But when I test some > >> small > >> >> > graph, everything went well. > >> >> > Also I try the trunk version( > >> >> > https://builds.apache.org/job/Hama-Nightly/672/changes#detail3), > >> replace > >> >> > my > >> >> > hama-0.5.0 with the hama-0.6.0-snapshot, only get the same results. > >> >> > Anyone got better ideas? > >> >> > > >> >> > Thanks! > >> >> > > >> >> > -- > >> >> > > >> >> > *Zhuang Kechen > >> >> > * > >> >> > > >> >> > >> > > >> > > >> > > >> > -- > >> > > >> > *Zhuang Kechen* > >> > > >> > School of Computer Science & Technology > >> > > >> > ** > >> > Nanjing University of Science & Technology > >> > > >> > Lab.623, School of Computer Sci. & Tech. > >> > > >> > No.200, Xiaolingwei Street > >> > > >> > Nanjing, Jiangsu, 210094 > >> > > >> > P.R. China > >> > > >> > Tel: 025-84315982** > >> > > >> > Email: [email protected] > >> > >> > >> > >> -- > >> Best Regards, Edward J. Yoon > >> @eddieyoon > >> > > > > > > > > -- > > > > *Zhuang Kechen > > * > > > > -- > Best Regards, Edward J. Yoon > @eddieyoon > -- *Zhuang Kechen*
