In case you do, please have a look into your Namenode logs. Do you see something like "org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:"?
2012/9/17 Thomas Jungblut <[email protected]> > Are you running the current trunk version? > > > 2012/9/17 Thomas Jungblut <[email protected]> > >> No idea. The log doesn't show anything. >> Anyone else have an idea? >> >> >> 2012/9/17 Zhuang Kechen <[email protected]> >> >>> *the logs of task attempt :* >>> >>> >>> >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 >>> GMT >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:host.name >>> =625-PC >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.version=1.7.0 >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.vendor=Oracle Corporation >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> >>> environment:java.class.path=/home/function/hadoop-1.0.3/hama-0.5.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-core-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-examples-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-graph-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/ant-1.7.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/ant-launcher-1.7.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/avro-1.6.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/avro-ipc-1.6.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-cli-1.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-configuration-1.7.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-lang-2.6.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-logging-1.1.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-math3-3.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/guava-10.0.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/hadoop-core-1.0.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/hadoop-test-1.0.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jackson-core-asl-1.9.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jackson-mapper-asl-1.9.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-annotations-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-util-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jsp-2.1-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jsp-api-2.1-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/junit-4.8.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/log4j-1.2.16.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/netty-3.2.6.Final.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/servlet-api-6.0.32.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/slf4j-api-1.5.8.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/slf4j-log4j12-1.5.8.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/snappy-java-1.0.4.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/zookeeper-3.3.3.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.io.tmpdir=/tmp >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:java.compiler=<NA> >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:os.name >>> =Linux >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:os.arch=i386 >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:os.version=3.2.0-23-generic-pae >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:user.name >>> =function >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> environment:user.home=/home/function >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client >>> >>> environment:user.dir=/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work >>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Initiating client connection, >>> connectString=627-PC:21810,625-PC:21810,623-PC:21810,624-PC:21810 >>> sessionTimeout=1200000 >>> watcher=org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl@7e6024 >>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Opening socket connection to >>> server 624-PC/192.168.1.2:21810 >>> 12/09/17 21:03:40 INFO sync.ZooKeeperSyncClientImpl: Start connecting to >>> Zookeeper! At 625-PC/192.168.0.3:61002 >>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Socket connection >>> established >>> to 624-PC/192.168.1.2:21810, initiating session >>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Session establishment >>> complete >>> on server 624-PC/192.168.1.2:21810, sessionid = 0x2a8004a6830016, >>> negotiated timeout = 1200000 >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: Connecting to 624-PC/ >>> 192.168.1.2:61001 >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138] OPEN >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] BOUND: /192.168.0.3:34094 >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] CONNECTED: 624-PC/ >>> 192.168.1.2:61001 >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: Connecting to 623-PC/ >>> 192.168.0.2:61001 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] BOUND: /192.168.0.3:45977 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] CONNECTED: 623-PC/ >>> 192.168.0.2:61001 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8, >>> /192.168.0.5:57665=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924, >>> /192.168.0.3:39122=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8, >>> /192.168.0.5:57665=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8, >>> /192.168.0.5:57665=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57665 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: Connecting to 625-PC/ >>> 192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0, >>> /192.168.0.3:39123=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924, >>> /192.168.0.3:39122=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0, >>> /192.168.0.3:39123=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924, >>> /192.168.0.3:39122=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39122 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0, >>> /192.168.0.3:39123=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39123 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653, >>> /192.168.0.5:57666=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653, >>> /192.168.0.5:57666=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653, >>> /192.168.0.5:57666=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57666 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] BOUND: /192.168.0.3:39123 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] CONNECTED: 625-PC/ >>> 192.168.0.3:61002 >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: Connecting to 623-PC/ >>> 192.168.0.2:61003 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa] OPEN >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] BOUND: /192.168.0.3:56679 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] CONNECTED: 623-PC/ >>> 192.168.0.2:61003 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b, >>> /192.168.0.3:39126=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b, >>> /192.168.0.3:39126=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b, >>> /192.168.0.3:39126=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39126 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: Connecting to 623-PC/ >>> 192.168.0.2:61002 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb] OPEN >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] BOUND: /192.168.0.3:49159 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] CONNECTED: 623-PC/ >>> 192.168.0.2:61002 >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6, >>> /192.168.0.5:57672=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6, >>> /192.168.0.5:57672=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6, >>> /192.168.0.5:57672=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57672 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675, >>> /192.168.1.2:59923=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675, >>> /192.168.1.2:59923=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7, >>> /192.168.0.2:45938=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675, >>> /192.168.1.2:59923=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59923 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7, >>> /192.168.0.2:45938=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7, >>> /192.168.0.2:45938=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45938 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b, >>> /192.168.0.2:45939=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b, >>> /192.168.0.2:45939=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b, >>> /192.168.0.2:45939=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45939 >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: Connecting to 625-PC/ >>> 192.168.0.3:61001 >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c] OPEN >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] BOUND: /192.168.0.3:51322 >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] CONNECTED: 625-PC/ >>> 192.168.0.3:61001 >>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9, >>> /192.168.1.2:59927=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9, >>> /192.168.1.2:59927=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9, >>> /192.168.1.2:59927=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59927 >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d, >>> /192.168.1.2:59928=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d, >>> /192.168.1.2:59928=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d, >>> /192.168.1.2:59928=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59928 >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0, >>> /192.168.0.2:45944=> / >>> 192.168.0.3:61002] OPEN >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0, >>> /192.168.0.2:45944=> / >>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002 >>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0, >>> /192.168.0.2:45944=> / >>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45944 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 624-PC/ >>> 192.168.1.2:61003 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909] OPEN >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] BOUND: /192.168.0.3:58014 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] CONNECTED: 624-PC/ >>> 192.168.1.2:61003 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 627-PC/ >>> 192.168.0.5:61002 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21] OPEN >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] BOUND: /192.168.0.3:60492 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] CONNECTED: 627-PC/ >>> 192.168.0.5:61002 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 624-PC/ >>> 192.168.1.2:61002 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52] OPEN >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] BOUND: /192.168.0.3:53962 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] CONNECTED: 624-PC/ >>> 192.168.1.2:61002 >>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: Connecting to 627-PC/ >>> 192.168.0.5:61001 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26] OPEN >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] BOUND: /192.168.0.3:34203 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] CONNECTED: 627-PC/ >>> 192.168.0.5:61001 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: Connecting to 625-PC/ >>> 192.168.0.3:61003 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e] OPEN >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] BOUND: /192.168.0.3:47749 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] CONNECTED: 625-PC/ >>> 192.168.0.3:61003 >>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: Connecting to 627-PC/ >>> 192.168.0.5:61003 >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d] OPEN >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] BOUND: /192.168.0.3:36006 >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] CONNECTED: 627-PC/ >>> 192.168.0.5:61003 >>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> 12/09/17 21:03:58 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:58 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:04:02 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> 12/09/17 21:04:02 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> 12/09/17 21:04:02 INFO graph.GraphJobRunner: Loading finished at 2 steps. >>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x00e5e138, / >>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED >>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x01384669, / >>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED >>> 12/09/17 21:04:08 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:04:08 INFO ipc.NettyTransceiver: [id: 0x005043cf, / >>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED >>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, / >>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, / >>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED >>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00f167bb, / >>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00093909, / >>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x002bba21, / >>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> >>> >>> >>> >>> >>> 2012/9/17 Thomas Jungblut <[email protected]> >>> >>> > Can you post the logs of task attempt_201008172027_0007_000000_0 ? >>> > >>> > 2012/9/17 Zhuang Kechen <[email protected]> >>> > >>> > > HI, Thomas: >>> > > Sorry to bother you. When I run some small graph test on my cluster, >>> a >>> > 25Mb >>> > > graph data job can be succeed, I can get the right output file on >>> HDFS. >>> > But >>> > > the 50Mb can not. when the job fails, I got the *ZooKeeper logs end >>> up >>> > > likes:* >>> > > * >>> > > * >>> > > 2012-09-17 21:04:27,866 WARN >>> org.apache.zookeeper.server.NIOServerCnxn: >>> > > EndOfStreamException: Unable to read additional data from client >>> > sessionid >>> > > 0x239d433755a0014, likely client has closed socket >>> > > 2012-09-17 21:04:32,666 INFO >>> org.apache.zookeeper.server.NIOServerCnxn: >>> > > Closed socket connection for client /192.168.0.2:57977 which had >>> > sessionid >>> > > 0x239d433755a0014 >>> > > 2012-09-17 21:04:36,551 WARN >>> org.apache.zookeeper.server.NIOServerCnxn: >>> > > EndOfStreamException: Unable to read additional data from client >>> > sessionid >>> > > 0x239d433755a0013, likely client has closed socket >>> > > 2012-09-17 21:04:36,989 INFO >>> org.apache.zookeeper.server.NIOServerCnxn: >>> > > Closed socket connection for client /192.168.0.3:44924 which had >>> > sessionid >>> > > 0x239d433755a0013 >>> > > >>> > > *GroomServer logs likes:* >>> > > 2012-09-17 21:03:37,679 INFO org.apache.hama.bsp.GroomServer: Launch >>> 3 >>> > > tasks. >>> > > 2012-09-17 21:03:37,982 INFO org.apache.hama.bsp.GroomServer: Task >>> > > 'attempt_201008172027_0007_000002_0' has started. >>> > > 2012-09-17 21:03:37,983 INFO org.apache.hama.bsp.GroomServer: Launch >>> 3 >>> > > tasks. >>> > > 2012-09-17 21:03:38,073 INFO org.apache.hama.bsp.GroomServer: Task >>> > > 'attempt_201008172027_0007_000000_0' has started. >>> > > 2012-09-17 21:03:38,074 INFO org.apache.hama.bsp.GroomServer: Launch >>> 3 >>> > > tasks. >>> > > 2012-09-17 21:03:38,325 INFO org.apache.hama.bsp.GroomServer: Task >>> > > 'attempt_201008172027_0007_000001_0' has started. >>> > > 2012-09-17 21:04:23,161 INFO org.apache.hama.bsp.GroomServer: adding >>> > purge >>> > > task: attempt_201008172027_0007_000000_0 >>> > > 2012-09-17 21:04:23,513 INFO org.apache.hama.bsp.GroomServer: adding >>> > purge >>> > > task: attempt_201008172027_0007_000002_0 >>> > > 2012-09-17 21:04:23,513 INFO org.apache.hama.bsp.GroomServer: About >>> to >>> > > purge task: attempt_201008172027_0007_000000_0 >>> > > 2012-09-17 21:04:25,918 INFO org.apache.hama.bsp.GroomServer: About >>> to >>> > > purge task: attempt_201008172027_0007_000002_0 >>> > > 2012-09-17 21:04:30,707 INFO org.apache.hama.bsp.GroomServer: Kill 1 >>> > tasks. >>> > > 2012-09-17 21:04:30,929 INFO org.apache.hama.bsp.GroomServer: Kill 1 >>> > tasks. >>> > > 2012-09-17 21:04:30,929 INFO org.apache.hama.bsp.GroomServer: Kill 1 >>> > tasks. >>> > > 2012-09-17 21:04:33,965 INFO org.apache.hama.bsp.GroomServer: Kill 1 >>> > tasks. >>> > > >>> > > *Task logs end up likes:* >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> > > 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, / >>> > > 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> > > 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, / >>> > > 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> > > 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, / >>> > > 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED >>> > > 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> > > 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> > > 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, / >>> > > 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED >>> > > .......... >>> > > Do you have any idea what may cause this kind of fail? Thanks a lot! >>> > > >>> > > >>> > > 2012/9/15 Thomas Jungblut <[email protected]> >>> > > >>> > > > Okay I have observed this problem as well with 10gb of adjacency >>> text >>> > > file. >>> > > > I was running on a 75gb instance on EC2 with 70gigs heap, which >>> should >>> > be >>> > > > no problem, but it fails after several steps. >>> > > > I'm profiling it now in more detail. >>> > > > >>> > > > Can't be that 10gb text use more than 20gb of heap as graph with >>> > > messages. >>> > > > >>> > > > 2012/9/14 Thomas Jungblut <[email protected]> >>> > > > >>> > > > > I would trim the spaces in the key and value. >>> > > > > If it afterwards still crashes, I have no idea anymore and would >>> > > > recommend >>> > > > > you to take a heapdump with hprof and look what is sucking all >>> that >>> > > > memory. >>> > > > > >>> > > > > 2012/9/14 庄克琛 <[email protected]> >>> > > > > >>> > > > >> Hi, I set the property to hama-site.xml. >>> > > > >> <property> >>> > > > >> <name> hama.messenger.queue.class </name> >>> > > > >> <value> org.apache.hama.bsp.message.DiskQueue </value> >>> > > > >> </property> >>> > > > >> Am I set it right? >>> > > > >> and restart the hama,(stop-bspd.sh and start-bspd.sh), try the >>> test >>> > > job >>> > > > >> again, and watch the memory slowly up to 70%, 80%, 90%, then >>> > crash... >>> > > > >_< >>> > > > >> >>> > > > >> >>> > > > >> 2012/9/14 Thomas Jungblut <[email protected]> >>> > > > >> >>> > > > >> > Yes, I wanted to have direct memory in Hama months ago, but >>> hadn't >>> > > > >> managed >>> > > > >> > to find enough time. >>> > > > >> > That is a very good idea. >>> > > > >> > >>> > > > >> > 2012/9/14 Tommaso Teofili <[email protected]> >>> > > > >> > >>> > > > >> > > I think we may also create an Apache DirectMemory based >>> > DiskQueue >>> > > > >> which >>> > > > >> > > cache things on disk but hides most of the complexity. >>> > > > >> > > My 2 cents, >>> > > > >> > > Tommaso >>> > > > >> > > >>> > > > >> > > 2012/9/14 Thomas Jungblut <[email protected]> >>> > > > >> > > >>> > > > >> > > > I have created an issue for that: >>> > > > >> > > > HAMA-642<https://issues.apache.org/jira/browse/HAMA-642> >>> > > > >> > > > >>> > > > >> > > > 2012/9/14 Thomas Jungblut <[email protected]> >>> > > > >> > > > >>> > > > >> > > > > Basically I think that the graph should fit into memory >>> of >>> > > your >>> > > > >> task. >>> > > > >> > > > > So the messages could cause the overflow. >>> > > > >> > > > > >>> > > > >> > > > > You can try out the DiskQueue, this can be configured >>> with >>> > > > setting >>> > > > >> > the >>> > > > >> > > > > property "hama.messenger.queue.class" to >>> > > > >> > > > > "org.apache.hama.bsp.message.DiskQueue". >>> > > > >> > > > > >>> > > > >> > > > > This will immediately flush the messages to disk. >>> However >>> > this >>> > > > is >>> > > > >> > > > > experimental currently, so if you try it out please >>> tell us >>> > if >>> > > > it >>> > > > >> > > helped. >>> > > > >> > > > > >>> > > > >> > > > > Thanks. >>> > > > >> > > > > >>> > > > >> > > > > To further scale this, we should write vertices that >>> don't >>> > fit >>> > > > in >>> > > > >> > > memory >>> > > > >> > > > > on the disk. I will add another jira for that soon. >>> > > > >> > > > > >>> > > > >> > > > > 2012/9/14 庄克琛 <[email protected]> >>> > > > >> > > > > >>> > > > >> > > > >> oh, the HDFS block size is 128Mb, not 64Mb, so the 73Mb >>> > graph >>> > > > >> will >>> > > > >> > not >>> > > > >> > > > >> be split-ed on the HDFS. >>> > > > >> > > > >> >>> > > > >> > > > >> 2012/9/14 庄克琛 <[email protected]> >>> > > > >> > > > >> >>> > > > >> > > > >> > em... I have try your configure advise and restart >>> the >>> > > hama. >>> > > > >> > > > >> > I use the Google web graph( >>> > > > >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ), >>> > > > >> > > > >> > Nodes: 875713 Edges: 5105039, which is about 73Mb, >>> upload >>> > > to >>> > > > a >>> > > > >> > small >>> > > > >> > > > >> HDFS >>> > > > >> > > > >> > cluster(block size is 64Mb), test the PageRank in ( >>> > > > >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ), >>> got >>> > the >>> > > > >> result >>> > > > >> > > as: >>> > > > >> > > > >> > ################ >>> > > > >> > > > >> > function@624-PC:~/hadoop-1.0.3/hama-0.6.0$ hama jar >>> > > > hama-6-P* >>> > > > >> > > > >> > input-google ouput-google >>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total >>> input >>> > > paths >>> > > > >> to >>> > > > >> > > > >> process : >>> > > > >> > > > >> > 1 >>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total # >>> of >>> > > > splits: >>> > > > >> 3 >>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.BSPJobClient: Running job: >>> > > > >> > > > >> job_201008141420_0004 >>> > > > >> > > > >> > 12/09/14 14:27:53 INFO bsp.BSPJobClient: Current >>> > supersteps >>> > > > >> > number: >>> > > > >> > > 0 >>> > > > >> > > > >> > Java HotSpot(TM) Server VM warning: Attempt to >>> allocate >>> > > stack >>> > > > >> > guard >>> > > > >> > > > >> pages >>> > > > >> > > > >> > failed. >>> > > > >> > > > >> > ################### >>> > > > >> > > > >> > >>> > > > >> > > > >> > Last time the supersteps could be 1 or 2, then the >>> same >>> > > > >> result. >>> > > > >> > > > >> > the task attempt****.err files are empty. >>> > > > >> > > > >> > Is the graph too large? >>> > > > >> > > > >> > I test on a small graph, get the right Rank results >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > 2012/9/14 Edward J. Yoon <[email protected]> >>> > > > >> > > > >> > >>> > > > >> > > > >> > I've added multi-step partitioning method to save >>> > > memory[1]. >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> Please try to configure below property to >>> hama-site.xml. >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> <property> >>> > > > >> > > > >> >> >>> > > <name>hama.graph.multi.step.partitioning.interval</name> >>> > > > >> > > > >> >> <value>10000000</value> >>> > > > >> > > > >> >> </property> >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> 1. https://issues.apache.org/jira/browse/HAMA-599 >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> On Fri, Sep 14, 2012 at 3:13 PM, 庄克琛 < >>> > > > [email protected]> >>> > > > >> > > wrote: >>> > > > >> > > > >> >> > HI, Actually I use this ( >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> >>> > > > >> > > > >> >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >>> > > >>> > >>> https://builds.apache.org/job/Hama-Nightly/672/artifact/.repository/org/apache/hama/hama-dist/0.6.0-SNAPSHOT/ >>> > > > >> > > > >> >> > ) >>> > > > >> > > > >> >> > to test again, I mean use this 0.6.0SNAPSHOT >>> version >>> > > > replace >>> > > > >> > > > >> everything, >>> > > > >> > > > >> >> > got the same out of memory results. I just don't >>> know >>> > > what >>> > > > >> > cause >>> > > > >> > > > the >>> > > > >> > > > >> >> out of >>> > > > >> > > > >> >> > memory fails, only some small graph computing can >>> be >>> > > > >> finished. >>> > > > >> > Is >>> > > > >> > > > >> this >>> > > > >> > > > >> >> > version finished the " >>> > > > >> > > > >> >> > [HAMA-596< >>> > > https://issues.apache.org/jira/browse/HAMA-596 >>> > > > >> > > > >]:Optimize >>> > > > >> > > > >> >> > memory usage of graph job" ? >>> > > > >> > > > >> >> > Thanks >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > 2012/9/14 Thomas Jungblut < >>> [email protected]> >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> >> Hey, what jar did you exactly replace? >>> > > > >> > > > >> >> >> Am 14.09.2012 07:49 schrieb "庄克琛" < >>> > > > [email protected] >>> > > > >> >: >>> > > > >> > > > >> >> >> >>> > > > >> > > > >> >> >> > hi, every one: >>> > > > >> > > > >> >> >> > I use the hama-0.5.0 with the hadoop-1.0.3, >>> try to >>> > do >>> > > > >> some >>> > > > >> > > large >>> > > > >> > > > >> >> graphs >>> > > > >> > > > >> >> >> > analysis. >>> > > > >> > > > >> >> >> > When I test the PageRank examples, as the ( >>> > > > >> > > > >> >> >> > http://wiki.apache.org/hama/WriteHamaGraphFile >>> ) >>> > > > shows, I >>> > > > >> > > > download >>> > > > >> > > > >> >> the >>> > > > >> > > > >> >> >> > graph >>> > > > >> > > > >> >> >> > data, and run the PageRank job on a small >>> > distributed >>> > > > >> > cluser, >>> > > > >> > > I >>> > > > >> > > > >> can >>> > > > >> > > > >> >> only >>> > > > >> > > > >> >> >> > get the out of memory failed, with Superstep >>> 0,1,2 >>> > > > works >>> > > > >> > well, >>> > > > >> > > > >> then >>> > > > >> > > > >> >> get >>> > > > >> > > > >> >> >> the >>> > > > >> > > > >> >> >> > memory out fail.(Each computer have 2G memory) >>> But >>> > > > when I >>> > > > >> > test >>> > > > >> > > > >> some >>> > > > >> > > > >> >> small >>> > > > >> > > > >> >> >> > graph, everything went well. >>> > > > >> > > > >> >> >> > Also I try the trunk version( >>> > > > >> > > > >> >> >> > >>> > > > >> > > >>> https://builds.apache.org/job/Hama-Nightly/672/changes#detail3 >>> > > > >> > > > ), >>> > > > >> > > > >> >> replace >>> > > > >> > > > >> >> >> > my >>> > > > >> > > > >> >> >> > hama-0.5.0 with the hama-0.6.0-snapshot, only >>> get >>> > the >>> > > > >> same >>> > > > >> > > > >> results. >>> > > > >> > > > >> >> >> > Anyone got better ideas? >>> > > > >> > > > >> >> >> > >>> > > > >> > > > >> >> >> > Thanks! >>> > > > >> > > > >> >> >> > >>> > > > >> > > > >> >> >> > -- >>> > > > >> > > > >> >> >> > >>> > > > >> > > > >> >> >> > *Zhuang Kechen >>> > > > >> > > > >> >> >> > * >>> > > > >> > > > >> >> >> > >>> > > > >> > > > >> >> >> >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > -- >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > *Zhuang Kechen* >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > School of Computer Science & Technology >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > ** >>> > > > >> > > > >> >> > Nanjing University of Science & Technology >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > Lab.623, School of Computer Sci. & Tech. >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > No.200, Xiaolingwei Street >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > Nanjing, Jiangsu, 210094 >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > P.R. China >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > Tel: 025-84315982** >>> > > > >> > > > >> >> > >>> > > > >> > > > >> >> > Email: [email protected] >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> >>> > > > >> > > > >> >> -- >>> > > > >> > > > >> >> Best Regards, Edward J. Yoon >>> > > > >> > > > >> >> @eddieyoon >>> > > > >> > > > >> >> >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > -- >>> > > > >> > > > >> > >>> > > > >> > > > >> > *Zhuang Kechen >>> > > > >> > > > >> > * >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> > >>> > > > >> > > > >> >>> > > > >> > > > >> >>> > > > >> > > > >> -- >>> > > > >> > > > >> >>> > > > >> > > > >> *Zhuang Kechen* >>> > > > >> > > > >> >>> > > > >> > > > >> School of Computer Science & Technology >>> > > > >> > > > >> >>> > > > >> > > > >> ** >>> > > > >> > > > >> Nanjing University of Science & Technology >>> > > > >> > > > >> >>> > > > >> > > > >> Lab.623, School of Computer Sci. & Tech. >>> > > > >> > > > >> >>> > > > >> > > > >> No.200, Xiaolingwei Street >>> > > > >> > > > >> >>> > > > >> > > > >> Nanjing, Jiangsu, 210094 >>> > > > >> > > > >> >>> > > > >> > > > >> P.R. China >>> > > > >> > > > >> >>> > > > >> > > > >> Tel: 025-84315982** >>> > > > >> > > > >> >>> > > > >> > > > >> Email: [email protected] >>> > > > >> > > > >> >>> > > > >> > > > > >>> > > > >> > > > > >>> > > > >> > > > >>> > > > >> > > >>> > > > >> > >>> > > > >> >>> > > > >> >>> > > > >> >>> > > > >> -- >>> > > > >> >>> > > > >> *Zhuang Kechen* >>> > > > >> >>> > > > > >>> > > > > >>> > > > >>> > > >>> > > >>> > > >>> > > -- >>> > > >>> > > *Zhuang Kechen* >>> > > >>> > >>> >>> >>> >>> -- >>> >>> *Zhuang Kechen* >>> >> >> >
