In case you do, please have a look into your Namenode logs. Do you see
something like
"org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:"?

2012/9/17 Thomas Jungblut <[email protected]>

> Are you running the current trunk version?
>
>
> 2012/9/17 Thomas Jungblut <[email protected]>
>
>> No idea. The log doesn't show anything.
>> Anyone else have an idea?
>>
>>
>> 2012/9/17 Zhuang Kechen <[email protected]>
>>
>>> *the logs of task  attempt :*
>>>
>>>
>>>
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27
>>> GMT
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:host.name
>>> =625-PC
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.version=1.7.0
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Oracle Corporation
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>>
>>> environment:java.class.path=/home/function/hadoop-1.0.3/hama-0.5.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-core-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-examples-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../hama-graph-0.5.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/ant-1.7.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/ant-launcher-1.7.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/avro-1.6.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/avro-ipc-1.6.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-cli-1.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-configuration-1.7.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-lang-2.6.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-logging-1.1.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/commons-math3-3.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/guava-10.0.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/hadoop-core-1.0.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/hadoop-test-1.0.0.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jackson-core-asl-1.9.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jackson-mapper-asl-1.9.2.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-annotations-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jetty-util-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jsp-2.1-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/jsp-api-2.1-6.1.14.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/junit-4.8.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/log4j-1.2.16.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/netty-3.2.6.Final.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/servlet-api-6.0.32.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/slf4j-api-1.5.8.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/slf4j-log4j12-1.5.8.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/snappy-java-1.0.4.1.jar:/home/function/hadoop-1.0.3/hama-0.5.0/bin/../lib/zookeeper-3.3.3.jar::/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work/classes:/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.library.path=/usr/java/packages/lib/i386:/lib:/usr/lib
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=/tmp
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:os.name
>>> =Linux
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:os.arch=i386
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:os.version=3.2.0-23-generic-pae
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client environment:user.name
>>> =function
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>> environment:user.home=/home/function
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Client
>>>
>>> environment:user.dir=/tmp/hama-hduser/bsp/local/groomServer/attempt_201008172027_0007_000000_0/work
>>> 12/09/17 21:03:40 INFO zookeeper.ZooKeeper: Initiating client connection,
>>> connectString=627-PC:21810,625-PC:21810,623-PC:21810,624-PC:21810
>>> sessionTimeout=1200000
>>> watcher=org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl@7e6024
>>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Opening socket connection to
>>> server 624-PC/192.168.1.2:21810
>>> 12/09/17 21:03:40 INFO sync.ZooKeeperSyncClientImpl: Start connecting to
>>> Zookeeper! At 625-PC/192.168.0.3:61002
>>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Socket connection
>>> established
>>> to 624-PC/192.168.1.2:21810, initiating session
>>> 12/09/17 21:03:40 INFO zookeeper.ClientCnxn: Session establishment
>>> complete
>>> on server 624-PC/192.168.1.2:21810, sessionid = 0x2a8004a6830016,
>>> negotiated timeout = 1200000
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: Connecting to 624-PC/
>>> 192.168.1.2:61001
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138] OPEN
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] BOUND: /192.168.0.3:34094
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] CONNECTED: 624-PC/
>>> 192.168.1.2:61001
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:48 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: Connecting to 623-PC/
>>> 192.168.0.2:61001
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] BOUND: /192.168.0.3:45977
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] CONNECTED: 623-PC/
>>> 192.168.0.2:61001
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8,
>>> /192.168.0.5:57665=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924,
>>> /192.168.0.3:39122=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8,
>>> /192.168.0.5:57665=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00cb68d8,
>>> /192.168.0.5:57665=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57665
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: Connecting to 625-PC/
>>> 192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0,
>>> /192.168.0.3:39123=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924,
>>> /192.168.0.3:39122=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0,
>>> /192.168.0.3:39123=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x008bf924,
>>> /192.168.0.3:39122=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39122
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x0193e3d0,
>>> /192.168.0.3:39123=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39123
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653,
>>> /192.168.0.5:57666=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653,
>>> /192.168.0.5:57666=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyServer: [id: 0x00af4653,
>>> /192.168.0.5:57666=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57666
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] BOUND: /192.168.0.3:39123
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] CONNECTED: 625-PC/
>>> 192.168.0.3:61002
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:49 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: Connecting to 623-PC/
>>> 192.168.0.2:61003
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa] OPEN
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] BOUND: /192.168.0.3:56679
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] CONNECTED: 623-PC/
>>> 192.168.0.2:61003
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b,
>>> /192.168.0.3:39126=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b,
>>> /192.168.0.3:39126=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:50 INFO ipc.NettyServer: [id: 0x0050039b,
>>> /192.168.0.3:39126=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.3:39126
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: Connecting to 623-PC/
>>> 192.168.0.2:61002
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb] OPEN
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] BOUND: /192.168.0.3:49159
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] CONNECTED: 623-PC/
>>> 192.168.0.2:61002
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:50 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6,
>>> /192.168.0.5:57672=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6,
>>> /192.168.0.5:57672=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b9b6b6,
>>> /192.168.0.5:57672=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.5:57672
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675,
>>> /192.168.1.2:59923=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675,
>>> /192.168.1.2:59923=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7,
>>> /192.168.0.2:45938=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00adc675,
>>> /192.168.1.2:59923=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59923
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7,
>>> /192.168.0.2:45938=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x00b68ab7,
>>> /192.168.0.2:45938=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45938
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b,
>>> /192.168.0.2:45939=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b,
>>> /192.168.0.2:45939=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:51 INFO ipc.NettyServer: [id: 0x0016d58b,
>>> /192.168.0.2:45939=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45939
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: Connecting to 625-PC/
>>> 192.168.0.3:61001
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c] OPEN
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] BOUND: /192.168.0.3:51322
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] CONNECTED: 625-PC/
>>> 192.168.0.3:61001
>>> 12/09/17 21:03:52 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9,
>>> /192.168.1.2:59927=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9,
>>> /192.168.1.2:59927=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00417ee9,
>>> /192.168.1.2:59927=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59927
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d,
>>> /192.168.1.2:59928=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d,
>>> /192.168.1.2:59928=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x01a2fe4d,
>>> /192.168.1.2:59928=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.1.2:59928
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0,
>>> /192.168.0.2:45944=> /
>>> 192.168.0.3:61002] OPEN
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0,
>>> /192.168.0.2:45944=> /
>>> 192.168.0.3:61002] BOUND: /192.168.0.3:61002
>>> 12/09/17 21:03:52 INFO ipc.NettyServer: [id: 0x00c554b0,
>>> /192.168.0.2:45944=> /
>>> 192.168.0.3:61002] CONNECTED: /192.168.0.2:45944
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 624-PC/
>>> 192.168.1.2:61003
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909] OPEN
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] BOUND: /192.168.0.3:58014
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] CONNECTED: 624-PC/
>>> 192.168.1.2:61003
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 627-PC/
>>> 192.168.0.5:61002
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21] OPEN
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] BOUND: /192.168.0.3:60492
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] CONNECTED: 627-PC/
>>> 192.168.0.5:61002
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: Connecting to 624-PC/
>>> 192.168.1.2:61002
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52] OPEN
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] BOUND: /192.168.0.3:53962
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] CONNECTED: 624-PC/
>>> 192.168.1.2:61002
>>> 12/09/17 21:03:53 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: Connecting to 627-PC/
>>> 192.168.0.5:61001
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26] OPEN
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] BOUND: /192.168.0.3:34203
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] CONNECTED: 627-PC/
>>> 192.168.0.5:61001
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: Connecting to 625-PC/
>>> 192.168.0.3:61003
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e] OPEN
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] BOUND: /192.168.0.3:47749
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] CONNECTED: 625-PC/
>>> 192.168.0.3:61003
>>> 12/09/17 21:03:54 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: Connecting to 627-PC/
>>> 192.168.0.5:61003
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d] OPEN
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] BOUND: /192.168.0.3:36006
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] CONNECTED: 627-PC/
>>> 192.168.0.5:61003
>>> 12/09/17 21:03:55 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:58 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:58 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:03:59 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:00 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:01 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:02 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:02 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:02 INFO graph.GraphJobRunner: Loading finished at 2 steps.
>>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x00e5e138, /
>>> 192.168.0.3:34094 => 624-PC/192.168.1.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:03 INFO ipc.NettyTransceiver: [id: 0x01384669, /
>>> 192.168.0.3:45977 => 623-PC/192.168.0.2:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:08 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:08 INFO ipc.NettyTransceiver: [id: 0x005043cf, /
>>> 192.168.0.3:39123 => 625-PC/192.168.0.3:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01a49bfa, /
>>> 192.168.0.3:56679 => 623-PC/192.168.0.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:09 INFO ipc.NettyTransceiver: [id: 0x01d1f61c, /
>>> 192.168.0.3:51322 => 625-PC/192.168.0.3:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00f167bb, /
>>> 192.168.0.3:49159 => 623-PC/192.168.0.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:10 INFO ipc.NettyTransceiver: [id: 0x00093909, /
>>> 192.168.0.3:58014 => 624-PC/192.168.1.2:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x002bba21, /
>>> 192.168.0.3:60492 => 627-PC/192.168.0.5:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>>
>>>
>>>
>>>
>>>
>>> 2012/9/17 Thomas Jungblut <[email protected]>
>>>
>>> > Can you post the logs of task  attempt_201008172027_0007_000000_0 ?
>>> >
>>> > 2012/9/17 Zhuang Kechen <[email protected]>
>>> >
>>> > > HI, Thomas:
>>> > > Sorry to bother you. When I run some small graph test on my cluster,
>>> a
>>> > 25Mb
>>> > > graph data job can be succeed, I can get the right output file on
>>> HDFS.
>>> > But
>>> > > the 50Mb can not. when the job fails, I got the *ZooKeeper logs end
>>> up
>>> > > likes:*
>>> > > *
>>> > > *
>>> > > 2012-09-17 21:04:27,866 WARN
>>> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > EndOfStreamException: Unable to read additional data from client
>>> > sessionid
>>> > > 0x239d433755a0014, likely client has closed socket
>>> > > 2012-09-17 21:04:32,666 INFO
>>> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > Closed socket connection for client /192.168.0.2:57977 which had
>>> > sessionid
>>> > > 0x239d433755a0014
>>> > > 2012-09-17 21:04:36,551 WARN
>>> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > EndOfStreamException: Unable to read additional data from client
>>> > sessionid
>>> > > 0x239d433755a0013, likely client has closed socket
>>> > > 2012-09-17 21:04:36,989 INFO
>>> org.apache.zookeeper.server.NIOServerCnxn:
>>> > > Closed socket connection for client /192.168.0.3:44924 which had
>>> > sessionid
>>> > > 0x239d433755a0013
>>> > >
>>> > > *GroomServer logs likes:*
>>> > > 2012-09-17 21:03:37,679 INFO org.apache.hama.bsp.GroomServer: Launch
>>> 3
>>> > > tasks.
>>> > > 2012-09-17 21:03:37,982 INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201008172027_0007_000002_0' has started.
>>> > > 2012-09-17 21:03:37,983 INFO org.apache.hama.bsp.GroomServer: Launch
>>> 3
>>> > > tasks.
>>> > > 2012-09-17 21:03:38,073 INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201008172027_0007_000000_0' has started.
>>> > > 2012-09-17 21:03:38,074 INFO org.apache.hama.bsp.GroomServer: Launch
>>> 3
>>> > > tasks.
>>> > > 2012-09-17 21:03:38,325 INFO org.apache.hama.bsp.GroomServer: Task
>>> > > 'attempt_201008172027_0007_000001_0' has started.
>>> > > 2012-09-17 21:04:23,161 INFO org.apache.hama.bsp.GroomServer: adding
>>> > purge
>>> > > task: attempt_201008172027_0007_000000_0
>>> > > 2012-09-17 21:04:23,513 INFO org.apache.hama.bsp.GroomServer: adding
>>> > purge
>>> > > task: attempt_201008172027_0007_000002_0
>>> > > 2012-09-17 21:04:23,513 INFO org.apache.hama.bsp.GroomServer: About
>>> to
>>> > > purge task: attempt_201008172027_0007_000000_0
>>> > > 2012-09-17 21:04:25,918 INFO org.apache.hama.bsp.GroomServer: About
>>> to
>>> > > purge task: attempt_201008172027_0007_000002_0
>>> > > 2012-09-17 21:04:30,707 INFO org.apache.hama.bsp.GroomServer: Kill 1
>>> > tasks.
>>> > > 2012-09-17 21:04:30,929 INFO org.apache.hama.bsp.GroomServer: Kill 1
>>> > tasks.
>>> > > 2012-09-17 21:04:30,929 INFO org.apache.hama.bsp.GroomServer: Kill 1
>>> > tasks.
>>> > > 2012-09-17 21:04:33,965 INFO org.apache.hama.bsp.GroomServer: Kill 1
>>> > tasks.
>>> > >
>>> > > *Task logs end up likes:*
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> > > 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x00a3ef26, /
>>> > > 192.168.0.3:34203 => 627-PC/192.168.0.5:61001] INTEREST_CHANGED
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> > > 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0057bd52, /
>>> > > 192.168.0.3:53962 => 624-PC/192.168.1.2:61002] INTEREST_CHANGED
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> > > 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> > > 12/09/17 21:04:11 INFO ipc.NettyTransceiver: [id: 0x0104ae5e, /
>>> > > 192.168.0.3:47749 => 625-PC/192.168.0.3:61003] INTEREST_CHANGED
>>> > > 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> > > 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> > > 12/09/17 21:04:12 INFO ipc.NettyTransceiver: [id: 0x00c0499d, /
>>> > > 192.168.0.3:36006 => 627-PC/192.168.0.5:61003] INTEREST_CHANGED
>>> > > ..........
>>> > > Do you have any idea what may cause this kind of fail? Thanks a lot!
>>> > >
>>> > >
>>> > > 2012/9/15 Thomas Jungblut <[email protected]>
>>> > >
>>> > > > Okay I have observed this problem as well with 10gb of adjacency
>>> text
>>> > > file.
>>> > > > I was running on a 75gb instance on EC2 with 70gigs heap, which
>>> should
>>> > be
>>> > > > no problem, but it fails after several steps.
>>> > > > I'm profiling it now in more detail.
>>> > > >
>>> > > > Can't be that 10gb text use more than 20gb of heap as graph with
>>> > > messages.
>>> > > >
>>> > > > 2012/9/14 Thomas Jungblut <[email protected]>
>>> > > >
>>> > > > > I would trim the spaces in the key and value.
>>> > > > > If it afterwards still crashes, I have no idea anymore and would
>>> > > > recommend
>>> > > > > you to take a heapdump with hprof and look what is sucking all
>>> that
>>> > > > memory.
>>> > > > >
>>> > > > > 2012/9/14 庄克琛 <[email protected]>
>>> > > > >
>>> > > > >> Hi, I set the property to hama-site.xml.
>>> > > > >>   <property>
>>> > > > >>     <name> hama.messenger.queue.class </name>
>>> > > > >>     <value> org.apache.hama.bsp.message.DiskQueue </value>
>>> > > > >>   </property>
>>> > > > >> Am I set it right?
>>> > > > >> and restart the hama,(stop-bspd.sh and start-bspd.sh), try the
>>> test
>>> > > job
>>> > > > >> again, and watch the memory slowly up to 70%, 80%, 90%, then
>>> > crash...
>>> > > > >_<
>>> > > > >>
>>> > > > >>
>>> > > > >> 2012/9/14 Thomas Jungblut <[email protected]>
>>> > > > >>
>>> > > > >> > Yes, I wanted to have direct memory in Hama months ago, but
>>> hadn't
>>> > > > >> managed
>>> > > > >> > to find enough time.
>>> > > > >> > That is a very good idea.
>>> > > > >> >
>>> > > > >> > 2012/9/14 Tommaso Teofili <[email protected]>
>>> > > > >> >
>>> > > > >> > > I think we may also create an Apache DirectMemory based
>>> > DiskQueue
>>> > > > >> which
>>> > > > >> > > cache things on disk but hides most of the complexity.
>>> > > > >> > > My 2 cents,
>>> > > > >> > > Tommaso
>>> > > > >> > >
>>> > > > >> > > 2012/9/14 Thomas Jungblut <[email protected]>
>>> > > > >> > >
>>> > > > >> > > > I have created an issue for that:
>>> > > > >> > > > HAMA-642<https://issues.apache.org/jira/browse/HAMA-642>
>>> > > > >> > > >
>>> > > > >> > > > 2012/9/14 Thomas Jungblut <[email protected]>
>>> > > > >> > > >
>>> > > > >> > > > > Basically I think that the graph should fit into memory
>>> of
>>> > > your
>>> > > > >> task.
>>> > > > >> > > > > So the messages could cause the overflow.
>>> > > > >> > > > >
>>> > > > >> > > > > You can try out the DiskQueue, this can be configured
>>> with
>>> > > > setting
>>> > > > >> > the
>>> > > > >> > > > > property "hama.messenger.queue.class" to
>>> > > > >> > > > > "org.apache.hama.bsp.message.DiskQueue".
>>> > > > >> > > > >
>>> > > > >> > > > > This will immediately flush the messages to disk.
>>> However
>>> > this
>>> > > > is
>>> > > > >> > > > > experimental currently, so if you try it out please
>>> tell us
>>> > if
>>> > > > it
>>> > > > >> > > helped.
>>> > > > >> > > > >
>>> > > > >> > > > > Thanks.
>>> > > > >> > > > >
>>> > > > >> > > > > To further scale this, we should write vertices that
>>> don't
>>> > fit
>>> > > > in
>>> > > > >> > > memory
>>> > > > >> > > > > on the disk. I will add another jira for that soon.
>>> > > > >> > > > >
>>> > > > >> > > > > 2012/9/14 庄克琛 <[email protected]>
>>> > > > >> > > > >
>>> > > > >> > > > >> oh, the HDFS block size is 128Mb, not 64Mb, so the 73Mb
>>> > graph
>>> > > > >> will
>>> > > > >> > not
>>> > > > >> > > > >> be split-ed on the HDFS.
>>> > > > >> > > > >>
>>> > > > >> > > > >> 2012/9/14 庄克琛 <[email protected]>
>>> > > > >> > > > >>
>>> > > > >> > > > >> > em... I have try your configure advise and restart
>>> the
>>> > > hama.
>>> > > > >> > > > >> >  I use the  Google web graph(
>>> > > > >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ),
>>> > > > >> > > > >> > Nodes: 875713 Edges: 5105039, which is about 73Mb,
>>> upload
>>> > > to
>>> > > > a
>>> > > > >> > small
>>> > > > >> > > > >> HDFS
>>> > > > >> > > > >> > cluster(block size is 64Mb), test the PageRank in (
>>> > > > >> > > > >> > http://wiki.apache.org/hama/WriteHamaGraphFile ),
>>> got
>>> > the
>>> > > > >> result
>>> > > > >> > > as:
>>> > > > >> > > > >> > ################
>>> > > > >> > > > >> > function@624-PC:~/hadoop-1.0.3/hama-0.6.0$ hama jar
>>> > > > hama-6-P*
>>> > > > >> > > > >> > input-google ouput-google
>>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total
>>> input
>>> > > paths
>>> > > > >> to
>>> > > > >> > > > >> process :
>>> > > > >> > > > >> > 1
>>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.FileInputFormat: Total #
>>> of
>>> > > > splits:
>>> > > > >> 3
>>> > > > >> > > > >> > 12/09/14 14:27:50 INFO bsp.BSPJobClient: Running job:
>>> > > > >> > > > >> job_201008141420_0004
>>> > > > >> > > > >> > 12/09/14 14:27:53 INFO bsp.BSPJobClient: Current
>>> > supersteps
>>> > > > >> > number:
>>> > > > >> > > 0
>>> > > > >> > > > >> > Java HotSpot(TM) Server VM warning: Attempt to
>>> allocate
>>> > > stack
>>> > > > >> > guard
>>> > > > >> > > > >> pages
>>> > > > >> > > > >> > failed.
>>> > > > >> > > > >> > ###################
>>> > > > >> > > > >> >
>>> > > > >> > > > >> > Last time the supersteps  could be 1 or 2, then the
>>> same
>>> > > > >> result.
>>> > > > >> > > > >> > the task attempt****.err files are empty.
>>> > > > >> > > > >> > Is the graph too large?
>>> > > > >> > > > >> > I test on a small graph, get the right Rank results
>>> > > > >> > > > >> >
>>> > > > >> > > > >> >
>>> > > > >> > > > >> > 2012/9/14 Edward J. Yoon <[email protected]>
>>> > > > >> > > > >> >
>>> > > > >> > > > >> > I've added multi-step partitioning method to save
>>> > > memory[1].
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >> Please try to configure below property to
>>> hama-site.xml.
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >>   <property>
>>> > > > >> > > > >> >>
>>> > > <name>hama.graph.multi.step.partitioning.interval</name>
>>> > > > >> > > > >> >>     <value>10000000</value>
>>> > > > >> > > > >> >>   </property>
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >> 1. https://issues.apache.org/jira/browse/HAMA-599
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >> On Fri, Sep 14, 2012 at 3:13 PM, 庄克琛 <
>>> > > > [email protected]>
>>> > > > >> > > wrote:
>>> > > > >> > > > >> >> > HI, Actually I use this (
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >>
>>> > > > >> > > > >>
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > >
>>> > >
>>> >
>>> https://builds.apache.org/job/Hama-Nightly/672/artifact/.repository/org/apache/hama/hama-dist/0.6.0-SNAPSHOT/
>>> > > > >> > > > >> >> > )
>>> > > > >> > > > >> >> > to test again, I mean use this 0.6.0SNAPSHOT
>>> version
>>> > > > replace
>>> > > > >> > > > >> everything,
>>> > > > >> > > > >> >> > got the same out of memory results. I just don't
>>> know
>>> > > what
>>> > > > >> > cause
>>> > > > >> > > > the
>>> > > > >> > > > >> >> out of
>>> > > > >> > > > >> >> > memory fails, only some small graph computing can
>>> be
>>> > > > >> finished.
>>> > > > >> > Is
>>> > > > >> > > > >> this
>>> > > > >> > > > >> >> > version finished the "
>>> > > > >> > > > >> >> > [HAMA-596<
>>> > > https://issues.apache.org/jira/browse/HAMA-596
>>> > > > >> > > > >]:Optimize
>>> > > > >> > > > >> >> > memory usage of graph job" ?
>>> > > > >> > > > >> >> > Thanks
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > 2012/9/14 Thomas Jungblut <
>>> [email protected]>
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> >> Hey, what jar did you exactly replace?
>>> > > > >> > > > >> >> >> Am 14.09.2012 07:49 schrieb "庄克琛" <
>>> > > > [email protected]
>>> > > > >> >:
>>> > > > >> > > > >> >> >>
>>> > > > >> > > > >> >> >> > hi, every one:
>>> > > > >> > > > >> >> >> > I use the hama-0.5.0 with the hadoop-1.0.3,
>>> try to
>>> > do
>>> > > > >> some
>>> > > > >> > > large
>>> > > > >> > > > >> >> graphs
>>> > > > >> > > > >> >> >> > analysis.
>>> > > > >> > > > >> >> >> > When I test the PageRank examples, as the (
>>> > > > >> > > > >> >> >> > http://wiki.apache.org/hama/WriteHamaGraphFile
>>> )
>>> > > > shows, I
>>> > > > >> > > > download
>>> > > > >> > > > >> >> the
>>> > > > >> > > > >> >> >> > graph
>>> > > > >> > > > >> >> >> > data, and run the PageRank job on a small
>>> > distributed
>>> > > > >> > cluser,
>>> > > > >> > > I
>>> > > > >> > > > >> can
>>> > > > >> > > > >> >> only
>>> > > > >> > > > >> >> >> > get the out of memory failed, with Superstep
>>> 0,1,2
>>> > > > works
>>> > > > >> > well,
>>> > > > >> > > > >> then
>>> > > > >> > > > >> >> get
>>> > > > >> > > > >> >> >> the
>>> > > > >> > > > >> >> >> > memory out fail.(Each computer have 2G memory)
>>> But
>>> > > > when I
>>> > > > >> > test
>>> > > > >> > > > >> some
>>> > > > >> > > > >> >> small
>>> > > > >> > > > >> >> >> > graph, everything went well.
>>> > > > >> > > > >> >> >> > Also I try the trunk version(
>>> > > > >> > > > >> >> >> >
>>> > > > >> > >
>>> https://builds.apache.org/job/Hama-Nightly/672/changes#detail3
>>> > > > >> > > > ),
>>> > > > >> > > > >> >> replace
>>> > > > >> > > > >> >> >> > my
>>> > > > >> > > > >> >> >> > hama-0.5.0 with the hama-0.6.0-snapshot, only
>>> get
>>> > the
>>> > > > >> same
>>> > > > >> > > > >> results.
>>> > > > >> > > > >> >> >> > Anyone got better ideas?
>>> > > > >> > > > >> >> >> >
>>> > > > >> > > > >> >> >> > Thanks!
>>> > > > >> > > > >> >> >> >
>>> > > > >> > > > >> >> >> > --
>>> > > > >> > > > >> >> >> >
>>> > > > >> > > > >> >> >> > *Zhuang Kechen
>>> > > > >> > > > >> >> >> > *
>>> > > > >> > > > >> >> >> >
>>> > > > >> > > > >> >> >>
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > --
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > *Zhuang Kechen*
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > School of Computer Science & Technology
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > **
>>> > > > >> > > > >> >> > Nanjing University of Science & Technology
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > Lab.623, School of Computer Sci. & Tech.
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > No.200, Xiaolingwei Street
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > Nanjing, Jiangsu, 210094
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > P.R. China
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > Tel: 025-84315982**
>>> > > > >> > > > >> >> >
>>> > > > >> > > > >> >> > Email: [email protected]
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >> --
>>> > > > >> > > > >> >> Best Regards, Edward J. Yoon
>>> > > > >> > > > >> >> @eddieyoon
>>> > > > >> > > > >> >>
>>> > > > >> > > > >> >
>>> > > > >> > > > >> >
>>> > > > >> > > > >> >
>>> > > > >> > > > >> > --
>>> > > > >> > > > >> >
>>> > > > >> > > > >> > *Zhuang Kechen
>>> > > > >> > > > >> > *
>>> > > > >> > > > >> >
>>> > > > >> > > > >> >
>>> > > > >> > > > >> >
>>> > > > >> > > > >>
>>> > > > >> > > > >>
>>> > > > >> > > > >> --
>>> > > > >> > > > >>
>>> > > > >> > > > >> *Zhuang Kechen*
>>> > > > >> > > > >>
>>> > > > >> > > > >> School of Computer Science & Technology
>>> > > > >> > > > >>
>>> > > > >> > > > >> **
>>> > > > >> > > > >> Nanjing University of Science & Technology
>>> > > > >> > > > >>
>>> > > > >> > > > >> Lab.623, School of Computer Sci. & Tech.
>>> > > > >> > > > >>
>>> > > > >> > > > >> No.200, Xiaolingwei Street
>>> > > > >> > > > >>
>>> > > > >> > > > >> Nanjing, Jiangsu, 210094
>>> > > > >> > > > >>
>>> > > > >> > > > >> P.R. China
>>> > > > >> > > > >>
>>> > > > >> > > > >> Tel: 025-84315982**
>>> > > > >> > > > >>
>>> > > > >> > > > >> Email: [email protected]
>>> > > > >> > > > >>
>>> > > > >> > > > >
>>> > > > >> > > > >
>>> > > > >> > > >
>>> > > > >> > >
>>> > > > >> >
>>> > > > >>
>>> > > > >>
>>> > > > >>
>>> > > > >> --
>>> > > > >>
>>> > > > >> *Zhuang Kechen*
>>> > > > >>
>>> > > > >
>>> > > > >
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > >
>>> > > *Zhuang Kechen*
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>>
>>> *Zhuang Kechen*
>>>
>>
>>
>

Reply via email to