Ok perfect. I do not have rights on /etc/hosts so that's why I was using the IP addresses. I will talk to the administrator.
Btw I am wondering, how PI example was able to communicate with the other servers. PI examples runs fine even if I have tasks more than 3 (works on both machines). On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon <[email protected]> wrote: > OKay almost done. I guess you need to add host names to your > /etc/hosts file. :-) Please see also > > http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster > > On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander <[email protected]> > wrote: > > Server 2 was showing the exception that I posted in the previous email. > > Server1 is showing the following exception > > > > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: > starting > > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added. > > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer > > groomd_8d4b512cf448_50000 > > java.net.UnknownHostException: unknown host: 8d4b512cf448 > > at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225) > > at org.apache.hama.ipc.Client.getConnection(Client.java:1039) > > at org.apache.hama.ipc.Client.call(Client.java:888) > > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239) > > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source) > > > > I am looking into this issue. > > > > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <[email protected]> > wrote: > > > >> Ok great. I was able to run the zk, groom and bspmaster on server 1. But > >> when I ran the groom on server2 I got the following exception > >> > >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in > >> establishing communication link with BSPMaster > >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while > >> reinitializing GroomServer: java.io.IOException: There is a problem in > >> establishing communication link with BSPMaster. > >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426) > >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860) > >> at java.lang.Thread.run(Thread.java:745) > >> > >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <[email protected]> > >> wrote: > >> > >>> Here's my configurations: > >>> > >>> hama-site.xml: > >>> > >>> <property> > >>> <name>bsp.master.address</name> > >>> <value>cluster-0:40000</value> > >>> </property> > >>> > >>> <property> > >>> <name>fs.default.name</name> > >>> <value>hdfs://cluster-0:9000/</value> > >>> </property> > >>> > >>> <property> > >>> <name>hama.zookeeper.quorum</name> > >>> <value>cluster-0</value> > >>> </property> > >>> > >>> > >>> % bin/hama zookeeper > >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid > >>> configuration, only one server specified (ignoring) > >>> > >>> Then, open new terminal and run master with following command: > >>> > >>> % bin/hama bspmaster > >>> ... > >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false > >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client > >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting > >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: > starting > >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: > starting > >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING > >>> > >>> > >>> > >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon < > [email protected]> > >>> wrote: > >>> > Hi, > >>> > > >>> > If you run zk server too, BSPmaster will be connected to zk and won't > >>> > throw exceptions. > >>> > > >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander < > [email protected]> > >>> wrote: > >>> >> Hi, > >>> >> Thank you the information. I moved to hama 0.7.0 and I still have > the > >>> same > >>> >> problem. > >>> >> When I run % bin/hama bspmaster, I am getting the following > exception > >>> >> > >>> >> INFO http.HttpServer: Port returned by > >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1. > >>> Opening > >>> >> the listener on 40013 > >>> >> INFO http.HttpServer: listener.getLocalPort() returned 40013 > >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013 > >>> >> INFO http.HttpServer: Jetty bound to port 40013 > >>> >> INFO mortbay.log: jetty-6.1.14 > >>> >> INFO mortbay.log: Extract > >>> >> > >>> > jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/ > >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp > >>> >> INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc > :40013 > >>> >> INFO bsp.BSPMaster: Cleaning up the system directory > >>> >> INFO bsp.BSPMaster: hdfs:// > >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system > >>> >> INFO sync.ZKSyncBSPMasterClient: Initialized ZK false > >>> >> INFO sync.ZKSyncClient: Initializing ZK Sync Client > >>> >> ERROR sync.ZKSyncBSPMasterClient: > >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: > >>> >> KeeperErrorCode = ConnectionLoss for /bsp > >>> >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:99) > >>> >> at > org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) > >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) > >>> >> at > >>> >> > >>> > org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62) > >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534) > >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517) > >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500) > >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46) > >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56) > >>> >> ERROR sync.ZKSyncBSPMasterClient: > >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: > >>> >> KeeperErrorCode = ConnectionLoss for /bsp > >>> >> > >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using > >>> just > >>> >> two servers 172.17.0.3 and 172.17.0.7)* > >>> >> <property> > >>> >> <name>hama.zookeeper.quorum</name> > >>> >> <value>172.17.0.3,172.17.0.7</value> > >>> >> <description>Comma separated list of servers in the > >>> >> ZooKeeper quorum. > >>> >> For example, "host1.mydomain.com, > host2.mydomain.com, > >>> >> host3.mydomain.com". > >>> >> By default this is set to localhost for local and > >>> >> pseudo-distributed modes > >>> >> of operation. For a fully-distributed setup, this > >>> should > >>> >> be set to a full > >>> >> list of ZooKeeper quorum servers. If > HAMA_MANAGES_ZK > >>> is > >>> >> set in hama-env.sh > >>> >> this is the list of servers which we will > start/stop > >>> >> ZooKeeper on. > >>> >> </description> > >>> >> </property> > >>> >> ...... > >>> >> <property> > >>> >> <name>hama.zookeeper.property.clientPort</name> > >>> >> <value>2181</value> > >>> >> </property> > >>> >> > >>> >> Is something wrong with my settings ? > >>> >> > >>> >> Regards, > >>> >> Behroz Sikander > >>> >> > >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon < > >>> [email protected]> > >>> >> wrote: > >>> >> > >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra > >>> >>> configurations > >>> >>> > >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. > Yarn > >>> >>> configuration is only needed when you want to submit a BSP job to > Yarn > >>> >>> cluster > >>> >>> without Hama cluster. So you don't need to worry about it. :-) > >>> >>> > >>> >>> > distributed mode ? and is there any way to manage the server ? I > >>> mean > >>> >>> right > >>> >>> > now, I have 3 machines with alot of configurations files and log > >>> files. > >>> >>> It > >>> >>> > >>> >>> You can use web UI at > http://masterserver_address:40013/bspmaster.jsp > >>> >>> > >>> >>> To debug your program, please try like below: > >>> >>> > >>> >>> 1) Run a BSPMaster and Zookeeper at server1. > >>> >>> % bin/hama bspmaster > >>> >>> % bin/hama zookeeper > >>> >>> > >>> >>> 2) Run a Groom at server1 and server2. > >>> >>> > >>> >>> % bin/hama groom > >>> >>> > >>> >>> 3) Check whether deamons are running well. Then, run your program > >>> using jar > >>> >>> command at server1. > >>> >>> > >>> >>> % bin/hama jar ..... > >>> >>> > >>> >>> > In hama_[user]_bspmaster_.....log file I get the following > >>> exception. But > >>> >>> > this occurs in both cases when I run my job with 3 tasks or with > 4 > >>> tasks > >>> >>> > >>> >>> In fact, you should not see above initZK error log. > >>> >>> > >>> >>> -- > >>> >>> Best Regards, Edward J. Yoon > >>> >>> > >>> >>> > >>> >>> -----Original Message----- > >>> >>> From: Behroz Sikander [mailto:[email protected]] > >>> >>> Sent: Monday, June 29, 2015 8:18 AM > >>> >>> To: [email protected] > >>> >>> Subject: Re: Groomserer BSPPeerChild limit > >>> >>> > >>> >>> I will try the things that you mentioned. I am not using the latest > >>> version > >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra > >>> configurations > >>> >>> which makes it more harder for me to understand when things go > wrong. > >>> Any > >>> >>> suggestions ? > >>> >>> > >>> >>> Further, are there any tools that you use for debugging while in > >>> >>> distributed mode ? and is there any way to manage the server ? I > mean > >>> right > >>> >>> now, I have 3 machines with alot of configurations files and log > >>> files. It > >>> >>> takes alot of time. This makes me wonder how people who have 100s > of > >>> >>> machines debug and manage the cluster. > >>> >>> > >>> >>> Regards, > >>> >>> Behroz > >>> >>> > >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon < > >>> [email protected]> > >>> >>> wrote: > >>> >>> > >>> >>> > Hi, > >>> >>> > > >>> >>> > It looks like a zookeeper connection problem. Please check > whether > >>> >>> > zookeeper > >>> >>> > is running and every tasks can connect to zookeeper. > >>> >>> > > >>> >>> > I would recommend you to stop the firewall during debugging, and > >>> please > >>> >>> use > >>> >>> > the 0.7.0 latest release. > >>> >>> > > >>> >>> > > >>> >>> > -- > >>> >>> > Best Regards, Edward J. Yoon > >>> >>> > > >>> >>> > -----Original Message----- > >>> >>> > From: Behroz Sikander [mailto:[email protected]] > >>> >>> > Sent: Monday, June 29, 2015 7:34 AM > >>> >>> > To: [email protected] > >>> >>> > Subject: Re: Groomserer BSPPeerChild limit > >>> >>> > > >>> >>> > To figure out the issue, I was trying something else and found > out > >>> >>> another > >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both > >>> following > >>> >>> > lines give an exception. > >>> >>> > > >>> >>> > System.out.println( peer.getPeerName(0)); //Exception > >>> >>> > > >>> >>> > System.out.println( peer.getNumPeers()); //Exception > >>> >>> > > >>> >>> > > >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp > >>> function.* > >>> >>> > > >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be > >>> >>> retrieved!* > >>> >>> > > >>> >>> > at > >>> >>> > > >>> >>> > > >>> >>> > >>> > org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305) > >>> >>> > > >>> >>> > at > >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544) > >>> >>> > > >>> >>> > at > org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538) > >>> >>> > > >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)* > >>> >>> > > >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) > >>> >>> > > >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) > >>> >>> > > >>> >>> > at > >>> >>> > >>> > org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) > >>> >>> > > >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander < > >>> [email protected]> > >>> >>> > wrote: > >>> >>> > > >>> >>> > > I think I have more information on the issue. I did some > >>> debugging and > >>> >>> > > found something quite strange. > >>> >>> > > > >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 > and > >>> 3 task > >>> >>> > > will be opened on other MACHINE2), > >>> >>> > > > >>> >>> > > - 3 tasks on Machine1 are frozen and the strange thing is > that > >>> the > >>> >>> > > processes do not even enter the SETUP function of BSP class. I > >>> have > >>> >>> print > >>> >>> > > statements in the setup function of BSP class and it doesn't > print > >>> >>> > > anything. I get empty files with zero size. > >>> >>> > > > >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:29 . > >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 .. > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000000_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000000_0.log > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000001_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000001_0.log > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000002_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 > >>> >>> > > attempt_201506281624_0001_000002_0.log > >>> >>> > > > >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class > and > >>> >>> prints > >>> >>> > > stuff. See the size of files generated on output. How is it > >>> possible > >>> >>> that > >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ? > >>> >>> > > > >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:39 . > >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 .. > >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000003_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000003_0.log > >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000004_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 1368 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000004_0.log > >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000005_0.err > >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 > >>> >>> > > attempt_201506281639_0001_000005_0.log > >>> >>> > > > >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000001_0' has started. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000002_0' has started. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000000_0' has started. > >>> >>> > > > >>> >>> > > - Hama Groom log file on MACHINE2 shows > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000003_0' has started. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000004_0' has started. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > 'attempt_201506281639_0001_000005_0' has started. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > attempt_201506281639_0001_000004_0 is *done*. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > attempt_201506281639_0001_000003_0 is *done*. > >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task > >>> >>> > > attempt_201506281639_0001_000005_0 is *done*. > >>> >>> > > > >>> >>> > > Any clue what might be going wrong ? > >>> >>> > > > >>> >>> > > Regards, > >>> >>> > > Behroz > >>> >>> > > > >>> >>> > > > >>> >>> > > > >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander < > >>> [email protected]> > >>> >>> > > wrote: > >>> >>> > > > >>> >>> > >> Here is the log file from that folder > >>> >>> > >> > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 > for > >>> port > >>> >>> > >> 61001 > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: > starting > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer > >>> >>> > >> address:b178b33b16cc port:61001 > >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on > 61001: > >>> >>> > starting > >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync > >>> Client > >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start > >>> connecting > >>> >>> to > >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001 > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001 > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on > 61001: > >>> >>> > exiting > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server > listener > >>> on > >>> >>> 61001 > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on > 61001: > >>> >>> > exiting > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on > 61001: > >>> >>> > exiting > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server > Responder > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on > 61001: > >>> >>> > exiting > >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on > 61001: > >>> >>> > exiting > >>> >>> > >> > >>> >>> > >> > >>> >>> > >> And my console shows the following ouptut. Hama is frozen > right > >>> now. > >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: > >>> >>> > >> job_201506262331_0003 > >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps > >>> number: 0 > >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps > >>> number: 2 > >>> >>> > >> > >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon < > >>> >>> [email protected]> > >>> >>> > >> wrote: > >>> >>> > >> > >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs > folder. > >>> >>> > >>> > >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander < > >>> [email protected] > >>> >>> > > >>> >>> > >>> wrote: > >>> >>> > >>> > Yea. I also thought that. I ran the program through eclipse > >>> with 20 > >>> >>> > >>> tasks > >>> >>> > >>> > and it works fine. > >>> >>> > >>> > > >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon < > >>> >>> > [email protected] > >>> >>> > >>> > > >>> >>> > >>> > wrote: > >>> >>> > >>> > > >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs > fine. > >>> When I > >>> >>> > >>> run my > >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I > >>> increase > >>> >>> > the > >>> >>> > >>> tasks > >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not > >>> >>> understand > >>> >>> > >>> what > >>> >>> > >>> >> can > >>> >>> > >>> >> > go wrong. > >>> >>> > >>> >> > >>> >>> > >>> >> It looks like a program bug. Have you ran your program in > >>> local > >>> >>> > mode? > >>> >>> > >>> >> > >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander < > >>> >>> > [email protected]> > >>> >>> > >>> >> wrote: > >>> >>> > >>> >> > Hi, > >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 > and 3 > >>> are > >>> >>> > >>> resolved > >>> >>> > >>> >> but > >>> >>> > >>> >> > issue number 2 is still giving me headaches. > >>> >>> > >>> >> > > >>> >>> > >>> >> > My problem: > >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them > >>> properly > >>> >>> > >>> >> configured > >>> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop > >>> and > >>> >>> Hama, > >>> >>> > >>> I can > >>> >>> > >>> >> > see the processes started on other 2 machines. If I > check > >>> the > >>> >>> > >>> maximum > >>> >>> > >>> >> tasks > >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on > each > >>> >>> > machine). > >>> >>> > >>> >> > > >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs > fine. > >>> When I > >>> >>> > >>> run my > >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I > >>> increase > >>> >>> > the > >>> >>> > >>> tasks > >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not > >>> >>> understand > >>> >>> > >>> what > >>> >>> > >>> >> can > >>> >>> > >>> >> > go wrong. > >>> >>> > >>> >> > > >>> >>> > >>> >> > I checked the logs files and things look fine. I just > >>> sometimes > >>> >>> > get > >>> >>> > >>> an > >>> >>> > >>> >> > exception that hama was not able to delete the sytem > >>> directory > >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml. > >>> >>> > >>> >> > > >>> >>> > >>> >> > Any help or clue would be great. > >>> >>> > >>> >> > > >>> >>> > >>> >> > Regards, > >>> >>> > >>> >> > Behroz Sikander > >>> >>> > >>> >> > > >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander < > >>> >>> > >>> [email protected]> > >>> >>> > >>> >> wrote: > >>> >>> > >>> >> > > >>> >>> > >>> >> >> Thank you :) > >>> >>> > >>> >> >> > >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon < > >>> >>> > >>> [email protected] > >>> >>> > >>> >> > > >>> >>> > >>> >> >> wrote: > >>> >>> > >>> >> >> > >>> >>> > >>> >> >>> Hi, > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> You can get the maximum number of available tasks like > >>> >>> following > >>> >>> > >>> code: > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> BSPJobClient jobClient = new BSPJobClient(conf); > >>> >>> > >>> >> >>> ClusterStatus cluster = > >>> jobClient.getClusterStatus(true); > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> // Set to maximum > >>> >>> > >>> >> >>> bsp.setNumBspTask(cluster.getMaxTasks()); > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander < > >>> >>> > >>> [email protected]> > >>> >>> > >>> >> >>> wrote: > >>> >>> > >>> >> >>> > Hi, > >>> >>> > >>> >> >>> > 1) Thank you for this. > >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log > files > >>> of PI > >>> >>> > >>> example > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > *Result of JPS command on slave* > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > >>> >>> > >>> >> > >>> >>> > >>> > >>> >>> > > >>> >>> > >>> > http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > *Result of JPS command on Master* > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > >>> >>> > >>> >> > >>> >>> > >>> > >>> >>> > > >>> >>> > >>> > http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input > >>> submitted to > >>> >>> > the > >>> >>> > >>> job. > >>> >>> > >>> >> >>> During > >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am > >>> looking > >>> >>> > for > >>> >>> > >>> >> >>> something > >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask(). > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > Regards, > >>> >>> > >>> >> >>> > Behroz > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon < > >>> >>> > >>> >> [email protected] > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> > wrote: > >>> >>> > >>> >> >>> > > >>> >>> > >>> >> >>> >> Hello, > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a > configuration > >>> >>> using > >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, > >>> the > >>> >>> > >>> fs.defaultFS > >>> >>> > >>> >> >>> >> property should be in hama-site.xml > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> <property> > >>> >>> > >>> >> >>> >> <name>fs.defaultFS</name> > >>> >>> > >>> >> >>> >> <value>hdfs://host1.mydomain.com:9000/</value> > >>> >>> > >>> >> >>> >> <description> > >>> >>> > >>> >> >>> >> The name of the default file system. Either > the > >>> >>> literal > >>> >>> > >>> string > >>> >>> > >>> >> >>> >> "local" or a host:port for HDFS. > >>> >>> > >>> >> >>> >> </description> > >>> >>> > >>> >> >>> >> </property> > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks > per > >>> node. > >>> >>> > It > >>> >>> > >>> looks > >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example > >>> and look > >>> >>> > at > >>> >>> > >>> the > >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the > >>> images > >>> >>> to > >>> >>> > >>> >> mailing > >>> >>> > >>> >> >>> >> list so I can't see it. > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) > method. > >>> If > >>> >>> input > >>> >>> > >>> is > >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically > driven > >>> by > >>> >>> the > >>> >>> > >>> number > >>> >>> > >>> >> of > >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on > >>> HAMA-956. > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> Thanks! > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander < > >>> >>> > >>> >> [email protected]> > >>> >>> > >>> >> >>> >> wrote: > >>> >>> > >>> >> >>> >> > Hi, > >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to > a 2 > >>> >>> > machine > >>> >>> > >>> >> setup. > >>> >>> > >>> >> >>> I was > >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the > HDFS > >>> to get > >>> >>> > >>> data. I > >>> >>> > >>> >> >>> have 3 > >>> >>> > >>> >> >>> >> > trivial questions > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP > >>> address > >>> >>> > of > >>> >>> > >>> >> server > >>> >>> > >>> >> >>> >> running > >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick > >>> from > >>> >>> the > >>> >>> > >>> >> >>> configurations > >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something > >>> wrong. > >>> >>> Right > >>> >>> > >>> now my > >>> >>> > >>> >> >>> code > >>> >>> > >>> >> >>> >> work > >>> >>> > >>> >> >>> >> > by using the following. > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new > >>> >>> > >>> URI("hdfs://server_ip:port/"), > >>> >>> > >>> >> >>> conf); > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it > >>> >>> automatically > >>> >>> > >>> starts > >>> >>> > >>> >> >>> hama in > >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and > slave > >>> are > >>> >>> set > >>> >>> > >>> as > >>> >>> > >>> >> >>> >> groomservers. > >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job > which > >>> >>> means > >>> >>> > >>> that I > >>> >>> > >>> >> can > >>> >>> > >>> >> >>> >> open > >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my > jar > >>> with > >>> >>> 3 > >>> >>> > >>> bsp > >>> >>> > >>> >> tasks > >>> >>> > >>> >> >>> then > >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 > tasks, > >>> Hama > >>> >>> > >>> freezes. > >>> >>> > >>> >> >>> Here is > >>> >>> > >>> >> >>> >> the > >>> >>> > >>> >> >>> >> > result of JPS command on slave. > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > Result of JPS command on Master > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on > slaves > >>> but > >>> >>> not > >>> >>> > >>> on > >>> >>> > >>> >> >>> master. > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum > >>> property in > >>> >>> > >>> >> >>> >> hama-default.xml > >>> >>> > >>> >> >>> >> > to 4 but still same result. > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild > >>> >>> processes > >>> >>> > >>> as > >>> >>> > >>> >> >>> possible. > >>> >>> > >>> >> >>> >> Is > >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ? > >>> Or hama > >>> >>> > >>> picks up > >>> >>> > >>> >> >>> the > >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ? > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > Regards, > >>> >>> > >>> >> >>> >> > > >>> >>> > >>> >> >>> >> > Behroz Sikander > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> >> -- > >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon > >>> >>> > >>> >> >>> >> > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >>> -- > >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon > >>> >>> > >>> >> >>> > >>> >>> > >>> >> >> > >>> >>> > >>> >> >> > >>> >>> > >>> >> > >>> >>> > >>> >> > >>> >>> > >>> >> > >>> >>> > >>> >> -- > >>> >>> > >>> >> Best Regards, Edward J. Yoon > >>> >>> > >>> >> > >>> >>> > >>> > >>> >>> > >>> > >>> >>> > >>> > >>> >>> > >>> -- > >>> >>> > >>> Best Regards, Edward J. Yoon > >>> >>> > >>> > >>> >>> > >> > >>> >>> > >> > >>> >>> > > > >>> >>> > > >>> >>> > > >>> >>> > > >>> >>> > >>> >>> > >>> >>> > >>> > > >>> > > >>> > > >>> > -- > >>> > Best Regards, Edward J. Yoon > >>> > >>> > >>> > >>> -- > >>> Best Regards, Edward J. Yoon > >>> > >> > >> > > > > -- > Best Regards, Edward J. Yoon >
