Server 2 was showing the exception that I posted in the previous email. Server1 is showing the following exception
15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: starting 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is added. 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer groomd_8d4b512cf448_50000 java.net.UnknownHostException: unknown host: 8d4b512cf448 at org.apache.hama.ipc.Client$Connection.<init>(Client.java:225) at org.apache.hama.ipc.Client.getConnection(Client.java:1039) at org.apache.hama.ipc.Client.call(Client.java:888) at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239) at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source) I am looking into this issue. On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander <[email protected]> wrote: > Ok great. I was able to run the zk, groom and bspmaster on server 1. But > when I ran the groom on server2 I got the following exception > > 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in > establishing communication link with BSPMaster > 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while > reinitializing GroomServer: java.io.IOException: There is a problem in > establishing communication link with BSPMaster. > at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426) > at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860) > at java.lang.Thread.run(Thread.java:745) > > On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon <[email protected]> > wrote: > >> Here's my configurations: >> >> hama-site.xml: >> >> <property> >> <name>bsp.master.address</name> >> <value>cluster-0:40000</value> >> </property> >> >> <property> >> <name>fs.default.name</name> >> <value>hdfs://cluster-0:9000/</value> >> </property> >> >> <property> >> <name>hama.zookeeper.quorum</name> >> <value>cluster-0</value> >> </property> >> >> >> % bin/hama zookeeper >> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid >> configuration, only one server specified (ignoring) >> >> Then, open new terminal and run master with following command: >> >> % bin/hama bspmaster >> ... >> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK false >> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client >> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting >> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: starting >> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: starting >> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING >> >> >> >> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon <[email protected]> >> wrote: >> > Hi, >> > >> > If you run zk server too, BSPmaster will be connected to zk and won't >> > throw exceptions. >> > >> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander <[email protected]> >> wrote: >> >> Hi, >> >> Thank you the information. I moved to hama 0.7.0 and I still have the >> same >> >> problem. >> >> When I run % bin/hama bspmaster, I am getting the following exception >> >> >> >> INFO http.HttpServer: Port returned by >> >> webServer.getConnectors()[0].getLocalPort() before open() is -1. >> Opening >> >> the listener on 40013 >> >> INFO http.HttpServer: listener.getLocalPort() returned 40013 >> >> webServer.getConnectors()[0].getLocalPort() returned 40013 >> >> INFO http.HttpServer: Jetty bound to port 40013 >> >> INFO mortbay.log: jetty-6.1.14 >> >> INFO mortbay.log: Extract >> >> >> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/ >> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp >> >> INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc:40013 >> >> INFO bsp.BSPMaster: Cleaning up the system directory >> >> INFO bsp.BSPMaster: hdfs:// >> 172.17.0.3:54310/tmp/hama-behroz/bsp/system >> >> INFO sync.ZKSyncBSPMasterClient: Initialized ZK false >> >> INFO sync.ZKSyncClient: Initializing ZK Sync Client >> >> ERROR sync.ZKSyncBSPMasterClient: >> >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >> KeeperErrorCode = ConnectionLoss for /bsp >> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> >> at >> >> >> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62) >> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534) >> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517) >> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500) >> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56) >> >> ERROR sync.ZKSyncBSPMasterClient: >> >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >> KeeperErrorCode = ConnectionLoss for /bsp >> >> >> >> *Why zookeeper settings in hama-site.xml are (right now, I am using >> just >> >> two servers 172.17.0.3 and 172.17.0.7)* >> >> <property> >> >> <name>hama.zookeeper.quorum</name> >> >> <value>172.17.0.3,172.17.0.7</value> >> >> <description>Comma separated list of servers in the >> >> ZooKeeper quorum. >> >> For example, "host1.mydomain.com,host2.mydomain.com, >> >> host3.mydomain.com". >> >> By default this is set to localhost for local and >> >> pseudo-distributed modes >> >> of operation. For a fully-distributed setup, this >> should >> >> be set to a full >> >> list of ZooKeeper quorum servers. If HAMA_MANAGES_ZK >> is >> >> set in hama-env.sh >> >> this is the list of servers which we will start/stop >> >> ZooKeeper on. >> >> </description> >> >> </property> >> >> ...... >> >> <property> >> >> <name>hama.zookeeper.property.clientPort</name> >> >> <value>2181</value> >> >> </property> >> >> >> >> Is something wrong with my settings ? >> >> >> >> Regards, >> >> Behroz Sikander >> >> >> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon < >> [email protected]> >> >> wrote: >> >> >> >>> > (0.7.0) because I do not understand YARN yet. It adds extra >> >>> configurations >> >>> >> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. Yarn >> >>> configuration is only needed when you want to submit a BSP job to Yarn >> >>> cluster >> >>> without Hama cluster. So you don't need to worry about it. :-) >> >>> >> >>> > distributed mode ? and is there any way to manage the server ? I >> mean >> >>> right >> >>> > now, I have 3 machines with alot of configurations files and log >> files. >> >>> It >> >>> >> >>> You can use web UI at http://masterserver_address:40013/bspmaster.jsp >> >>> >> >>> To debug your program, please try like below: >> >>> >> >>> 1) Run a BSPMaster and Zookeeper at server1. >> >>> % bin/hama bspmaster >> >>> % bin/hama zookeeper >> >>> >> >>> 2) Run a Groom at server1 and server2. >> >>> >> >>> % bin/hama groom >> >>> >> >>> 3) Check whether deamons are running well. Then, run your program >> using jar >> >>> command at server1. >> >>> >> >>> % bin/hama jar ..... >> >>> >> >>> > In hama_[user]_bspmaster_.....log file I get the following >> exception. But >> >>> > this occurs in both cases when I run my job with 3 tasks or with 4 >> tasks >> >>> >> >>> In fact, you should not see above initZK error log. >> >>> >> >>> -- >> >>> Best Regards, Edward J. Yoon >> >>> >> >>> >> >>> -----Original Message----- >> >>> From: Behroz Sikander [mailto:[email protected]] >> >>> Sent: Monday, June 29, 2015 8:18 AM >> >>> To: [email protected] >> >>> Subject: Re: Groomserer BSPPeerChild limit >> >>> >> >>> I will try the things that you mentioned. I am not using the latest >> version >> >>> (0.7.0) because I do not understand YARN yet. It adds extra >> configurations >> >>> which makes it more harder for me to understand when things go wrong. >> Any >> >>> suggestions ? >> >>> >> >>> Further, are there any tools that you use for debugging while in >> >>> distributed mode ? and is there any way to manage the server ? I mean >> right >> >>> now, I have 3 machines with alot of configurations files and log >> files. It >> >>> takes alot of time. This makes me wonder how people who have 100s of >> >>> machines debug and manage the cluster. >> >>> >> >>> Regards, >> >>> Behroz >> >>> >> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon < >> [email protected]> >> >>> wrote: >> >>> >> >>> > Hi, >> >>> > >> >>> > It looks like a zookeeper connection problem. Please check whether >> >>> > zookeeper >> >>> > is running and every tasks can connect to zookeeper. >> >>> > >> >>> > I would recommend you to stop the firewall during debugging, and >> please >> >>> use >> >>> > the 0.7.0 latest release. >> >>> > >> >>> > >> >>> > -- >> >>> > Best Regards, Edward J. Yoon >> >>> > >> >>> > -----Original Message----- >> >>> > From: Behroz Sikander [mailto:[email protected]] >> >>> > Sent: Monday, June 29, 2015 7:34 AM >> >>> > To: [email protected] >> >>> > Subject: Re: Groomserer BSPPeerChild limit >> >>> > >> >>> > To figure out the issue, I was trying something else and found out >> >>> another >> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both >> following >> >>> > lines give an exception. >> >>> > >> >>> > System.out.println( peer.getPeerName(0)); //Exception >> >>> > >> >>> > System.out.println( peer.getNumPeers()); //Exception >> >>> > >> >>> > >> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp >> function.* >> >>> > >> >>> > [time]java.lang.*RuntimeException: All peer names could not be >> >>> retrieved!* >> >>> > >> >>> > at >> >>> > >> >>> > >> >>> >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305) >> >>> > >> >>> > at >> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544) >> >>> > >> >>> > at org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538) >> >>> > >> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)* >> >>> > >> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) >> >>> > >> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) >> >>> > >> >>> > at >> >>> >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) >> >>> > >> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander < >> [email protected]> >> >>> > wrote: >> >>> > >> >>> > > I think I have more information on the issue. I did some >> debugging and >> >>> > > found something quite strange. >> >>> > > >> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 and >> 3 task >> >>> > > will be opened on other MACHINE2), >> >>> > > >> >>> > > - 3 tasks on Machine1 are frozen and the strange thing is that >> the >> >>> > > processes do not even enter the SETUP function of BSP class. I >> have >> >>> print >> >>> > > statements in the setup function of BSP class and it doesn't print >> >>> > > anything. I get empty files with zero size. >> >>> > > >> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:29 . >> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 .. >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000000_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000000_0.log >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000001_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000001_0.log >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000002_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> > > attempt_201506281624_0001_000002_0.log >> >>> > > >> >>> > > - On MACHINE2, the code enters the SETUP function of BSP class and >> >>> prints >> >>> > > stuff. See the size of files generated on output. How is it >> possible >> >>> that >> >>> > > in 3 tasks the code can enter BSP and in others it cannot ? >> >>> > > >> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:39 . >> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 .. >> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000003_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000003_0.log >> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000004_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 1368 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000004_0.log >> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000005_0.err >> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 >> >>> > > attempt_201506281639_0001_000005_0.log >> >>> > > >> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000001_0' has started. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000002_0' has started. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000000_0' has started. >> >>> > > >> >>> > > - Hama Groom log file on MACHINE2 shows >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000003_0' has started. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000004_0' has started. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > 'attempt_201506281639_0001_000005_0' has started. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > attempt_201506281639_0001_000004_0 is *done*. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > attempt_201506281639_0001_000003_0 is *done*. >> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> > > attempt_201506281639_0001_000005_0 is *done*. >> >>> > > >> >>> > > Any clue what might be going wrong ? >> >>> > > >> >>> > > Regards, >> >>> > > Behroz >> >>> > > >> >>> > > >> >>> > > >> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander < >> [email protected]> >> >>> > > wrote: >> >>> > > >> >>> > >> Here is the log file from that folder >> >>> > >> >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 for >> port >> >>> > >> 61001 >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: starting >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: BSPPeer >> >>> > >> address:b178b33b16cc port:61001 >> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on 61001: >> >>> > starting >> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK Sync >> Client >> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start >> connecting >> >>> to >> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001 >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001 >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on 61001: >> >>> > exiting >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server listener >> on >> >>> 61001 >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on 61001: >> >>> > exiting >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on 61001: >> >>> > exiting >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server Responder >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on 61001: >> >>> > exiting >> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on 61001: >> >>> > exiting >> >>> > >> >> >>> > >> >> >>> > >> And my console shows the following ouptut. Hama is frozen right >> now. >> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: >> >>> > >> job_201506262331_0003 >> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps >> number: 0 >> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps >> number: 2 >> >>> > >> >> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon < >> >>> [email protected]> >> >>> > >> wrote: >> >>> > >> >> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs folder. >> >>> > >>> >> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander < >> [email protected] >> >>> > >> >>> > >>> wrote: >> >>> > >>> > Yea. I also thought that. I ran the program through eclipse >> with 20 >> >>> > >>> tasks >> >>> > >>> > and it works fine. >> >>> > >>> > >> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon < >> >>> > [email protected] >> >>> > >>> > >> >>> > >>> > wrote: >> >>> > >>> > >> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. >> When I >> >>> > >>> run my >> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I >> increase >> >>> > the >> >>> > >>> tasks >> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not >> >>> understand >> >>> > >>> what >> >>> > >>> >> can >> >>> > >>> >> > go wrong. >> >>> > >>> >> >> >>> > >>> >> It looks like a program bug. Have you ran your program in >> local >> >>> > mode? >> >>> > >>> >> >> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander < >> >>> > [email protected]> >> >>> > >>> >> wrote: >> >>> > >>> >> > Hi, >> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 and 3 >> are >> >>> > >>> resolved >> >>> > >>> >> but >> >>> > >>> >> > issue number 2 is still giving me headaches. >> >>> > >>> >> > >> >>> > >>> >> > My problem: >> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them >> properly >> >>> > >>> >> configured >> >>> > >>> >> > (Apparently). From my master machine when I start Hadoop >> and >> >>> Hama, >> >>> > >>> I can >> >>> > >>> >> > see the processes started on other 2 machines. If I check >> the >> >>> > >>> maximum >> >>> > >>> >> tasks >> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on each >> >>> > machine). >> >>> > >>> >> > >> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs fine. >> When I >> >>> > >>> run my >> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I >> increase >> >>> > the >> >>> > >>> tasks >> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not >> >>> understand >> >>> > >>> what >> >>> > >>> >> can >> >>> > >>> >> > go wrong. >> >>> > >>> >> > >> >>> > >>> >> > I checked the logs files and things look fine. I just >> sometimes >> >>> > get >> >>> > >>> an >> >>> > >>> >> > exception that hama was not able to delete the sytem >> directory >> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml. >> >>> > >>> >> > >> >>> > >>> >> > Any help or clue would be great. >> >>> > >>> >> > >> >>> > >>> >> > Regards, >> >>> > >>> >> > Behroz Sikander >> >>> > >>> >> > >> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander < >> >>> > >>> [email protected]> >> >>> > >>> >> wrote: >> >>> > >>> >> > >> >>> > >>> >> >> Thank you :) >> >>> > >>> >> >> >> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon < >> >>> > >>> [email protected] >> >>> > >>> >> > >> >>> > >>> >> >> wrote: >> >>> > >>> >> >> >> >>> > >>> >> >>> Hi, >> >>> > >>> >> >>> >> >>> > >>> >> >>> You can get the maximum number of available tasks like >> >>> following >> >>> > >>> code: >> >>> > >>> >> >>> >> >>> > >>> >> >>> BSPJobClient jobClient = new BSPJobClient(conf); >> >>> > >>> >> >>> ClusterStatus cluster = >> jobClient.getClusterStatus(true); >> >>> > >>> >> >>> >> >>> > >>> >> >>> // Set to maximum >> >>> > >>> >> >>> bsp.setNumBspTask(cluster.getMaxTasks()); >> >>> > >>> >> >>> >> >>> > >>> >> >>> >> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander < >> >>> > >>> [email protected]> >> >>> > >>> >> >>> wrote: >> >>> > >>> >> >>> > Hi, >> >>> > >>> >> >>> > 1) Thank you for this. >> >>> > >>> >> >>> > 2) Here are the images. I will look into the log files >> of PI >> >>> > >>> example >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > *Result of JPS command on slave* >> >>> > >>> >> >>> > >> >>> > >>> >> >>> >> >>> > >>> >> >> >>> > >>> >> >>> > >> >>> >> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > *Result of JPS command on Master* >> >>> > >>> >> >>> > >> >>> > >>> >> >>> >> >>> > >>> >> >> >>> > >>> >> >>> > >> >>> >> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > 3) In my current case, I do not have any input >> submitted to >> >>> > the >> >>> > >>> job. >> >>> > >>> >> >>> During >> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am >> looking >> >>> > for >> >>> > >>> >> >>> something >> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask(). >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > Regards, >> >>> > >>> >> >>> > Behroz >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon < >> >>> > >>> >> [email protected] >> >>> > >>> >> >>> > >> >>> > >>> >> >>> > wrote: >> >>> > >>> >> >>> > >> >>> > >>> >> >>> >> Hello, >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a configuration >> >>> using >> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of course, >> the >> >>> > >>> fs.defaultFS >> >>> > >>> >> >>> >> property should be in hama-site.xml >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> <property> >> >>> > >>> >> >>> >> <name>fs.defaultFS</name> >> >>> > >>> >> >>> >> <value>hdfs://host1.mydomain.com:9000/</value> >> >>> > >>> >> >>> >> <description> >> >>> > >>> >> >>> >> The name of the default file system. Either the >> >>> literal >> >>> > >>> string >> >>> > >>> >> >>> >> "local" or a host:port for HDFS. >> >>> > >>> >> >>> >> </description> >> >>> > >>> >> >>> >> </property> >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks per >> node. >> >>> > It >> >>> > >>> looks >> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example >> and look >> >>> > at >> >>> > >>> the >> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach the >> images >> >>> to >> >>> > >>> >> mailing >> >>> > >>> >> >>> >> list so I can't see it. >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) method. >> If >> >>> input >> >>> > >>> is >> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically driven >> by >> >>> the >> >>> > >>> number >> >>> > >>> >> of >> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on >> HAMA-956. >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> Thanks! >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander < >> >>> > >>> >> [email protected]> >> >>> > >>> >> >>> >> wrote: >> >>> > >>> >> >>> >> > Hi, >> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup to a 2 >> >>> > machine >> >>> > >>> >> setup. >> >>> > >>> >> >>> I was >> >>> > >>> >> >>> >> > successfully able to run my job that uses the HDFS >> to get >> >>> > >>> data. I >> >>> > >>> >> >>> have 3 >> >>> > >>> >> >>> >> > trivial questions >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the IP >> address >> >>> > of >> >>> > >>> >> server >> >>> > >>> >> >>> >> running >> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically pick >> from >> >>> the >> >>> > >>> >> >>> configurations >> >>> > >>> >> >>> >> > but it does not. I am probably doing something >> wrong. >> >>> Right >> >>> > >>> now my >> >>> > >>> >> >>> code >> >>> > >>> >> >>> >> work >> >>> > >>> >> >>> >> > by using the following. >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new >> >>> > >>> URI("hdfs://server_ip:port/"), >> >>> > >>> >> >>> conf); >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it >> >>> automatically >> >>> > >>> starts >> >>> > >>> >> >>> hama in >> >>> > >>> >> >>> >> > the slave machine (all good). Both master and slave >> are >> >>> set >> >>> > >>> as >> >>> > >>> >> >>> >> groomservers. >> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job which >> >>> means >> >>> > >>> that I >> >>> > >>> >> can >> >>> > >>> >> >>> >> open >> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my jar >> with >> >>> 3 >> >>> > >>> bsp >> >>> > >>> >> tasks >> >>> > >>> >> >>> then >> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 tasks, >> Hama >> >>> > >>> freezes. >> >>> > >>> >> >>> Here is >> >>> > >>> >> >>> >> the >> >>> > >>> >> >>> >> > result of JPS command on slave. >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > Result of JPS command on Master >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > You can see that it is only opening tasks on slaves >> but >> >>> not >> >>> > >>> on >> >>> > >>> >> >>> master. >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum >> property in >> >>> > >>> >> >>> >> hama-default.xml >> >>> > >>> >> >>> >> > to 4 but still same result. >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > 3- I want my cluster to open as many BSPPeerChild >> >>> processes >> >>> > >>> as >> >>> > >>> >> >>> possible. >> >>> > >>> >> >>> >> Is >> >>> > >>> >> >>> >> > there any setting that can I do to achieve that ? >> Or hama >> >>> > >>> picks up >> >>> > >>> >> >>> the >> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ? >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > Regards, >> >>> > >>> >> >>> >> > >> >>> > >>> >> >>> >> > Behroz Sikander >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> -- >> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon >> >>> > >>> >> >>> >> >> >>> > >>> >> >>> >> >>> > >>> >> >>> >> >>> > >>> >> >>> >> >>> > >>> >> >>> -- >> >>> > >>> >> >>> Best Regards, Edward J. Yoon >> >>> > >>> >> >>> >> >>> > >>> >> >> >> >>> > >>> >> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> >> >>> > >>> >> -- >> >>> > >>> >> Best Regards, Edward J. Yoon >> >>> > >>> >> >> >>> > >>> >> >>> > >>> >> >>> > >>> >> >>> > >>> -- >> >>> > >>> Best Regards, Edward J. Yoon >> >>> > >>> >> >>> > >> >> >>> > >> >> >>> > > >> >>> > >> >>> > >> >>> > >> >>> >> >>> >> >>> >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> > >
