Does it work if you have < 30 mappers running but if you go > 30, some start to fail? Perhaps it this config: hbase.zookeeper.property.maxClientCnxns. Its default is 30 max connections per host. St.Ack
On Sat, Jan 22, 2011 at 10:42 AM, Wojciech Langiewicz <[email protected]> wrote: > I have changed it to 120000, and there's no change. > What is worth noticing is that some mappers finish task without any error, > and during the same job, the same server can fail some tasks, and finishes > other. > > > On 22.01.2011 18:56, Wojciech Langiewicz wrote: >> >> Hi, >> sessionTimeout=60000, >> I didn't change it, so it's default. >> What value do you recommend? >> >> -- >> Wojciech Langiewicz >> >> On 22.01.2011 16:49, Ted Yu wrote: >>> >>> What's the value of 'zookeeper.session.timeout' ? >>> >>> Maybe you can tune it higher. >>> >>> On Sat, Jan 22, 2011 at 3:13 AM, Wojciech Langiewicz >>> <[email protected]>wrote: >>> >>>> Hi, >>>> I have re-run test with 2 or more mappers and looked into logs more >>>> closely: The mapreduce job has finished correctly, some map attempts are >>>> killed, but eventually all of mappers finish. And there's no rule which >>>> mappers on which servers fail (re-run tests multiple times). >>>> >>>> So I suspect that this is not classpath problem, maybe there are >>>> settings >>>> that limit number of connections, because before MasterNotRunning I >>>> get this >>>> in logs: >>>> >>>> 2011-01-22 12:06:43,411 INFO org.apache.zookeeper.ZooKeeper: Initiating >>>> client connection, connectString=hd-master:2181 sessionTimeout=60000 >>>> watcher=org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper@3b5e234c >>>> 2011-01-22 12:06:43,483 INFO org.apache.zookeeper.ClientCnxn: Opening >>>> socket connection to server hd-master/10.6.75.212:2181 >>>> 2011-01-22 12:06:43,484 INFO org.apache.zookeeper.ClientCnxn: Socket >>>> connection established to hd-master/10.6.75.212:2181, initiating session >>>> 2011-01-22 12:06:43,488 INFO org.apache.zookeeper.ClientCnxn: Unable to >>>> read additional data from server sessionid 0x0, likely server has closed >>>> socket, closing socket connection and attempting reconnect >>>> 2011-01-22 12:06:43,605 INFO >>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers: >>>> getMaster >>>> attempt 0 of 10 failed; retrying after sleep of 1000 >>>> java.io.IOException: >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /hbase/master >>>> at >>>> >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:481) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readMasterAddressOrThrow(ZooKeeperWrapper.java:377) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:381) >>>> >>>> >>>> at >>>> org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:78) >>>> at >>>> >>>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:745) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:764) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1097) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:446) >>>> >>>> at >>>> >>>> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:399) >>>> >>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) >>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315) >>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:217) >>>> at java.security.AccessController.doPrivileged(Native Method) >>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>> at >>>> >>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) >>>> >>>> at org.apache.hadoop.mapred.Child.main(Child.java:211) >>>> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for /hbase/master >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>>> at >>>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>> at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:921) >>>> at >>>> >>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.readAddressOrThrow(ZooKeeperWrapper.java:477) >>>> >>>> ... 16 more >>>> >>>> >>>> On 22.01.2011 01:36, Stack wrote: >>>> >>>>> Then its odd that PE fails. Can you figure difference between the two >>>>> environments? Perhaps your MR jobs are fat jars that include the conf >>>>> and all dependencies whereas PE is dump and expects the dependencies >>>>> and conf on CLASSPATH? >>>>> >>>>> St.Ack >>>>> >>>>> On Fri, Jan 21, 2011 at 11:56 AM, Wojciech Langiewicz >>>>> <[email protected]> wrote: >>>>> >>>>>> Hi, >>>>>> I have other mapreduce tasks that are running on this cluster and >>>>>> using >>>>>> HBase and they are working correctly. All my servers have the same >>>>>> configuration. >>>>>> >>>>>> -- >>>>>> Wojciech Langiewicz >>>>>> >>>>>> On 21.01.2011 19:20, Stack wrote: >>>>>> >>>>>>> >>>>>>> When clients are> 1, then PE tries to run a mapreduce job to host the >>>>>>> loading clients. >>>>>>> >>>>>>> Is it possible that the client out in MR task is trying to connect to >>>>>>> wrong location? Perhaps the HBase conf dir is not available to the >>>>>>> running task? Have you seen >>>>>>> >>>>>>> >>>>>>> >>>>>>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description >>>>>>> >>>>>>> ? >>>>>>> Perhaps this will help? >>>>>>> >>>>>>> St.Ack >>>>>>> >>>>>>> On Fri, Jan 21, 2011 at 8:12 AM, Wojciech Langiewicz >>>>>>> <[email protected]> wrote: >>>>>>> >>>>>>>> >>>>>>>> Hello >>>>>>>> I have problem with running HBase performance tests from >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation package. I'm using >>>>>>>> version >>>>>>>> from CDH3, >>>>>>>> Tests are ok when argument nclients is 1, but in case of greater >>>>>>>> number, >>>>>>>> after mappers reaching 100% I get this exception (I didn't test all >>>>>>>> tests, >>>>>>>> but all of tested by me failed, 'scan' and 'randomRead' fail for >>>>>>>> sure): >>>>>>>> >>>>>>>> 11/01/21 17:05:19 INFO mapred.JobClient: Task Id : >>>>>>>> attempt_201101211442_0005_m_000000_0, Status : FAILED >>>>>>>> org.apache.hadoop.hbase.MasterNotRunningException >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:416) >>>>>>>> >>>>>>>> at >>>>>>>> org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:78) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.testSetup(PerformanceEvaluation.java:745) >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation$Test.test(PerformanceEvaluation.java:764) >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation.runOneClient(PerformanceEvaluation.java:1097) >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:446) >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.hbase.PerformanceEvaluation$EvaluationMapTask.map(PerformanceEvaluation.java:399) >>>>>>>> >>>>>>>> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >>>>>>>> at >>>>>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:639) >>>>>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:315) >>>>>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:217) >>>>>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>>>>> at >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063) >>>>>>>> >>>>>>>> at org.apache.hadoop.mapred.Child.main(Child.java:211) >>>>>>>> >>>>>>>> Do you have any ideas how to solve this? >>>>>>>> >>>>>>>> -- >>>>>>>> Wojciech Langiewicz >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>> >>> >> > >
