Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Hari Sreekumar Wed, 01 Jun 2011 22:50:50 -0700

sry.. it is
Changed
127.0.0.1 localhost localhost.localdomain
127.0.1.1 hsreekumar-lt.
<http://hsreekumar-lt.corp1.com/>Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
<http://hsreekumar-lt.corp1.com/>


to
127.0.0.1 localhost localhost.localdomain
hsreekumar-lt.Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
#127.0.1.1 hsreekumar-lt.
<http://hsreekumar-lt.corp1.com/>Clickablecorp.com<http://hsreekumar-lt.clickablecorp.com/>
hsreekumar-lt
<http://hsreekumar-lt.corp1.com/>

On Thu, Jun 2, 2011 at 11:18 AM, Hari Sreekumar <[email protected]>wrote:

> Hey,
>
> I had the same problem.. it seems it's because of the 127.0.1.1 entry in
> /etc/hosts (which is default in ubuntu I think, but I haven't seen it in
> CentOS systems).
>
> Changed
> 127.0.0.1 localhost localhost.localdomain
> 127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt
>
> to
> 127.0.0.1 localhost localhost.localdomain hsreekumar-lt.Clickablecorp.com
> hsreekumar-lt
> #127.0.1.1 hsreekumar-lt.corp1.com hsreekumar-lt
>
> See if it fixes your problem.. though I am not sure what will be the side
> effects of this/ whether some other programs will break?
>
> Thanks,
> Hari
>
> On Wed, Jun 1, 2011 at 11:29 PM, Stack <[email protected]> wrote:
>
>> On Tue, May 31, 2011 at 11:45 PM, Sean Bigdatafun
>> <[email protected]> wrote:
>> > Sure. Thanks, St.Ack. Here are the attached HBase logs, plus the
>> screenshot
>> > of the region server. The /etc/hosts should be Ok I think because my
>> Hadoop
>> > (pseudo distributed )cluster runs well and healthy.
>>
>> FYI, what works for hadoop may not work for hbase.
>>
>> > But I post it here in
>> > case I missed something :-0
>> >
>> > 127.0.0.1    localhost
>> > 127.0.1.1    sean-PowerEdge
>> >
>> > # The following lines are desirable for IPv6 capable hosts
>> > ::1     ip6-localhost ip6-loopback localhost6
>> > fe00::0 ip6-localnet
>> > ff00::0 ip6-mcastprefix
>> > ff02::1 ip6-allnodes
>> > ff02::2 ip6-allrouters
>> >
>>
>> Try turning off ipv6.  In the past its been fingered as problem-causing.
>>
>> Looking in your logs:
>>
>> + Make sure you fix this before you put any significant data into
>> hbase 'ulimit -n 1024'
>>
>> So, yeah, it looks like your /etc/hosts needs fixing.  When the
>> regionserver does its lookup its finding its hostname to be localhost:
>>
>> 2011-05-31 23:32:44,742 INFO
>> org.apache.hadoop.hbase.master.ServerManager: Registering
>> server=localhost,60020,1306909960650, regionCount=0, userLoad=false
>>
>> But then when the master tries to send it a region, its trying to send it
>> to
>>
>> 2011-05-31 23:32:47,671 INFO org.apache.hadoop.ipc.HbaseRPC: Server at
>> /127.0.0.1:60020 could not be reached after 1 tries, giving up.
>>
>> .... notice the 127.0.0.1 above.
>>
>> Fix this discrepency.
>>
>> St.Ack
>>
>>
>>
>> > Thanks,
>> > Sean
>> >
>> >
>> >
>> >
>> >
>> > On Mon, May 30, 2011 at 7:34 PM, Stack <[email protected]> wrote:
>> >>
>> >> Odd.  I dont' see the regionserver checking into the master (maybe
>> >> thats the way it is in pseudo-distributed and I just forgot).  Can you
>> >> paste more master log?   I don't see the regionserver coming in in the
>> >> snippet you've pasted so not sure how its registering itself (I see
>> >> the timeout when we try to assign it -ROOT-).
>> >>
>> >> Whats in your /etc/hosts?  I see lots of locahost and 127.0.0.1.
>> >> Maybe the two are not equated in your resolve setup?
>> >>
>> >> St.Ack
>> >>
>> >> On Sat, May 28, 2011 at 11:28 PM, Sean Bigdatafun
>> >> <[email protected]> wrote:
>> >> > I am trying for 0.90.1 (hbase-0.90.1-CDH3B4) under pseudo-dist mode,
>> and
>> >> > met
>> >> > the problem of HMaster crashing. Here is how I did.
>> >> >
>> >> > I. First I installed Hadoop pseudo cluster (hadoop-0.20.2-CDH3B4)
>> with
>> >> > the
>> >> > following conf edited.
>> >> >
>> >> > 1) core-site.xml ==>
>> >> > <property>
>> >> >  <name>fs.default.name</name>
>> >> >  <value>hdfs://localhost:9000</value>
>> >> > </property>
>> >> >
>> >> > 2) hdfs-site.xml ==>
>> >> >  <property>
>> >> >    <name>dfs.replication</name>
>> >> >    <value>1</value>
>> >> >  </property>
>> >> >
>> >> > (with above confs, start-all.sh was run, and the hadoop pseudo
>> cluster
>> >> > started to run happily)
>> >> >
>> >> >
>> >> > Secondly, I installed hbase-0.90.1-CDH3B4 with the following conf
>> >> > edited.
>> >> >
>> >> > hbase-site.xml ==>
>> >> >  <property>
>> >> >    <name>hbase.rootdir</name>
>> >> >    <value>hdfs://localhost:9000/hbase</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>hbase.cluster.distributed</name>
>> >> >    <value>true</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>hbase.zookeeper.quorum</name>
>> >> >    <value>localhost</value>
>> >> >  </property>
>> >> >
>> >> >  <property>
>> >> >    <name>dfs.replication</name>
>> >> >    <value>1</value>
>> >> >    <description>The replication count for HLog and HFile storage.
>> Should
>> >> > not be greater than HDFS datanode count.
>> >> >    </description>
>> >> >  </property>
>> >> >
>> >> > (with the above conf, I run the command of hbase-start.sh, and I
>> >> > realised
>> >> > that HMaster did not function well -- i can't access localhost:60010)
>> >> >
>> >> >
>> >> > II. Here is the HMaster error log:
>> >> >
>> >> > 2011-05-28 23:22:55,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> >> > viable
>> >> > location to assign region -ROOT-,,0.70236052
>> >> > 2011-05-28 23:23:35,291 INFO
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Regions in
>> transition
>> >> > timed out:  -ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> >> > 2011-05-28 23:23:35,291 INFO
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Region has been
>> >> > OFFLINE
>> >> > for too long, reassigning -ROOT-,,0.70236052 to a random server
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE;
>> >> > was=-ROOT-,,0.70236052 state=OFFLINE, ts=1306650175292
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Using pre-existing
>> >> > plan
>> >> > for region -ROOT-,,0.70236052; plan=hri=-ROOT-,,0.70236052, src=,
>> >> > dest=localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Assigning region
>> >> > -ROOT-,,0.70236052 to localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,291 DEBUG
>> >> > org.apache.hadoop.hbase.master.ServerManager:
>> >> > New connection to localhost,60020,1306648534687
>> >> > 2011-05-28 23:23:35,292 INFO org.apache.hadoop.ipc.HbaseRPC: Server
>> at /
>> >> > 127.0.0.1:60020 could not be reached after 1 tries, giving up.
>> >> > 2011-05-28 23:23:35,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Failed assignment
>> of
>> >> > -ROOT-,,0.70236052 to serverName=localhost,60020,1306648534687,
>> >> > load=(requests=0, regions=0, usedHeap=22, maxHeap=996), trying to
>> assign
>> >> > elsewhere instead; retry=0
>> >> > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
>> setting
>> >> > up
>> >> > proxy interface org.apache.hadoop.hbase.ipc.HRegionInterface to /
>> >> > 127.0.0.1:60020 after attempts=1
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:355)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.ServerManager.getServerConnection(ServerManager.java:606)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:541)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:901)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:730)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:710)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor.chore(AssignmentManager.java:1605)
>> >> >        at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>> >> > Caused by: java.net.ConnectException: Connection refused
>> >> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >> >        at
>> >> >
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>> >> >        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
>> >> >        at
>> >> >
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
>> >> >        at
>> >> >
>> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
>> >> >        at $Proxy6.getProtocolVersion(Unknown Source)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>> >> >        at
>> >> > org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>> >> >        ... 8 more
>> >> > 2011-05-28 23:23:35,292 WARN
>> >> > org.apache.hadoop.hbase.master.AssignmentManager: Unable to find a
>> >> > viable
>> >> > location to assign region -ROOT-,,0.70236052
>> >> >
>> >> >
>> >> >
>> >> > III. Here is the zk status from http://localhost:60010/zk.jsp
>> >> >
>> >> > HBase is rooted at /hbase
>> >> > Master address: sean-PowerEdge:60000
>> >> > Region server holding ROOT: null
>> >> > Region servers:
>> >> >  sean-PowerEdge:60020
>> >> > Quorum Server Statistics:
>> >> >  localhost:2181
>> >> >  Zookeeper version: 3.3.2-CDH3B4--1, built on 02/21/2011 20:16 GMT
>> >> >  Clients:
>> >> >   /127.0.0.1:42221[0](queued=0,recved=1,sent=0)
>> >> >   /127.0.0.1:44071[1](queued=0,recved=39,sent=44)
>> >> >   /127.0.0.1:44078[1](queued=0,recved=23,sent=24)
>> >> >   /127.0.0.1:44085[1](queued=0,recved=23,sent=23)
>> >> >   /127.0.0.1:44077[1](queued=0,recved=19,sent=19)
>> >> >
>> >> >  Latency min/avg/max: 0/6/164
>> >> >  Received: 105
>> >> >  Sent: 110
>> >> >  Outstanding: 0
>> >> >  Zxid: 0x148
>> >> >  Mode: standalone
>> >> >  Node count: 12
>> >> >
>> >> >
>> >> > What's the problem causing the above symptom?
>> >> >
>> >> > Thanks,
>> >> > --
>> >> > --Sean
>> >> >
>> >
>> >
>> >
>> > --
>> > --Sean
>> >
>> >
>>
>
>

Re: 0.90.1 HMaster malfunction in pseudo-distributed mode

Reply via email to