Hi,

 

  Yes Ming that’s an excellent point.  Though I didn’t mention it, my first 
attempt at recovery centered on trying to verify the hbase:meta table was okay 
using the HBase OfflineMetaRepair utility.  Even after that tool said the table 
was fine, I still tried another restart because the obvious symptom leads you 
to believe it is the file that is causing the problem.  It is very unusual to 
get into this situation but when you do, you have a tendency to overreact 
because HBase was working fine and after the restart no regions can be 
accessed.  So it’s important to examine all of the log files looking for the 
root cause of the problem.  The Master log file gave one view, but the Region 
Server’s log file made it very obvious what had to be resolved.

 

Thanks,

Dennis

 

From: Amanda Moran [mailto:amanda.mo...@esgyn.com] 
Sent: Wednesday, March 09, 2016 12:17 PM
To: user@trafodion.incubator.apache.org
Subject: Re: 答复: A failed Trafodion installation can lead to the hbase:meta 
table staying in the FAILED_OPEN state.

 

HI there All-

 

I have made a jira for the installer, based on this issue. 

 

https://issues.apache.org/jira/browse/TRAFODION-1884

 

Thanks! 

 

On Wed, Mar 9, 2016 at 8:41 AM, Liu, Ming (Ming) <ming....@esgyn.cn 
<mailto:ming....@esgyn.cn> > wrote:

Thanks Denies to share this. We saw this issue during an expansion of Trafodion 
from 4 nodes to 5 nodes, since newly add node is empty, META region should not 
be there, so it does no harm. But the problem is similar, the newly added RS 
cannot work until we update Trafodion into that RS node.

There are two related JIRAs:  TRAFODION-1729 and TRAFODION-1730.
we are working on them to solve the issue. Since Trafodion currently modify the 
HBase server's hbase-site.xml to add coprocessor, it affect *ALL* regions in 
the hbase, including META region. This is no need and not good. META region 
definitely no need to load Trafodion coprocessors. It is system region, 
Trafodion never need to access it directly, and once its open fail, the whole 
hbase system cannot work.
So with that JIRA fully addressed, we can remove hbase-site.xml modification 
from Trafodion installer, and no need to restart HBase. And as a proper 
installation, Trafodion should be installed on all RS node, so coprocessor jar 
files should be copied to all RS nodes. If Trafodion is not installed on all RS 
node, there may still be issues, I assume Installer still need to consider 
this. A better approach is to save coprocessor jar file on HDFS, but that is 
just a theory, need to study further.

Thanks,
Ming

-----邮件原件-----
发件人: D. Markt [mailto:dmarkt7...@gmail.com <mailto:dmarkt7...@gmail.com> ]
发送时间: 2016年3月9日 15:23
收件人: user@trafodion.incubator.apache.org 
<mailto:user@trafodion.incubator.apache.org> 
主题: A failed Trafodion installation can lead to the hbase:meta table staying in 
the FAILED_OPEN state.


Hi,

  I ran into this situation during a recent installation and thought it might 
be useful if others were to hit a similar situation in the future.
This isn't the only way to recover from the situation but it is one option and 
was proven to work as expected.

Regards,
Dennis

  During a recent Trafodion cluster install the daily build was broken in such 
a way that much of the installation proceeded, but the Trafodion files were not 
copied to each node.  This system was using CDH but I assume the following 
would happen for HDP as well.  After HBase was restarted as part of the 
installation I noticed the HBase icon was red.  I know this will likely not 
look the best in plain text, but the hbase:meta showed (in a red
box):

Region  State   RIT time (ms)
1588230740      hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07
07:19:00 UTC 2016 (1289s ago),
server=perf-sles-2.novalocal,60020,1457335120507        1289706

  Looking at the Region Server's log file that was assigned the hbase:meta 
table there was this output:

2016-03-07 16:45:27,243 INFO
org.apache.hadoop.hbase.regionserver.RSRpcServices: Open
hbase:meta,,1.1588230740
2016-03-07 16:45:27,249 ERROR
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of 
region=hbase:meta,,1.1588230740, starting to roll back the global memstore size.
java.lang.IllegalStateException: Could not instantiate a region instance.
        at
org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721)
        at
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672)
        at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op
enRegionHandler.java:356)
        at
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
egionHandler.java:126)
        at
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion
not found
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
        at
org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5475)
        ... 10 more
Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not found
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
        ... 11 more
2016-03-07 16:45:27,250 INFO
org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of 
region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '', ENDKEY 
=> ''} failed, transitioning from OPENING to FAILED_OPEN in ZK, expecting 
version 115

After consulting with our installer expert, the issue was in fact that the 
needed files had not been copied to each node.  At that point one option would 
be to re-install the previous build or at least undo the changes made to point 
to the new build.  I did not try that and I'll leave that fallback option as a 
separate topic.

  Instead, I took the path to see if I could get HBase to successfully come up 
without getting the new Trafodion installation properly completed.  To do that 
there are two HBase properties that have to be reset:

.       hbase.coprocessor.region.classes
.       hbase.hregion.impl

I actually deleted all of the properties listed under the hbase-site.xml that 
showed as non-default values by Cloudera Manager but I assume only the 
hbase.hregion.impl property had to be removed.  Remember to save the 
configuration and remove both sets of properties.  I forgot to do both of those 
and each time the restart hit the same basic error.

  Once the configuration is properly updated the restart will be successful and 
after the hbase:meta table can be opened by the Region Server, all the other 
regions will also be able to be opened.  However, without Trafodion running I 
would assume none of the Trafodion tables should be acted upon.
This exercise was to prove HBase could be restarted and running so that when 
the Trafodion installation was started it would have a viable 
Cloudera/HBase/HDFS environment to act on.





 

-- 

Thanks, 

 

Amanda Moran

Reply via email to