HI there All-

I have made a jira for the installer, based on this issue.

https://issues.apache.org/jira/browse/TRAFODION-1884

Thanks!

On Wed, Mar 9, 2016 at 8:41 AM, Liu, Ming (Ming) <[email protected]> wrote:

> Thanks Denies to share this. We saw this issue during an expansion of
> Trafodion from 4 nodes to 5 nodes, since newly add node is empty, META
> region should not be there, so it does no harm. But the problem is similar,
> the newly added RS cannot work until we update Trafodion into that RS node.
>
> There are two related JIRAs:  TRAFODION-1729 and TRAFODION-1730.
> we are working on them to solve the issue. Since Trafodion currently
> modify the HBase server's hbase-site.xml to add coprocessor, it affect
> *ALL* regions in the hbase, including META region. This is no need and not
> good. META region definitely no need to load Trafodion coprocessors. It is
> system region, Trafodion never need to access it directly, and once its
> open fail, the whole hbase system cannot work.
> So with that JIRA fully addressed, we can remove hbase-site.xml
> modification from Trafodion installer, and no need to restart HBase. And as
> a proper installation, Trafodion should be installed on all RS node, so
> coprocessor jar files should be copied to all RS nodes. If Trafodion is not
> installed on all RS node, there may still be issues, I assume Installer
> still need to consider this. A better approach is to save coprocessor jar
> file on HDFS, but that is just a theory, need to study further.
>
> Thanks,
> Ming
>
> -----邮件原件-----
> 发件人: D. Markt [mailto:[email protected]]
> 发送时间: 2016年3月9日 15:23
> 收件人: [email protected]
> 主题: A failed Trafodion installation can lead to the hbase:meta table
> staying in the FAILED_OPEN state.
>
> Hi,
>
>   I ran into this situation during a recent installation and thought it
> might be useful if others were to hit a similar situation in the future.
> This isn't the only way to recover from the situation but it is one option
> and was proven to work as expected.
>
> Regards,
> Dennis
>
>   During a recent Trafodion cluster install the daily build was broken in
> such a way that much of the installation proceeded, but the Trafodion files
> were not copied to each node.  This system was using CDH but I assume the
> following would happen for HDP as well.  After HBase was restarted as part
> of the installation I noticed the HBase icon was red.  I know this will
> likely not look the best in plain text, but the hbase:meta showed (in a red
> box):
>
> Region  State   RIT time (ms)
> 1588230740      hbase:meta,,1.1588230740 state=FAILED_OPEN, ts=Mon Mar 07
> 07:19:00 UTC 2016 (1289s ago),
> server=perf-sles-2.novalocal,60020,1457335120507        1289706
>
>   Looking at the Region Server's log file that was assigned the hbase:meta
> table there was this output:
>
> 2016-03-07 16:45:27,243 INFO
> org.apache.hadoop.hbase.regionserver.RSRpcServices: Open
> hbase:meta,,1.1588230740
> 2016-03-07 16:45:27,249 ERROR
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> open of region=hbase:meta,,1.1588230740, starting to roll back the global
> memstore size.
> java.lang.IllegalStateException: Could not instantiate a region instance.
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5486)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5793)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5765)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5721)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:5672)
>         at
>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(Op
> enRegionHandler.java:356)
>         at
>
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenR
> egionHandler.java:126)
>         at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
> 45)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
> 15)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class
> org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion
> not found
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5475)
>         ... 10 more
> Caused by: java.lang.ClassNotFoundException: Class
> org.apache.hadoop.hbase.regionserver.transactional.TransactionalRegion not
> found
>         at
>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
>         ... 11 more
> 2016-03-07 16:45:27,250 INFO
> org.apache.hadoop.hbase.coordination.ZkOpenRegionCoordination: Opening of
> region {ENCODED => 1588230740, NAME => 'hbase:meta,,1', STARTKEY => '',
> ENDKEY => ''} failed, transitioning from OPENING to FAILED_OPEN in ZK,
> expecting version 115
>
> After consulting with our installer expert, the issue was in fact that the
> needed files had not been copied to each node.  At that point one option
> would be to re-install the previous build or at least undo the changes made
> to point to the new build.  I did not try that and I'll leave that fallback
> option as a separate topic.
>
>   Instead, I took the path to see if I could get HBase to successfully
> come up without getting the new Trafodion installation properly completed.
> To do that there are two HBase properties that have to be reset:
>
> .       hbase.coprocessor.region.classes
> .       hbase.hregion.impl
>
> I actually deleted all of the properties listed under the hbase-site.xml
> that showed as non-default values by Cloudera Manager but I assume only the
> hbase.hregion.impl property had to be removed.  Remember to save the
> configuration and remove both sets of properties.  I forgot to do both of
> those and each time the restart hit the same basic error.
>
>   Once the configuration is properly updated the restart will be
> successful and after the hbase:meta table can be opened by the Region
> Server, all the other regions will also be able to be opened.  However,
> without Trafodion running I would assume none of the Trafodion tables
> should be acted upon.
> This exercise was to prove HBase could be restarted and running so that
> when the Trafodion installation was started it would have a viable
> Cloudera/HBase/HDFS environment to act on.
>
>


-- 
Thanks,

Amanda Moran

Reply via email to