Can this be a [DISCUSS] rather than a [NOTICE]? The implications for
downstream projects (both Phoenix and many internal projects) are large,
and it seems like something that needs broader discussion before being set
in stone. The HBaseTestingUtility is used extensively in Phoenix, as well
as in many internal projects at my dayjob (some directly and some through
Phoenix's BaseTest wrapping of HBTU) -- it's quite useful.

The idea of a better-encapsulated, easier-to-use HBase testing utility is a
good one, and the TestingHBaseCluster interface looks like a definite
improvement. However, I notice at least one large gap right away: there
doesn't appear to be a way to inject a custom Configuration object into the
test cluster, which is a very common pattern. (Example: run a test suite
twice with a new minicluster each time, once with a flag off and then on.)
This seems like a simple fix.

More concerning is the underlying assumption of the change, that only the
HBase project, and perhaps Phoenix, will ever need to write a test of
server-side components. That's simply not the case, because HBase has many
integration points that allow downstream developed code to run in
server-side processes.

These include:
Coprocessor Observers and Endpoints
Replication Endpoints
MapReduce integration (which acts as a client from HBase's perspective but
runs within YARN services)

In addition, Phoenix supports user-defined functions (UDFs) which I believe
can run server-side within a coproc in certain query plans.

The change assumes that no one will ever need direct access to the testing
utility's internal ZooKeeper, MR, or DFS services, but this seems relevant
to failure scenario tests of both Replication Endpoints and MapReduce jobs.
The Admin API may be able to replace quite a lot of existing logic going
forward, and many existing tests already use it rather than the test
utility directly.  But there are literally thousands of downstream tests to
analyze across many different organizations and institutions to verify that
nothing important is being lost, and that will take time. Just leaving a
reference to the old, lower-level HBTU as a public property of the new
interface seems lower-risk to me. What are the gains from hiding the
existing HBTU?

Geoffrey



On Sun, Jul 18, 2021 at 9:44 PM 张铎(Duo Zhang) <palomino...@gmail.com> wrote:

> Please see the discussion in
> https://issues.apache.org/jira/browse/HBASE-13126
>
> And final work is done in
> https://issues.apache.org/jira/browse/HBASE-26081
> https://github.com/apache/hbase/pull/3478
>
> The original HBaseTestingUtility has been renamed to HBaseTestingUtil, and
> MiniHBaseCluster has been renamed to SingleProcessHBaseCluster. Now they
> are not expected to be used by end users any more. We marked it as
> IA.LimitedPrivate("Phoenix"), as maybe the Phoenix project may still need
> to test something internal to HBase.
>
> Anyway, we encourage every downstream projects(including Phoenix) to try to
> make use of the new TestingHBaseCluster introduced in
> https://issues.apache.org/jira/browse/HBASE-26080
>
> We can keep improving it if the current API set is not enough.
>
> ==== 简略的中文版通知,非直译 ====
>
> HBaseTestingUtility 已经在 3.0.0 中被标记为 Deprecated,请所有用户尽量尝试使用在 HBASE-26080
> 中引入的 TestingHBaseCluster。有任何需求请随时反馈,我们会持续优化。
>

Reply via email to