Re: [openstack-dev] [trove] Adding support for HBase in Trove

Fox, Kevin M Wed, 06 Jan 2016 16:35:44 -0800

just my 2 cents... I think you can do both. The great thing about Trove is that 
its providing an abstract api so users just deal with provisioning db's, 
scaling db's, etc.


Having a simple plugin that doesn't depend on all of Sahara, for the case a 
user only wants a single node HBase does make sense. Its much easier for an Op 
to support that case if thats all their users ever want. But, thats probably as 
far as that plugin ever should go. If you need scale up/down, etc, then your 
starting to reimplement large swaths of Sahara, and like the Cinder plugin for 
Nova, there could be a plugin that works identically to the stand alone one 
that converts the same api over to a Sahara compatible one. You then farm the 
work over to Sahara.

Then, its up to the ops to choose features and the overhead of supporting 
Sahara, or not, and you don't have to support implementing a whole cluster 
management system for Trove that already exists.

Thanks,
Kevin
________________________________________
From: Amrith Kumar [[email protected]]
Sent: Wednesday, January 06, 2016 3:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [trove] Adding support for HBase in Trove

TL;DR Should Trove treat HBase as a special database because one use case is as 
part of a large multi-node Hadoop cluster, and therefore either not support it 
at all, or necessarily use Sahara to provision and manage a cluster? There are 
pro's and con's and it is argued that the con's outweigh the pro's and a 
blueprint/specification, and an implementation for basic Trove support for 
HBase independent of Sahara has been submitted for review. See [3], [4] and 
[5]. The benefits include the ability to provide the commonly used (in 
development) standalone mode operation, and eliminate the dependency on an 
additional OpenStack project thereby simplifying deployment. Comments and 
feedback are welcome on the implementation, as well as the specification and 
the approach.

The long version follows below.

The OpenStack Trove mission is to provide scalable and reliable Cloud Database 
as a Service provisioning functionality for both relational and non-relational 
database engines, and to continue to improve its fully-featured and extensible 
open source framework [1].

An important aspect of the Trove value proposition is that it provides a common 
control plane, a common API, and a common set of abstractions are used to 
manage a number of different relational, and non-relational database 
technologies. The common API contains primitives to create database instances 
and clusters of a number of databases including MySQL (MariaDB, Percona too), 
PostgreSQL, MongoDB, Cassandra, CouchDB, Couchbase, IBM DB2, Vertica, and Redis.

Cluster support is also available for a number of databases including MongoDB, 
Percona XtraDB cluster and Vertica, with more to come imminently.

In effect, Trove is a framework for provisioning and managing the lifecycle of 
a number of different database technologies; it provides only the control 
plane. Users can do things like provisioning instances and clusters, resizing 
them, taking backups and creating new instances and clusters from previous 
backups, establish and manage complex topologies including replication and 
clustering, and resize instances and clusters.

Trove does interfere with the data plane, the applications interact directly 
with the database using the native API's for each database technology.

Users of OpenStack look to Trove to provide a consistent set of interfaces for 
managing their database resources in a variety of use-cases ranging from 
small-scale prototyping, development, testing, and all the way through 
production. Apache HBase is an open-source, distributed, versioned, 
non-relational database [2] and users of HBase face many of the challenges that 
Trove addresses for other databases. Therefore adding support for HBase in 
Trove seems not only reasonable, but also consistent with the goal of the 
(Trove) project.

A spec proposing the addition of HBase support for Trove was submitted [3] and 
a first phase of code implementing this HBase support has also been submitted 
for review [4], [5]. The process that has been followed is consistent with 
other Trove datastores; add basic support and then progressively augment it in 
subsequent releases. The code submitted allows you to provision an HBase 
instance (which will launch on a Nova instance), build an HBase guest image 
using the elements provided, resize the storage and the instance, take a 
"backup" of the instance and store that backup on Swift, and at a later time 
you can launch a new instance from that "backup".

One can operate HBase with or without HDFS; in fact HBase documents the 
standalone mode of operation [6] where HBase is completely operational on a 
single node and data is stored on the local file system. This standalone mode 
provides a very useful construct for development and testing, and at a later 
stage an application can be seamlessly migrated to work with an HBase 
installation of some other "run mode" like "Fully Distributed".

Code submitted in [4] and [5] as described in [3] implement support for two 
modes of operation namely "Standalone" and "Pseudo-Distributed". At a later 
stage, support will be added for "Fully Distributed" consistent with the way in 
which clustering support was delivered for other datastores like MySQL and 
MongoDB.

Some have opined that Trove should not directly get into the business of 
orchestrating Hadoop Clusters or anything to do with HBase, arguing that this 
is something that Sahara already does, and should remain the sole domain of 
Sahara.

I believe that since HBase is perfectly operable without HDFS, it seems 
inappropriate to tightly couple HBase with Sahara whose primary motivation is 
to provision 'data-intensive application clusters' [7]. Furthermore, as we have 
found with other datastores, it is my belief that having a common 
implementation model across multiple deployment topologies is a benefit for 
Trove. Other considerations such as similarity to other databases supported by 
Trove motivated a choice as illustrated in the specification. An architecture 
where Trove can function entirely independent of Sahara is also a benefit for 
end users, and a model where Trove has dependencies only on other core 
OpenStack services considerably simplifies the deployment.

Comments and feedback are welcome on the code, as well as the specification and 
the approach.

References:

[1] https://wiki.openstack.org/wiki/Trove#Mission_Statement
[2] https://hbase.apache.org/
[3] https://review.openstack.org/#/c/256079
[4] https://review.openstack.org/#/c/262048/
[5] https://review.openstack.org/#/c/262815/
[6] http://hbase.apache.org/0.94/book/standalone_dist.html
[7] https://wiki.openstack.org/wiki/Sahara

Thanks,

-amrith

--
Amrith Kumar, CTO                   | [email protected]
Tesora, Inc                         | @amrithkumar
125 CambridgePark Drive, Suite 400  | http://www.tesora.com
Cambridge, MA. 02140                |







__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [trove] Adding support for HBase in Trove

Reply via email to