Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Fox, Kevin M
While I applaud raising the issue on the mailing list to get more folks to 
weigh in, I think part of the problem maybe the lack of a [sahara] tag on the 
subject. The thread is still tagged to be a Trove centric conversation. All 
respondents please consider adding [sahara] to the subject.

Thanks,
Kevin

From: Amrith Kumar [amr...@tesora.com]
Sent: Thursday, January 07, 2016 1:59 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove

> -Original Message-
> From: michael mccune [mailto:m...@redhat.com]
> Sent: Thursday, January 07, 2016 3:12 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
>
> On 01/07/2016 11:59 AM, Amrith Kumar wrote:
> >  From the things that you and Pete (Peter MacKinnon) are saying, I don't
> understand why there is an objection to accepting the currently proposed
> implementation which is clearly for single node deployments? Both
> Standalone and Pseudo-Distributed are by definition, explicitly, necessarily,
> absolutely, positively, definitely single node. I can't be more explicit about
> that. That's all that is being proposed at this time. See more comments
> below.
>
> i didn't think i explicitly objected to the spec, if it seems that way then i
> apologize. after reading the spec and the comments, it seemed that there
> was some question about engagement with the sahara team. i wanted to
> help bring some light to the issues surrounding deploying hbase and thought
> it would be good to participate in the discussion.

You are correct Michael. There was a suggestion that we should engage with the 
Sahara team (in the Trove team meeting yesterday) and that is what prompted 
this email thread. So I appreciate your participation as one who is a member of 
the Sahara team.

>
> > Further, the current proposal also chooses an implementation strategy that
> makes it much easier to handle fully-distributed in a different way in the
> future. Consider this, Trove could equally well have dealt with HBase using a
> single datastore for all operating modes. In the current implementation, one
> would create a HBase standalone instance using a command that included:
> >
> > --datastore hbase-standalone
> >
> > And a pseudo-distributed instance by including
> >
> > --datastore hbase-pseudo-distributed.
> >
>
> and this delineation sounds reasonable to me
>
> > Trove could equally well function by having a single datastore (hbase) but
> this would make hbase-fully-distributed harder to do in a different way in the
> future. I consciously eschewed that path, for this very specific reason; it
> would limit choice in the future.
>
> agreed
>
> > Now, the implementation behind hbase-fully-distributed could be a
> custom Trove guest agent that could (if we decided to go that route) interact
> with Sahara. However, an alternative implementation of hbase-fully-
> distributed could orchestrate everything natively in Trove. There is much
> flexibility in the current proposal, and I submit to you that this is being 
> lost in
> your reading of the specification and the current implementation as
> proposed.
>
> i don't think your characterization of my reading comprehension is fair.
> as i stated earlier, i wanted to participate in the discussion surrounding
> deploying a technology that sahara currently deploys. fwiw, i agree with what
> you are saying here, but i also think it is axiomatic, the trove team can 
> choose
> whichever path it would like for implementation.
>
> >> i think this sounds reasonable, as long as we are limiting it to
> >> standalone mode. if the deployments start to take on a larger scope i
> >> agree it would be useful to leverage sahara for provisioning and scaling.
> >
> > Why only standalone? The current proposal explicitly covers only
> standalone and pseudo-distributed which are both valid strictly (add other
> adjectives here to taste) single node topologies and the currently submitted
> specification specifically carves out fully-distributed operation as requiring
> further thought and contemplation.
>
> i think starting with standalone mode (and not pseudo-distributed) is a more
> conservative approach to this. my reason for suggesting limiting this to
> standalone is that even in pseudo-distributed mode the need for managing
> hdfs and zookeeper are present, i wanted to highlight some of of the overlap
> and the issues that will start to creep in surrounding this deployment.
>

The current code (submitted for review) provides both standalone and 
pseudo-distributed

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Peter MacKinnon

On 1/6/16 8:20 PM, Amrith Kumar wrote:

Kevin Fox writes:


as far as that plugin ever should go. If you need scale up/down, etc, then
your starting to reimplement large swaths of Sahara, and like the Cinder
plugin for Nova, there could be a plugin that works identically to the stand
alone one that converts the same api over to a Sahara compatible one. You
then farm the work over to Sahara.

I believe that this is not the case. The entire framework for integration with 
Cinder, Nova etc., already exists in Trove.

Recall that trove already deals with about a dozen databases, several of which 
have support for clusters.

The code to add HBase support to trove doesn't have to implement all of this 
framework that already exists.

All that is being implemented is (literally) a Trove 'plugin' for HBase and a 
mechanism to build a HBase guest image.

-amrith


Right, I think that's the concern. A plugin for integration with a 
standalone/pseudo-distributed Hbase deployment has arguably a reasonable 
scale to be managed by a Trove guestagent. That agent would also fire up 
the client RPC services necessary for an end user to interact with Hbase 
remotely. But even the Hbase project views standalone mode as a 
devel/test capability only. The fully distributed model gets orders of 
magnitude more complex. Is the agent plugin just wiring into an existing 
multi-node Hbase deployment somewhere? Is it spawning/growing/shrinking 
HDFS endpoints itself?


The "we already have cluster support in Trove" argument doesn't really 
track in a production Hadoop space, IMHO. That's why Sahara was developed.


My $0.02,
\Pete




-Original Message-
From: Fox, Kevin M [mailto:kevin@pnnl.gov]
Sent: Wednesday, January 06, 2016 7:32 PM
To: OpenStack Development Mailing List (not for usage questions)
<openstack-dev@lists.openstack.org>
Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove

just my 2 cents... I think you can do both. The great thing about Trove is that
its providing an abstract api so users just deal with provisioning db's, scaling
db's, etc.

Having a simple plugin that doesn't depend on all of Sahara, for the case a
user only wants a single node HBase does make sense. Its much easier for an
Op to support that case if thats all their users ever want. But, thats probably
as far as that plugin ever should go. If you need scale up/down, etc, then
your starting to reimplement large swaths of Sahara, and like the Cinder
plugin for Nova, there could be a plugin that works identically to the stand
alone one that converts the same api over to a Sahara compatible one. You
then farm the work over to Sahara.

Then, its up to the ops to choose features and the overhead of supporting
Sahara, or not, and you don't have to support implementing a whole cluster
management system for Trove that already exists.

Thanks,
Kevin

From: Amrith Kumar [amr...@tesora.com]
Sent: Wednesday, January 06, 2016 3:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [trove] Adding support for HBase in Trove

TL;DR Should Trove treat HBase as a special database because one use case is
as part of a large multi-node Hadoop cluster, and therefore either not
support it at all, or necessarily use Sahara to provision and manage a cluster?
There are pro's and con's and it is argued that the con's outweigh the pro's
and a blueprint/specification, and an implementation for basic Trove support
for HBase independent of Sahara has been submitted for review. See [3], [4]
and [5]. The benefits include the ability to provide the commonly used (in
development) standalone mode operation, and eliminate the dependency
on an additional OpenStack project thereby simplifying deployment.
Comments and feedback are welcome on the implementation, as well as the
specification and the approach.

The long version follows below.

The OpenStack Trove mission is to provide scalable and reliable Cloud
Database as a Service provisioning functionality for both relational and non-
relational database engines, and to continue to improve its fully-featured
and extensible open source framework [1].

An important aspect of the Trove value proposition is that it provides a
common control plane, a common API, and a common set of abstractions are
used to manage a number of different relational, and non-relational
database technologies. The common API contains primitives to create
database instances and clusters of a number of databases including MySQL
(MariaDB, Percona too), PostgreSQL, MongoDB, Cassandra, CouchDB,
Couchbase, IBM DB2, Vertica, and Redis.

Cluster support is also available for a number of databases including
MongoDB, Percona XtraDB cluster and Vertica, with more to come
imminently.

In effect, Trove is a framework for provisioning and managing the lifecycle of
a number of different database technologies; it provides only the control
plane. Users ca

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Amrith Kumar
Michael, Pete, please see comments interspersed below.

>From the things that you and Pete (Peter MacKinnon) are saying, I don't 
>understand why there is an objection to accepting the currently proposed 
>implementation which is clearly for single node deployments? Both Standalone 
>and Pseudo-Distributed are by definition, explicitly, necessarily, absolutely, 
>positively, definitely single node. I can't be more explicit about that. 
>That's all that is being proposed at this time. See more comments below.

Further, the current proposal also chooses an implementation strategy that 
makes it much easier to handle fully-distributed in a different way in the 
future. Consider this, Trove could equally well have dealt with HBase using a 
single datastore for all operating modes. In the current implementation, one 
would create a HBase standalone instance using a command that included:

--datastore hbase-standalone 

And a pseudo-distributed instance by including

--datastore hbase-pseudo-distributed.

Trove could equally well function by having a single datastore (hbase) but this 
would make hbase-fully-distributed harder to do in a different way in the 
future. I consciously eschewed that path, for this very specific reason; it 
would limit choice in the future.

Now, the implementation behind hbase-fully-distributed could be a custom Trove 
guest agent that could (if we decided to go that route) interact with Sahara. 
However, an alternative implementation of hbase-fully-distributed could 
orchestrate everything natively in Trove. There is much flexibility in the 
current proposal, and I submit to you that this is being lost in your reading 
of the specification and the current implementation as proposed.

-amrith

> -Original Message-
> From: michael mccune [mailto:m...@redhat.com]
> Sent: Thursday, January 07, 2016 11:18 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> thanks for bringing this up Amrith,
> 
> On 01/06/2016 07:31 PM, Fox, Kevin M wrote:
> > Having a simple plugin that doesn't depend on all of Sahara, for the case a
> user only wants a single node HBase does make sense. Its much easier for an
> Op to support that case if thats all their users ever want. But, thats 
> probably
> as far as that plugin ever should go. If you need scale up/down, etc, then
> your starting to reimplement large swaths of Sahara, and like the Cinder
> plugin for Nova, there could be a plugin that works identically to the stand
> alone one that converts the same api over to a Sahara compatible one. You
> then farm the work over to Sahara.
> 
> i think this sounds reasonable, as long as we are limiting it to standalone
> mode. if the deployments start to take on a larger scope i agree it would be
> useful to leverage sahara for provisioning and scaling.

Why only standalone? The current proposal explicitly covers only standalone and 
pseudo-distributed which are both valid strictly (add other adjectives here to 
taste) single node topologies and the currently submitted specification 
specifically carves out fully-distributed operation as requiring further 
thought and contemplation. 

> 
> as the hbase installation grows beyond the standalone mode there will
> necessarily need to be hdfs and zookeeper support to allow for a proper
> production deployment. this also brings up questions of allowing the end-
> users to supply configurations for the hdfs and zookeeper processes, not to
> mention enabling support for high availability hdfs.

These are things that Trove already addresses, albeit in a different way than 
Sahara. Users can, as it turns out, specify configuration groups which can then 
be used to launch new instances, and can also be associated with groups of 
instances.
 
> 
> i can envision a scenario where trove could use sahara to provision and
> manage the clusters for hbase/hdfs/zk. this does pose some questions as
> we'd have to determine how the trove guest agent would be installed on the
> nodes, if there will need to be custom configurations used by trove, and if
> sahara will need to provide a plugin for bare (meaning no data processing
> framework) hbase/hdfs/zk clusters. but, i think these could be solved by
> either using custom images or a plugin in sahara that would install the
> necessary agents/configurations.

Let us not underestimate the effort for an end user to now deploy one more 
project. To a user already using Trove for a myriad of databases, requiring 
Sahara for supporting HBase Standalone sounds (to put it bluntly) a burden. 
Requiring it for Fully-Distributed mode may have some development benefits but 
it remains to be seen whether those benefits are really worth the contortions 
that Trove would have to go through. And in the Trove architecture,

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Greg Hill
I don't work on Sahara, but I do work on a similar closed-source project.
FWIW, I agree with Kevin here.  standalone and pseudo-distributed HBase
are only intended for Hbase developers to test code without having to spin
up a cluster; it's not meant for operators or users to actually use as a
database. Hbase is designed to run on HDFS and relies on Zookeeper for
coordination as well. Unless trove is going to re-implement half of
Sahara, having it there makes no sense, and will ultimately only lead to
confusion among users who see Hbase and think they're getting something
useful when they are in fact not.

My $0.02

Greg

On 1/7/16, 12:19 PM, "Fox, Kevin M" <kevin@pnnl.gov> wrote:

>Oh. And I'd suggest having this conversation with the Sahara team. They
>may have some interesting insight into the issue.
>
>Thanks,
>Kevin
>
>From: Fox, Kevin M
>Sent: Thursday, January 07, 2016 9:44 AM
>To: OpenStack Development Mailing List (not for usage questions)
>Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
>
>the whole hadoopish stack is unusual though. I suspect users often want
>to slice and dice all the components that run together on the cluster,
>where HBase is just one component of the shared cluster. I can totally
>envision users walking up to my door saying, I provisioned this HBase
>system with Trove, and now I want to run such and such job on the
>cluster... Building on top of Sahara enables that kind of thing. If trove
>wants to do the clustering all itself, then that's either out of the
>picture, or you end up having to add lots of sahara like functionality in
>the end to get its functionality back up to where users will want it.
>
>Thanks,
>Kevin
>
>From: michael mccune [m...@redhat.com]
>Sent: Thursday, January 07, 2016 8:17 AM
>To: openstack-dev@lists.openstack.org
>Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
>
>thanks for bringing this up Amrith,
>
>On 01/06/2016 07:31 PM, Fox, Kevin M wrote:
>> Having a simple plugin that doesn't depend on all of Sahara, for the
>>case a user only wants a single node HBase does make sense. Its much
>>easier for an Op to support that case if thats all their users ever
>>want. But, thats probably as far as that plugin ever should go. If you
>>need scale up/down, etc, then your starting to reimplement large swaths
>>of Sahara, and like the Cinder plugin for Nova, there could be a plugin
>>that works identically to the stand alone one that converts the same api
>>over to a Sahara compatible one. You then farm the work over to Sahara.
>
>i think this sounds reasonable, as long as we are limiting it to
>standalone mode. if the deployments start to take on a larger scope i
>agree it would be useful to leverage sahara for provisioning and scaling.
>
>as the hbase installation grows beyond the standalone mode there will
>necessarily need to be hdfs and zookeeper support to allow for a proper
>production deployment. this also brings up questions of allowing the
>end-users to supply configurations for the hdfs and zookeeper processes,
>not to mention enabling support for high availability hdfs.
>
>i can envision a scenario where trove could use sahara to provision and
>manage the clusters for hbase/hdfs/zk. this does pose some questions as
>we'd have to determine how the trove guest agent would be installed on
>the nodes, if there will need to be custom configurations used by trove,
>and if sahara will need to provide a plugin for bare (meaning no data
>processing framework) hbase/hdfs/zk clusters. but, i think these could
>be solved by either using custom images or a plugin in sahara that would
>install the necessary agents/configurations.
>
>of course, this does add a layer of complexity as operators who wish
>this type of deployment will need to have both trove and sahara, but imo
>this would be easier than replicating the work that sahara has done with
>these technologies.
>
>regards,
>mike
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>__
>OpenStack Development Mailing List (not for usage questions)
>Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>__
>OpenStack Development Mailing List 

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Fox, Kevin M
Oh. And I'd suggest having this conversation with the Sahara team. They may 
have some interesting insight into the issue.

Thanks,
Kevin

From: Fox, Kevin M
Sent: Thursday, January 07, 2016 9:44 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove

the whole hadoopish stack is unusual though. I suspect users often want to 
slice and dice all the components that run together on the cluster, where HBase 
is just one component of the shared cluster. I can totally envision users 
walking up to my door saying, I provisioned this HBase system with Trove, and 
now I want to run such and such job on the cluster... Building on top of Sahara 
enables that kind of thing. If trove wants to do the clustering all itself, 
then that's either out of the picture, or you end up having to add lots of 
sahara like functionality in the end to get its functionality back up to where 
users will want it.

Thanks,
Kevin

From: michael mccune [m...@redhat.com]
Sent: Thursday, January 07, 2016 8:17 AM
To: openstack-dev@lists.openstack.org
Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove

thanks for bringing this up Amrith,

On 01/06/2016 07:31 PM, Fox, Kevin M wrote:
> Having a simple plugin that doesn't depend on all of Sahara, for the case a 
> user only wants a single node HBase does make sense. Its much easier for an 
> Op to support that case if thats all their users ever want. But, thats 
> probably as far as that plugin ever should go. If you need scale up/down, 
> etc, then your starting to reimplement large swaths of Sahara, and like the 
> Cinder plugin for Nova, there could be a plugin that works identically to the 
> stand alone one that converts the same api over to a Sahara compatible one. 
> You then farm the work over to Sahara.

i think this sounds reasonable, as long as we are limiting it to
standalone mode. if the deployments start to take on a larger scope i
agree it would be useful to leverage sahara for provisioning and scaling.

as the hbase installation grows beyond the standalone mode there will
necessarily need to be hdfs and zookeeper support to allow for a proper
production deployment. this also brings up questions of allowing the
end-users to supply configurations for the hdfs and zookeeper processes,
not to mention enabling support for high availability hdfs.

i can envision a scenario where trove could use sahara to provision and
manage the clusters for hbase/hdfs/zk. this does pose some questions as
we'd have to determine how the trove guest agent would be installed on
the nodes, if there will need to be custom configurations used by trove,
and if sahara will need to provide a plugin for bare (meaning no data
processing framework) hbase/hdfs/zk clusters. but, i think these could
be solved by either using custom images or a plugin in sahara that would
install the necessary agents/configurations.

of course, this does add a layer of complexity as operators who wish
this type of deployment will need to have both trove and sahara, but imo
this would be easier than replicating the work that sahara has done with
these technologies.

regards,
mike

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-07 Thread Amrith Kumar
> -Original Message-
> From: michael mccune [mailto:m...@redhat.com]
> Sent: Thursday, January 07, 2016 3:12 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> On 01/07/2016 11:59 AM, Amrith Kumar wrote:
> >  From the things that you and Pete (Peter MacKinnon) are saying, I don't
> understand why there is an objection to accepting the currently proposed
> implementation which is clearly for single node deployments? Both
> Standalone and Pseudo-Distributed are by definition, explicitly, necessarily,
> absolutely, positively, definitely single node. I can't be more explicit about
> that. That's all that is being proposed at this time. See more comments
> below.
> 
> i didn't think i explicitly objected to the spec, if it seems that way then i
> apologize. after reading the spec and the comments, it seemed that there
> was some question about engagement with the sahara team. i wanted to
> help bring some light to the issues surrounding deploying hbase and thought
> it would be good to participate in the discussion.

You are correct Michael. There was a suggestion that we should engage with the 
Sahara team (in the Trove team meeting yesterday) and that is what prompted 
this email thread. So I appreciate your participation as one who is a member of 
the Sahara team.

> 
> > Further, the current proposal also chooses an implementation strategy that
> makes it much easier to handle fully-distributed in a different way in the
> future. Consider this, Trove could equally well have dealt with HBase using a
> single datastore for all operating modes. In the current implementation, one
> would create a HBase standalone instance using a command that included:
> >
> > --datastore hbase-standalone
> >
> > And a pseudo-distributed instance by including
> >
> > --datastore hbase-pseudo-distributed.
> >
> 
> and this delineation sounds reasonable to me
> 
> > Trove could equally well function by having a single datastore (hbase) but
> this would make hbase-fully-distributed harder to do in a different way in the
> future. I consciously eschewed that path, for this very specific reason; it
> would limit choice in the future.
> 
> agreed
> 
> > Now, the implementation behind hbase-fully-distributed could be a
> custom Trove guest agent that could (if we decided to go that route) interact
> with Sahara. However, an alternative implementation of hbase-fully-
> distributed could orchestrate everything natively in Trove. There is much
> flexibility in the current proposal, and I submit to you that this is being 
> lost in
> your reading of the specification and the current implementation as
> proposed.
> 
> i don't think your characterization of my reading comprehension is fair.
> as i stated earlier, i wanted to participate in the discussion surrounding
> deploying a technology that sahara currently deploys. fwiw, i agree with what
> you are saying here, but i also think it is axiomatic, the trove team can 
> choose
> whichever path it would like for implementation.
> 
> >> i think this sounds reasonable, as long as we are limiting it to
> >> standalone mode. if the deployments start to take on a larger scope i
> >> agree it would be useful to leverage sahara for provisioning and scaling.
> >
> > Why only standalone? The current proposal explicitly covers only
> standalone and pseudo-distributed which are both valid strictly (add other
> adjectives here to taste) single node topologies and the currently submitted
> specification specifically carves out fully-distributed operation as requiring
> further thought and contemplation.
> 
> i think starting with standalone mode (and not pseudo-distributed) is a more
> conservative approach to this. my reason for suggesting limiting this to
> standalone is that even in pseudo-distributed mode the need for managing
> hdfs and zookeeper are present, i wanted to highlight some of of the overlap
> and the issues that will start to creep in surrounding this deployment.
> 

The current code (submitted for review) provides both standalone and 
pseudo-distributed support. You will observe that the standalone and 
pseudo-distributed implementations do install zookeeper. As you are no doubt 
aware, one of the recommended ways to force the HBase Master server to always 
bind to a well-known port in favor of the ephemeral ports is to stipulate  
hbase.cluster.distributed is True (see 
https://review.openstack.org/#/c/262048/5/scripts/files/elements/ubuntu-hbase-standalone/install.d/20-install-hbase
 line 121). So, as it turns out, the code to deploy hdfs and zookeeper is 
already part of the proposed implementation.


> >> as

[openstack-dev] [trove] Adding support for HBase in Trove

2016-01-06 Thread Amrith Kumar
TL;DR Should Trove treat HBase as a special database because one use case is as 
part of a large multi-node Hadoop cluster, and therefore either not support it 
at all, or necessarily use Sahara to provision and manage a cluster? There are 
pro's and con's and it is argued that the con's outweigh the pro's and a 
blueprint/specification, and an implementation for basic Trove support for 
HBase independent of Sahara has been submitted for review. See [3], [4] and 
[5]. The benefits include the ability to provide the commonly used (in 
development) standalone mode operation, and eliminate the dependency on an 
additional OpenStack project thereby simplifying deployment. Comments and 
feedback are welcome on the implementation, as well as the specification and 
the approach.

The long version follows below.

The OpenStack Trove mission is to provide scalable and reliable Cloud Database 
as a Service provisioning functionality for both relational and non-relational 
database engines, and to continue to improve its fully-featured and extensible 
open source framework [1].

An important aspect of the Trove value proposition is that it provides a common 
control plane, a common API, and a common set of abstractions are used to 
manage a number of different relational, and non-relational database 
technologies. The common API contains primitives to create database instances 
and clusters of a number of databases including MySQL (MariaDB, Percona too), 
PostgreSQL, MongoDB, Cassandra, CouchDB, Couchbase, IBM DB2, Vertica, and 
Redis. 

Cluster support is also available for a number of databases including MongoDB, 
Percona XtraDB cluster and Vertica, with more to come imminently. 

In effect, Trove is a framework for provisioning and managing the lifecycle of 
a number of different database technologies; it provides only the control 
plane. Users can do things like provisioning instances and clusters, resizing 
them, taking backups and creating new instances and clusters from previous 
backups, establish and manage complex topologies including replication and 
clustering, and resize instances and clusters. 

Trove does interfere with the data plane, the applications interact directly 
with the database using the native API's for each database technology.

Users of OpenStack look to Trove to provide a consistent set of interfaces for 
managing their database resources in a variety of use-cases ranging from 
small-scale prototyping, development, testing, and all the way through 
production. Apache HBase is an open-source, distributed, versioned, 
non-relational database [2] and users of HBase face many of the challenges that 
Trove addresses for other databases. Therefore adding support for HBase in 
Trove seems not only reasonable, but also consistent with the goal of the 
(Trove) project.

A spec proposing the addition of HBase support for Trove was submitted [3] and 
a first phase of code implementing this HBase support has also been submitted 
for review [4], [5]. The process that has been followed is consistent with 
other Trove datastores; add basic support and then progressively augment it in 
subsequent releases. The code submitted allows you to provision an HBase 
instance (which will launch on a Nova instance), build an HBase guest image 
using the elements provided, resize the storage and the instance, take a 
"backup" of the instance and store that backup on Swift, and at a later time 
you can launch a new instance from that "backup".

One can operate HBase with or without HDFS; in fact HBase documents the 
standalone mode of operation [6] where HBase is completely operational on a 
single node and data is stored on the local file system. This standalone mode 
provides a very useful construct for development and testing, and at a later 
stage an application can be seamlessly migrated to work with an HBase 
installation of some other "run mode" like "Fully Distributed".

Code submitted in [4] and [5] as described in [3] implement support for two 
modes of operation namely "Standalone" and "Pseudo-Distributed". At a later 
stage, support will be added for "Fully Distributed" consistent with the way in 
which clustering support was delivered for other datastores like MySQL and 
MongoDB.

Some have opined that Trove should not directly get into the business of 
orchestrating Hadoop Clusters or anything to do with HBase, arguing that this 
is something that Sahara already does, and should remain the sole domain of 
Sahara.

I believe that since HBase is perfectly operable without HDFS, it seems 
inappropriate to tightly couple HBase with Sahara whose primary motivation is 
to provision 'data-intensive application clusters' [7]. Furthermore, as we have 
found with other datastores, it is my belief that having a common 
implementation model across multiple deployment topologies is a benefit for 
Trove. Other considerations such as similarity to other databases supported by 
Trove motivated a choice as 

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-06 Thread Fox, Kevin M
just my 2 cents... I think you can do both. The great thing about Trove is that 
its providing an abstract api so users just deal with provisioning db's, 
scaling db's, etc.

Having a simple plugin that doesn't depend on all of Sahara, for the case a 
user only wants a single node HBase does make sense. Its much easier for an Op 
to support that case if thats all their users ever want. But, thats probably as 
far as that plugin ever should go. If you need scale up/down, etc, then your 
starting to reimplement large swaths of Sahara, and like the Cinder plugin for 
Nova, there could be a plugin that works identically to the stand alone one 
that converts the same api over to a Sahara compatible one. You then farm the 
work over to Sahara.

Then, its up to the ops to choose features and the overhead of supporting 
Sahara, or not, and you don't have to support implementing a whole cluster 
management system for Trove that already exists.

Thanks,
Kevin

From: Amrith Kumar [amr...@tesora.com]
Sent: Wednesday, January 06, 2016 3:15 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev] [trove] Adding support for HBase in Trove

TL;DR Should Trove treat HBase as a special database because one use case is as 
part of a large multi-node Hadoop cluster, and therefore either not support it 
at all, or necessarily use Sahara to provision and manage a cluster? There are 
pro's and con's and it is argued that the con's outweigh the pro's and a 
blueprint/specification, and an implementation for basic Trove support for 
HBase independent of Sahara has been submitted for review. See [3], [4] and 
[5]. The benefits include the ability to provide the commonly used (in 
development) standalone mode operation, and eliminate the dependency on an 
additional OpenStack project thereby simplifying deployment. Comments and 
feedback are welcome on the implementation, as well as the specification and 
the approach.

The long version follows below.

The OpenStack Trove mission is to provide scalable and reliable Cloud Database 
as a Service provisioning functionality for both relational and non-relational 
database engines, and to continue to improve its fully-featured and extensible 
open source framework [1].

An important aspect of the Trove value proposition is that it provides a common 
control plane, a common API, and a common set of abstractions are used to 
manage a number of different relational, and non-relational database 
technologies. The common API contains primitives to create database instances 
and clusters of a number of databases including MySQL (MariaDB, Percona too), 
PostgreSQL, MongoDB, Cassandra, CouchDB, Couchbase, IBM DB2, Vertica, and Redis.

Cluster support is also available for a number of databases including MongoDB, 
Percona XtraDB cluster and Vertica, with more to come imminently.

In effect, Trove is a framework for provisioning and managing the lifecycle of 
a number of different database technologies; it provides only the control 
plane. Users can do things like provisioning instances and clusters, resizing 
them, taking backups and creating new instances and clusters from previous 
backups, establish and manage complex topologies including replication and 
clustering, and resize instances and clusters.

Trove does interfere with the data plane, the applications interact directly 
with the database using the native API's for each database technology.

Users of OpenStack look to Trove to provide a consistent set of interfaces for 
managing their database resources in a variety of use-cases ranging from 
small-scale prototyping, development, testing, and all the way through 
production. Apache HBase is an open-source, distributed, versioned, 
non-relational database [2] and users of HBase face many of the challenges that 
Trove addresses for other databases. Therefore adding support for HBase in 
Trove seems not only reasonable, but also consistent with the goal of the 
(Trove) project.

A spec proposing the addition of HBase support for Trove was submitted [3] and 
a first phase of code implementing this HBase support has also been submitted 
for review [4], [5]. The process that has been followed is consistent with 
other Trove datastores; add basic support and then progressively augment it in 
subsequent releases. The code submitted allows you to provision an HBase 
instance (which will launch on a Nova instance), build an HBase guest image 
using the elements provided, resize the storage and the instance, take a 
"backup" of the instance and store that backup on Swift, and at a later time 
you can launch a new instance from that "backup".

One can operate HBase with or without HDFS; in fact HBase documents the 
standalone mode of operation [6] where HBase is completely operational on a 
single node and data is stored on the local file system. This standalone mode 
provides a very useful const

Re: [openstack-dev] [trove] Adding support for HBase in Trove

2016-01-06 Thread Amrith Kumar
Kevin Fox writes:

> as far as that plugin ever should go. If you need scale up/down, etc, then
> your starting to reimplement large swaths of Sahara, and like the Cinder
> plugin for Nova, there could be a plugin that works identically to the stand
> alone one that converts the same api over to a Sahara compatible one. You
> then farm the work over to Sahara.

I believe that this is not the case. The entire framework for integration with 
Cinder, Nova etc., already exists in Trove. 

Recall that trove already deals with about a dozen databases, several of which 
have support for clusters. 

The code to add HBase support to trove doesn't have to implement all of this 
framework that already exists. 

All that is being implemented is (literally) a Trove 'plugin' for HBase and a 
mechanism to build a HBase guest image.

-amrith

> -Original Message-
> From: Fox, Kevin M [mailto:kevin@pnnl.gov]
> Sent: Wednesday, January 06, 2016 7:32 PM
> To: OpenStack Development Mailing List (not for usage questions)
> <openstack-dev@lists.openstack.org>
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> just my 2 cents... I think you can do both. The great thing about Trove is 
> that
> its providing an abstract api so users just deal with provisioning db's, 
> scaling
> db's, etc.
> 
> Having a simple plugin that doesn't depend on all of Sahara, for the case a
> user only wants a single node HBase does make sense. Its much easier for an
> Op to support that case if thats all their users ever want. But, thats 
> probably
> as far as that plugin ever should go. If you need scale up/down, etc, then
> your starting to reimplement large swaths of Sahara, and like the Cinder
> plugin for Nova, there could be a plugin that works identically to the stand
> alone one that converts the same api over to a Sahara compatible one. You
> then farm the work over to Sahara.
> 
> Then, its up to the ops to choose features and the overhead of supporting
> Sahara, or not, and you don't have to support implementing a whole cluster
> management system for Trove that already exists.
> 
> Thanks,
> Kevin
> 
> From: Amrith Kumar [amr...@tesora.com]
> Sent: Wednesday, January 06, 2016 3:15 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> TL;DR Should Trove treat HBase as a special database because one use case is
> as part of a large multi-node Hadoop cluster, and therefore either not
> support it at all, or necessarily use Sahara to provision and manage a 
> cluster?
> There are pro's and con's and it is argued that the con's outweigh the pro's
> and a blueprint/specification, and an implementation for basic Trove support
> for HBase independent of Sahara has been submitted for review. See [3], [4]
> and [5]. The benefits include the ability to provide the commonly used (in
> development) standalone mode operation, and eliminate the dependency
> on an additional OpenStack project thereby simplifying deployment.
> Comments and feedback are welcome on the implementation, as well as the
> specification and the approach.
> 
> The long version follows below.
> 
> The OpenStack Trove mission is to provide scalable and reliable Cloud
> Database as a Service provisioning functionality for both relational and non-
> relational database engines, and to continue to improve its fully-featured
> and extensible open source framework [1].
> 
> An important aspect of the Trove value proposition is that it provides a
> common control plane, a common API, and a common set of abstractions are
> used to manage a number of different relational, and non-relational
> database technologies. The common API contains primitives to create
> database instances and clusters of a number of databases including MySQL
> (MariaDB, Percona too), PostgreSQL, MongoDB, Cassandra, CouchDB,
> Couchbase, IBM DB2, Vertica, and Redis.
> 
> Cluster support is also available for a number of databases including
> MongoDB, Percona XtraDB cluster and Vertica, with more to come
> imminently.
> 
> In effect, Trove is a framework for provisioning and managing the lifecycle of
> a number of different database technologies; it provides only the control
> plane. Users can do things like provisioning instances and clusters, resizing
> them, taking backups and creating new instances and clusters from previous
> backups, establish and manage complex topologies including replication and
> clustering, and resize instances and clusters.
> 
> Trove does interfere with the data plane, the applications interact directly
> with the database using the native API's for each database technol