Re: [openstack-dev] [trove] Adding support for HBase in Trove

Amrith Kumar Thu, 07 Jan 2016 09:03:54 -0800

Michael, Pete, please see comments interspersed below.

>From the things that you and Pete (Peter MacKinnon) are saying, I don't 
>understand why there is an objection to accepting the currently proposed 
>implementation which is clearly for single node deployments? Both Standalone 
>and Pseudo-Distributed are by definition, explicitly, necessarily, absolutely, 
>positively, definitely single node. I can't be more explicit about that. 
>That's all that is being proposed at this time. See more comments below.

Further, the current proposal also chooses an implementation strategy that 
makes it much easier to handle fully-distributed in a different way in the 
future. Consider this, Trove could equally well have dealt with HBase using a 
single datastore for all operating modes. In the current implementation, one 
would create a HBase standalone instance using a command that included:

        --datastore hbase-standalone 

And a pseudo-distributed instance by including

        --datastore hbase-pseudo-distributed.

Trove could equally well function by having a single datastore (hbase) but this 
would make hbase-fully-distributed harder to do in a different way in the 
future. I consciously eschewed that path, for this very specific reason; it 
would limit choice in the future.

Now, the implementation behind hbase-fully-distributed could be a custom Trove 
guest agent that could (if we decided to go that route) interact with Sahara. 
However, an alternative implementation of hbase-fully-distributed could 
orchestrate everything natively in Trove. There is much flexibility in the 
current proposal, and I submit to you that this is being lost in your reading 
of the specification and the current implementation as proposed.

-amrith

> -----Original Message-----
> From: michael mccune [mailto:[email protected]]
> Sent: Thursday, January 07, 2016 11:18 AM
> To: [email protected]
> Subject: Re: [openstack-dev] [trove] Adding support for HBase in Trove
> 
> thanks for bringing this up Amrith,
> 
> On 01/06/2016 07:31 PM, Fox, Kevin M wrote:
> > Having a simple plugin that doesn't depend on all of Sahara, for the case a
> user only wants a single node HBase does make sense. Its much easier for an
> Op to support that case if thats all their users ever want. But, thats 
> probably
> as far as that plugin ever should go. If you need scale up/down, etc, then
> your starting to reimplement large swaths of Sahara, and like the Cinder
> plugin for Nova, there could be a plugin that works identically to the stand
> alone one that converts the same api over to a Sahara compatible one. You
> then farm the work over to Sahara.
> 
> i think this sounds reasonable, as long as we are limiting it to standalone
> mode. if the deployments start to take on a larger scope i agree it would be
> useful to leverage sahara for provisioning and scaling.

Why only standalone? The current proposal explicitly covers only standalone and 
pseudo-distributed which are both valid strictly (add other adjectives here to 
taste) single node topologies and the currently submitted specification 
specifically carves out fully-distributed operation as requiring further 
thought and contemplation. 

> 
> as the hbase installation grows beyond the standalone mode there will
> necessarily need to be hdfs and zookeeper support to allow for a proper
> production deployment. this also brings up questions of allowing the end-
> users to supply configurations for the hdfs and zookeeper processes, not to
> mention enabling support for high availability hdfs.

These are things that Trove already addresses, albeit in a different way than 
Sahara. Users can, as it turns out, specify configuration groups which can then 
be used to launch new instances, and can also be associated with groups of 
instances.

> 
> i can envision a scenario where trove could use sahara to provision and
> manage the clusters for hbase/hdfs/zk. this does pose some questions as
> we'd have to determine how the trove guest agent would be installed on the
> nodes, if there will need to be custom configurations used by trove, and if
> sahara will need to provide a plugin for bare (meaning no data processing
> framework) hbase/hdfs/zk clusters. but, i think these could be solved by
> either using custom images or a plugin in sahara that would install the
> necessary agents/configurations.

Let us not underestimate the effort for an end user to now deploy one more 
project. To a user already using Trove for a myriad of databases, requiring 
Sahara for supporting HBase Standalone sounds (to put it bluntly) a burden. 
Requiring it for Fully-Distributed mode may have some development benefits but 
it remains to be seen whether those benefits are really worth the contortions 
that Trove would have to go through. And in the Trove architecture, there is 
flexibility as described above to have multiple possible implementations for 
fully-distributed, one that would interface with Sahara and another that didn't 
have to. 

Let's be clear that for a person who wants a fully configurable Hadoop based 
deployment with more control, Sahara may be the best option. And to one who 
wants even more control, maybe doing it themselves with Nova and customer 
Glance Images is the way to go. Similarly, a Database-as-a-Service comes with 
the understood boundaries imposed by the "as-a-Service" deployment. Not all 
configuration options may be tweakable with a DBaaS, that's well known an 
understood, not just in Trove but also, for example, in Amazon RDS, RedShift or 
any of the other database-as-a-service implementations. The same would be true 
in fully-distributed as well, in the proposal that is currently under review. I 
submit to you that this nuance is being lost in your reading.

> 
> of course, this does add a layer of complexity as operators who wish this type
> of deployment will need to have both trove and sahara, but imo this would
> be easier than replicating the work that sahara has done with these
> technologies.

I think this is where our opinions differ, as the 'replication' isn't all that 
much given the fact that Trove already provides capabilities to cluster 
databases. But, with that said, nothing in the current specification locks us 
into a specific deployment strategy in the future, nor does it preclude 
multiple implementations of fully-distributed, one which could leverage Sahara 
and one which didn't.

> 
> regards,
> mike
> 
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> [email protected]?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [trove] Adding support for HBase in Trove

Reply via email to