Re: Advice on making a Riak middleware easy to configure

Marc Savy Thu, 19 Mar 2015 05:38:46 -0700

Hi Dmitri,

Many thanks for your impressive response, it's very helpful indeed!

In order to answer the setup questions specifically, we'd need to
know more about what the project is intending to do. Will users be
typically installing their own Riak clusters and then setting up
apiman to help manage APIs?


This is one of the areas I'm not sure about. Ideally I'd like the user
to bring their own Riak cluster and manage it themselves, but wasn't
sure if there was a large use-base that might *expect* us to set
everything up for them

In the ideal world: users come along and say "I want to use Riak for the
distributed components rather than Infinispan or ElasticSearch, and
here's an end-point(s) to connect to".

If I can make that assumption for a dev/test environment too, that's great!

Or is this more of a multi-tenant kind of situation, where apiman
would be spinning up nodes or clusters for users? To put it another
way, how does apiman handle ElasticSearch?


We would leave the spinning up of additional nodes to some other element
of the system (e.g. kubernetes). It's outside of our domain, really.

Ah, ok, if I understand your question correctly -- if you're not
spinning up VMs or setting up the nodes yourself via ssh (using
something like our Ansible playbook), then you can expect an already
set up cluster. (FWIW, the various configuration management tools
such as Ansible that install Riak clusters do provide idempotency). I
can't really picture a situation where users would set up nodes but
not join them and leave that up to apiman.


You understood correctly. Thanks!

Essentially, I can use a simple RiakClient with a single address (or
list of addresses) and I don't need to worry about a more complex
RiakCluster set-up routine as below; correct?

addresses = <list of addrs>
... nodes = RiakNode.Builder.buildNodes(builder, addresses);
... cluster = new RiakCluster.Builder(nodes).build();
cluster.start();
RiakClient client = new RiakClient(cluster)

A load balancer is crucial (we recommend either a hardware based one,
or something like HAProxy or Nginx). I know some users connect to a
Riak cluster using the round-robin load balancing built into a Riak
client, but that should be a last resort measure (if, for example,
you're not allowed to spin up another machine for HAProxy). A
dedicated load balancer (with a least-connection load balancing
algorithm) is significantly faster. (Not to mention, provides logging
and a rich ecosystem of tools and dashboards).

Given the introduction of Riak Data Types on buckets, whom should I
expect to set up the data types?


There isn't currently an API to create bucket types remotely. So
unless apiman has daemons that will be running on the individual Riak
nodes and can make commandline calls, you will have to leave bucket
type creation to the users.


This is all excellent information, and exactly what I wanted to know.

For example, Strongly Consistent buckets are useful for atomic
operations like user password management, security group management
and so on. So, you could require that users would create a bucket
type named 'sc' and enable Strong Consistency on it. (Any buckets
under that bucket type would then also be strongly consistent, and
usable by apiman or by the users' client code).

Similarly, given that metering is a goal, you would also need
bucket types for the various server-side Data Types. That is,
require users to create a Maps bucket type named 'maps', a Counters
type named 'counters', and a Sets type named 'sets', for example.

Other things to keep on your radar, as far as bucket types:

* You can attach a Solr Search index to a bucket type. However,
given that you can only associate a single search index with a
bucket type, this isn't as generic/reusable as Data Types. I could
see setting up a Search index for something like API logging,
though.

* You probably want provisions for Riak Authentication &
Authorization (http://docs.basho.com/riak/2.0.4/ops/running/authz/
). (Specifically, for supporting user-created users & passwords,
since at the moment we don't have a remote API to manage these).


I could provide a script to set everything up as an example, and also
document the process in the community. That being said, it would be nice
to be able to do this kind of preparatory and meta-data set-up using a
simple schema (e.g. json-schema). For instance, things like setting up
buckets, data types, name-spaces, etc.

This is invaluable information; thank you very much. We'll definitely
consider Riak implementations for those elements, too.

In terms of options, do you mean like best-practice/recommended riak
 config files that you'd point your users to?


I was thinking more of what config they would expect to be available in
Apiman's config to facilitate using their Riak cluster with our
components. I think you've answered this point already.

Regards,
Marc

On 17/03/2015 13:14, Dmitri Zagidulin wrote:

Hi Marc,

This sounds like a very cool project! I'd be very interested in hearing
more about this, and answering any data modeling or setup questions.

In order to answer the setup questions specifically, we'd need to know
more about what the project is intending to do. Will users be typically
installing their own Riak clusters and then setting up apiman to help
manage APIs? Or is this more of a multi-tenant kind of situation, where
apiman would be spinning up nodes or clusters for users? To put it
another way, how does apiman handle ElasticSearch?

Couple of thoughts, from your questions.

 > To be more concrete, should I, for example, expect the user to have
 > already set up and joined together their Riak cluster a priori, with
 > everything behind a load-balancer: just give me a single URI to connect
 > to). [Or attempt to join them into a cluster].

Ah, ok, if I understand your question correctly -- if you're not
spinning up VMs or setting up the nodes yourself via ssh (using
something like our Ansible playbook), then you can expect an already set
up cluster. (FWIW, the various configuration management tools such as
Ansible that install Riak clusters do provide idempotency). I can't
really picture a situation where users would set up nodes but not join
them and leave that up to apiman.

 > As far as I can tell, there is no node discovery/sharing
 > implementation

If you know the IP of one node, you can definitely discover the other
nodes via an HTTP call to /stats
http://docs.basho.com/riak/latest/ops/running/nodes/inspecting/ (via
'ring_members'). But, unless apiman provides some sort of monitoring or
keepalive-checking capability, I don't think there's any reason to do that.

A load balancer is crucial (we recommend either a hardware based one, or
something like HAProxy or Nginx). I know some users connect to a Riak
cluster using the round-robin load balancing built into a Riak client,
but that should be a last resort measure (if, for example, you're not
allowed to spin up another machine for HAProxy). A dedicated load
balancer (with a least-connection load balancing algorithm) is
significantly faster. (Not to mention, provides logging and a rich
ecosystem of tools and dashboards).

 > Given the introduction of Riak
 > Data Types on buckets, whom should I expect to set up the data types?

There isn't currently an API to create bucket types remotely. So unless
apiman has daemons that will be running on the individual Riak nodes and
can make commandline calls, you will have to leave bucket type creation
to the users.

That said, I could easily see you requiring a certain set of bucket
types of your users.

For example, Strongly Consistent buckets are useful for atomic
operations like user password management, security group management and
so on. So, you could require that users would create a bucket type named
'sc' and enable Strong Consistency on it. (Any buckets under that bucket
type would then also be strongly consistent, and usable by apiman or by
the users' client code).

Similarly, given that metering is a goal, you would also need bucket
types for the various server-side Data Types. That is, require users to
create a Maps bucket type named 'maps', a Counters type named
'counters', and a Sets type named 'sets', for example.

Other things to keep on your radar, as far as bucket types:

* You can attach a Solr Search index to a bucket type. However, given
that you can only associate a single search index with a bucket type,
this isn't as generic/reusable as Data Types. I could see setting up a
Search index for something like API logging, though.

* You probably want provisions for Riak Authentication & Authorization
(http://docs.basho.com/riak/2.0.4/ops/running/authz/ ). (Specifically,
for supporting user-created users & passwords, since at the moment we
don't have a remote API to manage these).

 > I'm very interested to know to present a convenient set of options that
 > will allow a typical development and deployment environment to be
supported.

In terms of options, do you mean like best-practice/recommended riak
config files that you'd point your users to?

Let me know if you have further questions.

Dmitri

On Sat, Mar 7, 2015 at 10:35 AM, Marc Savy <[email protected]
<mailto:[email protected]>> wrote:

    Hi All,

    I'm involved in a FOSS API management project (apiman), and I've been
    thinking about providing a Riak implementation of its gateway components
    in the community (where we already have ElasticSearch and Infinispan).
    These components provide the distributed storage for tasks like
    rate-limiting counters, IP white-listing, black-listing, etc and are
    applied by a horizontally scalable, async gateway (to vastly
    oversimplify!).

    I'm in need of advice principally in regards to configuration and
    set-up. Namely, what assumptions can I safely make about a Riak user's
    set-up, and which settings I should expose in the component's
    configuration. Note that many gateways can exist, and hence any set-up
    ideally needs to already in advance, or be idempotent in case multiple
    nodes attempt to do it at once (or otherwise for it to be
    lockable/exclusionary).

    To be more concrete, should I, for example, expect the user to have
    already set up and joined together their Riak cluster a priori, with
    everything behind a load-balancer: just give me a single URI to connect
    to). Or, should I expect a list of FQDNs/IPs and attempt to join them
    together into a cluster on the user's behalf - or will there be
    idempotence issues if I do that multiple times?

    As far as I can tell, there is no node discovery/sharing
    implementation[1], so I take it there's no way, for instance, to hit a
    single node (which has already been joined with other nodes), and
    thereby automatically gain knowledge of all cluster members?

    A couple of other configuration issues: Given the introduction of Riak
    Data Types on buckets, whom should I expect to set up the data types[2]?
    Should I create them automatically if they don't exist? Same for the
    bucket itself.

    I'm very interested to know to present a convenient set of options that
    will allow a typical development and deployment environment to be
    supported.

    Regards,
    Marc

    [0] With the usual consistency limitations
    [1] https://github.com/basho/riak/__issues/356
    <https://github.com/basho/riak/issues/356>
    [2]

http://docs.basho.com/riak/__latest/dev/using/data-types/#__Setting-Up-Buckets-to-Use-__Riak-Data-Types

<http://docs.basho.com/riak/latest/dev/using/data-types/#Setting-Up-Buckets-to-Use-Riak-Data-Types>

    _________________________________________________
    riak-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
    <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Advice on making a Riak middleware easy to configure

Reply via email to