Hi Alex,
I get what you mean, I would say; if you think your project will succeed
don't think small, RabbitMQ and Riak can be pretty easy to understand
and manage once you play with them, don't be scare to try, high demand
systems usually have several subsystems to get their tasks done quickly
like:
1. Distributed data.
2. Distributed queues.
3. Text indexing/search.
With Riak+Yokozuna you cover 1 and 3 which is a pretty good combination,
as a side note, maybe, just maybe, if you "divide and conquer" you may
be able to remove the "Set" concept and just work with CRDT or siblings,
let me explain briefly:
CRDT would warranty your distributed CAS operations per your designed
key, instead of treating the keys as a Set, treat them independently and
each CRDT key to manage its own states, that way you get the atomicity
you want without worrying about locking or distributed concurrency in a
collection of items since each key is unique and each would control its
own state, or let each key have siblings and conflict resolution, as
long as you don't create too many siblings per key in a very small
amount of time.
The the only show stopper for CRDT is if you need to do operations in
more than one atomically, then you will have to use siblings, even for
siblings, your key approach would be to try to operate atomically in the
minimum amount of states as possible so that you can avoid using Sets or
Queues.
I hope I have expressed my thoughts well since explaining all this can
get complicated really quick, or become a really long talk.
Hope that helps,
Guido.
On 12/09/13 17:45, Alex Rice wrote:
Hey all, thanks for the feedback, this is interesting!
re: Those CRDT white papers look pretty complicated, I will definitely
will leave that to you hardcore mathematicians and computer scientists
:)
re: Riak 2.0, I see there are git branches for 2.0 related stuff in
Riak and in corrugatediron, I am guessing it's still too rough around
the edges for a newbie like me to be messing around with?
re: RabbitMQ I had heard of it but am hesitant to add another
daemon/system dependency. But I am definitely considering your
advice, Guido!
Also for this particular project (a multiplayer game) I may be able to
get away with Sets, don't actually really need proper Queues. Actually
I'm not sure. Just thinking out loud here.
For a Set implementation in Riak, I am thinking of modeling it after
this blog post, I cannot find which at the moment, which described how
to implement a lock-free Set in Redis. Redis is not distributed, but I
think the data structure is still relevant:
* To get around the eventual-consistency issue, a prerequisite would
be to always use R=1 and W=all. This will optimize it for reads
because there will be a process polling Riak to remove messages out of
the Set.
* In the Redis implementation, they used the atomic APPEND operation
to append entries into the set, as a list but with notation to add or
remove, prefixed with + or -. e.g.
+itemA
+itemB
-itemA
+itemC
=> yields { itemB, itemC }
* Obviously Riak doesn't have atomic appends, however setting a bucket
to allow_mult = true, and always Put-ing with null vector clock, then
the Siblings feature should be equivalent. Instead of a value being
appended to, the Siblings would be populated with +itemA, +itemB,
-itemA, +itemC, etc.
* Any process reading the Set is also be responsible for collating and
Put-ing the condensed set back, after it reaches some size limit or I
think they called it 'fragmentation'. This condense and re-Put
operation would be the tricky part, and would need to use vector clock
and ensure there are 0 siblings when finished. But it should be
possible? It seems like this is an uber-simplified form of a CRDT data
structure?
Thanks,
Alex
On Thu, Sep 12, 2013 at 2:12 AM, Guido Medina <[email protected]> wrote:
Alex,
RabbitMQ which is a good high performer, developed in Erlang and scales just
as Riak.
The old saying, the right tool for the right job, I like how fast Riak is
fetching/storing key values on a distributed environment, I don't like Riak
for queues, is it because it wasn't designed for that? CRDT like CAS
operations on structures (to mention some for Java) like LinkedBlockingQueue
and ConcurrentLinkedQueue should be painful enough to develop in a CRDT
environment, cause now you would be talking of guarding states of few within
the same structure; say AtomicReference of several variables to warranty a
non-blocking CAS operation, in this case heap and tail of the queue (Not so
true for LinkedBlockingQueue)
And performance of several CRDT for high performance queue I don't actually
think is going to be good.
If I were to use Riak for my big data environment and would like to match
Riak on a distributed queueing system then I would use like I said,
RabbitMQ.
Guido.
On 12/09/13 00:29, Alex Rice wrote:
Hi I'm very new to Riak. The allow_mult = true / Siblings feature is
very interesting. Could it be used to implement a high performance
collection like a Queue or Set, in a lock free manner? The Riak docs
make it sound like allow_mult is mainly for confict resolution and
degenerate cases, rather than a feature which a data structure could
be designed around. Does anyone have any links to share, or thoughts
about this broader issue?
Thanks much!
Alex
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com