Re: speeding up riaksearch precommit indexing

Les Mikesell Tue, 21 Jun 2011 14:26:49 -0700

Where can I find the redis hacks that get close to clustering? Wouldmembase work with syncronous replication on a pair of nodes for areliable atomic 'check and set' operation to dedup redundant data beforewriting to riak? Conceptually I like the 'smart client' faulttolerance of memcache/membase and restricting it to a pair of machineswould keep the client configuration reasonable.


 -Les



On 6/18/2011 6:54 PM, John D. Rowell wrote:

The "real" queues like HornetQ and others can take care of this without
a single point of failure but it's a pain (in my opinion) to set them up
that way, and usually with all the cluster and failover features active
they get quite slow for writes.We use Redis for this because it's
simpler and lightweight. The problem is that there is no real clustering
option for Redis today, even thought there are some hacks that get
close. When we cannot afford a single point of failure or any downtime,
we tend to use MongoDB for simple queues. It has full cluster support
and the performance is pretty close to what you get with Redis in this
use case.

OTOH you could keep it all Riak and setup a separate small cluster with
a RAM backend and use that as a queue, probably with similar
performance. The idea here is that you can scale these clusters (the
"queue" and the indexed production data) independently in response to
your load patterns, and have optimum hardware and I/O specs for the
different cluster nodes.

-jd

2011/6/18 Les Mikesell <[email protected]
<mailto:[email protected]>>

    Is there a good way to handle something like this with redundancy
    all the way through?  On simple key/value items you could have two
    readers write the same things to riak and let bitcask cleanup
    eventually discard one, but with indexing you probably need to use
    some sort of failover approach up front.  Do any of those queue
    managers handle that without adding their own single point of
    failure?  Assuming there are unique identifiers in the items being
    written, you might use the CAS feature of redis to arbitrate writes
    into its queue, but what happens when the redis node fails?

      -Les



    On 6/17/11 11:48 PM, John D. Rowell wrote:

        Why not decouple the twitter stream processing from the
        indexing? More than
        likely you have a single process consuming the spritzer stream,
        so you can put
        the fetched results in a queue (hornetq, beanstalk, or even a
        simple Redis
        queue) and then have workers pull from the queue and insert into
        Riak. You could
        run one worker per node and thus insert in parallel into all
        nodes. If you need
        free CPU (e.g. for searches), just throttle the workers to some
        sane level. If
        you see the queue getting bigger, add another Riak node (and
        thus another local
        worker).

        -jd

        2011/6/13 Steve Webb <[email protected] <mailto:[email protected]>
        <mailto:[email protected] <mailto:[email protected]>>>


            Ok, I've changed my two VMs to each have:

            3 CPUs, 1GB ram, 120GB disk

            I'm ingesting the twitter spritzer stream (about 10-20
        tweets per second,
            approx 2k of data per tweet).  One bucket is storing the
        non-indexed tweets
            in full.  Another bucket is storing the indexed tweet
        string, id, date and
            username.  A maximum of 20 clients can be hitting the
        'cluster' at any one time.

            I'm using n_val=2 so there is replication going on behind
        the scenes.

            I'm using a hardware load-balancer to distribute the work
        amongst the two
            nodes and now I'm seeing about 75% CPU usage as opposed to
        100% on one node
            and 50% on the replicating-only node.

            I've monitored the VM over the last few days and it seems to
        be mostly
            CPU-bound.  The disk I/O is low.  The Network I/O is low.

            Q: Can I change the pre-commit to a post-commit trigger or
        something perhaps
            or will that make any difference at all?  I'm ok if the
        tweet stuff doesn't
            get indexed immediately and there's a slight lag in indexing
        if it saves on CPU.




        _________________________________________________
        riak-users mailing list
        [email protected] <mailto:[email protected]>
        http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
        <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>



    _________________________________________________
    riak-users mailing list
    [email protected] <mailto:[email protected]>
    http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com
    <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>



_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: speeding up riaksearch precommit indexing

Reply via email to