Why not write to a queue bucket with a timestamp and have a queue processor move writes to the "final" bucket once they're over a certain age? It can dedup/validate at that point too.
On Tue, Jun 21, 2011 at 2:26 PM, Les Mikesell <[email protected]> wrote: > Where can I find the redis hacks that get close to clustering? Would > membase work with syncronous replication on a pair of nodes for a reliable > atomic 'check and set' operation to dedup redundant data before writing to > riak? Conceptually I like the 'smart client' fault tolerance of > memcache/membase and restricting it to a pair of machines would keep the > client configuration reasonable. > > -Les > > > On 6/18/2011 6:54 PM, John D. Rowell wrote: >> >> The "real" queues like HornetQ and others can take care of this without >> a single point of failure but it's a pain (in my opinion) to set them up >> that way, and usually with all the cluster and failover features active >> they get quite slow for writes.We use Redis for this because it's >> simpler and lightweight. The problem is that there is no real clustering >> option for Redis today, even thought there are some hacks that get >> close. When we cannot afford a single point of failure or any downtime, >> we tend to use MongoDB for simple queues. It has full cluster support >> and the performance is pretty close to what you get with Redis in this >> use case. >> >> OTOH you could keep it all Riak and setup a separate small cluster with >> a RAM backend and use that as a queue, probably with similar >> performance. The idea here is that you can scale these clusters (the >> "queue" and the indexed production data) independently in response to >> your load patterns, and have optimum hardware and I/O specs for the >> different cluster nodes. >> >> -jd >> >> 2011/6/18 Les Mikesell <[email protected] >> <mailto:[email protected]>> >> >> Is there a good way to handle something like this with redundancy >> all the way through? On simple key/value items you could have two >> readers write the same things to riak and let bitcask cleanup >> eventually discard one, but with indexing you probably need to use >> some sort of failover approach up front. Do any of those queue >> managers handle that without adding their own single point of >> failure? Assuming there are unique identifiers in the items being >> written, you might use the CAS feature of redis to arbitrate writes >> into its queue, but what happens when the redis node fails? >> >> -Les >> >> >> >> On 6/17/11 11:48 PM, John D. Rowell wrote: >> >> Why not decouple the twitter stream processing from the >> indexing? More than >> likely you have a single process consuming the spritzer stream, >> so you can put >> the fetched results in a queue (hornetq, beanstalk, or even a >> simple Redis >> queue) and then have workers pull from the queue and insert into >> Riak. You could >> run one worker per node and thus insert in parallel into all >> nodes. If you need >> free CPU (e.g. for searches), just throttle the workers to some >> sane level. If >> you see the queue getting bigger, add another Riak node (and >> thus another local >> worker). >> >> -jd >> >> 2011/6/13 Steve Webb <[email protected] <mailto:[email protected]> >> <mailto:[email protected] <mailto:[email protected]>>> >> >> >> Ok, I've changed my two VMs to each have: >> >> 3 CPUs, 1GB ram, 120GB disk >> >> I'm ingesting the twitter spritzer stream (about 10-20 >> tweets per second, >> approx 2k of data per tweet). One bucket is storing the >> non-indexed tweets >> in full. Another bucket is storing the indexed tweet >> string, id, date and >> username. A maximum of 20 clients can be hitting the >> 'cluster' at any one time. >> >> I'm using n_val=2 so there is replication going on behind >> the scenes. >> >> I'm using a hardware load-balancer to distribute the work >> amongst the two >> nodes and now I'm seeing about 75% CPU usage as opposed to >> 100% on one node >> and 50% on the replicating-only node. >> >> I've monitored the VM over the last few days and it seems to >> be mostly >> CPU-bound. The disk I/O is low. The Network I/O is low. >> >> Q: Can I change the pre-commit to a post-commit trigger or >> something perhaps >> or will that make any difference at all? I'm ok if the >> tweet stuff doesn't >> get indexed immediately and there's a slight lag in indexing >> if it saves on CPU. >> >> >> >> >> _________________________________________________ >> riak-users mailing list >> [email protected] <mailto:[email protected]> >> >> http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com >> >> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> >> >> >> >> _________________________________________________ >> riak-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.basho.com/__mailman/listinfo/riak-users___lists.basho.com >> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> >> >> > > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
