inline. --- Jeremiah Peschka - Founder, Brent Ozar Unlimited MCITP: SQL Server 2008, MVP Cloudera Certified Developer for Apache Hadoop
On Wed, Sep 25, 2013 at 2:47 PM, Brady Wetherington <[email protected]>wrote: > I've built it a solid proof-of-concept system on leveldb, and use some 2i > indexes in order to search for certain things - usually just for counts of > things. > > I have two questions so far: > > First off, why is Bitcask the default? Is it just because it is faster? Or > is it considered more 'stable' or something? > Long ago, when bitcask was elected as the default, LevelDB was not a thing. Databases strive for stability and the principle of least surprise. Changing anything can potentially introduce performance regressions, stability problems, and any host of other undesirable and reputation destroying things. Changing the storage back end is high up on the list of things I'd never want to do in a database. Why do you think MySQL still defaults to MyISAM? > > Next, I've learned about the allow_mult feature you can set on buckets. I > wonder if I should use this for my most heavily-used primary-purpose > queries? Is there a limit to how many 'siblings' you can have for an entry? > Is it inadvisable to do what I'm talking about? Would fetching all of the > siblings end up being a disastrous nightmare or something? > The upper limit will depend on the size of your objects. You don't want to have object sizes (including siblings) much beyond 6MB. You'll have a lot of network congestion. You certainly *could* have bigger object + sibling collections, but you'd want to beef up the network backend to something like 10GbE, 40GbE, or InfiniBand to deal with the increased gossip. Fetching all of your siblings is bad if you never resolve siblings since you'll have a lot of data. Allow_mult is typically turned on for production clusters. This is set off by default to help new users get a handle on Riak quickly without having to worry about siblings. Once you get the hang of how Riak behaves, turning on siblings is usually a good thing. Depending on resolution, it's probably best to read your data, resolve siblings, and send that garbage collected object back to Riak - even if you're performing a "read only" query. The new Riak DT features eliminate some of the worry about siblings by pushing the responsibility back down to Riak. Those features are only available if you're building from source, but hopefully Riak 2.0 will be out soon. > I *assume* - and I could be wrong - that a 2i query would be slower than a > fetch-of-siblings for a particular key - is that wrong? > > If I switch from using 2i indexes to using allow_mult and siblings, we'd > be talking a few hundred thousand to low millions for a sibling-count. > I do not think 'siblings' means what you think it means. A sibling would occur if two clients, A and B, read v1 of an object and then issue writes. Client A updates object and sets preferences to ['cat pictures', 'ham sandwiches'] Client B updates object and sets preferences to ['knitting with bacon'] With allow_mult enabled you'd have two versions of the object. These are siblings. If you're thinking of some kind of index created by your application, you could look at 2i vs using siblings to build a secondary index: http://basho.com/index-for-fun-and-for-profit/ Even when you're creating your own secondary index, you still want to perform garbage collection on the data you're storing in Riak. > Thanks for making an excellent product! Can't wait to get this bad boy > into production and really see what it can do! > > -B. > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
