Wanted to add some additional information to augment Sean's comments. > Does AAE also apply to search? Or is it only KV that benefits from this? The > reason I'm asking is of cause that this list has seen some cases of bit rot > in merge index files.
Easy to miss, but the very end of the release notes section on AAE mentions it's for K/V only. As Sean mentioned, partition repair still works for Search data. In theory, we could support AAE for Search using the same approach the repair command uses: repairing specific merge_index data from replicas of the merge_index data on adjacent partitions in the ring. Basically, just automating partition repair. The main issue is that Search data doesn't include anything equivalent to vector clocks. If two replicas have divergent data, who wins? With the repair command, the user implicitly makes that determination. Also, repairing merge_index data only addresses merge_index durability issues. It doesn't help with K/V vs Search consistency issues. Most users of Riak Search are using it to index Riak K/V data. In that case, a much more useful guarantee would be that the Search index is both properly replicated and consistent with the K/V data. In other words, a system that performs AAE between the K/V data and the Search index. This is very hard to do for Riak Search due to the use of term-based partitioning. K/V data written to any node in the cluster can trigger Search index updates on arbitrary nodes in the cluster. Thus, correct K/V + Search AAE would require exchanges between all nodes in the cluster. I've played around with the idea of breaking the cluster into a tree-like structure and performing a top-down fan-out style exchange to make this type of AAE tractable, but it's still a lot of additional complexity. For all these reasons, we decided to go with K/V only. Of course, the next-generation search solution for Riak (Yokozuna) already includes K/V + Yokozuna AAE. This is much more straightforward because Yokozuna use document-based partitioning. It also works really well. K/V AAE between nodes repairs K/V replicas as necessary. Then, local AAE between the local K/V data and the local Solr index ensures that the index is consistent with the K/V data. Since the K/V data is replicated + repaired, the Yokozuna indices are also implicitly replicated and repaired despite using local-only exchange. Ryan Zezeski did a great job designing Yokozuna from the beginning to exploit AAE, both for data integrity as well as a general solution to the K/V vs Index consistency problem. > > Given that AAE necessarily add extra I/O, should we expect that I/O to > noticeably effect throughput of K/V? As always, it depends on the workload and situation. If a system is running very close to its maximum IOPs rate, then AAE could have a noticeable effect. However, a lot of work was done to ensure AAE was as low-impact as possible. A few points. First, the most expensive operation is going to be AAE tree building. Either when initially moving to 1.3, or when trees are expired and rebuilt (which happens once a week by default). Each tree build is a fold over a given partition's data. Of course, handoff and the fullsync replication feature in Riak Enterprise both do similar folds. If you have insight into how your cluster handles either of these operations, than you should have a reasonable idea of how tree building will impact cluster performance. Second, the task of keeping trees up-to-date triggers an additional AAE write per Riak K/V write. However, these writes are very small (key + hash + some overhead). To improve performance, these writes are buffered in memory until there are 200 hash updates, and then the updates are sent to the respective hash tree LevelDB instance in a single batch write. Finally, actual AAE exchange between replicas are very inexpensive. Likewise, AAE repairs data by performing a single read at a time in the same manner as a normal client. As such, if there are lots of client requests, than AAE repair is implicitly throttled. If there are no client requests, than AAE repairs data as fast a Riak can respond to read requests. Hope this provides some additional insight. BTW, if anyone wants to dive more into AAE's design, here's a link to a video I made months back to describe the AAE design to other engineers at Basho. The content is slightly outdated and not 100% current with the code shipping in 1.3, but it's 90% the same and the ideas and concepts remain unchanged. And again, this was meant to be a informal presentation between co-workers so don't expect anything polished: http://coffee.jtuple.com/video/AAE.html Regards, Joe --- Joseph Blomstedt Senior Software Engineer Basho Technologies @jtuple On Wed, Jan 30, 2013 at 5:24 PM, Sean Cribbs <[email protected]> wrote: > On Wed, Jan 30, 2013 at 6:04 PM, Kresten Krab Thorup <[email protected]> wrote: >> Looks great! I'm excited! >> >> Does AAE also apply to search? Or is it only KV that benefits from this? >> The reason I'm asking is of cause that this list has seen some cases of bit >> rot in merge index files. > > It does not, however, the 1.2 "partition repair" feature is still > functional and works with Riak Search. > >> >> Given that AAE necessarily add extra I/O, should we expect that I/O to >> noticeably effect throughput of K/V? > > The throughput should be minimally affected. In early testing we found > a significant overhead, but that has been greatly reduced thanks to > improvements to eleveldb and AAE. However, of course we could not test > all scenarios and would appreciate early adopters to put it through > its paces. > > -- > Sean Cribbs <[email protected]> > Software Engineer > Basho Technologies, Inc. > http://basho.com/ > > _______________________________________________ > riak-users mailing list > [email protected] > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
