Jason,

For key listing requests the bitcask in-memory key directory is scanned for each vnode participating in the covering set. The data structure is a hash table which is ideal for direct accesses, but there is no way to guarantee ordering when the entires are hashed into the table so bitcask iterates over the all the
entries in the table and the keys matching the query are returned.

Then for on-disk data retrieval, each entry in the key directory contains information about how to locate the on-disk data for the key such as the cask file identifier and the offset into the file where the entry resides. It sounds like you have already done some reading on bitcask, but in case you have not seen it before, this bitcask intro document has some good illustrations of how things work: https://github.com/basho/bitcask/blob/develop/doc/bitcask-intro.pdf.

Hope that helps answer your questions.


Kelly

On 08/19/2014 04:51 PM, Jason Campbell wrote:
Hello Kelly,

Thanks for the detailed response, but I'm still a bit confused.  I understand 
the overhead over the covering set, and the key listing will only complete as 
fast as the slowest node.  The data transfer shouldn't be much for a small key 
set though.  So assuming we aren't doing something crazy like trying to stream 
billions of keys, it should be roughly the same speed as getting those keys 
from 1/3 of the nodes.

As far as bitcask being unordered, I'm not sure how that is possible.  I 
understand it's on-disk format is log structured, which although ordered by 
last-modified time, is fairly useless for most purposes.  But in-memory it has 
to be able to satisfy requests for a random key name in fairly fixed time to 
meet the latency guarantees that are expected of it.  I would expect it to be 
some kind of search tree, in which case, it could be used to filter partial 
ranges (like a bucket lookup) just as easily as a full key lookup.  It could 
also be a hash of the key, at which point it would be useless for partial 
ranges, but then so would the bitcask memory calculator, as it would be a fixed 
size per record, not variable with the length of the bucket and key.

I don't know Erlang well enough to jump into the bitcask source, and the papers 
I have found explain the keydir and data structures well enough, but not how a 
request is able to look up a key in memory.  Is it using a b-tree?

Thanks,
Jason

----- Original Message -----
From: "Kelly McLaughlin" <[email protected]>
To: "Jason Campbell" <[email protected]>, "riak-users" 
<[email protected]>
Sent: Wednesday, 20 August, 2014 1:26:36 AM
Subject: Re: Bitcask Key Listing

Jason,

There are two aspects to to a key listing operation that make it
expensive relative to normal gets or puts.

The first part is that, due to the way data is distributed in Riak, key
listing requires a covering set of vnodes to participate in
order to determine the list of keys for a bucket. A minimal covering set
of vnodes works out to 1/N nodes in the cluster where N
is the n_val of the bucket. By default this is 3 so in the default case
a key listing request must send a request to and receive
responses from 1/3 of the nodes in the cluster. This incurs network
traversal overhead as the keys from each vnode are returned
and the speed to completion is limited by the slowest vnode in the
covering set. This is true regardless of the backend in use.

The second part is specific to bitcask. Bitcask is an unordered backend
and the consequence of this when doing a key listing is
that all of the keys stored by a vnode that participates in a key
listing request must be scanned. It doesn't matter if there are
2 keys or 2000 keys for the bucket being queried, they all must be
scanned. This is a case where all the keys being stored in memory
is beneficial to performance, but as the amount of data stored increases
so does the expense to scan over it. The leveldb backend is
ordered and we are able to take advantage of that fact to only scan over
data for the bucket in question, but for bitcask that is
not an option.

At this time there is nothing in the works to specifically improve key
listing performance. It is certainly something we are aware of,
but at this time there are other things with higher priority.

Hope that helps answer your question.


Kelly


On 08/19/2014 05:17 AM, Jaston Campbell wrote:
I currently maintain my own indexes for some things, and use natural keys where 
I can, but a question has been nagging me lately.

Why is key listing slow?  Specifically, why is bitcask key listing slow?

One of the biggest issues with bitcask is all keys (including the bucket name 
and some overhead) must fit into RAM.  For large amounts of keys, I understand 
the coordination data transfer will hurt, but shouldn't things like list 
buckets (or listing keys from small buckets) be fast?

Is there a reason this is slow, and is there a plan to fix it?

Thanks,
Jason

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to