Hi, I’ve been thinking the past few days about how a composite blob store might go about prioritizing the delegate blob stores for reading and writing, considering concepts like storage filters on a blob store, read-only blob stores, and archive or “cold” blob stores (which we don’t currently have, but could in the future).
Storage filters basically restrict what can be stored in a delegate - like saying only blobs with a certain JCR property, etc. (I realize there are implications with this too - I’ll worry about that in a separate thread someday.) I’d like feedback on the following idea: - Create a new public interface in Oak that can be injected into the composite blob store and used to handle the delegate prioritization for reads and writes. - Create a default implementation of this interface that can be used in most cases (see below). This would allow extensibility in this area to implement new or more custom algorithms for any future use cases, as needed, without tying it to configuration. The default implementation would be basically this: - For reads: - Delegates with storage filters first - Delegates without storage filters next - Read-only delegates next (with filters first, then without) - Retry reads on delegates with with filters that were previously skipped (this is a special case) - Cold storage delegates last - For writes: - Search for an existing blob first using the “read” algorithm - always update an existing blob, if one is found (except in cold storage) - If not found: - Try delegates with storage filters first - Delegates without storage filters next The special case to retry reads on delegates with filters that were previously skipped is to handle configuration change. Essentially, if a blob is stored in a delegate blob store, and then the configuration for that delegate changes so that the blob wouldn’t be stored there if it was being written now, we want to be able to locate it during the time between when the configuration change happens and some background curator moves the blob to the correct location. So in short, I’d do the default implementation as described, but a different implementation could be injected instead, if someone wanted a more custom one. WDYT? -MR