Re: [openstack-dev] [swift] Optimizing storage for small objects in Swift

Alexandre Lécuyer Mon, 19 Jun 2017 02:38:41 -0700

Hello Clint,

Thanks for your feedback, replying in the email inline.


On 06/16/2017 10:54 PM, Clint Byrum wrote:

Excerpts from John Dickinson's message of 2017-06-16 11:35:39 -0700:

On 16 Jun 2017, at 10:51, Clint Byrum wrote:

This is great work.

I'm sure you've already thought of this, but could you explain why
you've chosen not to put the small objects in the k/v store as part of
the value rather than in secondary large files?

I don't want to co-opt an answer from Alex, but I do want to point to some of 
the other background on this LOSF work.

https://wiki.openstack.org/wiki/Swift/ideas/small_files
https://wiki.openstack.org/wiki/Swift/ideas/small_files/experimentations
https://wiki.openstack.org/wiki/Swift/ideas/small_files/implementation

These are great. Thanks for sharing them, I understand a lot more now.

Look at the second link for some context to your answer, but the summary is "that 
means writing a file system, and writing a file system is really hard".

I'm not sure we were thinking the same thing.

I was more asking, why not put the content of the object into the k/v
instead of the big_file_id:offset? My thinking was that for smaller
objects, you would just return the data immediately upon reading the k/v,
rather than then needing to go find the big file and read the offset.
However, I'm painfully aware that those directly involved with the problem
have likely thought of this. However, the experiments don't seem to show
that this was attempted. Perhaps I'm zooming too far out to see the real
problem space. You can all tell me to take my spray paint can and stop
staring at the bike shed if this is just too annoying. Seriously.

Of course, one important thing is, what does one consider "small"? Seems
like there's a size where the memory footprint of storing it in the
k/v would be justifiable if reads just returned immediately from k/v
vs. needing to also go get data from a big file on disk. Perhaps that
size is too low to really matter. I was hoping that this had been
considered and there was documentation, but I don't really see it.

Right, we had considered this when we started the project : storingsmall objects directly in the KV. It would not be too diffcult to do,but we see a few problems :


1) consistency

In the current design, we append data at the end of a "big file". Whenthe data upload is finished, swift writes the metadata and commits thefile. This triggers a fsync(). Only then do we return. We can rely onthe data being stable on disk, even if there is a power loss. Becausewe fallocate() space for the "big files" beforehand, we can also hope tohave mostly sequential disk IO.

(Important as most swift clusters use SATA disks).

Once the object has been committed, we create an entry for it in the KV.This is done asynchronously, because synchronous writes on the KV killsperformance. If we loose power, we loose the latest data. After theserver is rebooted, we have to scan the end of volumes to create missingentries in the KV. (I will not discuss this in detail in this email tokeep this short, but we can discuss it in another thread, or I can postsome information on the wiki).

If we put small objects in the KV, we would need to do synchronouswrites to make sure we don't loose data.Also, currently we can completly reconstruct the KV from the "bigfiles". It would not be possible anymore.



2) performance

On our clusters we see about 40% of physical disk IO being caused byreaddir().We want to serve directory listing requests from memory. So "small"means "the KV can fit in the page cache".We estimate that we need the size per object to be below 50 bytes, whichdoesn't leave much room for data.

LevelDB causes write amplification, as it will regularly copy data todifferent files (levels) to keep keys compressed and in sorted order. Ifwe store object data within the KV, it will be copied around multipletimes as well.

Finally it is also more simple to have only one path to handle. Beyondthese issues, it would not be difficult to store data in the KV. This issomething we can revisit after more test and maybe some productionexperience.


Also the "writing your own filesystem" option in experiments seemed
more like a thing to do if you left the k/v stores out entirely.






__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [swift] Optimizing storage for small objects in Swift

Reply via email to