Re: queue_transaction interface + unique_ptr + performance
Hi, To those writing benchmarks, please note that the standard library provides std::make_shared() as an optimization for shared_ptr. It creates the object and its shared storage in a single allocation, instead of the two allocations required for "std::shared_ptr fs (new Foo())". Casey - Original Message - > 1- I agree we should avoid shared_ptr whenever possible. > > 2- unique_ptr should not have any more overhead than a raw pointer--the > compiler is enforcing the single-owner semantics. See for example > > https://msdn.microsoft.com/en-us/library/hh279676.aspx > > "It is exactly is efficient as a raw pointer and can be used in STL > containers." > > Unless the implementation is broken somehow? That seems unlikely... > > sage > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: queue_transaction interface + unique_ptr + performance
- Original Message - > Eh, Sage had a point that Transaction has a bunch of little fields > which would have to be filled in -- its move constructor would be less > trivial than unique_ptr's. > -Sam It's true that the move ctor has to do work. I counted 18 fields, half of which are integers, and the rest have move ctors themselves. But the cpu is good at integers. The win here is that you're not hitting the allocator in the fast path. Casey > > On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemer...@redhat.com> wrote: > > On 03/12/2015, Casey Bodley wrote: > > [snip] > >> The queue_transactions() interface could take a container of Transactions, > >> rather than pointers to Transactions, and the ObjectStore would move them > >> out of the container into whatever representation it prefers. > > [snip] > > > > Or a pointer and count (or we could steal array_view from GSL). That way we > > could pass in any continguous range (std::vector or even a std::array or > > regular C style array allocated on the stack.) > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: queue_transaction interface + unique_ptr + performance
After more discussion with Adam, it seems that making the Transaction object itself a movable type could alleviate all of these concerns about heap allocations, ownership, and lifetime. The queue_transactions() interface could take a container of Transactions, rather than pointers to Transactions, and the ObjectStore would move them out of the container into whatever representation it prefers. Casey - Original Message - > Yes, if we can do that, that will be far more easier..I will double check > that if apply_transaction is been used from any of the performance sensitive > path and do the changes accordingly..Thanks.. > > -Original Message- > From: Samuel Just [mailto:sj...@redhat.com] > Sent: Thursday, December 03, 2015 9:51 AM > To: Somnath Roy > Cc: Adam C. Emerson; Sage Weil; Samuel Just (sam.j...@inktank.com); > ceph-devel@vger.kernel.org > Subject: Re: queue_transaction interface + unique_ptr + performance > > As far as I know, there are no current users which want to use the > Transaction later. You could also change apply_transaction to copy the > Transaction into a unique_ptr since I don't think it's used in any > performance sensitive code paths. > -Sam > > On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roywrote: > > Yes, that we can do..But, in that case aren't we restricting user if they > > want to do something with this Transaction object later. I didn't go > > through each and every part of the code yet (which is huge) that are using > > these interfaces to understand if it is using Transaction object > > afterwards. > > > > Thanks & Regards > > Somnath > > -Original Message- > > From: Adam C. Emerson [mailto:aemer...@redhat.com] > > Sent: Thursday, December 03, 2015 9:25 AM > > To: Somnath Roy > > Cc: Sage Weil; Samuel Just (sam.j...@inktank.com); > > ceph-devel@vger.kernel.org > > Subject: Re: queue_transaction interface + unique_ptr + performance > > > > On 03/12/2015, Somnath Roy wrote: > >> Yes, I posted the new result after adding -O2 in the compiler flag and it > >> shows almost no overhead with unique_ptr. > >> I will add the test of adding to list overhead and start implementing the > >> new interface. > >> But, regarding my other point of changing all the objecstore interfaces > >> (my first mail on this mail chain in case you have missed) taking > >> Transaction, any thought of that ? > >> Should we reconsider having two queue_transaction interface ? > > > > As I understand it, the concern with switching to unique_ptr was that the > > callee would move from the reference without this being known to the > > caller. > > > > Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)? > > That way the compiler should demand that the callers explicitly use > > std::move on the reference they're holding, documenting at the site of the > > call that they're willing to give up ownership. > > > > > > -- > > Senior Software Engineer Red Hat Storage, Ann Arbor, MI, US > > IRC: Aemerson@{RedHat, OFTC, Freenode} > > 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C 7C12 80F7 544B 90ED BFB9 > N�r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"�� -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: queue_transaction interface + unique_ptr + performance
- Original Message - > Well, yeah we are, it's just the actual Transaction structure which > wouldn't be dynamic -- the buffers and many other fields would still > hit the allocator. > -Sam Sure. I was looking specifically at the tradeoffs between allocating and moving the Transaction object itself. As it currently stands, the caller of ObjectStore can choose whether to allocate its Transactions on the heap, embed them in other objects, or put them on the stack for use with apply_transactions(). Switching to an interface built around unique_ptr forces all callers to use the heap. I'm advocating for an interface that doesn't. Casey > > On Thu, Dec 3, 2015 at 11:29 AM, Casey Bodley <cbod...@redhat.com> wrote: > > > > - Original Message - > >> Eh, Sage had a point that Transaction has a bunch of little fields > >> which would have to be filled in -- its move constructor would be less > >> trivial than unique_ptr's. > >> -Sam > > > > It's true that the move ctor has to do work. I counted 18 fields, half of > > which are integers, and the rest have move ctors themselves. But the cpu > > is good at integers. The win here is that you're not hitting the allocator > > in the fast path. > > > > Casey > > > >> > >> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemer...@redhat.com> > >> wrote: > >> > On 03/12/2015, Casey Bodley wrote: > >> > [snip] > >> >> The queue_transactions() interface could take a container of > >> >> Transactions, > >> >> rather than pointers to Transactions, and the ObjectStore would move > >> >> them > >> >> out of the container into whatever representation it prefers. > >> > [snip] > >> > > >> > Or a pointer and count (or we could steal array_view from GSL). That way > >> > we > >> > could pass in any continguous range (std::vector or even a std::array or > >> > regular C style array allocated on the stack.) > >> -- > >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > >> the body of a message to majord...@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ObjectStore interface and std::list
- Original Message - > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA256 > > Template functions can be virtual in c++11 and can avoid the VTL. I'm > using them in https://github.com/ceph/ceph/pull/6781 .Hi Robert, Hi Robert, Virtual functions are allowed in a templated class, but templated member functions are a separate case. My suggestion to use iterators would require the function itself to be templated on the iterator type. This isn't allowed, because the virtual function table would have to contain an entry for every template instantiation of that function. Casey > - > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Thu, Dec 3, 2015 at 10:46 AM, Casey Bodley wrote: > > Hi, > > > > I missed out on the context leading up to the discussion about > > queue_transactions and the use of raw vs. smart pointers, but I'm curious > > whether the use of std::list in the ObjectStore interface has come up as > > well. That's another source of allocations that could be avoided. > > > > A naive approach could just replace the list with vector, which could at > > least reduce the number of allocations to one if you reserve() space for > > all of your elements up front. > > > > An interface based on iterators could also help, by allowing the caller to > > put an array on the stack and pass that in. Iterators are tricky because > > the generic form requires templates, and template functions can't be > > virtual. But if we require the elements to be sequential in memory (i.e. > > raw array, std::array or std::vector), the > > interface could take (Transaction** begin, Transaction** end) without > > requiring templates. This interface can also be made compatible with > > unique_ptrs. > > > > Casey > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -BEGIN PGP SIGNATURE- > Version: Mailvelope v1.3.0 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJWYIp9CRDmVDuy+mK58QAAKVEP/jFe1Vv/1ZJbz30FCM6a > DB85aIvbnw2qjNHBbFeaJSAd2oJyVxC1bqfVKTxcms0pnQPYL8/GcAzUImrt > n986VgBJAIeaH0qRSGYcITS4rlSikUcSUcPAdjL6Fv9yPleGLCqmCEIWUj/q > jKTndoEb4E20I6XHJpG2dAXdxk/knMJJGwHtS7KdsgR2nDos5evjncGrE8I7 > nvSO/4zshuQPoglOC86SP17tviQUew3e/a9cZ6jPy6Adz0u3XeV+eKyhf2SB > 3hm2sfRVOPT6rdfRvjBBLR7QFz7kJxedyWB2y5c4j2x7GtdWgiZn2YePuNti > yRPad+Jq04DpXX0IUFY9+FXQeOIdEQq40bhfvVOgus04xHXgcjUdyR0AYh0M > YfXoqvBg7L5C68duPP2wjn2w58CLCDqCp+4JQVVRZ9au8YQK4vt53BefuFnH > QX2XePnXNnlkqjgXmq9TqQafETodZYi0VQ110ADGQA/TsPGEqzUn2mudUjM7 > ixbJdbcwUd4JHwb55zK5xxBaat9EBj+5pI9Xfjz4Fg8+ZcgKRkWG7EsFVlTn > bcZb8TV9/3FSrGFbV4UqTf5IT1/wVuWeG/AKOs540yNFr9My/DtKlNxx+/8O > VhSliJ2qbymw0/JpsTacGMqWjYH5bADfYiCmBcyPm8EHYXOnfxNmpC9FYkpO > 4LUj > =9kKf > -END PGP SIGNATURE- > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: queue_transaction interface + unique_ptr + performance
- Original Message - > On Thu, 3 Dec 2015, Casey Bodley wrote: > > > Well, yeah we are, it's just the actual Transaction structure which > > > wouldn't be dynamic -- the buffers and many other fields would still > > > hit the allocator. > > > -Sam > > > > Sure. I was looking specifically at the tradeoffs between allocating > > and moving the Transaction object itself. > > > > As it currently stands, the caller of ObjectStore can choose whether to > > allocate its Transactions on the heap, embed them in other objects, or > > put them on the stack for use with apply_transactions(). Switching to an > > interface built around unique_ptr forces all callers to use the heap. I'm > > advocating for an interface that doesn't. > > That leaves us with either std::move or.. the raw Transaction* we have > now. Right? Right. The thing I really like about the unique_ptr approach is that the caller no longer has to care about the Transaction's lifetime, so doesn't have to allocate the extra ObjectStore::C_DeleteTransaction for cleanup. Movable Transactions accomplish this as well. Casey > > > > > It's true that the move ctor has to do work. I counted 18 fields, half > > > > of > > > > which are integers, and the rest have move ctors themselves. But the > > > > cpu > > > > is good at integers. The win here is that you're not hitting the > > > > allocator > > > > in the fast path. > > To be fair, many of these are also legacy that we can remove... possibly > even now. IIRC the only exposure to legacy encoded transactions (that use > the tbl hackery) are journal items from an upgrade pre-hammer OSD that > aren't flushed on upgrade. We should have made the osd flush the journal > before recording the 0_94_4 ondisk feature. We could add another one to > enforce that and rip all that code out now instead of waiting until > after jewel... that would be satisfying (and I think an ondisk ceph-osd > feature is enough here, then document that users should upgrade to > hammer 0.94.6 or infernalis 9.2.1 before moving to jewel). > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ObjectStore interface and std::list
Hi, I missed out on the context leading up to the discussion about queue_transactions and the use of raw vs. smart pointers, but I'm curious whether the use of std::list in the ObjectStore interface has come up as well. That's another source of allocations that could be avoided. A naive approach could just replace the list with vector, which could at least reduce the number of allocations to one if you reserve() space for all of your elements up front. An interface based on iterators could also help, by allowing the caller to put an array on the stack and pass that in. Iterators are tricky because the generic form requires templates, and template functions can't be virtual. But if we require the elements to be sequential in memory (i.e. raw array, std::arrayor std::vector ), the interface could take (Transaction** begin, Transaction** end) without requiring templates. This interface can also be made compatible with unique_ptrs. Casey -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
updates to the fio objectstore engine
Hi folks, I made another pass at the fio objectstore engine over the long weekend. Some of the major changes: * added support for multiple jobs at a time by splitting global/ObjectStore initialization into a new Engine class * when run with nr_files > 1, the objects are spread over multiple collections to model the concurrency we get from placement groups * replaced other options with a 'conf' option to read in a ceph.conf * moved into src/test/fio and added a README and example job/config files You can find the pull request at https://github.com/ceph/ceph/pull/5943. Testing and feedback is appreciated! Casey -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backend ObjectStore engine performance bench with FIO
Hi Xiaoxi, I pushed a new branch wip-fio-objectstore to ceph's github. I look forward to seeing James' work! Thanks, Casey - Original Message - > Hi Casey, > Would it better if we create an integration brunch on > ceph/ceph/wip-fio-objstore to allow more people try and improve it? > Seems James has some patches. > > -Xiaoxi > > > -Original Message- > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > > ow...@vger.kernel.org] On Behalf Of Casey Bodley > > Sent: Wednesday, September 30, 2015 4:06 AM > > To: James (Fei) Liu-SSI > > Cc: ceph-devel@vger.kernel.org > > Subject: Re: Backend ObjectStore engine performance bench with FIO > > > > Hi James/Haomai/Xiaoxi, > > > > I spent some more time on the fio-objectstore branch, and pushed an > > update. > > > > In testing, I realized that it was using the io_unit's start time to name > > the > > objects, which meant that every write operation was creating a separate > > object. > > In addition to fixing this to use fio's filenames for object names, I also > > added > > support for the open_file() and close_file() functions. It now creates > > objects > > of the proper size on startup, so read-only jobs will work normally. > > It also removes its objects on exit. > > > > On startup, it no longer calls create_collection() if it already exists, so > > I was > > able to re-run fio jobs over and over again without having to clear the > > data > > directory (tested with FileStore and KeyValueStore). > > > > Casey > > > > ----- Original Message - > > > Great work James! > > > > > > - Original Message - > > > > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > > > > To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley" > > > > <cbod...@redhat.com> > > > > Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org > > > > Sent: Friday, September 25, 2015 1:55:29 PM > > > > Subject: Backend ObjectStore engine performance bench with FIO > > > > > > > > Hi Xiaoxi, > > > > > > > >With changing the IO mode from aio to sync, we make fio against > > newstore > > > >works. Even with sync engine(I am still debugging the aio engine in > > > >newstore with Xiaoxi) in newstore, Newstore still performing the > > > >best > > > >among all of backstore engine with our initial setup(Thoroughly test > > > >will > > > >be run soon). Attachment is the initial data we collected for your > > > >reference. Thanks for great help from Xiaoxi from regarding to > > Newstore > > > >development to support FIO. > > > > > > > > Hi Casey, > > > > Let me know if you need any help to put fio-ceph-objectstore into > > > > upstream. > > > > After then , I can commit all of mine into upstream. > > > > > > My pull request at https://github.com/ceph/ceph/pull/5943 is still > > > pending. > > > If you have patches that you'd like included, I would be happy to pull > > > them in; just point me to a branch. > > > > > > > > > > > Thanks, > > > > James > > > > > > > > > > Thanks, > > > Casey > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" > > > in the body of a message to majord...@vger.kernel.org More > > majordomo > > > info at http://vger.kernel.org/majordomo-info.html > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the > > body of a message to majord...@vger.kernel.org More majordomo info at > > http://vger.kernel.org/majordomo-info.html > N�r��y���b�X��ǧv�^�){.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"�� -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backend ObjectStore engine performance bench with FIO
Hi James/Haomai/Xiaoxi, I spent some more time on the fio-objectstore branch, and pushed an update. In testing, I realized that it was using the io_unit's start time to name the objects, which meant that every write operation was creating a separate object. In addition to fixing this to use fio's filenames for object names, I also added support for the open_file() and close_file() functions. It now creates objects of the proper size on startup, so read-only jobs will work normally. It also removes its objects on exit. On startup, it no longer calls create_collection() if it already exists, so I was able to re-run fio jobs over and over again without having to clear the data directory (tested with FileStore and KeyValueStore). Casey - Original Message - > Great work James! > > - Original Message - > > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > > To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley" > > <cbod...@redhat.com> > > Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org > > Sent: Friday, September 25, 2015 1:55:29 PM > > Subject: Backend ObjectStore engine performance bench with FIO > > > > Hi Xiaoxi, > > > >With changing the IO mode from aio to sync, we make fio against newstore > >works. Even with sync engine(I am still debugging the aio engine in > >newstore with Xiaoxi) in newstore, Newstore still performing the best > >among all of backstore engine with our initial setup(Thoroughly test > >will > >be run soon). Attachment is the initial data we collected for your > >reference. Thanks for great help from Xiaoxi from regarding to Newstore > >development to support FIO. > > > > Hi Casey, > > Let me know if you need any help to put fio-ceph-objectstore into > > upstream. > > After then , I can commit all of mine into upstream. > > My pull request at https://github.com/ceph/ceph/pull/5943 is still pending. > If you have patches that you'd like included, I would be happy to pull them > in; just point me to a branch. > > > > > Thanks, > > James > > > > Thanks, > Casey > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Backend ObjectStore engine performance bench with FIO
Great work James! - Original Message - > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley" <cbod...@redhat.com> > Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org > Sent: Friday, September 25, 2015 1:55:29 PM > Subject: Backend ObjectStore engine performance bench with FIO > > Hi Xiaoxi, > >With changing the IO mode from aio to sync, we make fio against newstore >works. Even with sync engine(I am still debugging the aio engine in >newstore with Xiaoxi) in newstore, Newstore still performing the best >among all of backstore engine with our initial setup(Thoroughly test will >be run soon). Attachment is the initial data we collected for your >reference. Thanks for great help from Xiaoxi from regarding to Newstore >development to support FIO. > > Hi Casey, > Let me know if you need any help to put fio-ceph-objectstore into upstream. > After then , I can commit all of mine into upstream. My pull request at https://github.com/ceph/ceph/pull/5943 is still pending. If you have patches that you'd like included, I would be happy to pull them in; just point me to a branch. > > Thanks, > James > Thanks, Casey -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About Fio backend with ObjectStore API
Hi James, I just looked back at the results you posted, and saw that you were using iodepth=1. Setting this higher should help keep the FileStore busy. Casey - Original Message - > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > To: "Casey Bodley" <cbod...@redhat.com> > Cc: "Haomai Wang" <haomaiw...@gmail.com>, ceph-devel@vger.kernel.org > Sent: Friday, September 11, 2015 1:18:31 PM > Subject: RE: About Fio backend with ObjectStore API > > Hi Casey, > You are right. I think the bottleneck is in fio side rather than in > filestore side in this case. The fio did not issue the io commands faster > enough to saturate the filestore. > Here is one of possible solution for it: Create a async engine which are > normally way faster than sync engine in fio. > >Here is possible framework. This new Objectstore-AIO engine in FIO in >theory will be way faster than sync engine. Once we have FIO which can >saturate newstore, memstore and filestore, we can investigate them in >very details of where the bottleneck in their design. > > . > struct objectstore_aio_data { > struct aio_ctx *q_aio_ctx; > struct aio_completion_data *a_data; > aio_ses_ctx_t *p_ses_ctx; > unsigned int entries; > }; > ... > /* > * Note that the structure is exported, so that fio can get it via > * dlsym(..., "ioengine"); > */ > struct ioengine_ops us_aio_ioengine = { > .name = "objectstore-aio", > .version= FIO_IOOPS_VERSION, > .init = fio_objectstore_aio_init, > .prep = fio_objectstore_aio_prep, > .queue = fio_objectstore_aio_queue, > .cancel = fio_objectstore_aio_cancel, > .getevents = fio_objectstore_aio_getevents, > .event = fio_objectstore_aio_event, > .cleanup= fio_objectstore_aio_cleanup, > .open_file = fio_objectstore_aio_open, > .close_file = fio_objectstore_aio_close, > }; > > > Let me know what you think. > > Regards, > James > > -Original Message- > From: Casey Bodley [mailto:cbod...@redhat.com] > Sent: Friday, September 11, 2015 7:28 AM > To: James (Fei) Liu-SSI > Cc: Haomai Wang; ceph-devel@vger.kernel.org > Subject: Re: About Fio backend with ObjectStore API > > Hi James, > > That's great that you were able to get fio-objectstore running! Thanks to you > and Haomai for all the help with testing. > > In terms of performance, it's possible that we're not handling the > completions optimally. When profiling with MemStore I remember seeing a > significant amount of cpu time spent in polling with > fio_ceph_os_getevents(). > > The issue with reads is more of a design issue than a bug. Because the test > starts with a mkfs(), there are no objects to read from initially. You would > just have to add a write job to run before the read job, to make sure that > the objects are initialized. Or perhaps the mkfs() step could be an optional > part of the configuration. > > Casey > > - Original Message - > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com> > Cc: ceph-devel@vger.kernel.org > Sent: Thursday, September 10, 2015 8:08:04 PM > Subject: RE: About Fio backend with ObjectStore API > > Hi Casey and Haomai, > > We finally made the fio-objectstore works in our end . Here is fio data > against filestore with Samsung 850 Pro. It is sequential write and the > performance is very poor which is expected though. > > Run status group 0 (all jobs): > WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s, > mint=55378msec, maxt=55378msec > > But anyway, it works even though still some bugs to fix like read and > filesytem issues. thanks a lot for your great work. > > Regards, > James > > jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio > ./test/objectstore.fio > filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, > ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process > test1 > filestore: Laying out IO file(s) (1 file(s) / 512MB) > 2015-09-10 16:55:40.614494 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) > mkfs in /home/jamesliu/fio_ceph > 2015-09-10 16:55:40.614924 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) > mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a > 2015-09-10 16:
Re: About Fio backend with ObjectStore API
Hi James, That's great that you were able to get fio-objectstore running! Thanks to you and Haomai for all the help with testing. In terms of performance, it's possible that we're not handling the completions optimally. When profiling with MemStore I remember seeing a significant amount of cpu time spent in polling with fio_ceph_os_getevents(). The issue with reads is more of a design issue than a bug. Because the test starts with a mkfs(), there are no objects to read from initially. You would just have to add a write job to run before the read job, to make sure that the objects are initialized. Or perhaps the mkfs() step could be an optional part of the configuration. Casey - Original Message - From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com> Cc: ceph-devel@vger.kernel.org Sent: Thursday, September 10, 2015 8:08:04 PM Subject: RE: About Fio backend with ObjectStore API Hi Casey and Haomai, We finally made the fio-objectstore works in our end . Here is fio data against filestore with Samsung 850 Pro. It is sequential write and the performance is very poor which is expected though. Run status group 0 (all jobs): WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s, mint=55378msec, maxt=55378msec But anyway, it works even though still some bugs to fix like read and filesytem issues. thanks a lot for your great work. Regards, James jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio ./test/objectstore.fio filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process test1 filestore: Laying out IO file(s) (1 file(s) / 512MB) 2015-09-10 16:55:40.614494 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs in /home/jamesliu/fio_ceph 2015-09-10 16:55:40.614924 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a 2015-09-10 16:55:40.630326 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) write_version_stamp 4 2015-09-10 16:55:40.673417 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) backend xfs (magic 0x58465342) 2015-09-10 16:55:40.724097 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) leveldb db exists/created 2015-09-10 16:55:40.724218 7f19d34d1840 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-09-10 16:55:40.724226 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.724468 7f19d34d1840 -1 journal check: ondisk fsid 7580401a-6863-4863-9873-3adda08c9150 doesn't match expected 5508d58e-dbfc-48a5-9f9c-c639af4fe73a, invalid (someone else's?) journal 2015-09-10 16:55:40.724481 7f19d34d1840 1 journal close /tmp/fio_ceph_filestore1 2015-09-10 16:55:40.724506 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.730417 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) mkjournal created journal on /tmp/fio_ceph_filestore1 2015-09-10 16:55:40.730446 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs done in /home/jamesliu/fio_ceph 2015-09-10 16:55:40.730527 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) backend xfs (magic 0x58465342) 2015-09-10 16:55:40.730773 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-09-10 16:55:40.730779 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2015-09-10 16:55:40.730793 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: splice is supported 2015-09-10 16:55:40.751951 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-09-10 16:55:40.752102 7f19d34d1840 0 xfsfilestorebackend(/home/jamesliu/fio_ceph) detect_features: extsize is supported and your kernel >= 3.5 2015-09-10 16:55:40.794731 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-09-10 16:55:40.794906 7f19d34d1840 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-09-10 16:55:40.794917 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.795219 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 11: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.795533 7f19d34d1840 1 filestore
Re: About Fio backend with ObjectStore API
I forgot to mention for the list, you can find the latest version of the fio-objectstore branch at https://github.com/cbodley/ceph/commits/fio-objectstore. Casey - Original Message - From: "Casey Bodley" <cbod...@redhat.com> To: "James (Fei) Liu-SSI" <james@ssi.samsung.com> Cc: "Haomai Wang" <haomaiw...@gmail.com>, ceph-devel@vger.kernel.org Sent: Friday, September 11, 2015 10:28:14 AM Subject: Re: About Fio backend with ObjectStore API Hi James, That's great that you were able to get fio-objectstore running! Thanks to you and Haomai for all the help with testing. In terms of performance, it's possible that we're not handling the completions optimally. When profiling with MemStore I remember seeing a significant amount of cpu time spent in polling with fio_ceph_os_getevents(). The issue with reads is more of a design issue than a bug. Because the test starts with a mkfs(), there are no objects to read from initially. You would just have to add a write job to run before the read job, to make sure that the objects are initialized. Or perhaps the mkfs() step could be an optional part of the configuration. Casey - Original Message - From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com> Cc: ceph-devel@vger.kernel.org Sent: Thursday, September 10, 2015 8:08:04 PM Subject: RE: About Fio backend with ObjectStore API Hi Casey and Haomai, We finally made the fio-objectstore works in our end . Here is fio data against filestore with Samsung 850 Pro. It is sequential write and the performance is very poor which is expected though. Run status group 0 (all jobs): WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s, mint=55378msec, maxt=55378msec But anyway, it works even though still some bugs to fix like read and filesytem issues. thanks a lot for your great work. Regards, James jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio ./test/objectstore.fio filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process test1 filestore: Laying out IO file(s) (1 file(s) / 512MB) 2015-09-10 16:55:40.614494 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs in /home/jamesliu/fio_ceph 2015-09-10 16:55:40.614924 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a 2015-09-10 16:55:40.630326 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) write_version_stamp 4 2015-09-10 16:55:40.673417 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) backend xfs (magic 0x58465342) 2015-09-10 16:55:40.724097 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) leveldb db exists/created 2015-09-10 16:55:40.724218 7f19d34d1840 -1 journal FileJournal::_open: disabling aio for non-block journal. Use journal_force_aio to force use of aio anyway 2015-09-10 16:55:40.724226 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.724468 7f19d34d1840 -1 journal check: ondisk fsid 7580401a-6863-4863-9873-3adda08c9150 doesn't match expected 5508d58e-dbfc-48a5-9f9c-c639af4fe73a, invalid (someone else's?) journal 2015-09-10 16:55:40.724481 7f19d34d1840 1 journal close /tmp/fio_ceph_filestore1 2015-09-10 16:55:40.724506 7f19d34d1840 1 journal _open /tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, directio = 1, aio = 0 2015-09-10 16:55:40.730417 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) mkjournal created journal on /tmp/fio_ceph_filestore1 2015-09-10 16:55:40.730446 7f19d34d1840 1 filestore(/home/jamesliu/fio_ceph) mkfs done in /home/jamesliu/fio_ceph 2015-09-10 16:55:40.730527 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) backend xfs (magic 0x58465342) 2015-09-10 16:55:40.730773 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2015-09-10 16:55:40.730779 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2015-09-10 16:55:40.730793 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: splice is supported 2015-09-10 16:55:40.751951 7f19d34d1840 0 genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2015-09-10 16:55:40.752102 7f19d34d1840 0 xfsfilestorebackend(/home/jamesliu/fio_ceph) detect_features: extsize is supported and your kernel >= 3.5 2015-09-10 16:55:40.794731 7f19d34d1840 0 filestore(/home/jamesliu/fio_ceph) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2015-09-10 16:
Re: About Fio backend with ObjectStore API
Hi James, I'm sorry for not following up on that segfault, but I wasn't ever able to reproduce it. I used it recently for memstore testing without any problems. I wonder if there's a problem with the autotools build? I've only tested it with cmake. When I find some time, I'll rebase it on master and do another round of testing. Casey - Original Message - > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com> > To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com> > Cc: "Casey Bodley" <cbod...@gmail.com>, "Matt W. Benjamin" > <m...@cohortfs.com>, ceph-devel@vger.kernel.org > Sent: Wednesday, September 2, 2015 8:06:14 PM > Subject: RE: About Fio backend with ObjectStore API > > Hi Haomai and Case, > Do you have any fixes for that segfault? > > Thanks, > James > > -----Original Message- > From: Haomai Wang [mailto:haomaiw...@gmail.com] > Sent: Wednesday, July 22, 2015 6:07 PM > To: Casey Bodley > Cc: Casey Bodley; Matt W. Benjamin; James (Fei) Liu-SSI; > ceph-devel@vger.kernel.org > Subject: Re: About Fio backend with ObjectStore API > > no special > > [global] > #logging > #write_iops_log=write_iops_log > #write_bw_log=write_bw_log > #write_lat_log=write_lat_log > ioengine=./ceph-int/src/.libs/libfio_ceph_objectstore.so > invalidate=0 # mandatory > rw=write > #bs=4k > > [filestore] > iodepth=1 > # create a journaled filestore > objectstore=filestore > directory=./osd/ > filestore_journal=./osd/journal > > On Thu, Jul 23, 2015 at 4:56 AM, Casey Bodley <cbod...@redhat.com> wrote: > > Hi Haomai, > > > > Sorry for the late response, I was out of the office. I'm afraid I haven't > > run into that segfault. The io_ops should be set at the very beginning > > when it calls get_ioengine(). All I can suggest is that you verify that > > your job file is pointing to the correct fio_ceph_objectstore.so. If > > you've made any other interesting changes to the job file, could you share > > it here? > > > > Casey > > > > - Original Message - > > From: "Haomai Wang" <haomaiw...@gmail.com> > > To: "Casey Bodley" <cbod...@gmail.com> > > Cc: "Matt W. Benjamin" <m...@cohortfs.com>, "James (Fei) Liu-SSI" > > <james@ssi.samsung.com>, ceph-devel@vger.kernel.org > > Sent: Tuesday, July 21, 2015 7:50:32 AM > > Subject: Re: About Fio backend with ObjectStore API > > > > Hi Casey, > > > > I check your commits and know what you fixed. I cherry-picked your new > > commits but I still met the same problem. > > > > """ > > It's strange that it alwasys hit segment fault when entering > > "_fio_setup_ceph_filestore_data", gdb tells "td->io_ops" is NULL but > > when I up the stack, the "td->io_ops" is not null. Maybe it's related > > to dlopen? > > """ > > > > Do you have any hint about this? > > > > On Thu, Jul 16, 2015 at 5:23 AM, Casey Bodley <cbod...@gmail.com> wrote: > >> Hi Haomai, > >> > >> I was able to run this after a couple changes to the filestore.fio > >> job file. Two of the config options were using the wrong names. I > >> pushed a fix for the job file, as well as a patch that renames > >> everything from filestore to objectstore (thanks James), to > >> https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore. > >> > >> I found that the read support doesn't appear to work anymore, so give > >> "rw=write" a try. And because it does a mkfs(), make sure you're > >> pointing it to an empty xfs directory with the "directory=" option. > >> > >> Casey > >> > >> On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang <haomaiw...@gmail.com> wrote: > >>> Anyone who have successfully ran the fio with this external io > >>> engine ceph_objectstore? > >>> > >>> It's strange that it alwasys hit segment fault when entering > >>> "_fio_setup_ceph_filestore_data", gdb tells "td->io_ops" is NULL but > >>> when I up the stack, the "td->io_ops" is not null. Maybe it's > >>> related to dlopen? > >>> > >>> On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang <haomaiw...@gmail.com> > >>> wrote: > >>>> I have rebased the branch with master, and push it to ceph upstream > >>>> repo. https://git
multi-site rgw and the period push/pull api
Hi Orit and Yehuda, A couple questions came up today while I was fleshing out the /admin/realm handler: I found a RGWOp_Period_Get() and _Post() in rgw_rest_config.cc that do pretty much what we want for push and pull. Is this /admin/config handler temporary, or should we share these ops between the /admin/config and /admin/realm handlers? I also asked Yehuda for clarification on the push op (POST /admin/realm/period): (04:09:59 PM) cbodley: i'm confused, because the title says "Request children to fetch period", but under Input: it includes a json representation of the period (04:10:29 PM) cbodley: so should the POST include the data, or should the handler send a GET request to fetch it? (04:11:57 PM) yehudasa_: cbodley, great question.. let me take a look (04:14:21 PM) cbodley: i guess it depends on how we do authentication? we don't want to accept a POST from any random endpoint, and overwrite our map. so sending a GET to a known party sounds safer in that respect (04:14:30 PM) yehudasa_: cbodley, I think it should include the period data (04:15:20 PM) yehudasa_: we should only allow authenticated system users to be able to send it to us So my followup question is, should we get rid of the period_id and epoch parameters for POST, since the json-encoded period will contain those already? Thanks, Casey -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Branch for C++11
Hi, I've pushed a wip-cxx11 branch to github that adds the -std=c++11 flag for both cmake and automake builds, and fixes the resulting compilation errors. The switch to c++11 has two main implications: * platform support: as expected, gitbuilder is reporting failures on precise and centos6 due to their older compilers * ABI change: Sam started a discussion on the list yesterday about how to deal with librados clients I would also like to track the state of compiler support for c++14 (and beyond), so that we can adopt them as soon as it's practical. Casey -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About Fio backend with ObjectStore API
Hi Haomai, Sorry for the late response, I was out of the office. I'm afraid I haven't run into that segfault. The io_ops should be set at the very beginning when it calls get_ioengine(). All I can suggest is that you verify that your job file is pointing to the correct fio_ceph_objectstore.so. If you've made any other interesting changes to the job file, could you share it here? Casey - Original Message - From: Haomai Wang haomaiw...@gmail.com To: Casey Bodley cbod...@gmail.com Cc: Matt W. Benjamin m...@cohortfs.com, James (Fei) Liu-SSI james@ssi.samsung.com, ceph-devel@vger.kernel.org Sent: Tuesday, July 21, 2015 7:50:32 AM Subject: Re: About Fio backend with ObjectStore API Hi Casey, I check your commits and know what you fixed. I cherry-picked your new commits but I still met the same problem. It's strange that it alwasys hit segment fault when entering _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but when I up the stack, the td-io_ops is not null. Maybe it's related to dlopen? Do you have any hint about this? On Thu, Jul 16, 2015 at 5:23 AM, Casey Bodley cbod...@gmail.com wrote: Hi Haomai, I was able to run this after a couple changes to the filestore.fio job file. Two of the config options were using the wrong names. I pushed a fix for the job file, as well as a patch that renames everything from filestore to objectstore (thanks James), to https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore. I found that the read support doesn't appear to work anymore, so give rw=write a try. And because it does a mkfs(), make sure you're pointing it to an empty xfs directory with the directory= option. Casey On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang haomaiw...@gmail.com wrote: Anyone who have successfully ran the fio with this external io engine ceph_objectstore? It's strange that it alwasys hit segment fault when entering _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but when I up the stack, the td-io_ops is not null. Maybe it's related to dlopen? On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang haomaiw...@gmail.com wrote: I have rebased the branch with master, and push it to ceph upstream repo. https://github.com/ceph/ceph/compare/fio-objectstore?expand=1 Plz let me know if who is working on this. Otherwise, I would like to improve this to be merge ready. On Fri, Jul 10, 2015 at 4:26 AM, Matt W. Benjamin m...@cohortfs.com wrote: That makes sense. Matt - James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Got it. I was directed to the old code base. By the way, Since the testing case was used to exercise all of object stores. Strongly recommend to change the name from fio_ceph_filestore.cc to fio_ceph_objectstore.cc . And the code in fio_ceph_filestore.cc should be refactored to reflect that the whole objectstore will be supported by fio_ceph_objectstore.cc. what you think? Let me know if you need any help from my side. Regards, James -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Thursday, July 09, 2015 12:32 PM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi James, Are you looking at the code from https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It uses ObjectStore::create() instead of new FileStore(). This allows us to exercise all of the object stores with the same code. Casey On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Here is the code in the fio_ceph_filestore.cc. Basically, it creates a filestore as backend engine for IO exercises. If we got to send IO commands to KeyValue Store or Newstore, we got to change the code accordingly, right? I did not see any other files like fio_ceph_keyvaluestore.cc or fio_ceph_newstore.cc. In my humble opinion, we might need to create other two fio engines for keyvaluestore and newstore if we want to exercise these two, right? Regards, James static int fio_ceph_filestore_init(struct thread_data *td) 209 { 210 vectorconst char* args; 211 struct ceph_filestore_data *ceph_filestore_data = (struct ceph_filestore_data *) td-io_ops-data; 212 ObjectStore::Transaction ft; 213 214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD, CODE_ENVIRONMENT_UTILITY, 0); 215 //g_conf-journal_dio = false; 216 common_init_finish(g_ceph_context); 217 //g_ceph_context-_conf-set_val(debug_filestore, 20); 218 //g_ceph_context-_conf-set_val(debug_throttle, 20); 219 g_ceph_context-_conf-apply_changes(NULL); 220 221 ceph_filestore_data-osd_path = strdup(/mnt/fio_ceph_filestore.XXX); 222 ceph_filestore_data-journal_path = strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX); 223 224 if (!mkdtemp(ceph_filestore_data-osd_path)) { 225 cout
Re: About Fio backend with ObjectStore API
Hi Haomai, I was able to run this after a couple changes to the filestore.fio job file. Two of the config options were using the wrong names. I pushed a fix for the job file, as well as a patch that renames everything from filestore to objectstore (thanks James), to https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore. I found that the read support doesn't appear to work anymore, so give rw=write a try. And because it does a mkfs(), make sure you're pointing it to an empty xfs directory with the directory= option. Casey On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang haomaiw...@gmail.com wrote: Anyone who have successfully ran the fio with this external io engine ceph_objectstore? It's strange that it alwasys hit segment fault when entering _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but when I up the stack, the td-io_ops is not null. Maybe it's related to dlopen? On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang haomaiw...@gmail.com wrote: I have rebased the branch with master, and push it to ceph upstream repo. https://github.com/ceph/ceph/compare/fio-objectstore?expand=1 Plz let me know if who is working on this. Otherwise, I would like to improve this to be merge ready. On Fri, Jul 10, 2015 at 4:26 AM, Matt W. Benjamin m...@cohortfs.com wrote: That makes sense. Matt - James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Got it. I was directed to the old code base. By the way, Since the testing case was used to exercise all of object stores. Strongly recommend to change the name from fio_ceph_filestore.cc to fio_ceph_objectstore.cc . And the code in fio_ceph_filestore.cc should be refactored to reflect that the whole objectstore will be supported by fio_ceph_objectstore.cc. what you think? Let me know if you need any help from my side. Regards, James -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Thursday, July 09, 2015 12:32 PM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi James, Are you looking at the code from https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It uses ObjectStore::create() instead of new FileStore(). This allows us to exercise all of the object stores with the same code. Casey On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Here is the code in the fio_ceph_filestore.cc. Basically, it creates a filestore as backend engine for IO exercises. If we got to send IO commands to KeyValue Store or Newstore, we got to change the code accordingly, right? I did not see any other files like fio_ceph_keyvaluestore.cc or fio_ceph_newstore.cc. In my humble opinion, we might need to create other two fio engines for keyvaluestore and newstore if we want to exercise these two, right? Regards, James static int fio_ceph_filestore_init(struct thread_data *td) 209 { 210 vectorconst char* args; 211 struct ceph_filestore_data *ceph_filestore_data = (struct ceph_filestore_data *) td-io_ops-data; 212 ObjectStore::Transaction ft; 213 214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD, CODE_ENVIRONMENT_UTILITY, 0); 215 //g_conf-journal_dio = false; 216 common_init_finish(g_ceph_context); 217 //g_ceph_context-_conf-set_val(debug_filestore, 20); 218 //g_ceph_context-_conf-set_val(debug_throttle, 20); 219 g_ceph_context-_conf-apply_changes(NULL); 220 221 ceph_filestore_data-osd_path = strdup(/mnt/fio_ceph_filestore.XXX); 222 ceph_filestore_data-journal_path = strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX); 223 224 if (!mkdtemp(ceph_filestore_data-osd_path)) { 225 cout mkdtemp failed: strerror(errno) std::endl; 226 return 1; 227 } 228 //mktemp(ceph_filestore_data-journal_path); // NOSPC issue 229 230 ObjectStore *fs = new FileStore(ceph_filestore_data-osd_path, ceph_filestore_data-journal_path); 231 ceph_filestore_data-fs = fs; 232 233 if (fs-mkfs() 0) { 234 cout mkfs failed std::endl; 235 goto failed; 236 } 237 238 if (fs-mount() 0) { 239 cout mount failed std::endl; 240 goto failed; 241 } 242 243 ft.create_collection(coll_t()); 244 fs-apply_transaction(ft); 245 246 247 return 0; 248 249 failed: 250 return 1; 251 252 } -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Thursday, July 09, 2015 9:19 AM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi James, In the job file src/test/filestore.fio, you can modify the line objectstore=filestore to use any objectstore type supported by the ObjectStore
Re: About Fio backend with ObjectStore API
Hi James, Are you looking at the code from https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It uses ObjectStore::create() instead of new FileStore(). This allows us to exercise all of the object stores with the same code. Casey On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Here is the code in the fio_ceph_filestore.cc. Basically, it creates a filestore as backend engine for IO exercises. If we got to send IO commands to KeyValue Store or Newstore, we got to change the code accordingly, right? I did not see any other files like fio_ceph_keyvaluestore.cc or fio_ceph_newstore.cc. In my humble opinion, we might need to create other two fio engines for keyvaluestore and newstore if we want to exercise these two, right? Regards, James static int fio_ceph_filestore_init(struct thread_data *td) 209 { 210 vectorconst char* args; 211 struct ceph_filestore_data *ceph_filestore_data = (struct ceph_filestore_data *) td-io_ops-data; 212 ObjectStore::Transaction ft; 213 214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD, CODE_ENVIRONMENT_UTILITY, 0); 215 //g_conf-journal_dio = false; 216 common_init_finish(g_ceph_context); 217 //g_ceph_context-_conf-set_val(debug_filestore, 20); 218 //g_ceph_context-_conf-set_val(debug_throttle, 20); 219 g_ceph_context-_conf-apply_changes(NULL); 220 221 ceph_filestore_data-osd_path = strdup(/mnt/fio_ceph_filestore.XXX); 222 ceph_filestore_data-journal_path = strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX); 223 224 if (!mkdtemp(ceph_filestore_data-osd_path)) { 225 cout mkdtemp failed: strerror(errno) std::endl; 226 return 1; 227 } 228 //mktemp(ceph_filestore_data-journal_path); // NOSPC issue 229 230 ObjectStore *fs = new FileStore(ceph_filestore_data-osd_path, ceph_filestore_data-journal_path); 231 ceph_filestore_data-fs = fs; 232 233 if (fs-mkfs() 0) { 234 cout mkfs failed std::endl; 235 goto failed; 236 } 237 238 if (fs-mount() 0) { 239 cout mount failed std::endl; 240 goto failed; 241 } 242 243 ft.create_collection(coll_t()); 244 fs-apply_transaction(ft); 245 246 247 return 0; 248 249 failed: 250 return 1; 251 252 } -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Thursday, July 09, 2015 9:19 AM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi James, In the job file src/test/filestore.fio, you can modify the line objectstore=filestore to use any objectstore type supported by the ObjectStore::create() factory. Casey On Wed, Jul 8, 2015 at 8:02 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Quick questions, The code in the trunk only cover the test for filestore. I was wondering do you have any plan to cover the test for kvstore and newstore? Thanks, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI Sent: Tuesday, June 30, 2015 2:19 PM To: Casey Bodley Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: RE: About Fio backend with ObjectStore API Hi Casey, Thanks a lot. Regards, James -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Tuesday, June 30, 2015 2:16 PM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi, When Danny Al-Gaaf Daniel Gollub published Ceph Performance Analysis: fio and RBD at https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analys is_fio_rbd.html, they also mentioned a fio engine that linked directly into ceph's FileStore. I was able to find Daniel's branch on github at https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some more work on it at the time. I just rebased that work onto the latest ceph master branch, and pushed to our github at https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You can find the source in src/test/fio_ceph_filestore.cc, and run fio with the provided example fio job file in src/test/filestore.fio. I didn't have a chance to confirm that it builds with automake, but the cmake version built for me. I'm happy to help if you run into problems, Casey On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Haomai, What are you trying to ask is to benchmark local objectstore(like kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want to purely compare the performance locally for these objectstores, right? Regards, James -Original Message- From: ceph-devel-ow
Re: About Fio backend with ObjectStore API
Hi James, In the job file src/test/filestore.fio, you can modify the line objectstore=filestore to use any objectstore type supported by the ObjectStore::create() factory. Casey On Wed, Jul 8, 2015 at 8:02 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Casey, Quick questions, The code in the trunk only cover the test for filestore. I was wondering do you have any plan to cover the test for kvstore and newstore? Thanks, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI Sent: Tuesday, June 30, 2015 2:19 PM To: Casey Bodley Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: RE: About Fio backend with ObjectStore API Hi Casey, Thanks a lot. Regards, James -Original Message- From: Casey Bodley [mailto:cbod...@gmail.com] Sent: Tuesday, June 30, 2015 2:16 PM To: James (Fei) Liu-SSI Cc: Haomai Wang; ceph-devel@vger.kernel.org Subject: Re: About Fio backend with ObjectStore API Hi, When Danny Al-Gaaf Daniel Gollub published Ceph Performance Analysis: fio and RBD at https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html, they also mentioned a fio engine that linked directly into ceph's FileStore. I was able to find Daniel's branch on github at https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some more work on it at the time. I just rebased that work onto the latest ceph master branch, and pushed to our github at https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You can find the source in src/test/fio_ceph_filestore.cc, and run fio with the provided example fio job file in src/test/filestore.fio. I didn't have a chance to confirm that it builds with automake, but the cmake version built for me. I'm happy to help if you run into problems, Casey On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Haomai, What are you trying to ask is to benchmark local objectstore(like kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want to purely compare the performance locally for these objectstores, right? Regards, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang Sent: Tuesday, June 30, 2015 9:06 AM To: ceph-devel@vger.kernel.org Subject: About Fio backend with ObjectStore API Hi all, Long long ago, is there someone said about fio backend with Ceph ObjectStore API? So we could use the existing mature fio facility to benchmark ceph objectstore. -- Best Regards, Wheat -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html {.n + +% lzwm b 맲 r yǩ ׯzX ܨ} Ơz j:+vzZ+ +zf h ~i z w ? )ߢ f -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: About Fio backend with ObjectStore API
Hi, When Danny Al-Gaaf Daniel Gollub published Ceph Performance Analysis: fio and RBD at https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html, they also mentioned a fio engine that linked directly into ceph's FileStore. I was able to find Daniel's branch on github at https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some more work on it at the time. I just rebased that work onto the latest ceph master branch, and pushed to our github at https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You can find the source in src/test/fio_ceph_filestore.cc, and run fio with the provided example fio job file in src/test/filestore.fio. I didn't have a chance to confirm that it builds with automake, but the cmake version built for me. I'm happy to help if you run into problems, Casey On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI james@ssi.samsung.com wrote: Hi Haomai, What are you trying to ask is to benchmark local objectstore(like kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want to purely compare the performance locally for these objectstores, right? Regards, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang Sent: Tuesday, June 30, 2015 9:06 AM To: ceph-devel@vger.kernel.org Subject: About Fio backend with ObjectStore API Hi all, Long long ago, is there someone said about fio backend with Ceph ObjectStore API? So we could use the existing mature fio facility to benchmark ceph objectstore. -- Best Regards, Wheat -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
CMake blueprint
Hi Ilya, Regarding the CMake blueprint at http://wiki.ceph.com/Planning/Blueprints/Giant/CMake, we at The Linux Box are excited to see more interest! I know that we've made several improvements to the CMakeLists on our local branches that haven't made it to our github repository. We'll get everything consolidated, and should have another push ready sometime next week. Thanks, Casey -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent xattrs on file objects
To expand on what Matt said, we're also trying to address this issue of lookups by inode number for use with NFS. The design we've been exploring is to create a single system inode, designated the 'inode container' directory, which stores the primary links to all inodes in the filesystem. These links are named by their inode number to satisfy lookups and obviate the need for an anchor table. This design allows the inode container to make use of existing directory fragmentation and load balancing to distribute the inodes over the MDS cluster. When a new file is created, it then adds two links: a primary link into the inode container, and a remote link into the filesystem namespace. In the case where the parent directory fragment's authority is different than the corresponding inode container fragment's, it is created in the parent directory then exported to the inode container via an asynchronous slave request. We welcome additional discussion, both on this design specifically and on the general topic of scalable ino lookups. Casey - Original Message - From: Matt W. Benjamin m...@linuxbox.com To: Sage Weil s...@inktank.com Cc: ceph-devel@vger.kernel.org, aemerson aemer...@linuxbox.com, casey ca...@linuxbox.com, peter honeyman peter.honey...@gmail.com Sent: Tuesday, October 16, 2012 5:35:12 PM Subject: Re: parent xattrs on file objects Hi Sage, We've been exploring (experimentally implementing) a different solution to this problem, basically refactoring dirents and inodes, extending fragmentation logic, and adding new metadata location operations. We also remove the anchor table. We were planning to ask for some feedback once we had some initial results, but since you're floating a related idea, we'd like to share what we have so far. CC'ing people. Regards, Matt - Sage Weil s...@inktank.com wrote: Hey- One of the design goals of the ceph fs was to keep metadata separate from data. This means, among other things, that when a client is creating a bunch of files, it creates the inode via the mds and writes the file data to the OSD, but no mds-osd interaction is necessary. One of the challenges we currently have is that it is difficult to lookup an inode by ino. Normally clients traverse the hierarchy to get there, so things are fine for native ceph clients, but when reexporting via NFS we can get ESTALE because we an ancient nfs file handle can be presented and the ceph MDS won't know where to find it. We have a similar problem with the fsck design in that it is not always possible to discover orphaned children of directory that was somehow lost. One option is to put an ancestor xattr on the first object for each file, similar to what we do for directories. This basically means that each file creation will be followed (eventually) by a setxattr osd operation. This used to scare me, but now it's seeming like a pretty small price to pay for robust NFS reexport and additional information for fsck to utilize. It's also nice because it means we could get rid of the anchor table (used for locating files with multiple hard links) entirely and use the ancestore xattrs instead. That means one less thing to fsck, and avoids having to invest any time in making the anchor table effectively scale (it currently doesn't). Anyone feel like we shouldn't go ahead and do this? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Matt Benjamin The Linux Box 206 South Fifth Ave. Suite 150 Ann Arbor, MI 48104 http://linuxbox.com tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: parent xattrs on file objects
Hi Greg, In this case where an inode is created on mds.a and exported to mds.b, there is a potential race on mds.b between a subsequent lookup-by-ino and the primary link actually making it into the inode container. Our tentative solution was to rely on the way InoTable breaks up the range of inode numbers based on mds nodeid. So when a lookup on the inode container fails, we can determine which mds would have allocated that inode number and attempt to find the inode there. The originating mds.a should always find the inode in its cache while it's pinned for export. Depending on whether the inode is found on mds.a, the lookup-by-ino on mds.b either returns failure or waits for the import to finish. Casey - Original Message - From: Gregory Farnum g...@inktank.com To: Casey Bodley ca...@linuxbox.com Cc: Matt W. Benjamin m...@linuxbox.com, ceph-devel@vger.kernel.org, aemerson aemer...@linuxbox.com, peter honeyman peter.honey...@gmail.com, Sage Weil s...@inktank.com Sent: Wednesday, October 17, 2012 4:18:04 PM Subject: Re: parent xattrs on file objects On Wed, Oct 17, 2012 at 12:40 PM, Casey Bodley ca...@linuxbox.com wrote: To expand on what Matt said, we're also trying to address this issue of lookups by inode number for use with NFS. The design we've been exploring is to create a single system inode, designated the 'inode container' directory, which stores the primary links to all inodes in the filesystem. These links are named by their inode number to satisfy lookups and obviate the need for an anchor table. This design allows the inode container to make use of existing directory fragmentation and load balancing to distribute the inodes over the MDS cluster. When a new file is created, it then adds two links: a primary link into the inode container, and a remote link into the filesystem namespace. In the case where the parent directory fragment's authority is different than the corresponding inode container fragment's, it is created in the parent directory then exported to the inode container via an asynchronous slave request. We welcome additional discussion, both on this design specifically and on the general topic of scalable ino lookups. So if the primary link isn't always in the inode container, you must be preserving the anchor table for this setup. Am I understanding that correctly? Or is there some other mechanism for linking them that's less expensive? -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html