Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Casey Bodley
Hi,

To those writing benchmarks, please note that the standard library
provides std::make_shared() as an optimization for shared_ptr. It
creates the object and its shared storage in a single allocation,
instead of the two allocations required for
"std::shared_ptr fs (new Foo())".

Casey

- Original Message -
> 1- I agree we should avoid shared_ptr whenever possible.
> 
> 2- unique_ptr should not have any more overhead than a raw pointer--the
> compiler is enforcing the single-owner semantics.  See for example
> 
>   https://msdn.microsoft.com/en-us/library/hh279676.aspx
> 
> "It is exactly is efficient as a raw pointer and can be used in STL
> containers."
> 
> Unless the implementation is broken somehow?  That seems unlikely...
> 
> sage
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Casey Bodley

- Original Message -
> Eh, Sage had a point that Transaction has a bunch of little fields
> which would have to be filled in -- its move constructor would be less
> trivial than unique_ptr's.
> -Sam

It's true that the move ctor has to do work. I counted 18 fields, half of
which are integers, and the rest have move ctors themselves. But the cpu
is good at integers. The win here is that you're not hitting the allocator
in the fast path.

Casey

> 
> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemer...@redhat.com> wrote:
> > On 03/12/2015, Casey Bodley wrote:
> > [snip]
> >> The queue_transactions() interface could take a container of Transactions,
> >> rather than pointers to Transactions, and the ObjectStore would move them
> >> out of the container into whatever representation it prefers.
> > [snip]
> >
> > Or a pointer and count (or we could steal array_view from GSL). That way we
> > could pass in any continguous range (std::vector or even a std::array or
> > regular C style array allocated on the stack.)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Casey Bodley
After more discussion with Adam, it seems that making the Transaction
object itself a movable type could alleviate all of these concerns about
heap allocations, ownership, and lifetime.

The queue_transactions() interface could take a container of Transactions,
rather than pointers to Transactions, and the ObjectStore would move them
out of the container into whatever representation it prefers.

Casey

- Original Message -
> Yes, if we can do that, that will be far more easier..I will double check
> that if apply_transaction is been used from any of the performance sensitive
> path and do the changes accordingly..Thanks..
> 
> -Original Message-
> From: Samuel Just [mailto:sj...@redhat.com]
> Sent: Thursday, December 03, 2015 9:51 AM
> To: Somnath Roy
> Cc: Adam C. Emerson; Sage Weil; Samuel Just (sam.j...@inktank.com);
> ceph-devel@vger.kernel.org
> Subject: Re: queue_transaction interface + unique_ptr + performance
> 
> As far as I know, there are no current users which want to use the
> Transaction later.  You could also change apply_transaction to copy the
> Transaction into a unique_ptr since I don't think it's used in any
> performance sensitive code paths.
> -Sam
> 
> On Thu, Dec 3, 2015 at 9:48 AM, Somnath Roy  wrote:
> > Yes, that we can do..But, in that case aren't we restricting user if they
> > want to do something with this Transaction object later. I didn't go
> > through each and every part of the code yet (which is huge) that are using
> > these interfaces to understand if it is using Transaction object
> > afterwards.
> >
> > Thanks & Regards
> > Somnath
> > -Original Message-
> > From: Adam C. Emerson [mailto:aemer...@redhat.com]
> > Sent: Thursday, December 03, 2015 9:25 AM
> > To: Somnath Roy
> > Cc: Sage Weil; Samuel Just (sam.j...@inktank.com);
> > ceph-devel@vger.kernel.org
> > Subject: Re: queue_transaction interface + unique_ptr + performance
> >
> > On 03/12/2015, Somnath Roy wrote:
> >> Yes, I posted the new result after adding -O2 in the compiler flag and it
> >> shows almost no overhead with unique_ptr.
> >> I will add the test of adding to list overhead and start implementing the
> >> new interface.
> >> But, regarding my other point of changing all the objecstore interfaces
> >> (my first mail on this mail chain in case you have missed) taking
> >> Transaction, any thought of that ?
> >> Should we reconsider having two queue_transaction interface ?
> >
> > As I understand it, the concern with switching to unique_ptr was that the
> > callee would move from the reference without this being known to the
> > caller.
> >
> > Would it make sense to pass as an RValue reference (i.e. TransactionRef&&)?
> > That way the compiler should demand that the callers explicitly use
> > std::move on the reference they're holding, documenting at the site of the
> > call that they're willing to give up ownership.
> >
> >
> > --
> > Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
> > IRC: Aemerson@{RedHat, OFTC, Freenode}
> > 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> N�r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Casey Bodley

- Original Message -
> Well, yeah we are, it's just the actual Transaction structure which
> wouldn't be dynamic -- the buffers and many other fields would still
> hit the allocator.
> -Sam

Sure. I was looking specifically at the tradeoffs between allocating
and moving the Transaction object itself.

As it currently stands, the caller of ObjectStore can choose whether to
allocate its Transactions on the heap, embed them in other objects, or
put them on the stack for use with apply_transactions(). Switching to an
interface built around unique_ptr forces all callers to use the heap. I'm
advocating for an interface that doesn't.

Casey

> 
> On Thu, Dec 3, 2015 at 11:29 AM, Casey Bodley <cbod...@redhat.com> wrote:
> >
> > - Original Message -
> >> Eh, Sage had a point that Transaction has a bunch of little fields
> >> which would have to be filled in -- its move constructor would be less
> >> trivial than unique_ptr's.
> >> -Sam
> >
> > It's true that the move ctor has to do work. I counted 18 fields, half of
> > which are integers, and the rest have move ctors themselves. But the cpu
> > is good at integers. The win here is that you're not hitting the allocator
> > in the fast path.
> >
> > Casey
> >
> >>
> >> On Thu, Dec 3, 2015 at 11:12 AM, Adam C. Emerson <aemer...@redhat.com>
> >> wrote:
> >> > On 03/12/2015, Casey Bodley wrote:
> >> > [snip]
> >> >> The queue_transactions() interface could take a container of
> >> >> Transactions,
> >> >> rather than pointers to Transactions, and the ObjectStore would move
> >> >> them
> >> >> out of the container into whatever representation it prefers.
> >> > [snip]
> >> >
> >> > Or a pointer and count (or we could steal array_view from GSL). That way
> >> > we
> >> > could pass in any continguous range (std::vector or even a std::array or
> >> > regular C style array allocated on the stack.)
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ObjectStore interface and std::list

2015-12-03 Thread Casey Bodley

- Original Message -
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> Template functions can be virtual in c++11 and can avoid the VTL. I'm
> using them in https://github.com/ceph/ceph/pull/6781 .Hi Robert,

Hi Robert,

Virtual functions are allowed in a templated class, but templated member
functions are a separate case. My suggestion to use iterators would require
the function itself to be templated on the iterator type. This isn't allowed,
because the virtual function table would have to contain an entry for every
template instantiation of that function.

Casey

> - 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Thu, Dec 3, 2015 at 10:46 AM, Casey Bodley  wrote:
> > Hi,
> >
> > I missed out on the context leading up to the discussion about
> > queue_transactions and the use of raw vs. smart pointers, but I'm curious
> > whether the use of std::list in the ObjectStore interface has come up as
> > well. That's another source of allocations that could be avoided.
> >
> > A naive approach could just replace the list with vector, which could at
> > least reduce the number of allocations to one if you reserve() space for
> > all of your elements up front.
> >
> > An interface based on iterators could also help, by allowing the caller to
> > put an array on the stack and pass that in. Iterators are tricky because
> > the generic form requires templates, and template functions can't be
> > virtual. But if we require the elements to be sequential in memory (i.e.
> > raw array, std::array or std::vector), the
> > interface could take (Transaction** begin, Transaction** end) without
> > requiring templates. This interface can also be made compatible with
> > unique_ptrs.
> >
> > Casey
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -BEGIN PGP SIGNATURE-
> Version: Mailvelope v1.3.0
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWYIp9CRDmVDuy+mK58QAAKVEP/jFe1Vv/1ZJbz30FCM6a
> DB85aIvbnw2qjNHBbFeaJSAd2oJyVxC1bqfVKTxcms0pnQPYL8/GcAzUImrt
> n986VgBJAIeaH0qRSGYcITS4rlSikUcSUcPAdjL6Fv9yPleGLCqmCEIWUj/q
> jKTndoEb4E20I6XHJpG2dAXdxk/knMJJGwHtS7KdsgR2nDos5evjncGrE8I7
> nvSO/4zshuQPoglOC86SP17tviQUew3e/a9cZ6jPy6Adz0u3XeV+eKyhf2SB
> 3hm2sfRVOPT6rdfRvjBBLR7QFz7kJxedyWB2y5c4j2x7GtdWgiZn2YePuNti
> yRPad+Jq04DpXX0IUFY9+FXQeOIdEQq40bhfvVOgus04xHXgcjUdyR0AYh0M
> YfXoqvBg7L5C68duPP2wjn2w58CLCDqCp+4JQVVRZ9au8YQK4vt53BefuFnH
> QX2XePnXNnlkqjgXmq9TqQafETodZYi0VQ110ADGQA/TsPGEqzUn2mudUjM7
> ixbJdbcwUd4JHwb55zK5xxBaat9EBj+5pI9Xfjz4Fg8+ZcgKRkWG7EsFVlTn
> bcZb8TV9/3FSrGFbV4UqTf5IT1/wVuWeG/AKOs540yNFr9My/DtKlNxx+/8O
> VhSliJ2qbymw0/JpsTacGMqWjYH5bADfYiCmBcyPm8EHYXOnfxNmpC9FYkpO
> 4LUj
> =9kKf
> -END PGP SIGNATURE-
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: queue_transaction interface + unique_ptr + performance

2015-12-03 Thread Casey Bodley

- Original Message -
> On Thu, 3 Dec 2015, Casey Bodley wrote:
> > > Well, yeah we are, it's just the actual Transaction structure which
> > > wouldn't be dynamic -- the buffers and many other fields would still
> > > hit the allocator.
> > > -Sam
> > 
> > Sure. I was looking specifically at the tradeoffs between allocating
> > and moving the Transaction object itself.
> > 
> > As it currently stands, the caller of ObjectStore can choose whether to
> > allocate its Transactions on the heap, embed them in other objects, or
> > put them on the stack for use with apply_transactions(). Switching to an
> > interface built around unique_ptr forces all callers to use the heap. I'm
> > advocating for an interface that doesn't.
> 
> That leaves us with either std::move or.. the raw Transaction* we have
> now.  Right?

Right. The thing I really like about the unique_ptr approach is that the
caller no longer has to care about the Transaction's lifetime, so doesn't
have to allocate the extra ObjectStore::C_DeleteTransaction for cleanup.
Movable Transactions accomplish this as well.

Casey

> 
> > > > It's true that the move ctor has to do work. I counted 18 fields, half
> > > > of
> > > > which are integers, and the rest have move ctors themselves. But the
> > > > cpu
> > > > is good at integers. The win here is that you're not hitting the
> > > > allocator
> > > > in the fast path.
> 
> To be fair, many of these are also legacy that we can remove... possibly
> even now.  IIRC the only exposure to legacy encoded transactions (that use
> the tbl hackery) are journal items from an upgrade pre-hammer OSD that
> aren't flushed on upgrade.  We should have made the osd flush the journal
> before recording the 0_94_4 ondisk feature.  We could add another one to
> enforce that and rip all that code out now instead of waiting until
> after jewel... that would be satisfying (and I think an ondisk ceph-osd
> feature is enough here, then document that users should upgrade to
> hammer 0.94.6 or infernalis 9.2.1 before moving to jewel).
> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ObjectStore interface and std::list

2015-12-03 Thread Casey Bodley
Hi,

I missed out on the context leading up to the discussion about
queue_transactions and the use of raw vs. smart pointers, but I'm curious
whether the use of std::list in the ObjectStore interface has come up as
well. That's another source of allocations that could be avoided.

A naive approach could just replace the list with vector, which could at
least reduce the number of allocations to one if you reserve() space for
all of your elements up front.

An interface based on iterators could also help, by allowing the caller to
put an array on the stack and pass that in. Iterators are tricky because
the generic form requires templates, and template functions can't be
virtual. But if we require the elements to be sequential in memory (i.e.
raw array, std::array or std::vector), the
interface could take (Transaction** begin, Transaction** end) without
requiring templates. This interface can also be made compatible with
unique_ptrs.

Casey
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


updates to the fio objectstore engine

2015-11-30 Thread Casey Bodley
Hi folks,

I made another pass at the fio objectstore engine over the long weekend. Some 
of the major changes:

* added support for multiple jobs at a time by splitting global/ObjectStore 
initialization into a new Engine class

* when run with nr_files > 1, the objects are spread over multiple collections 
to model the concurrency we get from placement groups

* replaced other options with a 'conf' option to read in a ceph.conf

* moved into src/test/fio and added a README and example job/config files

You can find the pull request at https://github.com/ceph/ceph/pull/5943. 
Testing and feedback is appreciated!

Casey
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backend ObjectStore engine performance bench with FIO

2015-09-30 Thread Casey Bodley
Hi Xiaoxi,

I pushed a new branch wip-fio-objectstore to ceph's github. I look forward to 
seeing James' work!

Thanks,
Casey

- Original Message -
> Hi Casey,
>   Would it better if we create an integration brunch on
>   ceph/ceph/wip-fio-objstore to allow more people try and improve it?
>   Seems James has some patches.
> 
> -Xiaoxi
> 
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Casey Bodley
> > Sent: Wednesday, September 30, 2015 4:06 AM
> > To: James (Fei) Liu-SSI
> > Cc: ceph-devel@vger.kernel.org
> > Subject: Re: Backend ObjectStore engine performance bench with FIO
> > 
> > Hi James/Haomai/Xiaoxi,
> > 
> > I spent some more time on the fio-objectstore branch, and pushed an
> > update.
> > 
> > In testing, I realized that it was using the io_unit's start time to name
> > the
> > objects, which meant that every write operation was creating a separate
> > object.
> > In addition to fixing this to use fio's filenames for object names, I also
> > added
> > support for the open_file() and close_file() functions. It now creates
> > objects
> > of the proper size on startup, so read-only jobs will work normally.
> > It also removes its objects on exit.
> > 
> > On startup, it no longer calls create_collection() if it already exists, so
> > I was
> > able to re-run fio jobs over and over again without having to clear the
> > data
> > directory (tested with FileStore and KeyValueStore).
> > 
> > Casey
> > 
> > ----- Original Message -
> > > Great work James!
> > >
> > > - Original Message -
> > > > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> > > > To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley"
> > > > <cbod...@redhat.com>
> > > > Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org
> > > > Sent: Friday, September 25, 2015 1:55:29 PM
> > > > Subject: Backend ObjectStore engine performance bench with FIO
> > > >
> > > > Hi Xiaoxi,
> > > >
> > > >With changing the IO mode from aio to sync, we make fio against
> > newstore
> > > >works. Even  with sync engine(I am still debugging the aio engine in
> > > >newstore with Xiaoxi) in newstore, Newstore still performing the
> > > >best
> > > >among all of backstore engine with our initial setup(Thoroughly test
> > > >will
> > > >be run soon). Attachment is the initial data we collected for your
> > > >reference. Thanks for great help from Xiaoxi from regarding to
> > Newstore
> > > >development to support FIO.
> > > >
> > > > Hi Casey,
> > > >   Let me know if you need any help to put fio-ceph-objectstore into
> > > >   upstream.
> > > >   After then , I can commit all of mine into upstream.
> > >
> > > My pull request at https://github.com/ceph/ceph/pull/5943 is still
> > > pending.
> > > If you have patches that you'd like included, I would be happy to pull
> > > them in; just point me to a branch.
> > >
> > > >
> > > >   Thanks,
> > > >   James
> > > >
> > >
> > > Thanks,
> > > Casey
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > in the body of a message to majord...@vger.kernel.org More
> > majordomo
> > > info at  http://vger.kernel.org/majordomo-info.html
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the
> > body of a message to majord...@vger.kernel.org More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
> N�r��y���b�X��ǧv�^�)޺{.n�+���z�]z���{ay�ʇڙ�,j��f���h���z��w��j:+v���w�j�mzZ+��ݢj"��
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backend ObjectStore engine performance bench with FIO

2015-09-29 Thread Casey Bodley
Hi James/Haomai/Xiaoxi,

I spent some more time on the fio-objectstore branch, and pushed an update.

In testing, I realized that it was using the io_unit's start time to name the
objects, which meant that every write operation was creating a separate object.
In addition to fixing this to use fio's filenames for object names, I also
added support for the open_file() and close_file() functions. It now creates
objects of the proper size on startup, so read-only jobs will work normally.
It also removes its objects on exit.

On startup, it no longer calls create_collection() if it already exists, so I
was able to re-run fio jobs over and over again without having to clear the
data directory (tested with FileStore and KeyValueStore).

Casey

- Original Message -
> Great work James!
> 
> - Original Message -
> > From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> > To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley"
> > <cbod...@redhat.com>
> > Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org
> > Sent: Friday, September 25, 2015 1:55:29 PM
> > Subject: Backend ObjectStore engine performance bench with FIO
> > 
> > Hi Xiaoxi,
> > 
> >With changing the IO mode from aio to sync, we make fio against newstore
> >works. Even  with sync engine(I am still debugging the aio engine in
> >newstore with Xiaoxi) in newstore, Newstore still performing the best
> >among all of backstore engine with our initial setup(Thoroughly test
> >will
> >be run soon). Attachment is the initial data we collected for your
> >reference. Thanks for great help from Xiaoxi from regarding to Newstore
> >development to support FIO.
> > 
> > Hi Casey,
> >   Let me know if you need any help to put fio-ceph-objectstore into
> >   upstream.
> >   After then , I can commit all of mine into upstream.
> 
> My pull request at https://github.com/ceph/ceph/pull/5943 is still pending.
> If you have patches that you'd like included, I would be happy to pull them
> in; just point me to a branch.
> 
> > 
> >   Thanks,
> >   James
> > 
> 
> Thanks,
> Casey
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Backend ObjectStore engine performance bench with FIO

2015-09-25 Thread Casey Bodley
Great work James!

- Original Message -
> From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> To: "Xiaoxi Chen" <xiaoxi.c...@intel.com>, "Casey Bodley" <cbod...@redhat.com>
> Cc: "Sage Weil" <s...@newdream.net>, ceph-devel@vger.kernel.org
> Sent: Friday, September 25, 2015 1:55:29 PM
> Subject: Backend ObjectStore engine performance bench with FIO
> 
> Hi Xiaoxi,
> 
>With changing the IO mode from aio to sync, we make fio against newstore
>works. Even  with sync engine(I am still debugging the aio engine in
>newstore with Xiaoxi) in newstore, Newstore still performing the best
>among all of backstore engine with our initial setup(Thoroughly test will
>be run soon). Attachment is the initial data we collected for your
>reference. Thanks for great help from Xiaoxi from regarding to Newstore
>development to support FIO.
> 
> Hi Casey,
>   Let me know if you need any help to put fio-ceph-objectstore into upstream.
>   After then , I can commit all of mine into upstream.

My pull request at https://github.com/ceph/ceph/pull/5943 is still pending. If 
you have patches that you'd like included, I would be happy to pull them in; 
just point me to a branch.

> 
>   Thanks,
>   James
> 

Thanks,
Casey
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About Fio backend with ObjectStore API

2015-09-11 Thread Casey Bodley
Hi James,

I just looked back at the results you posted, and saw that you were using 
iodepth=1. Setting this higher should help keep the FileStore busy.

Casey

- Original Message -
> From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> To: "Casey Bodley" <cbod...@redhat.com>
> Cc: "Haomai Wang" <haomaiw...@gmail.com>, ceph-devel@vger.kernel.org
> Sent: Friday, September 11, 2015 1:18:31 PM
> Subject: RE: About Fio backend with ObjectStore API
> 
> Hi Casey,
>   You are right. I think the bottleneck is in fio side rather than in
>   filestore side in this case. The fio did not issue the io commands faster
>   enough to saturate the filestore.
>   Here is one of possible solution for it: Create a  async engine which are
>   normally way faster than sync engine in fio.
>
>Here is possible framework. This new Objectstore-AIO engine in FIO in
>theory will be way faster than sync engine. Once we have FIO which can
>saturate newstore, memstore and filestore, we can investigate them in
>very details of where the bottleneck in their design.
> 
> .
> struct objectstore_aio_data {
>   struct aio_ctx *q_aio_ctx;
>   struct aio_completion_data *a_data;
>   aio_ses_ctx_t *p_ses_ctx;
>   unsigned int entries;
> };
> ...
> /*
>  * Note that the structure is exported, so that fio can get it via
>  * dlsym(..., "ioengine");
>  */
> struct ioengine_ops us_aio_ioengine = {
>   .name   = "objectstore-aio",
>   .version= FIO_IOOPS_VERSION,
>   .init   = fio_objectstore_aio_init,
>   .prep   = fio_objectstore_aio_prep,
>   .queue  = fio_objectstore_aio_queue,
>   .cancel = fio_objectstore_aio_cancel,
>   .getevents  = fio_objectstore_aio_getevents,
>   .event  = fio_objectstore_aio_event,
>   .cleanup= fio_objectstore_aio_cleanup,
>   .open_file  = fio_objectstore_aio_open,
>   .close_file = fio_objectstore_aio_close,
> };
> 
> 
> Let me know what you think.
> 
> Regards,
> James
> 
> -Original Message-
> From: Casey Bodley [mailto:cbod...@redhat.com]
> Sent: Friday, September 11, 2015 7:28 AM
> To: James (Fei) Liu-SSI
> Cc: Haomai Wang; ceph-devel@vger.kernel.org
> Subject: Re: About Fio backend with ObjectStore API
> 
> Hi James,
> 
> That's great that you were able to get fio-objectstore running! Thanks to you
> and Haomai for all the help with testing.
> 
> In terms of performance, it's possible that we're not handling the
> completions optimally. When profiling with MemStore I remember seeing a
> significant amount of cpu time spent in polling with
> fio_ceph_os_getevents().
> 
> The issue with reads is more of a design issue than a bug. Because the test
> starts with a mkfs(), there are no objects to read from initially. You would
> just have to add a write job to run before the read job, to make sure that
> the objects are initialized. Or perhaps the mkfs() step could be an optional
> part of the configuration.
> 
> Casey
> 
> - Original Message -
> From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com>
> Cc: ceph-devel@vger.kernel.org
> Sent: Thursday, September 10, 2015 8:08:04 PM
> Subject: RE: About Fio backend with ObjectStore API
> 
> Hi Casey and Haomai,
> 
>   We finally made the fio-objectstore works in our end . Here is fio data
>   against filestore with Samsung 850 Pro. It is sequential write and the
>   performance is very poor which is expected though.
> 
> Run status group 0 (all jobs):
>   WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s,
>   mint=55378msec, maxt=55378msec
> 
>   But anyway, it works even though still some bugs to fix like read and
>   filesytem issues. thanks a lot for your great work.
> 
>   Regards,
>   James
> 
>   jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio
>   ./test/objectstore.fio
> filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K,
> ioengine=cephobjectstore, iodepth=1 fio-2.2.9-56-g736a Starting 1 process
> test1
> filestore: Laying out IO file(s) (1 file(s) / 512MB)
> 2015-09-10 16:55:40.614494 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> mkfs in /home/jamesliu/fio_ceph
> 2015-09-10 16:55:40.614924 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph)
> mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a
> 2015-09-10 16:

Re: About Fio backend with ObjectStore API

2015-09-11 Thread Casey Bodley
Hi James,

That's great that you were able to get fio-objectstore running! Thanks to you 
and Haomai for all the help with testing.

In terms of performance, it's possible that we're not handling the completions 
optimally. When profiling with MemStore I remember seeing a significant amount 
of cpu time spent in polling with fio_ceph_os_getevents().

The issue with reads is more of a design issue than a bug. Because the test 
starts with a mkfs(), there are no objects to read from initially. You would 
just have to add a write job to run before the read job, to make sure that the 
objects are initialized. Or perhaps the mkfs() step could be an optional part 
of the configuration.

Casey

- Original Message -
From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com>
Cc: ceph-devel@vger.kernel.org
Sent: Thursday, September 10, 2015 8:08:04 PM
Subject: RE: About Fio backend with ObjectStore API

Hi Casey and Haomai,

  We finally made the fio-objectstore works in our end . Here is fio data 
against filestore with Samsung 850 Pro. It is sequential write and the 
performance is very poor which is expected though. 

Run status group 0 (all jobs):
  WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s, 
mint=55378msec, maxt=55378msec

  But anyway, it works even though still some bugs to fix like read and 
filesytem issues. thanks a lot for your great work.

  Regards,
  James

  jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio 
./test/objectstore.fio 
filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, 
ioengine=cephobjectstore, iodepth=1
fio-2.2.9-56-g736a
Starting 1 process
test1
filestore: Laying out IO file(s) (1 file(s) / 512MB)
2015-09-10 16:55:40.614494 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs in /home/jamesliu/fio_ceph
2015-09-10 16:55:40.614924 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a
2015-09-10 16:55:40.630326 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
write_version_stamp 4
2015-09-10 16:55:40.673417 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
backend xfs (magic 0x58465342)
2015-09-10 16:55:40.724097 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
leveldb db exists/created
2015-09-10 16:55:40.724218 7f19d34d1840 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-09-10 16:55:40.724226 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.724468 7f19d34d1840 -1 journal check: ondisk fsid 
7580401a-6863-4863-9873-3adda08c9150 doesn't match expected 
5508d58e-dbfc-48a5-9f9c-c639af4fe73a, invalid (someone else's?) journal
2015-09-10 16:55:40.724481 7f19d34d1840  1 journal close 
/tmp/fio_ceph_filestore1
2015-09-10 16:55:40.724506 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.730417 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
mkjournal created journal on /tmp/fio_ceph_filestore1
2015-09-10 16:55:40.730446 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs done in /home/jamesliu/fio_ceph
2015-09-10 16:55:40.730527 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
backend xfs (magic 0x58465342)
2015-09-10 16:55:40.730773 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2015-09-10 16:55:40.730779 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-09-10 16:55:40.730793 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: splice is 
supported
2015-09-10 16:55:40.751951 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2015-09-10 16:55:40.752102 7f19d34d1840  0 
xfsfilestorebackend(/home/jamesliu/fio_ceph) detect_features: extsize is 
supported and your kernel >= 3.5
2015-09-10 16:55:40.794731 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-09-10 16:55:40.794906 7f19d34d1840 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-09-10 16:55:40.794917 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 11: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.795219 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 11: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.795533 7f19d34d1840  1 filestore

Re: About Fio backend with ObjectStore API

2015-09-11 Thread Casey Bodley
I forgot to mention for the list, you can find the latest version of the 
fio-objectstore branch at 
https://github.com/cbodley/ceph/commits/fio-objectstore.

Casey

- Original Message -
From: "Casey Bodley" <cbod...@redhat.com>
To: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
Cc: "Haomai Wang" <haomaiw...@gmail.com>, ceph-devel@vger.kernel.org
Sent: Friday, September 11, 2015 10:28:14 AM
Subject: Re: About Fio backend with ObjectStore API

Hi James,

That's great that you were able to get fio-objectstore running! Thanks to you 
and Haomai for all the help with testing.

In terms of performance, it's possible that we're not handling the completions 
optimally. When profiling with MemStore I remember seeing a significant amount 
of cpu time spent in polling with fio_ceph_os_getevents().

The issue with reads is more of a design issue than a bug. Because the test 
starts with a mkfs(), there are no objects to read from initially. You would 
just have to add a write job to run before the read job, to make sure that the 
objects are initialized. Or perhaps the mkfs() step could be an optional part 
of the configuration.

Casey

- Original Message -
From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com>
Cc: ceph-devel@vger.kernel.org
Sent: Thursday, September 10, 2015 8:08:04 PM
Subject: RE: About Fio backend with ObjectStore API

Hi Casey and Haomai,

  We finally made the fio-objectstore works in our end . Here is fio data 
against filestore with Samsung 850 Pro. It is sequential write and the 
performance is very poor which is expected though. 

Run status group 0 (all jobs):
  WRITE: io=524288KB, aggrb=9467KB/s, minb=9467KB/s, maxb=9467KB/s, 
mint=55378msec, maxt=55378msec

  But anyway, it works even though still some bugs to fix like read and 
filesytem issues. thanks a lot for your great work.

  Regards,
  James

  jamesliu@jamesliu-OptiPlex-7010:~/WorkSpace/ceph_casey/src$ sudo ./fio/fio 
./test/objectstore.fio 
filestore: (g=0): rw=write, bs=128K-128K/128K-128K/128K-128K, 
ioengine=cephobjectstore, iodepth=1
fio-2.2.9-56-g736a
Starting 1 process
test1
filestore: Laying out IO file(s) (1 file(s) / 512MB)
2015-09-10 16:55:40.614494 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs in /home/jamesliu/fio_ceph
2015-09-10 16:55:40.614924 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs generated fsid 5508d58e-dbfc-48a5-9f9c-c639af4fe73a
2015-09-10 16:55:40.630326 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
write_version_stamp 4
2015-09-10 16:55:40.673417 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
backend xfs (magic 0x58465342)
2015-09-10 16:55:40.724097 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
leveldb db exists/created
2015-09-10 16:55:40.724218 7f19d34d1840 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-09-10 16:55:40.724226 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.724468 7f19d34d1840 -1 journal check: ondisk fsid 
7580401a-6863-4863-9873-3adda08c9150 doesn't match expected 
5508d58e-dbfc-48a5-9f9c-c639af4fe73a, invalid (someone else's?) journal
2015-09-10 16:55:40.724481 7f19d34d1840  1 journal close 
/tmp/fio_ceph_filestore1
2015-09-10 16:55:40.724506 7f19d34d1840  1 journal _open 
/tmp/fio_ceph_filestore1 fd 5: 5368709120 bytes, block size 4096 bytes, 
directio = 1, aio = 0
2015-09-10 16:55:40.730417 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
mkjournal created journal on /tmp/fio_ceph_filestore1
2015-09-10 16:55:40.730446 7f19d34d1840  1 filestore(/home/jamesliu/fio_ceph) 
mkfs done in /home/jamesliu/fio_ceph
2015-09-10 16:55:40.730527 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
backend xfs (magic 0x58465342)
2015-09-10 16:55:40.730773 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: FIEMAP ioctl 
is disabled via 'filestore fiemap' config option
2015-09-10 16:55:40.730779 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: 
SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option
2015-09-10 16:55:40.730793 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: splice is 
supported
2015-09-10 16:55:40.751951 7f19d34d1840  0 
genericfilestorebackend(/home/jamesliu/fio_ceph) detect_features: syncfs(2) 
syscall fully supported (by glibc and kernel)
2015-09-10 16:55:40.752102 7f19d34d1840  0 
xfsfilestorebackend(/home/jamesliu/fio_ceph) detect_features: extsize is 
supported and your kernel >= 3.5
2015-09-10 16:55:40.794731 7f19d34d1840  0 filestore(/home/jamesliu/fio_ceph) 
mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-09-10 16:

Re: About Fio backend with ObjectStore API

2015-09-03 Thread Casey Bodley
Hi James,

I'm sorry for not following up on that segfault, but I wasn't ever able to 
reproduce it. I used it recently for memstore testing without any problems. I 
wonder if there's a problem with the autotools build? I've only tested it with 
cmake. When I find some time, I'll rebase it on master and do another round of 
testing.

Casey

- Original Message -
> From: "James (Fei) Liu-SSI" <james@ssi.samsung.com>
> To: "Haomai Wang" <haomaiw...@gmail.com>, "Casey Bodley" <cbod...@redhat.com>
> Cc: "Casey Bodley" <cbod...@gmail.com>, "Matt W. Benjamin" 
> <m...@cohortfs.com>, ceph-devel@vger.kernel.org
> Sent: Wednesday, September 2, 2015 8:06:14 PM
> Subject: RE: About Fio backend with ObjectStore API
> 
> Hi Haomai and Case,
>  Do you have any fixes for that segfault?
> 
> Thanks,
> James
> 
> -----Original Message-
> From: Haomai Wang [mailto:haomaiw...@gmail.com]
> Sent: Wednesday, July 22, 2015 6:07 PM
> To: Casey Bodley
> Cc: Casey Bodley; Matt W. Benjamin; James (Fei) Liu-SSI;
> ceph-devel@vger.kernel.org
> Subject: Re: About Fio backend with ObjectStore API
> 
> no special
> 
> [global]
> #logging
> #write_iops_log=write_iops_log
> #write_bw_log=write_bw_log
> #write_lat_log=write_lat_log
> ioengine=./ceph-int/src/.libs/libfio_ceph_objectstore.so
> invalidate=0 # mandatory
> rw=write
> #bs=4k
> 
> [filestore]
> iodepth=1
> # create a journaled filestore
> objectstore=filestore
> directory=./osd/
> filestore_journal=./osd/journal
> 
> On Thu, Jul 23, 2015 at 4:56 AM, Casey Bodley <cbod...@redhat.com> wrote:
> > Hi Haomai,
> >
> > Sorry for the late response, I was out of the office. I'm afraid I haven't
> > run into that segfault. The io_ops should be set at the very beginning
> > when it calls get_ioengine(). All I can suggest is that you verify that
> > your job file is pointing to the correct fio_ceph_objectstore.so. If
> > you've made any other interesting changes to the job file, could you share
> > it here?
> >
> > Casey
> >
> > - Original Message -
> > From: "Haomai Wang" <haomaiw...@gmail.com>
> > To: "Casey Bodley" <cbod...@gmail.com>
> > Cc: "Matt W. Benjamin" <m...@cohortfs.com>, "James (Fei) Liu-SSI"
> > <james@ssi.samsung.com>, ceph-devel@vger.kernel.org
> > Sent: Tuesday, July 21, 2015 7:50:32 AM
> > Subject: Re: About Fio backend with ObjectStore API
> >
> > Hi Casey,
> >
> > I check your commits and know what you fixed. I cherry-picked your new
> > commits but I still met the same problem.
> >
> > """
> > It's strange that it alwasys hit segment fault when entering
> > "_fio_setup_ceph_filestore_data", gdb tells "td->io_ops" is NULL but
> > when I up the stack, the "td->io_ops" is not null. Maybe it's related
> > to dlopen?
> > """
> >
> > Do you have any hint about this?
> >
> > On Thu, Jul 16, 2015 at 5:23 AM, Casey Bodley <cbod...@gmail.com> wrote:
> >> Hi Haomai,
> >>
> >> I was able to run this after a couple changes to the filestore.fio
> >> job file. Two of the config options were using the wrong names. I
> >> pushed a fix for the job file, as well as a patch that renames
> >> everything from filestore to objectstore (thanks James), to
> >> https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore.
> >>
> >> I found that the read support doesn't appear to work anymore, so give
> >> "rw=write" a try. And because it does a mkfs(), make sure you're
> >> pointing it to an empty xfs directory with the "directory=" option.
> >>
> >> Casey
> >>
> >> On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang <haomaiw...@gmail.com> wrote:
> >>> Anyone who have successfully ran the fio with this external io
> >>> engine ceph_objectstore?
> >>>
> >>> It's strange that it alwasys hit segment fault when entering
> >>> "_fio_setup_ceph_filestore_data", gdb tells "td->io_ops" is NULL but
> >>> when I up the stack, the "td->io_ops" is not null. Maybe it's
> >>> related to dlopen?
> >>>
> >>> On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang <haomaiw...@gmail.com>
> >>> wrote:
> >>>> I have rebased the branch with master, and push it to ceph upstream
> >>>> repo. https://git

multi-site rgw and the period push/pull api

2015-09-03 Thread Casey Bodley
Hi Orit and Yehuda,

A couple questions came up today while I was fleshing out the /admin/realm 
handler:

I found a RGWOp_Period_Get() and _Post() in rgw_rest_config.cc that do pretty 
much what we want for push and pull.  Is this /admin/config handler temporary, 
or should we share these ops between the /admin/config and /admin/realm 
handlers?

I also asked Yehuda for clarification on the push op (POST /admin/realm/period):

(04:09:59 PM) cbodley: i'm confused, because the title says "Request children 
to fetch period", but under Input: it includes a json representation of the 
period
(04:10:29 PM) cbodley: so should the POST include the data, or should the 
handler send a GET request to fetch it?
(04:11:57 PM) yehudasa_: cbodley, great question.. let me take a look
(04:14:21 PM) cbodley: i guess it depends on how we do authentication? we don't 
want to accept a POST from any random endpoint, and overwrite our map. so 
sending a GET to a known party sounds safer in that respect
(04:14:30 PM) yehudasa_: cbodley, I think it should include the period data
(04:15:20 PM) yehudasa_: we should only allow authenticated system users to be 
able to send it to us

So my followup question is, should we get rid of the period_id and epoch 
parameters for POST, since the json-encoded period will contain those already?

Thanks,
Casey
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Branch for C++11

2015-08-04 Thread Casey Bodley
Hi,

I've pushed a wip-cxx11 branch to github that adds the -std=c++11 flag for both 
cmake and automake builds, and fixes the resulting compilation errors.

The switch to c++11 has two main implications:
* platform support: as expected, gitbuilder is reporting failures on precise 
and centos6 due to their older compilers
* ABI change: Sam started a discussion on the list yesterday about how to deal 
with librados clients

I would also like to track the state of compiler support for c++14 (and 
beyond), so that we can adopt them as soon as it's practical.

Casey
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About Fio backend with ObjectStore API

2015-07-22 Thread Casey Bodley
Hi Haomai,

Sorry for the late response, I was out of the office. I'm afraid I haven't run 
into that segfault. The io_ops should be set at the very beginning when it 
calls get_ioengine(). All I can suggest is that you verify that your job file 
is pointing to the correct fio_ceph_objectstore.so. If you've made any other 
interesting changes to the job file, could you share it here?

Casey

- Original Message -
From: Haomai Wang haomaiw...@gmail.com
To: Casey Bodley cbod...@gmail.com
Cc: Matt W. Benjamin m...@cohortfs.com, James (Fei) Liu-SSI 
james@ssi.samsung.com, ceph-devel@vger.kernel.org
Sent: Tuesday, July 21, 2015 7:50:32 AM
Subject: Re: About Fio backend with ObjectStore API

Hi Casey,

I check your commits and know what you fixed. I cherry-picked your new
commits but I still met the same problem.


It's strange that it alwasys hit segment fault when entering
_fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but
when I up the stack, the td-io_ops is not null. Maybe it's related
to dlopen?


Do you have any hint about this?

On Thu, Jul 16, 2015 at 5:23 AM, Casey Bodley cbod...@gmail.com wrote:
 Hi Haomai,

 I was able to run this after a couple changes to the filestore.fio job
 file. Two of the config options were using the wrong names. I pushed a
 fix for the job file, as well as a patch that renames everything from
 filestore to objectstore (thanks James), to
 https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore.

 I found that the read support doesn't appear to work anymore, so give
 rw=write a try. And because it does a mkfs(), make sure you're
 pointing it to an empty xfs directory with the directory= option.

 Casey

 On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang haomaiw...@gmail.com wrote:
 Anyone who have successfully ran the fio with this external io engine
 ceph_objectstore?

 It's strange that it alwasys hit segment fault when entering
 _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but
 when I up the stack, the td-io_ops is not null. Maybe it's related
 to dlopen?

 On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang haomaiw...@gmail.com wrote:
 I have rebased the branch with master, and push it to ceph upstream
 repo. https://github.com/ceph/ceph/compare/fio-objectstore?expand=1

 Plz let me know if who is working on this. Otherwise, I would like to
 improve this to be merge ready.

 On Fri, Jul 10, 2015 at 4:26 AM, Matt W. Benjamin m...@cohortfs.com wrote:
 That makes sense.

 Matt

 - James (Fei) Liu-SSI james@ssi.samsung.com wrote:

 Hi Casey,
   Got it. I was directed to the old code base. By the way, Since the
 testing case was used to exercise all of object stores.  Strongly
 recommend to change the name from fio_ceph_filestore.cc to
 fio_ceph_objectstore.cc . And the code in fio_ceph_filestore.cc should
 be refactored to reflect that the whole objectstore will be supported
 by fio_ceph_objectstore.cc. what you think?

 Let me know if you need any help from my side.


 Regards,
 James



 -Original Message-
 From: Casey Bodley [mailto:cbod...@gmail.com]
 Sent: Thursday, July 09, 2015 12:32 PM
 To: James (Fei) Liu-SSI
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: Re: About Fio backend with ObjectStore API

 Hi James,

 Are you looking at the code from
 https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It
 uses ObjectStore::create() instead of new FileStore(). This allows us
 to exercise all of the object stores with the same code.

 Casey

 On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI
 james@ssi.samsung.com wrote:
  Hi Casey,
Here is the code in the fio_ceph_filestore.cc. Basically, it
 creates a filestore as backend engine for IO exercises. If we got to
 send IO commands to KeyValue Store or Newstore, we got to change the
 code accordingly, right?  I did not see any other files like
 fio_ceph_keyvaluestore.cc or fio_ceph_newstore.cc. In my humble
 opinion, we might need to create other two fio engines for
 keyvaluestore and newstore if we want to exercise these two, right?
 
  Regards,
  James
 
  static int fio_ceph_filestore_init(struct thread_data *td)
  209 {
  210 vectorconst char* args;
  211 struct ceph_filestore_data *ceph_filestore_data = (struct
 ceph_filestore_data *) td-io_ops-data;
  212 ObjectStore::Transaction ft;
  213
 
  214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD,
 CODE_ENVIRONMENT_UTILITY, 0);
  215 //g_conf-journal_dio = false;
  216 common_init_finish(g_ceph_context);
  217 //g_ceph_context-_conf-set_val(debug_filestore, 20);
  218 //g_ceph_context-_conf-set_val(debug_throttle, 20);
  219 g_ceph_context-_conf-apply_changes(NULL);
  220
 
  221 ceph_filestore_data-osd_path =
 strdup(/mnt/fio_ceph_filestore.XXX);
  222 ceph_filestore_data-journal_path =
 strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX);
  223
 
  224 if (!mkdtemp(ceph_filestore_data-osd_path)) {
  225 cout

Re: About Fio backend with ObjectStore API

2015-07-15 Thread Casey Bodley
Hi Haomai,

I was able to run this after a couple changes to the filestore.fio job
file. Two of the config options were using the wrong names. I pushed a
fix for the job file, as well as a patch that renames everything from
filestore to objectstore (thanks James), to
https://github.com/linuxbox2/linuxbox-ceph/commits/fio-objectstore.

I found that the read support doesn't appear to work anymore, so give
rw=write a try. And because it does a mkfs(), make sure you're
pointing it to an empty xfs directory with the directory= option.

Casey

On Tue, Jul 14, 2015 at 2:45 AM, Haomai Wang haomaiw...@gmail.com wrote:
 Anyone who have successfully ran the fio with this external io engine
 ceph_objectstore?

 It's strange that it alwasys hit segment fault when entering
 _fio_setup_ceph_filestore_data, gdb tells td-io_ops is NULL but
 when I up the stack, the td-io_ops is not null. Maybe it's related
 to dlopen?

 On Fri, Jul 10, 2015 at 3:51 PM, Haomai Wang haomaiw...@gmail.com wrote:
 I have rebased the branch with master, and push it to ceph upstream
 repo. https://github.com/ceph/ceph/compare/fio-objectstore?expand=1

 Plz let me know if who is working on this. Otherwise, I would like to
 improve this to be merge ready.

 On Fri, Jul 10, 2015 at 4:26 AM, Matt W. Benjamin m...@cohortfs.com wrote:
 That makes sense.

 Matt

 - James (Fei) Liu-SSI james@ssi.samsung.com wrote:

 Hi Casey,
   Got it. I was directed to the old code base. By the way, Since the
 testing case was used to exercise all of object stores.  Strongly
 recommend to change the name from fio_ceph_filestore.cc to
 fio_ceph_objectstore.cc . And the code in fio_ceph_filestore.cc should
 be refactored to reflect that the whole objectstore will be supported
 by fio_ceph_objectstore.cc. what you think?

 Let me know if you need any help from my side.


 Regards,
 James



 -Original Message-
 From: Casey Bodley [mailto:cbod...@gmail.com]
 Sent: Thursday, July 09, 2015 12:32 PM
 To: James (Fei) Liu-SSI
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: Re: About Fio backend with ObjectStore API

 Hi James,

 Are you looking at the code from
 https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It
 uses ObjectStore::create() instead of new FileStore(). This allows us
 to exercise all of the object stores with the same code.

 Casey

 On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI
 james@ssi.samsung.com wrote:
  Hi Casey,
Here is the code in the fio_ceph_filestore.cc. Basically, it
 creates a filestore as backend engine for IO exercises. If we got to
 send IO commands to KeyValue Store or Newstore, we got to change the
 code accordingly, right?  I did not see any other files like
 fio_ceph_keyvaluestore.cc or fio_ceph_newstore.cc. In my humble
 opinion, we might need to create other two fio engines for
 keyvaluestore and newstore if we want to exercise these two, right?
 
  Regards,
  James
 
  static int fio_ceph_filestore_init(struct thread_data *td)
  209 {
  210 vectorconst char* args;
  211 struct ceph_filestore_data *ceph_filestore_data = (struct
 ceph_filestore_data *) td-io_ops-data;
  212 ObjectStore::Transaction ft;
  213
 
  214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD,
 CODE_ENVIRONMENT_UTILITY, 0);
  215 //g_conf-journal_dio = false;
  216 common_init_finish(g_ceph_context);
  217 //g_ceph_context-_conf-set_val(debug_filestore, 20);
  218 //g_ceph_context-_conf-set_val(debug_throttle, 20);
  219 g_ceph_context-_conf-apply_changes(NULL);
  220
 
  221 ceph_filestore_data-osd_path =
 strdup(/mnt/fio_ceph_filestore.XXX);
  222 ceph_filestore_data-journal_path =
 strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX);
  223
 
  224 if (!mkdtemp(ceph_filestore_data-osd_path)) {
  225 cout  mkdtemp failed:   strerror(errno) 
 std::endl;
  226 return 1;
  227 }
  228 //mktemp(ceph_filestore_data-journal_path); // NOSPC issue
  229
 
  230 ObjectStore *fs = new
 FileStore(ceph_filestore_data-osd_path,
 ceph_filestore_data-journal_path);
  231 ceph_filestore_data-fs = fs;
  232
 
  233 if (fs-mkfs()  0) {
  234 cout  mkfs failed  std::endl;
  235 goto failed;
  236 }
  237
  238 if (fs-mount()  0) {
  239 cout  mount failed  std::endl;
  240 goto failed;
  241 }
  242
 
  243 ft.create_collection(coll_t());
  244 fs-apply_transaction(ft);
  245
 
  246
 
  247 return 0;
  248
 
  249 failed:
  250 return 1;
  251
 
  252 }
  -Original Message-
  From: Casey Bodley [mailto:cbod...@gmail.com]
  Sent: Thursday, July 09, 2015 9:19 AM
  To: James (Fei) Liu-SSI
  Cc: Haomai Wang; ceph-devel@vger.kernel.org
  Subject: Re: About Fio backend with ObjectStore API
 
  Hi James,
 
  In the job file src/test/filestore.fio, you can modify the line
  objectstore=filestore to use any objectstore type supported by
 the
  ObjectStore

Re: About Fio backend with ObjectStore API

2015-07-09 Thread Casey Bodley
Hi James,

Are you looking at the code from
https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore? It
uses ObjectStore::create() instead of new FileStore(). This allows us
to exercise all of the object stores with the same code.

Casey

On Thu, Jul 9, 2015 at 2:01 PM, James (Fei) Liu-SSI
james@ssi.samsung.com wrote:
 Hi Casey,
   Here is the code in the fio_ceph_filestore.cc. Basically, it creates a 
 filestore as backend engine for IO exercises. If we got to send IO commands 
 to KeyValue Store or Newstore, we got to change the code accordingly, right?  
 I did not see any other files like fio_ceph_keyvaluestore.cc or 
 fio_ceph_newstore.cc. In my humble opinion, we might need to create other two 
 fio engines for keyvaluestore and newstore if we want to exercise these two, 
 right?

 Regards,
 James

 static int fio_ceph_filestore_init(struct thread_data *td)
 209 {
 210 vectorconst char* args;
 211 struct ceph_filestore_data *ceph_filestore_data = (struct 
 ceph_filestore_data *) td-io_ops-data;
 212 ObjectStore::Transaction ft;
 213

 214 global_init(NULL, args, CEPH_ENTITY_TYPE_OSD, 
 CODE_ENVIRONMENT_UTILITY, 0);
 215 //g_conf-journal_dio = false;
 216 common_init_finish(g_ceph_context);
 217 //g_ceph_context-_conf-set_val(debug_filestore, 20);
 218 //g_ceph_context-_conf-set_val(debug_throttle, 20);
 219 g_ceph_context-_conf-apply_changes(NULL);
 220

 221 ceph_filestore_data-osd_path = 
 strdup(/mnt/fio_ceph_filestore.XXX);
 222 ceph_filestore_data-journal_path = 
 strdup(/var/lib/ceph/osd/journal-ram/fio_ceph_filestore.XXX);
 223

 224 if (!mkdtemp(ceph_filestore_data-osd_path)) {
 225 cout  mkdtemp failed:   strerror(errno)  std::endl;
 226 return 1;
 227 }
 228 //mktemp(ceph_filestore_data-journal_path); // NOSPC issue
 229

 230 ObjectStore *fs = new FileStore(ceph_filestore_data-osd_path, 
 ceph_filestore_data-journal_path);
 231 ceph_filestore_data-fs = fs;
 232

 233 if (fs-mkfs()  0) {
 234 cout  mkfs failed  std::endl;
 235 goto failed;
 236 }
 237
 238 if (fs-mount()  0) {
 239 cout  mount failed  std::endl;
 240 goto failed;
 241 }
 242

 243 ft.create_collection(coll_t());
 244 fs-apply_transaction(ft);
 245

 246

 247 return 0;
 248

 249 failed:
 250 return 1;
 251

 252 }
 -Original Message-
 From: Casey Bodley [mailto:cbod...@gmail.com]
 Sent: Thursday, July 09, 2015 9:19 AM
 To: James (Fei) Liu-SSI
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: Re: About Fio backend with ObjectStore API

 Hi James,

 In the job file src/test/filestore.fio, you can modify the line 
 objectstore=filestore to use any objectstore type supported by the
 ObjectStore::create() factory.

 Casey

 On Wed, Jul 8, 2015 at 8:02 PM, James (Fei) Liu-SSI 
 james@ssi.samsung.com wrote:
 Hi Casey,
   Quick questions, The code in the trunk only cover the test for filestore. 
 I was wondering do you have any plan to cover the test for kvstore and 
 newstore?

   Thanks,
   James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of James (Fei)
 Liu-SSI
 Sent: Tuesday, June 30, 2015 2:19 PM
 To: Casey Bodley
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: RE: About Fio backend with ObjectStore API

 Hi Casey,

   Thanks a lot.

   Regards,
   James

 -Original Message-
 From: Casey Bodley [mailto:cbod...@gmail.com]
 Sent: Tuesday, June 30, 2015 2:16 PM
 To: James (Fei) Liu-SSI
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: Re: About Fio backend with ObjectStore API

 Hi,

 When Danny Al-Gaaf  Daniel Gollub published Ceph Performance
 Analysis: fio and RBD at
 https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analys
 is_fio_rbd.html, they also mentioned a fio engine that linked directly
 into ceph's FileStore. I was able to find Daniel's branch on github at 
 https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some more work 
 on it at the time.

 I just rebased that work onto the latest ceph master branch, and pushed to 
 our github at 
 https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You can 
 find the source in src/test/fio_ceph_filestore.cc, and run fio with the 
 provided example fio job file in src/test/filestore.fio.

 I didn't have a chance to confirm that it builds with automake, but
 the cmake version built for me. I'm happy to help if you run into
 problems, Casey

 On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI 
 james@ssi.samsung.com wrote:
 Hi Haomai,
   What are you trying to ask is to benchmark local objectstore(like 
 kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want 
 to purely compare the performance locally for these objectstores, right?

   Regards,
   James

 -Original Message-
 From: ceph-devel-ow

Re: About Fio backend with ObjectStore API

2015-07-09 Thread Casey Bodley
Hi James,

In the job file src/test/filestore.fio, you can modify the line
objectstore=filestore to use any objectstore type supported by the
ObjectStore::create() factory.

Casey

On Wed, Jul 8, 2015 at 8:02 PM, James (Fei) Liu-SSI
james@ssi.samsung.com wrote:
 Hi Casey,
   Quick questions, The code in the trunk only cover the test for filestore. I 
 was wondering do you have any plan to cover the test for kvstore and newstore?

   Thanks,
   James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of James (Fei) Liu-SSI
 Sent: Tuesday, June 30, 2015 2:19 PM
 To: Casey Bodley
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: RE: About Fio backend with ObjectStore API

 Hi Casey,

   Thanks a lot.

   Regards,
   James

 -Original Message-
 From: Casey Bodley [mailto:cbod...@gmail.com]
 Sent: Tuesday, June 30, 2015 2:16 PM
 To: James (Fei) Liu-SSI
 Cc: Haomai Wang; ceph-devel@vger.kernel.org
 Subject: Re: About Fio backend with ObjectStore API

 Hi,

 When Danny Al-Gaaf  Daniel Gollub published Ceph Performance
 Analysis: fio and RBD at
 https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html,
 they also mentioned a fio engine that linked directly into ceph's FileStore. 
 I was able to find Daniel's branch on github at 
 https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some more work 
 on it at the time.

 I just rebased that work onto the latest ceph master branch, and pushed to 
 our github at 
 https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You can find 
 the source in src/test/fio_ceph_filestore.cc, and run fio with the provided 
 example fio job file in src/test/filestore.fio.

 I didn't have a chance to confirm that it builds with automake, but the cmake 
 version built for me. I'm happy to help if you run into problems, Casey

 On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI 
 james@ssi.samsung.com wrote:
 Hi Haomai,
   What are you trying to ask is to benchmark local objectstore(like 
 kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want 
 to purely compare the performance locally for these objectstores, right?

   Regards,
   James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang
 Sent: Tuesday, June 30, 2015 9:06 AM
 To: ceph-devel@vger.kernel.org
 Subject: About Fio backend with ObjectStore API

 Hi all,

 Long long ago, is there someone said about fio backend with Ceph ObjectStore 
 API? So we could use the existing mature fio facility to benchmark ceph 
 objectstore.

 --
 Best Regards,

 Wheat
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in the body of a message to majord...@vger.kernel.org More majordomo
 info at  http://vger.kernel.org/majordomo-info.html
   {.n +   +%  lzwm  b 맲  r  yǩ ׯzX ܨ}   Ơz j:+vzZ+  +zf   h  
  ~i   z   w   ? )ߢ f
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: About Fio backend with ObjectStore API

2015-06-30 Thread Casey Bodley
Hi,

When Danny Al-Gaaf  Daniel Gollub published Ceph Performance
Analysis: fio and RBD at
https://telekomcloud.github.io/ceph/2014/02/26/ceph-performance-analysis_fio_rbd.html,
they also mentioned a fio engine that linked directly into ceph's
FileStore. I was able to find Daniel's branch on github at
https://github.com/gollub/ceph/tree/fio_filestore_v2, and did some
more work on it at the time.

I just rebased that work onto the latest ceph master branch, and
pushed to our github at
https://github.com/linuxbox2/linuxbox-ceph/tree/fio-objectstore. You
can find the source in src/test/fio_ceph_filestore.cc, and run fio
with the provided example fio job file in src/test/filestore.fio.

I didn't have a chance to confirm that it builds with automake, but
the cmake version built for me. I'm happy to help if you run into
problems,
Casey

On Tue, Jun 30, 2015 at 2:31 PM, James (Fei) Liu-SSI
james@ssi.samsung.com wrote:
 Hi Haomai,
   What are you trying to ask is to benchmark local objectstore(like 
 kvstore/filestore/newstore) locally with FIO(ObjectStore engine)? You want to 
 purely compare the performance locally for these objectstores, right?

   Regards,
   James

 -Original Message-
 From: ceph-devel-ow...@vger.kernel.org 
 [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Haomai Wang
 Sent: Tuesday, June 30, 2015 9:06 AM
 To: ceph-devel@vger.kernel.org
 Subject: About Fio backend with ObjectStore API

 Hi all,

 Long long ago, is there someone said about fio backend with Ceph ObjectStore 
 API? So we could use the existing mature fio facility to benchmark ceph 
 objectstore.

 --
 Best Regards,

 Wheat
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in the 
 body of a message to majord...@vger.kernel.org More majordomo info at  
 http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


CMake blueprint

2014-03-05 Thread Casey Bodley
Hi Ilya,

Regarding the CMake blueprint at 
http://wiki.ceph.com/Planning/Blueprints/Giant/CMake, we at The Linux Box are 
excited to see more interest!  I know that we've made several improvements to 
the CMakeLists on our local branches that haven't made it to our github 
repository.  We'll get everything consolidated, and should have another push 
ready sometime next week.

Thanks,
Casey
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent xattrs on file objects

2012-10-17 Thread Casey Bodley
To expand on what Matt said, we're also trying to address this issue of lookups 
by inode number for use with NFS.

The design we've been exploring is to create a single system inode, designated 
the 'inode container' directory, which stores the primary links to all inodes 
in the filesystem. These links are named by their inode number to satisfy 
lookups and obviate the need for an anchor table. This design allows the inode 
container to make use of existing directory fragmentation and load balancing to 
distribute the inodes over the MDS cluster.

When a new file is created, it then adds two links: a primary link into the 
inode container, and a remote link into the filesystem namespace. In the case 
where the parent directory fragment's authority is different than the 
corresponding inode container fragment's, it is created in the parent directory 
then exported to the inode container via an asynchronous slave request.

We welcome additional discussion, both on this design specifically and on the 
general topic of scalable ino lookups.

Casey

- Original Message -
From: Matt W. Benjamin m...@linuxbox.com
To: Sage Weil s...@inktank.com
Cc: ceph-devel@vger.kernel.org, aemerson aemer...@linuxbox.com, casey 
ca...@linuxbox.com, peter honeyman peter.honey...@gmail.com
Sent: Tuesday, October 16, 2012 5:35:12 PM
Subject: Re: parent xattrs on file objects

Hi Sage,

We've been exploring (experimentally implementing) a different solution to this 
problem, basically refactoring dirents and inodes, extending fragmentation 
logic, and adding new metadata location operations.  We also remove the anchor 
table.  We were planning to ask for some feedback once we had some initial 
results, but since you're floating a related idea, we'd like to share what we 
have so far.  CC'ing people.

Regards,

Matt

- Sage Weil s...@inktank.com wrote:

 Hey-
 
 One of the design goals of the ceph fs was to keep metadata separate
 from 
 data.  This means, among other things, that when a client is creating
 a 
 bunch of files, it creates the inode via the mds and writes the file
 data 
 to the OSD, but no mds-osd interaction is necessary.
 
 One of the challenges we currently have is that it is difficult to
 lookup 
 an inode by ino.  Normally clients traverse the hierarchy to get
 there, so 
 things are fine for native ceph clients, but when reexporting via NFS
 we 
 can get ESTALE because we an ancient nfs file handle can be presented
 and 
 the ceph MDS won't know where to find it.  We have a similar problem
 with 
 the fsck design in that it is not always possible to discover orphaned
 
 children of directory that was somehow lost.
 
 One option is to put an ancestor xattr on the first object for each
 file, 
 similar to what we do for directories.  This basically means that each
 
 file creation will be followed (eventually) by a setxattr osd
 operation.  
 This used to scare me, but now it's seeming like a pretty small price
 to 
 pay for robust NFS reexport and additional information for fsck to 
 utilize.
 
 It's also nice because it means we could get rid of the anchor table
 (used 
 for locating files with multiple hard links) entirely and use the 
 ancestore xattrs instead.  That means one less thing to fsck, and
 avoids 
 having to invest any time in making the anchor table effectively scale
 (it 
 currently doesn't).
 
 Anyone feel like we shouldn't go ahead and do this?
 
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel
 in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Matt Benjamin
The Linux Box
206 South Fifth Ave. Suite 150
Ann Arbor, MI  48104

http://linuxbox.com

tel. 734-761-4689
fax. 734-769-8938
cel. 734-216-5309
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: parent xattrs on file objects

2012-10-17 Thread Casey Bodley
Hi Greg,

In this case where an inode is created on mds.a and exported to mds.b, there is 
a potential race on mds.b between a subsequent lookup-by-ino and the primary 
link actually making it into the inode container.

Our tentative solution was to rely on the way InoTable breaks up the range of 
inode numbers based on mds nodeid. So when a lookup on the inode container 
fails, we can determine which mds would have allocated that inode number and 
attempt to find the inode there. The originating mds.a should always find the 
inode in its cache while it's pinned for export. Depending on whether the inode 
is found on mds.a, the lookup-by-ino on mds.b either returns failure or waits 
for the import to finish.

Casey

- Original Message -
From: Gregory Farnum g...@inktank.com
To: Casey Bodley ca...@linuxbox.com
Cc: Matt W. Benjamin m...@linuxbox.com, ceph-devel@vger.kernel.org, 
aemerson aemer...@linuxbox.com, peter honeyman 
peter.honey...@gmail.com, Sage Weil s...@inktank.com
Sent: Wednesday, October 17, 2012 4:18:04 PM
Subject: Re: parent xattrs on file objects

On Wed, Oct 17, 2012 at 12:40 PM, Casey Bodley ca...@linuxbox.com wrote:
 To expand on what Matt said, we're also trying to address this issue of 
 lookups by inode number for use with NFS.

 The design we've been exploring is to create a single system inode, 
 designated the 'inode container' directory, which stores the primary links to 
 all inodes in the filesystem. These links are named by their inode number to 
 satisfy lookups and obviate the need for an anchor table. This design allows 
 the inode container to make use of existing directory fragmentation and load 
 balancing to distribute the inodes over the MDS cluster.

 When a new file is created, it then adds two links: a primary link into the 
 inode container, and a remote link into the filesystem namespace. In the case 
 where the parent directory fragment's authority is different than the 
 corresponding inode container fragment's, it is created in the parent 
 directory then exported to the inode container via an asynchronous slave 
 request.

 We welcome additional discussion, both on this design specifically and on the 
 general topic of scalable ino lookups.

So if the primary link isn't always in the inode container, you must
be preserving the anchor table for this setup. Am I understanding that
correctly? Or is there some other mechanism for linking them that's
less expensive?
-Greg
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html