Re: [ceph-users] RADOS as a simple object storage
Wido den Hollander wrote: : : > Op 27 februari 2017 om 15:59 schreef Jan Kasprzak : : > : > : > Here is some statistics from our biggest instance of the object storage: : > : > : > : > : > : > objects stored: 100_000_000 : > : > : > < 1024 bytes:10_000_000 : > : > : > 1k-64k bytes:80_000_000 : > : > : > 64k-4M bytes:10_000_000 : > : > : > 4M-256M bytes:1_000_000 : > : > : >> 256M bytes:10_000 : > : > : > biggest object: 15 GBytes : > : > : > : > : > : > Would it be feasible to put 100M to 1G objects as a native RADOS objects : > : > : > into a single pool? [...] : > https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33 : > : > If I understand it correctly, it looks like libradosstriper only splits : > large stored objects into smaller pieces (RADOS objects), but does not : > consolidate more small stored objects into larger RADOS objects. : : Why would you want to do that? Yes, very small objects can be a problem if you have millions of them since it takes a bit more to replicate them and recover them. Yes. This is what I was afraid of. The immutability of my objects would allow to consolidate smaller objects into larger bundles, but if you say is not necessary for the problem of my size, I'll store them into individual RADOS objects. : : But overall I wouldn't bother about it too much. OK, thanks! : > So do you think I am ok with >10M tiny objects (smaller than 1KB) : > and ~100,000,000 to 1,000,000,000 total objects, provided that I split : > huge objects using libradosstriper? -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | Assuming that OpenSSL is written as carefully as Wietse's own code, every 1000 lines introduce one additional bug into Postfix." --TLS_README ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS as a simple object storage
> Op 27 februari 2017 om 15:59 schreef Jan Kasprzak : > > > Hello, > > Gregory Farnum wrote: > : On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak wrote: > : > Gregory Farnum wrote: > : > : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak wrote: > : > : > > : > : > I have been using CEPH RBD for a year or so as a virtual machine > storage > : > : > backend, and I am thinking about moving our another subsystem to CEPH: > [...] > : > : > Here is some statistics from our biggest instance of the object > storage: > : > : > > : > : > objects stored: 100_000_000 > : > : > < 1024 bytes:10_000_000 > : > : > 1k-64k bytes:80_000_000 > : > : > 64k-4M bytes:10_000_000 > : > : > 4M-256M bytes:1_000_000 > : > : >> 256M bytes:10_000 > : > : > biggest object: 15 GBytes > : > : > > : > : > Would it be feasible to put 100M to 1G objects as a native RADOS > objects > : > : > into a single pool? > : > : > : > : This is well outside the object size RADOS is targeted or tested with; > : > : I'd expect issues. You might want to look at libradosstriper from the > : > : requirements you've mentioned. > : > > : > OK, thanks! Is there any documentation for libradosstriper? > : > I am looking for something similar to librados documentation: > : > http://docs.ceph.com/docs/master/rados/api/librados/ > : > : Not that I see, and I haven't used it myself, but the header file (see > : ceph/src/libradosstriper) seems to have reasonable function docs. It's > : a fairly thin wrapper around librados AFAIK. > > OK, I have read the docs in the header file and the comment > near the top of RadosStriperImpl.cc: > > https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33 > > If I understand it correctly, it looks like libradosstriper only splits > large stored objects into smaller pieces (RADOS objects), but does not > consolidate more small stored objects into larger RADOS objects. Why would you want to do that? Yes, very small objects can be a problem if you have millions of them since it takes a bit more to replicate them and recover them. But overall I wouldn't bother about it too much. Wido > > So do you think I am ok with >10M tiny objects (smaller than 1KB) > and ~100,000,000 to 1,000,000,000 total objects, provided that I split > huge objects using libradosstriper? > > Thanks, > > -Yenya > > -- > | Jan "Yenya" Kasprzak | > | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | > Assuming that OpenSSL is written as carefully as Wietse's own code, > every 1000 lines introduce one additional bug into Postfix." --TLS_README > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS as a simple object storage
Hello, Gregory Farnum wrote: : On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak wrote: : > Gregory Farnum wrote: : > : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak wrote: : > : > : > : > I have been using CEPH RBD for a year or so as a virtual machine storage : > : > backend, and I am thinking about moving our another subsystem to CEPH: [...] : > : > Here is some statistics from our biggest instance of the object storage: : > : > : > : > objects stored: 100_000_000 : > : > < 1024 bytes:10_000_000 : > : > 1k-64k bytes:80_000_000 : > : > 64k-4M bytes:10_000_000 : > : > 4M-256M bytes:1_000_000 : > : >> 256M bytes:10_000 : > : > biggest object: 15 GBytes : > : > : > : > Would it be feasible to put 100M to 1G objects as a native RADOS objects : > : > into a single pool? : > : : > : This is well outside the object size RADOS is targeted or tested with; : > : I'd expect issues. You might want to look at libradosstriper from the : > : requirements you've mentioned. : > : > OK, thanks! Is there any documentation for libradosstriper? : > I am looking for something similar to librados documentation: : > http://docs.ceph.com/docs/master/rados/api/librados/ : : Not that I see, and I haven't used it myself, but the header file (see : ceph/src/libradosstriper) seems to have reasonable function docs. It's : a fairly thin wrapper around librados AFAIK. OK, I have read the docs in the header file and the comment near the top of RadosStriperImpl.cc: https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33 If I understand it correctly, it looks like libradosstriper only splits large stored objects into smaller pieces (RADOS objects), but does not consolidate more small stored objects into larger RADOS objects. So do you think I am ok with >10M tiny objects (smaller than 1KB) and ~100,000,000 to 1,000,000,000 total objects, provided that I split huge objects using libradosstriper? Thanks, -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | Assuming that OpenSSL is written as carefully as Wietse's own code, every 1000 lines introduce one additional bug into Postfix." --TLS_README ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS as a simple object storage
On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak wrote: > Gregory Farnum wrote: > : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak wrote: > : > Hello, world!\n > : > > : > I have been using CEPH RBD for a year or so as a virtual machine storage > : > backend, and I am thinking about moving our another subsystem to CEPH: > : > > : > The subsystem in question is a simple replicated object storage, > : > currently implemented by a custom C code by yours truly. My question > : > is whether implementing such a thing on top of a CEPH RADOS pool and > librados > : > is feasible, and what layout and optimizations would you suggest. > : > > : > Our object storage indexes object with a numeric ID. The access methods > : > involve creating, reading and deleting objects. Objects are never modified > : > in place, they are instead deleted and an object with a new ID is created. > : > We also keep a hash of an object contents and use it to prevent bit rot > : > - the objects are scrubbed periodically, and if a checksum mismatch is > : > discovered, the object is restored from another replica. > : > > : > Here is some statistics from our biggest instance of the object storage: > : > > : > objects stored: 100_000_000 > : > < 1024 bytes:10_000_000 > : > 1k-64k bytes:80_000_000 > : > 64k-4M bytes:10_000_000 > : > 4M-256M bytes:1_000_000 > : >> 256M bytes:10_000 > : > biggest object: 15 GBytes > : > > : > Would it be feasible to put 100M to 1G objects as a native RADOS objects > : > into a single pool? > : > : This is well outside the object size RADOS is targeted or tested with; > : I'd expect issues. You might want to look at libradosstriper from the > : requirements you've mentioned. > > OK, thanks! Is there any documentation for libradosstriper? > I am looking for something similar to librados documentation: > http://docs.ceph.com/docs/master/rados/api/librados/ Not that I see, and I haven't used it myself, but the header file (see ceph/src/libradosstriper) seems to have reasonable function docs. It's a fairly thin wrapper around librados AFAIK. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS as a simple object storage
Gregory Farnum wrote: : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak wrote: : > Hello, world!\n : > : > I have been using CEPH RBD for a year or so as a virtual machine storage : > backend, and I am thinking about moving our another subsystem to CEPH: : > : > The subsystem in question is a simple replicated object storage, : > currently implemented by a custom C code by yours truly. My question : > is whether implementing such a thing on top of a CEPH RADOS pool and librados : > is feasible, and what layout and optimizations would you suggest. : > : > Our object storage indexes object with a numeric ID. The access methods : > involve creating, reading and deleting objects. Objects are never modified : > in place, they are instead deleted and an object with a new ID is created. : > We also keep a hash of an object contents and use it to prevent bit rot : > - the objects are scrubbed periodically, and if a checksum mismatch is : > discovered, the object is restored from another replica. : > : > Here is some statistics from our biggest instance of the object storage: : > : > objects stored: 100_000_000 : > < 1024 bytes:10_000_000 : > 1k-64k bytes:80_000_000 : > 64k-4M bytes:10_000_000 : > 4M-256M bytes:1_000_000 : >> 256M bytes:10_000 : > biggest object: 15 GBytes : > : > Would it be feasible to put 100M to 1G objects as a native RADOS objects : > into a single pool? : : This is well outside the object size RADOS is targeted or tested with; : I'd expect issues. You might want to look at libradosstriper from the : requirements you've mentioned. OK, thanks! Is there any documentation for libradosstriper? I am looking for something similar to librados documentation: http://docs.ceph.com/docs/master/rados/api/librados/ Thanks! -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | Assuming that OpenSSL is written as carefully as Wietse's own code, every 1000 lines introduce one additional bug into Postfix." --TLS_README ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RADOS as a simple object storage
On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak wrote: > Hello, world!\n > > I have been using CEPH RBD for a year or so as a virtual machine storage > backend, and I am thinking about moving our another subsystem to CEPH: > > The subsystem in question is a simple replicated object storage, > currently implemented by a custom C code by yours truly. My question > is whether implementing such a thing on top of a CEPH RADOS pool and librados > is feasible, and what layout and optimizations would you suggest. > > Our object storage indexes object with a numeric ID. The access methods > involve creating, reading and deleting objects. Objects are never modified > in place, they are instead deleted and an object with a new ID is created. > We also keep a hash of an object contents and use it to prevent bit rot > - the objects are scrubbed periodically, and if a checksum mismatch is > discovered, the object is restored from another replica. > > Here is some statistics from our biggest instance of the object storage: > > objects stored: 100_000_000 > < 1024 bytes:10_000_000 > 1k-64k bytes:80_000_000 > 64k-4M bytes:10_000_000 > 4M-256M bytes:1_000_000 >> 256M bytes:10_000 > biggest object: 15 GBytes > > Would it be feasible to put 100M to 1G objects as a native RADOS objects > into a single pool? This is well outside the object size RADOS is targeted or tested with; I'd expect issues. You might want to look at libradosstriper from the requirements you've mentioned. > Or should I consider their read-only nature and pack them > to bigger object/pack with metadata stored in a tmap object, and repack > those packed objects periodically as older object get deleted? Definitely don't do that, see above. ;) -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RADOS as a simple object storage
Hello, world!\n I have been using CEPH RBD for a year or so as a virtual machine storage backend, and I am thinking about moving our another subsystem to CEPH: The subsystem in question is a simple replicated object storage, currently implemented by a custom C code by yours truly. My question is whether implementing such a thing on top of a CEPH RADOS pool and librados is feasible, and what layout and optimizations would you suggest. Our object storage indexes object with a numeric ID. The access methods involve creating, reading and deleting objects. Objects are never modified in place, they are instead deleted and an object with a new ID is created. We also keep a hash of an object contents and use it to prevent bit rot - the objects are scrubbed periodically, and if a checksum mismatch is discovered, the object is restored from another replica. Here is some statistics from our biggest instance of the object storage: objects stored: 100_000_000 < 1024 bytes:10_000_000 1k-64k bytes:80_000_000 64k-4M bytes:10_000_000 4M-256M bytes:1_000_000 > 256M bytes:10_000 biggest object: 15 GBytes Would it be feasible to put 100M to 1G objects as a native RADOS objects into a single pool? Or should I consider their read-only nature and pack them to bigger object/pack with metadata stored in a tmap object, and repack those packed objects periodically as older object get deleted? I have also considered rados-gw, but it looks like a too big hammer for my nail :-) Thanks for your suggestions, -Yenya -- | Jan "Yenya" Kasprzak | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | Assuming that OpenSSL is written as carefully as Wietse's own code, every 1000 lines introduce one additional bug into Postfix." --TLS_README ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com