Re: [ceph-users] RADOS as a simple object storage

2017-03-01 Thread Jan Kasprzak
Wido den Hollander wrote:
: 
: > Op 27 februari 2017 om 15:59 schreef Jan Kasprzak :
: > : > : > Here is some statistics from our biggest instance of the object 
storage:
: > : > : >
: > : > : > objects stored: 100_000_000
: > : > : > < 1024 bytes:10_000_000
: > : > : > 1k-64k bytes:80_000_000
: > : > : > 64k-4M bytes:10_000_000
: > : > : > 4M-256M bytes:1_000_000
: > : > : >> 256M bytes:10_000
: > : > : > biggest object:   15 GBytes
: > : > : >
: > : > : > Would it be feasible to put 100M to 1G objects as a native RADOS 
objects
: > : > : > into a single pool?
[...]
: > 
https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33
: > 
: > If I understand it correctly, it looks like libradosstriper only splits
: > large stored objects into smaller pieces (RADOS objects), but does not
: > consolidate more small stored objects into larger RADOS objects.
: 
: Why would you want to do that? Yes, very small objects can be a problem if 
you have millions of them since it takes a bit more to replicate them and 
recover them.

Yes. This is what I was afraid of. The immutability of my objects
would allow to consolidate smaller objects into larger bundles, but
if you say is not necessary for the problem of my size, I'll store them into
individual RADOS objects.
: 
: But overall I wouldn't bother about it too much.

OK, thanks!

: > So do you think I am ok with >10M tiny objects (smaller than 1KB)
: > and ~100,000,000 to 1,000,000,000 total objects, provided that I split
: > huge objects using libradosstriper?

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOS as a simple object storage

2017-02-28 Thread Wido den Hollander

> Op 27 februari 2017 om 15:59 schreef Jan Kasprzak :
> 
> 
>   Hello,
> 
> Gregory Farnum wrote:
> : On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak  wrote:
> : > Gregory Farnum wrote:
> : > : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak  wrote:
> : > : >
> : > : > I have been using CEPH RBD for a year or so as a virtual machine 
> storage
> : > : > backend, and I am thinking about moving our another subsystem to CEPH:
> [...]
> : > : > Here is some statistics from our biggest instance of the object 
> storage:
> : > : >
> : > : > objects stored: 100_000_000
> : > : > < 1024 bytes:10_000_000
> : > : > 1k-64k bytes:80_000_000
> : > : > 64k-4M bytes:10_000_000
> : > : > 4M-256M bytes:1_000_000
> : > : >> 256M bytes:10_000
> : > : > biggest object:   15 GBytes
> : > : >
> : > : > Would it be feasible to put 100M to 1G objects as a native RADOS 
> objects
> : > : > into a single pool?
> : > :
> : > : This is well outside the object size RADOS is targeted or tested with;
> : > : I'd expect issues. You might want to look at libradosstriper from the
> : > : requirements you've mentioned.
> : >
> : > OK, thanks! Is there any documentation for libradosstriper?
> : > I am looking for something similar to librados documentation:
> : > http://docs.ceph.com/docs/master/rados/api/librados/
> : 
> : Not that I see, and I haven't used it myself, but the header file (see
> : ceph/src/libradosstriper) seems to have reasonable function docs. It's
> : a fairly thin wrapper around librados AFAIK.
> 
>   OK, I have read the docs in the header file and the comment
> near the top of RadosStriperImpl.cc:
> 
> https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33
> 
> If I understand it correctly, it looks like libradosstriper only splits
> large stored objects into smaller pieces (RADOS objects), but does not
> consolidate more small stored objects into larger RADOS objects.

Why would you want to do that? Yes, very small objects can be a problem if you 
have millions of them since it takes a bit more to replicate them and recover 
them.

But overall I wouldn't bother about it too much.

Wido

> 
>   So do you think I am ok with >10M tiny objects (smaller than 1KB)
> and ~100,000,000 to 1,000,000,000 total objects, provided that I split
> huge objects using libradosstriper?
> 
>   Thanks,
> 
> -Yenya
> 
> -- 
> | Jan "Yenya" Kasprzak  |
> | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
> Assuming that OpenSSL is written as carefully as Wietse's own code,
> every 1000 lines introduce one additional bug into Postfix."   --TLS_README
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOS as a simple object storage

2017-02-27 Thread Jan Kasprzak
Hello,

Gregory Farnum wrote:
: On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak  wrote:
: > Gregory Farnum wrote:
: > : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak  wrote:
: > : >
: > : > I have been using CEPH RBD for a year or so as a virtual machine storage
: > : > backend, and I am thinking about moving our another subsystem to CEPH:
[...]
: > : > Here is some statistics from our biggest instance of the object storage:
: > : >
: > : > objects stored: 100_000_000
: > : > < 1024 bytes:10_000_000
: > : > 1k-64k bytes:80_000_000
: > : > 64k-4M bytes:10_000_000
: > : > 4M-256M bytes:1_000_000
: > : >> 256M bytes:10_000
: > : > biggest object:   15 GBytes
: > : >
: > : > Would it be feasible to put 100M to 1G objects as a native RADOS objects
: > : > into a single pool?
: > :
: > : This is well outside the object size RADOS is targeted or tested with;
: > : I'd expect issues. You might want to look at libradosstriper from the
: > : requirements you've mentioned.
: >
: > OK, thanks! Is there any documentation for libradosstriper?
: > I am looking for something similar to librados documentation:
: > http://docs.ceph.com/docs/master/rados/api/librados/
: 
: Not that I see, and I haven't used it myself, but the header file (see
: ceph/src/libradosstriper) seems to have reasonable function docs. It's
: a fairly thin wrapper around librados AFAIK.

OK, I have read the docs in the header file and the comment
near the top of RadosStriperImpl.cc:

https://github.com/ceph/ceph/blob/master/src/libradosstriper/RadosStriperImpl.cc#L33

If I understand it correctly, it looks like libradosstriper only splits
large stored objects into smaller pieces (RADOS objects), but does not
consolidate more small stored objects into larger RADOS objects.

So do you think I am ok with >10M tiny objects (smaller than 1KB)
and ~100,000,000 to 1,000,000,000 total objects, provided that I split
huge objects using libradosstriper?

Thanks,

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOS as a simple object storage

2017-02-20 Thread Gregory Farnum
On Mon, Feb 20, 2017 at 11:57 AM, Jan Kasprzak  wrote:
> Gregory Farnum wrote:
> : On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak  wrote:
> : > Hello, world!\n
> : >
> : > I have been using CEPH RBD for a year or so as a virtual machine storage
> : > backend, and I am thinking about moving our another subsystem to CEPH:
> : >
> : > The subsystem in question is a simple replicated object storage,
> : > currently implemented by a custom C code by yours truly. My question
> : > is whether implementing such a thing on top of a CEPH RADOS pool and 
> librados
> : > is feasible, and what layout and optimizations would you suggest.
> : >
> : > Our object storage indexes object with a numeric ID. The access methods
> : > involve creating, reading and deleting objects. Objects are never modified
> : > in place, they are instead deleted and an object with a new ID is created.
> : > We also keep a hash of an object contents and use it to prevent bit rot
> : > - the objects are scrubbed periodically, and if a checksum mismatch is
> : > discovered, the object is restored from another replica.
> : >
> : > Here is some statistics from our biggest instance of the object storage:
> : >
> : > objects stored: 100_000_000
> : > < 1024 bytes:10_000_000
> : > 1k-64k bytes:80_000_000
> : > 64k-4M bytes:10_000_000
> : > 4M-256M bytes:1_000_000
> : >> 256M bytes:10_000
> : > biggest object:   15 GBytes
> : >
> : > Would it be feasible to put 100M to 1G objects as a native RADOS objects
> : > into a single pool?
> :
> : This is well outside the object size RADOS is targeted or tested with;
> : I'd expect issues. You might want to look at libradosstriper from the
> : requirements you've mentioned.
>
> OK, thanks! Is there any documentation for libradosstriper?
> I am looking for something similar to librados documentation:
> http://docs.ceph.com/docs/master/rados/api/librados/

Not that I see, and I haven't used it myself, but the header file (see
ceph/src/libradosstriper) seems to have reasonable function docs. It's
a fairly thin wrapper around librados AFAIK.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOS as a simple object storage

2017-02-20 Thread Jan Kasprzak
Gregory Farnum wrote:
: On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak  wrote:
: > Hello, world!\n
: >
: > I have been using CEPH RBD for a year or so as a virtual machine storage
: > backend, and I am thinking about moving our another subsystem to CEPH:
: >
: > The subsystem in question is a simple replicated object storage,
: > currently implemented by a custom C code by yours truly. My question
: > is whether implementing such a thing on top of a CEPH RADOS pool and 
librados
: > is feasible, and what layout and optimizations would you suggest.
: >
: > Our object storage indexes object with a numeric ID. The access methods
: > involve creating, reading and deleting objects. Objects are never modified
: > in place, they are instead deleted and an object with a new ID is created.
: > We also keep a hash of an object contents and use it to prevent bit rot
: > - the objects are scrubbed periodically, and if a checksum mismatch is
: > discovered, the object is restored from another replica.
: >
: > Here is some statistics from our biggest instance of the object storage:
: >
: > objects stored: 100_000_000
: > < 1024 bytes:10_000_000
: > 1k-64k bytes:80_000_000
: > 64k-4M bytes:10_000_000
: > 4M-256M bytes:1_000_000
: >> 256M bytes:10_000
: > biggest object:   15 GBytes
: >
: > Would it be feasible to put 100M to 1G objects as a native RADOS objects
: > into a single pool?
: 
: This is well outside the object size RADOS is targeted or tested with;
: I'd expect issues. You might want to look at libradosstriper from the
: requirements you've mentioned.

OK, thanks! Is there any documentation for libradosstriper?
I am looking for something similar to librados documentation:
http://docs.ceph.com/docs/master/rados/api/librados/

Thanks!

-Yenya


-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RADOS as a simple object storage

2017-02-20 Thread Gregory Farnum
On Mon, Feb 20, 2017 at 6:46 AM, Jan Kasprzak  wrote:
> Hello, world!\n
>
> I have been using CEPH RBD for a year or so as a virtual machine storage
> backend, and I am thinking about moving our another subsystem to CEPH:
>
> The subsystem in question is a simple replicated object storage,
> currently implemented by a custom C code by yours truly. My question
> is whether implementing such a thing on top of a CEPH RADOS pool and librados
> is feasible, and what layout and optimizations would you suggest.
>
> Our object storage indexes object with a numeric ID. The access methods
> involve creating, reading and deleting objects. Objects are never modified
> in place, they are instead deleted and an object with a new ID is created.
> We also keep a hash of an object contents and use it to prevent bit rot
> - the objects are scrubbed periodically, and if a checksum mismatch is
> discovered, the object is restored from another replica.
>
> Here is some statistics from our biggest instance of the object storage:
>
> objects stored: 100_000_000
> < 1024 bytes:10_000_000
> 1k-64k bytes:80_000_000
> 64k-4M bytes:10_000_000
> 4M-256M bytes:1_000_000
>> 256M bytes:10_000
> biggest object:   15 GBytes
>
> Would it be feasible to put 100M to 1G objects as a native RADOS objects
> into a single pool?

This is well outside the object size RADOS is targeted or tested with;
I'd expect issues. You might want to look at libradosstriper from the
requirements you've mentioned.


> Or should I consider their read-only nature and pack them
> to bigger object/pack with metadata stored in a tmap object, and repack
> those packed objects periodically as older object get deleted?

Definitely don't do that, see above. ;)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RADOS as a simple object storage

2017-02-20 Thread Jan Kasprzak
Hello, world!\n

I have been using CEPH RBD for a year or so as a virtual machine storage
backend, and I am thinking about moving our another subsystem to CEPH:

The subsystem in question is a simple replicated object storage,
currently implemented by a custom C code by yours truly. My question
is whether implementing such a thing on top of a CEPH RADOS pool and librados
is feasible, and what layout and optimizations would you suggest.

Our object storage indexes object with a numeric ID. The access methods
involve creating, reading and deleting objects. Objects are never modified
in place, they are instead deleted and an object with a new ID is created.
We also keep a hash of an object contents and use it to prevent bit rot
- the objects are scrubbed periodically, and if a checksum mismatch is
discovered, the object is restored from another replica.

Here is some statistics from our biggest instance of the object storage:

objects stored: 100_000_000
< 1024 bytes:10_000_000
1k-64k bytes:80_000_000
64k-4M bytes:10_000_000
4M-256M bytes:1_000_000
> 256M bytes:10_000
biggest object:   15 GBytes

Would it be feasible to put 100M to 1G objects as a native RADOS objects
into a single pool? Or should I consider their read-only nature and pack them
to bigger object/pack with metadata stored in a tmap object, and repack
those packed objects periodically as older object get deleted?

I have also considered rados-gw, but it looks like a too big hammer
for my nail :-)

Thanks for your suggestions,

-Yenya

-- 
| Jan "Yenya" Kasprzak  |
| http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 |
Assuming that OpenSSL is written as carefully as Wietse's own code,
every 1000 lines introduce one additional bug into Postfix."   --TLS_README
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com