Re: [ceph-users] Looking for experience
> Am 10.01.2020 um 07:10 schrieb Mainor Daly : > > > Hi Stefan, > > before I give some suggestions, can you first describe your usecase for which > you wanna use that setup? Also which aspects are important for you. It’s just the backup target of another ceph Cluster to sync snapshots once a day. Important is pricing, performance todo this task and expansion. We would like to start with something around just 50tb storage. Greets, Stefan > > >> Stefan Priebe - Profihost AG < s.pri...@profihost.ag> hat am 9. Januar 2020 >> um 22:52 geschrieben: >> >> >> As a starting point the current idea is to use something like: >> >> 4-6 nodes with 12x 12tb disks each >> 128G Memory >> AMD EPYC 7302P 3GHz, 16C/32T >> 128GB RAM >> >> Something to discuss is >> >> - EC or go with 3 replicas. We'll use bluestore with compression. >> - Do we need something like Intel Optane for WAL / DB or not? >> >> Since we started using ceph we're mostly subscribed to SSDs - so no >> knowlege about HDD in place. >> >> Greets, >> Stefan >>> Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG: >>> Am 09.01.2020 um 16:10 schrieb Wido den Hollander < w...@42on.com>: >> On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: >> Hi Wido, >> Am 09.01.20 um 14:18 schrieb Wido den Hollander: >> >> >>> On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: >>> > Am 09.01.20 um 13:39 schrieb Janne Johansson: > I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. >>> >>> >>> We are aiming for about 100 writers. >> >> So if I read it correctly the writes will be 64k each. > > may be ;-) see below > >> That should be doable, but you probably want something like NVMe for >> DB+WAL. >> >> You might want to tune that larger writes also go into the WAL to speed >> up the ingress writes. But you mainly want more spindles then less. > > I would like to give a little bit more insight about this and most > probobly some overhead we currently have in those numbers. Those values > come from our old classic raid storage boxes. Those use btrfs + zlib > compression + subvolumes for those backups and we've collected those > numbers from all of them. > > The new system should just replicate snapshots from the live ceph. > Hopefully being able to use Erase Coding and compression? ;-) > Compression might work, but only if the data is compressable. EC usually writes very fast, so that's good. I would recommend a lot of spindles those. More spindles == more OSDs == more performance. So instead of using 12TB drives you can consider 6TB or 8TB drives. >>> >>> Currently we have a lot of 5TB 2.5 drives in place so we could use them.we >>> would like to start with around 4000 Iops and 250 MB per second while using >>> 24 Drive boxes. We could please one or two NVMe PCIe cards in them. >>> >>> >>> Stefan >>> >>> > Wido > Greets, > Stefan > >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
Hi Stefan, before I give some suggestions, can you first describe your usecase for which you wanna use that setup? Also which aspects are important for you. Stefan Priebe - Profihost AG < s.pri...@profihost.ag> hat am 9. Januar 2020 um 22:52 geschrieben: As a starting point the current idea is to use something like: 4-6 nodes with 12x 12tb disks each 128G Memory AMD EPYC 7302P 3GHz, 16C/32T 128GB RAM Something to discuss is - EC or go with 3 replicas. We'll use bluestore with compression. - Do we need something like Intel Optane for WAL / DB or not? Since we started using ceph we're mostly subscribed to SSDs - so no knowlege about HDD in place. Greets, Stefan Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG: Am 09.01.2020 um 16:10 schrieb Wido den Hollander < w...@42on.com>: On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: > Am 09.01.20 um 13:39 schrieb Janne Johansson: > I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. We are aiming for about 100 writers. So if I read it correctly the writes will be 64k each. may be ;-) see below That should be doable, but you probably want something like NVMe for DB+WAL. You might want to tune that larger writes also go into the WAL to speed up the ingress writes. But you mainly want more spindles then less. I would like to give a little bit more insight about this and most probobly some overhead we currently have in those numbers. Those values come from our old classic raid storage boxes. Those use btrfs + zlib compression + subvolumes for those backups and we've collected those numbers from all of them. The new system should just replicate snapshots from the live ceph. Hopefully being able to use Erase Coding and compression? ;-) Compression might work, but only if the data is compressable. EC usually writes very fast, so that's good. I would recommend a lot of spindles those. More spindles == more OSDs == more performance. So instead of using 12TB drives you can consider 6TB or 8TB drives. Currently we have a lot of 5TB 2.5 drives in place so we could use them.we would like to start with around 4000 Iops and 250 MB per second while using 24 Drive boxes. We could please one or two NVMe PCIe cards in them. Stefan > Wido Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
It sounds like an I/O bottleneck (either max IOPS or max throughput) in the making. If you are looking for cold storage archival data only, then it may be ok.(if it doesn't matter how long it takes to write the data) If this is production data with any sort of IOPs load or data change rate, I'd be concerned. Too big of spin disks, will get killed on seek times. Too many & too big spinners will likely bottleneck the i/O controller. It would be better to use more of cheaper nodes to yield way more disks which are smaller. (2TB max) (more disks, more i/o controllers, more motherboards = more perf) Think "scale out" in # of nodes not "scale up" the individual nodes -Ed Software Defined Storage Engineer On 1/9/2020 3:52 PM, Stefan Priebe - Profihost AG wrote: As a starting point the current idea is to use something like: 4-6 nodes with 12x 12tb disks each 128G Memory AMD EPYC 7302P 3GHz, 16C/32T 128GB RAM Something to discuss is - EC or go with 3 replicas. We'll use bluestore with compression. - Do we need something like Intel Optane for WAL / DB or not? Since we started using ceph we're mostly subscribed to SSDs - so no knowlege about HDD in place. Greets, Stefan Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG: Am 09.01.2020 um 16:10 schrieb Wido den Hollander : On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: Am 09.01.20 um 13:39 schrieb Janne Johansson: I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. We are aiming for about 100 writers. So if I read it correctly the writes will be 64k each. may be ;-) see below That should be doable, but you probably want something like NVMe for DB+WAL. You might want to tune that larger writes also go into the WAL to speed up the ingress writes. But you mainly want more spindles then less. I would like to give a little bit more insight about this and most probobly some overhead we currently have in those numbers. Those values come from our old classic raid storage boxes. Those use btrfs + zlib compression + subvolumes for those backups and we've collected those numbers from all of them. The new system should just replicate snapshots from the live ceph. Hopefully being able to use Erase Coding and compression? ;-) Compression might work, but only if the data is compressable. EC usually writes very fast, so that's good. I would recommend a lot of spindles those. More spindles == more OSDs == more performance. So instead of using 12TB drives you can consider 6TB or 8TB drives. Currently we have a lot of 5TB 2.5 drives in place so we could use them.we would like to start with around 4000 Iops and 250 MB per second while using 24 Drive boxes. We could please one or two NVMe PCIe cards in them. Stefan Wido Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
As a starting point the current idea is to use something like: 4-6 nodes with 12x 12tb disks each 128G Memory AMD EPYC 7302P 3GHz, 16C/32T 128GB RAM Something to discuss is - EC or go with 3 replicas. We'll use bluestore with compression. - Do we need something like Intel Optane for WAL / DB or not? Since we started using ceph we're mostly subscribed to SSDs - so no knowlege about HDD in place. Greets, Stefan Am 09.01.20 um 16:49 schrieb Stefan Priebe - Profihost AG: > >> Am 09.01.2020 um 16:10 schrieb Wido den Hollander : >> >> >> >>> On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: >>> Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: > > Am 09.01.20 um 13:39 schrieb Janne Johansson: >> >>I'm currently trying to workout a concept for a ceph cluster which can >>be used as a target for backups which satisfies the following >>requirements: >> >>- approx. write speed of 40.000 IOP/s and 2500 Mbyte/s >> >> >> You might need to have a large (at least non-1) number of writers to get >> to that sum of operations, as opposed to trying to reach it with one >> single stream written from one single client. > > > We are aiming for about 100 writers. So if I read it correctly the writes will be 64k each. >>> >>> may be ;-) see below >>> That should be doable, but you probably want something like NVMe for DB+WAL. You might want to tune that larger writes also go into the WAL to speed up the ingress writes. But you mainly want more spindles then less. >>> >>> I would like to give a little bit more insight about this and most >>> probobly some overhead we currently have in those numbers. Those values >>> come from our old classic raid storage boxes. Those use btrfs + zlib >>> compression + subvolumes for those backups and we've collected those >>> numbers from all of them. >>> >>> The new system should just replicate snapshots from the live ceph. >>> Hopefully being able to use Erase Coding and compression? ;-) >>> >> >> Compression might work, but only if the data is compressable. >> >> EC usually writes very fast, so that's good. I would recommend a lot of >> spindles those. More spindles == more OSDs == more performance. >> >> So instead of using 12TB drives you can consider 6TB or 8TB drives. > > Currently we have a lot of 5TB 2.5 drives in place so we could use them.we > would like to start with around 4000 Iops and 250 MB per second while using > 24 Drive boxes. We could please one or two NVMe PCIe cards in them. > > > Stefan > >> >> Wido >> >>> Greets, >>> Stefan >>> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
> Am 09.01.2020 um 16:10 schrieb Wido den Hollander : > > > >> On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: >> Hi Wido, >>> Am 09.01.20 um 14:18 schrieb Wido den Hollander: >>> >>> >>> On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: Am 09.01.20 um 13:39 schrieb Janne Johansson: > >I'm currently trying to workout a concept for a ceph cluster which can >be used as a target for backups which satisfies the following >requirements: > >- approx. write speed of 40.000 IOP/s and 2500 Mbyte/s > > > You might need to have a large (at least non-1) number of writers to get > to that sum of operations, as opposed to trying to reach it with one > single stream written from one single client. We are aiming for about 100 writers. >>> >>> So if I read it correctly the writes will be 64k each. >> >> may be ;-) see below >> >>> That should be doable, but you probably want something like NVMe for DB+WAL. >>> >>> You might want to tune that larger writes also go into the WAL to speed >>> up the ingress writes. But you mainly want more spindles then less. >> >> I would like to give a little bit more insight about this and most >> probobly some overhead we currently have in those numbers. Those values >> come from our old classic raid storage boxes. Those use btrfs + zlib >> compression + subvolumes for those backups and we've collected those >> numbers from all of them. >> >> The new system should just replicate snapshots from the live ceph. >> Hopefully being able to use Erase Coding and compression? ;-) >> > > Compression might work, but only if the data is compressable. > > EC usually writes very fast, so that's good. I would recommend a lot of > spindles those. More spindles == more OSDs == more performance. > > So instead of using 12TB drives you can consider 6TB or 8TB drives. Currently we have a lot of 5TB 2.5 drives in place so we could use them.we would like to start with around 4000 Iops and 250 MB per second while using 24 Drive boxes. We could please one or two NVMe PCIe cards in them. Stefan > > Wido > >> Greets, >> Stefan >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
I would try to scale horizontally with smaller ceph nodes, so you have the advantage of being able to choose an EC profile that does not require too much overhead and you can use failure domain host. Joachim Am 09.01.2020 um 15:31 schrieb Wido den Hollander: On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: Am 09.01.20 um 13:39 schrieb Janne Johansson: I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. We are aiming for about 100 writers. So if I read it correctly the writes will be 64k each. may be ;-) see below That should be doable, but you probably want something like NVMe for DB+WAL. You might want to tune that larger writes also go into the WAL to speed up the ingress writes. But you mainly want more spindles then less. I would like to give a little bit more insight about this and most probobly some overhead we currently have in those numbers. Those values come from our old classic raid storage boxes. Those use btrfs + zlib compression + subvolumes for those backups and we've collected those numbers from all of them. The new system should just replicate snapshots from the live ceph. Hopefully being able to use Erase Coding and compression? ;-) Compression might work, but only if the data is compressable. EC usually writes very fast, so that's good. I would recommend a lot of spindles those. More spindles == more OSDs == more performance. So instead of using 12TB drives you can consider 6TB or 8TB drives. Wido Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
On 1/9/20 2:27 PM, Stefan Priebe - Profihost AG wrote: > Hi Wido, > Am 09.01.20 um 14:18 schrieb Wido den Hollander: >> >> >> On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: >>> >>> Am 09.01.20 um 13:39 schrieb Janne Johansson: I'm currently trying to workout a concept for a ceph cluster which can be used as a target for backups which satisfies the following requirements: - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. >>> >>> >>> We are aiming for about 100 writers. >> >> So if I read it correctly the writes will be 64k each. > > may be ;-) see below > >> That should be doable, but you probably want something like NVMe for DB+WAL. >> >> You might want to tune that larger writes also go into the WAL to speed >> up the ingress writes. But you mainly want more spindles then less. > > I would like to give a little bit more insight about this and most > probobly some overhead we currently have in those numbers. Those values > come from our old classic raid storage boxes. Those use btrfs + zlib > compression + subvolumes for those backups and we've collected those > numbers from all of them. > > The new system should just replicate snapshots from the live ceph. > Hopefully being able to use Erase Coding and compression? ;-) > Compression might work, but only if the data is compressable. EC usually writes very fast, so that's good. I would recommend a lot of spindles those. More spindles == more OSDs == more performance. So instead of using 12TB drives you can consider 6TB or 8TB drives. Wido > Greets, > Stefan > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
Hi Wido, Am 09.01.20 um 14:18 schrieb Wido den Hollander: > > > On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: >> >> Am 09.01.20 um 13:39 schrieb Janne Johansson: >>> >>> I'm currently trying to workout a concept for a ceph cluster which can >>> be used as a target for backups which satisfies the following >>> requirements: >>> >>> - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s >>> >>> >>> You might need to have a large (at least non-1) number of writers to get >>> to that sum of operations, as opposed to trying to reach it with one >>> single stream written from one single client. >> >> >> We are aiming for about 100 writers. > > So if I read it correctly the writes will be 64k each. may be ;-) see below > That should be doable, but you probably want something like NVMe for DB+WAL. > > You might want to tune that larger writes also go into the WAL to speed > up the ingress writes. But you mainly want more spindles then less. I would like to give a little bit more insight about this and most probobly some overhead we currently have in those numbers. Those values come from our old classic raid storage boxes. Those use btrfs + zlib compression + subvolumes for those backups and we've collected those numbers from all of them. The new system should just replicate snapshots from the live ceph. Hopefully being able to use Erase Coding and compression? ;-) Greets, Stefan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
On 1/9/20 2:07 PM, Daniel Aberger - Profihost AG wrote: > > Am 09.01.20 um 13:39 schrieb Janne Johansson: >> >> I'm currently trying to workout a concept for a ceph cluster which can >> be used as a target for backups which satisfies the following >> requirements: >> >> - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s >> >> >> You might need to have a large (at least non-1) number of writers to get >> to that sum of operations, as opposed to trying to reach it with one >> single stream written from one single client. > > > We are aiming for about 100 writers. So if I read it correctly the writes will be 64k each. That should be doable, but you probably want something like NVMe for DB+WAL. You might want to tune that larger writes also go into the WAL to speed up the ingress writes. But you mainly want more spindles then less. Wido > > Cheers > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
Am 09.01.20 um 13:39 schrieb Janne Johansson: > > I'm currently trying to workout a concept for a ceph cluster which can > be used as a target for backups which satisfies the following > requirements: > > - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s > > > You might need to have a large (at least non-1) number of writers to get > to that sum of operations, as opposed to trying to reach it with one > single stream written from one single client. We are aiming for about 100 writers. Cheers ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Looking for experience
> > > I'm currently trying to workout a concept for a ceph cluster which can > be used as a target for backups which satisfies the following requirements: > > - approx. write speed of 40.000 IOP/s and 2500 Mbyte/s > You might need to have a large (at least non-1) number of writers to get to that sum of operations, as opposed to trying to reach it with one single stream written from one single client. -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com