[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Martin Verges
Hello, * 2 x Xeon Silver 4212 (12C/24T) > I would choose single cpu AMD EPYC systems for lower price with better performance. Supermicro does have some good systems for AMD as well. * 16 x 10 TB nearline SAS HDD (8 bays for future needs) > Don't waste money here as well. No real gain.

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Mike Christie
Also, I do not think we saw if the ceph-iscsi versions are the same. We saw the ceph versions are different which is ok. If you were using a old ceph-iscsi version which has different settings/values with a newer version then you could see this error. On 12/03/2019 06:47 PM, Jason Dillaman wrote:

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Alex Gorbachev
FYI for ZFS on RBD. https://github.com/zfsonlinux/zfs/issues/3324 We go for a more modest setting with async to 64, not 2. -- Alex Gorbachev Intelligent Systems Services Inc. On Tue, Dec 3, 2019 at 3:07 PM Fabien Sirjean wrote: > Hi Ceph users ! > > After years of using Ceph, we plan to

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Jack
> Cost of SSD vs. HDD is still in the 6:1 favor of HHD's. It is not, you can buy fewer thing for less money, with HDDs -that is true $/TB is better from spinning than from flash, but this is not the most important indicator, and by far: $/IOPS is another story indeed On 12/3/19 9:46 PM,

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

2019-12-03 Thread Wesley Dillingham
Thanks. If I am reading this correctly the ability to remove an iSCSI gateway would allow the remaining iSCSI gateways to take over for the removed gateway's LUN's as of > 3.0. Thats good, we run 3.2. However, because the actual update of the central config object happens from the to-be-deleted

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Paul Emmerich
It's pretty pointless to discuss erasure coding vs replicated without knowing how it'll be used. There are setups where erasure coding is faster than replicated. You do need to write less data overall, so if that's your bottleneck then erasure coding will be faster. Paul -- Paul Emmerich

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
> If k=8,m=3 is too slow on HDDs, so you need replica 3 and SSD DB/WAL, > vs EC 8,3 on SSD, then that's (1/3) / (8/11) = 0.45 multiplier on the > SSD space required vs HDDs. > That brings it from 6x to 2.7x. Then you have the benefit of not > needing separate SSDs for DB/WAL both in hardware cost

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Nathan Fish
If k=8,m=3 is too slow on HDDs, so you need replica 3 and SSD DB/WAL, vs EC 8,3 on SSD, then that's (1/3) / (8/11) = 0.45 multiplier on the SSD space required vs HDDs. That brings it from 6x to 2.7x. Then you have the benefit of not needing separate SSDs for DB/WAL both in hardware cost and

[ceph-users] Re: Can min_read_recency_for_promote be -1

2019-12-03 Thread Robert LeBlanc
On Mon, Dec 2, 2019 at 3:42 PM Romit Misra wrote: > > Hi Robert, > I am not quite sure if I get your question correct, but what I understand > is that you want the inbound writes to land on the cache tier, which > presumably would be on a faster media, possibily a ssd. > > From there you would

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
>> * Hardware raid with Battery Backed write-cache - will allow OSD to ack >> writes before hitting spinning rust. > > Disagree. See my litany from a few months ago. Use a plain, IT-mode HBA. > Take the $$ you save and put it toward building your cluster out of SSDs > instead of HDDs. That way

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Jack
Hi, You will get slow performance: EC is slow, HDD are slow too With 400 iops per device, you get 89600 iops for the whole cluster, raw With 8+3EC, each logical write is mapped to 11 physical writes You get only 8145 write IOPS (is my math correct ?), which I find very low for a PB storage So,

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread Nathan Fish
Rather than a cache tier, I would put an NVMe device in each OSD box for Bluestore's DB and WAL. This will significantly improve small IOs. 14*16 HDDs / 11 chunks = 20 HDD's worth of write IOPs. If you expect these files to be written sequentially, this is probably ok. Mons and mgr on OSD nodes

[ceph-users] Re: Building a petabyte cluster from scratch

2019-12-03 Thread jesper
> After years of using Ceph, we plan to build soon a new cluster bigger than > what > we've done in the past. As the project is still in reflection, I'd like to > have your thoughts on our planned design : any feedback is welcome :) > > > ## Requirements > > * ~1 PB usable space for file storage,

[ceph-users] Building a petabyte cluster from scratch

2019-12-03 Thread Fabien Sirjean
Hi Ceph users ! After years of using Ceph, we plan to build soon a new cluster bigger than what we've done in the past. As the project is still in reflection, I'd like to have your thoughts on our planned design : any feedback is welcome :) ## Requirements * ~1 PB usable space for file

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

2019-12-03 Thread Mike Christie
I do not think it's going to do what you want when the node you want to delete is down. It looks like we only temporarily stop the gw from being exported. It does not update the gateway.cfg, because we do the config removal call on the node we want to delete. So gwcli would report success and

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

2019-12-03 Thread Paul Emmerich
Gateway removal is indeed supported since ceph-iscsi 3.0 (or was it 2.7?) and it works while it is offline :) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Tue,

[ceph-users] Re: iSCSI Gateway reboots and permanent loss

2019-12-03 Thread Jason Dillaman
If I recall correctly, the recent ceph-iscsi release supports the removal of a gateway via the "gwcli". I think the Ceph dashboard can do that as well. On Tue, Dec 3, 2019 at 1:59 PM Wesley Dillingham wrote: > > We utilize 4 iSCSI gateways in a cluster and have noticed the following > during

[ceph-users] iSCSI Gateway reboots and permanent loss

2019-12-03 Thread Wesley Dillingham
We utilize 4 iSCSI gateways in a cluster and have noticed the following during patching cycles when we sequentially reboot single iSCSI-gateways: "gwcli" often hangs on the still-up iSCSI GWs but sometimes still functions and gives the message: "1 gateway is inaccessible - updates will be

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Jason Dillaman
On Tue, Dec 3, 2019 at 12:58 PM Gesiel Galvão Bernardes wrote: >> >> Jason, > > > Returned a json with "data" and a different strnig. Were they supposed to be > the same? Yes, they should be the same since those are the hashes of the config. > Now I checked that ceph version is different in

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Gesiel Galvão Bernardes
> Jason, Returned a json with "data" and a different strnig. Were they supposed to be the same? Now I checked that ceph version is different in the gateways: ceph-iscsi1 running 13.2.6 and ceph-iscsi1 running 13.2.7. Could this be the problem? Gesiel Em ter., 3 de dez. de 2019 às 14:43, Jason

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Jason Dillaman
You might need to reset the target API service. You can compare what the two REST APIs think the config is by running: $ curl --insecure --user : -X GET http://ceph-iscsi1:5000/api/sysinfo/checkconf $ curl --insecure --user : -X GET http://ceph-iscsi3:5000/api/sysinfo/checkconf (tweak username,

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Gesiel Galvão Bernardes
Are exactly the same: [root@ceph-iscsi1 ~]# sha256sum /etc/ceph/iscsi-gateway.cfg 33620867c9c3c5e6a666df2bb461150d4a885db2989cd3aea812a29555fdc45a /etc/ceph/iscsi-gateway.cfg [root@ceph-iscsi3 ~]# sha256sum /etc/ceph/iscsi-gateway.cfg

[ceph-users] Re: Error in add new ISCSI gateway

2019-12-03 Thread Jason Dillaman
The sha256sum between the "/etc/ceph/iscsi-gateway.cfg" where you are running the 'gwcli' tool and `ceph-iscsi3` most likely mismatches. On Tue, Dec 3, 2019 at 12:19 PM Gesiel Galvão Bernardes wrote: > > Hi everyone, > > I have a problem trying to add an ISCSI gateway. The following error is >

[ceph-users] Re: Behavior of EC pool when a host goes offline

2019-12-03 Thread Robert LeBlanc
On Tue, Nov 26, 2019 at 7:45 PM majia xiao wrote: > Hi all, > > We have a Ceph(version 12.2.4)cluster that adopts EC pools, and it > consists of 10 hosts for OSDs. > > The corresponding commands to create the EC pool are listed as follows: > > > > ceph osd erasure-code-profile set

[ceph-users] Re: why osd's heartbeat partner comes from another root tree?

2019-12-03 Thread Aleksey Gutikov
According to my understanding, osd's heartbeat partners only come from those osds who assume the same pg Hello, That was my initial assumption too. But according to my experience set of heartbeat peers include pg peers and some other osds. Actually it contains: - pg peers - next and

[ceph-users] how to speed up mount a ceph fs when a node unusual down in ceph cluster

2019-12-03 Thread h...@portsip.cn
I created one ceph cluster. node-1: mon, mgr, osd.0, mds node-2: mon, mgr, osd.1, mds node-3: mon, mgr, osd.2, mds When the cluster is working normally, using command "mount -t ceph :/ /mnt -o name=admin,secret=" to mount is ok. But when a node unusual down(like poweroff), and using same

[ceph-users] Re: Balancing PGs across OSDs

2019-12-03 Thread Thomas Schneider
Hi, I have set upmap_max_iterations 2 w/o any impact. In my opinion the issue is that the evaluation of OSDs data load is not working. Or can you explain why osdmaptool does not report anything to do? Regards Thomas Am 03.12.2019 um 08:26 schrieb Harald Staub: > Hi all > > Something to try: >