Re: [ceph-users] Fwd: Planning all flash cluster

2019-01-30 Thread Félix Barbeira
> Is there anything that obviously stands out as severely unbalanced? The R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a different HBA might be a better idea, any recommendations please? > Don't know that HBA. Does it support pass through mode or HBA mode? H710 card

Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-30 Thread Janne Johansson
Den ons 30 jan. 2019 kl 05:24 skrev Linh Vu : > > We use https://github.com/cernceph/ceph-scripts ceph-gentle-split script to > slowly increase by 16 pgs at a time until we hit the target. > > Somebody recommends that this adjustment should be done in multiple stages, > e.g. increase 1024 pg

Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-30 Thread Matthew Vernon
Hi, On 30/01/2019 02:39, Albert Yue wrote: > As the number of OSDs increase in our cluster, we reach a point where > pg/osd is lower than recommend value and we want to increase it from > 4096 to 8192.  For an increase that small, I'd just do it in one go (and have done so on our production

Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-30 Thread ceph
Hello scott, Ive Seen a Solution from the croit Guys Perhaps this is Related? https://croit.io/2018/09/23/2018-09-23-debian-mirror Greetz Mehmet Am 14. Januar 2019 20:33:59 MEZ schrieb Scottix : >Wow OK. >I wish there was some official stance on this. > >Now I got to remove those OSDs,

Re: [ceph-users] Question regarding client-network

2019-01-30 Thread Robert Sander
On 30.01.19 08:55, Buchberger, Carsten wrote: > So as long as there is ip-connectivity between the client, and the > client-network ip –adressses of our ceph-cluster everything is fine ? Yes, client traffic is routable. Even inter-OSD traffic is routable, there are reports from people running

[ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Fabio - NS3 srl
Hello guys, i have a Ceph with a full S3 ~# ceph health detail HEALTH_ERR 1 full osd(s); 1 near full osd(s) osd.2 is full at 95% osd.5 is near full at 85% I want to delete some bucket but when i tried to show list bucket ~# radosgw-admin bucket list 2019-01-30 11:41:47.933621 7f467a9d0780  0

[ceph-users] moving a new hardware to cluster

2019-01-30 Thread Fabio Abreu
Hi everybody, I have a doubt about moving a new sata storage(new hardware too) inside of production rack with a huge amount data. I thinks this movimentation creates news pgs and can be reduce my performance if i do this wrong and we don't a lot experience in a new hardware move inside cluster.

[ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello, I have my cluster set up correctly now (thank you again for the help) I am seeking now a way to get cluster health thru API (REST) with curl command. I had a look at manager / RESTful and Dashboard but none seems to provide simple way to get cluster health RESTful module do a lot of

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Wido den Hollander
On 1/30/19 2:02 PM, PHARABOT Vincent wrote: > Hello, > >   > > I have my cluster set up correctly now (thank you again for the help) > >   > > I am seeking now a way to get cluster health thru API (REST) with curl > command. > > I had a look at manager / RESTful and Dashboard but none seems

Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Fyodor Ustinov
Hi! But unless after "ceph osd crush remove" I will not got the undersized objects? That is, this is not the same thing as simply turning off the OSD and waiting for the cluster to be restored? - Original Message - From: "Wido den Hollander" To: "Fyodor Ustinov" , "ceph-users" Sent:

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>I don't see any smoking gun here... :/ I need to test to compare when latency are going very high, but I need to wait more days/weeks. >>The main difference between a warm OSD and a cold one is that on startup >>the bluestore cache is empty. You might try setting the bluestore cache

[ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Fyodor Ustinov
Hi! I thought I should first do "ceph osd out", wait for the end relocation of the misplaced objects and after that do "ceph osd purge". But after "purge" the cluster starts relocation again. Maybe I'm doing something wrong? Then what is the correct way to delete the OSD from the cluster?

Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Wido den Hollander
On 1/30/19 2:00 PM, Fyodor Ustinov wrote: > Hi! > > I thought I should first do "ceph osd out", wait for the end relocation of > the misplaced objects and after that do "ceph osd purge". > But after "purge" the cluster starts relocation again. > > Maybe I'm doing something wrong? Then what

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Alexandru Cucu
Hello, Not exactly what you were looking for, but you could use the Prometheus plugin for ceph-mgr and get the health status from the metrics. curl -s http://ceph-mgr-node:9283/metrics | grep ^ceph_health_status On Wed, Jan 30, 2019 at 3:04 PM PHARABOT Vincent wrote: > > Hello, > > > > I

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Sage Weil
On Wed, 30 Jan 2019, Alexandre DERUMIER wrote: > Hi, > > here some new results, > different osd/ different cluster > > before osd restart latency was between 2-5ms > after osd restart is around 1-1.5ms > > http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) >

[ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread M Ranga Swami Reddy
Hello - Can I use the ceph block volume with RAID#0? Are there any issues with this? Thanks Swami ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Lenz Grimmer
Hi, On 1/30/19 2:02 PM, PHARABOT Vincent wrote: > I have my cluster set up correctly now (thank you again for the help) What version of Ceph is this? > I am seeking now a way to get cluster health thru API (REST) with curl > command. > > I had a look at manager / RESTful and Dashboard but

Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Krishna Verma
Hi Casey, Thanks for your reply, however I tried with "--source-zone" option with sync command but getting below error: Sync status From slave gateway to master zone "noida1" [cephuser@zabbix-client ~]$ radosgw-admin sync status --source-zone noida1 2>/dev/null realm

Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Scottix
I generally have gone the crush reweight 0 route This way the drive can participate in the rebalance, and the rebalance only happens once. Then you can take it out and purge. If I am not mistaken this is the safest. ceph osd crush reweight 0 On Wed, Jan 30, 2019 at 7:45 AM Fyodor Ustinov

Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Amit Ghadge
Better way is increase osd set-full-ratio slightly (.97) and then remove buckets. -AmitG On Wed, 30 Jan 2019, 21:30 Paul Emmerich, wrote: > Quick and dirty solution: take the full OSD down to issue the deletion > command ;) > > Better solutions: temporarily incrase the full limit (ceph osd >

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Mark Nelson
On 1/30/19 7:45 AM, Alexandre DERUMIER wrote: I don't see any smoking gun here... :/ I need to test to compare when latency are going very high, but I need to wait more days/weeks. The main difference between a warm OSD and a cold one is that on startup the bluestore cache is empty. You

Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Amit Ghadge
Have you commit your changes on slave gateway? First, run commit command on slave gateway and then try. -AmitG On Wed, 30 Jan 2019, 21:06 Krishna Verma, wrote: > Hi Casey, > > Thanks for your reply, however I tried with "--source-zone" option with > sync command but getting below error: > >

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello Thanks for the info But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full, /api/health/minimal also...) Vincent -Message d'origine- De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Lenz Grimmer Envoyé : mercredi 30 janvier 2019 16:26 À :

[ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Hector Martin
Hi list, I'm experimentally running single-host CephFS as as replacement for "traditional" filesystems. My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All of the components are running on the same host (mon/osd/mds/kernel CephFS client). I've set the stripe_unit/object_size

Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Paul Emmerich
Quick and dirty solution: take the full OSD down to issue the deletion command ;) Better solutions: temporarily incrase the full limit (ceph osd set-full-ratio) or reduce the OSD's reweight (ceph osd reweight) Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Stefan Priebe - Profihost AG
Hi, Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: > Hi Stefan, > >>> currently i'm in the process of switching back from jemalloc to tcmalloc >>> like suggested. This report makes me a little nervous about my change. > Well,I'm really not sure that it's a tcmalloc bug. > maybe bluestore

Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Krishna Verma
Hi Amit, Still same. Please see the below output. Anything else I can try? Please update [cephuser@zabbix-client ~]$ radosgw-admin period update --commit 2>/dev/null Sending period to new master zone 71931e0e-1be6-449f-af34-edb4166c4e4a [cephuser@zabbix-client ~]$ sudo systemctl start

Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread David C
Hi Patrick Thanks for the info. If I did multiple exports, how does that work in terms of the cache settings defined in ceph.conf, are those settings per CephFS client or a shared cache? I.e if I've definied client_oc_size, would that be per export? Cheers, On Tue, Jan 15, 2019 at 6:47 PM

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Martin Verges
Hello Vincent, when you install or migrate to croit, you can get a large number of REST API's (see https://croit.io/docs/v1809/cluster#get-cluster-status) and we support read-only users that you can create in our GUI. If you want to use our API's from the cli, you can use our httpie-auth plugin

Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-30 Thread Brian Godette
Did you mkfs with -O 64bit or have it in the [defaults] section of /etc/mke2fs.conf before creating the filesystem? If you didn't 4TB is as big as it goes and can't be changed after the fact. If the device is already larger than 4TB when you create the filesystem, mkfs does the right then and

Re: [ceph-users] block storage over provisioning

2019-01-30 Thread Wido den Hollander
On 1/30/19 9:12 PM, Void Star Nill wrote: > Hello, > > When a Ceph block device is created with a given size, does Ceph > allocate all that space right away or is that allocated as the user > starts storing the data? > > I want to know if we can over provision the Ceph cluster. For example, >

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>If it does, probably only by accident. :) The autotuner in master is >>pretty dumb and mostly just grows/shrinks the caches based on the >>default ratios but accounts for the memory needed for rocksdb >>indexes/filters. It will try to keep the total OSD memory consumption >>below the

Re: [ceph-users] moving a new hardware to cluster

2019-01-30 Thread Fabio Abreu
Hi Martin Thanks for you reply ! Yes I am using "osd recovery op priority", "osd max backfills", "osd recovery max active" and "osd client op priorit" to trying minimize the impact in the cluster expansion. My Ceph version is 10.2.7 Jewel and I am moving 1 osd waiting the recovery and go to

Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-30 Thread David Zafman
Strange, I can't reproduce this with v13.2.4.  I tried the following scenarios: pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out).  The df on osd.2 shows 0 space, but only osd.4 (backfill target) checks full space. pg acting 1, 0, 2 -> up 4,3,5 (osd,1,0,2 all marked out).  The df for

Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread solarflow99
Do can you do HA on the NFS shares? On Wed, Jan 30, 2019 at 9:10 AM David C wrote: > Hi Patrick > > Thanks for the info. If I did multiple exports, how does that work in > terms of the cache settings defined in ceph.conf, are those settings per > CephFS client or a shared cache? I.e if I've

Re: [ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Marc Roos
I was wondering the same, from a 'default' setup I get this performance, no idea if this is bad, good or normal. 4k r ran. 4k w ran. 4k r seq. 4k w seq. 1024k r ran. 1024k w ran. 1024k r seq. 1024k w seq. size lat iops kB/s lat iops kB/s lat iops

[ceph-users] block storage over provisioning

2019-01-30 Thread Void Star Nill
Hello, When a Ceph block device is created with a given size, does Ceph allocate all that space right away or is that allocated as the user starts storing the data? I want to know if we can over provision the Ceph cluster. For example, if we have a cluster with 10G available space, am I allowed

Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-30 Thread Wido den Hollander
On 1/30/19 9:08 PM, David Zafman wrote: > > Strange, I can't reproduce this with v13.2.4.  I tried the following > scenarios: > > pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out).  The df on osd.2 > shows 0 space, but only osd.4 (backfill target) checks full space. > > pg acting 1, 0, 2 ->

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>Thanks. Is there any reason you monitor op_w_latency but not >>op_r_latency but instead op_latency? >> >>Also why do you monitor op_w_process_latency? but not op_r_process_latency? I monitor read too. (I have all metrics for osd sockets, and a lot of graphs). I just don't see latency

[ceph-users] Ceph mimic issue with snaptimming.

2019-01-30 Thread Darius Kasparavičius
Hello, I have recently update a cluster to mimic. After the upgrade I have started converting nodes to bluestore one by one. While ceph was rebalancing I slapped a "nosnaptrim" on the cluster to save a bit of IO. After the rebalancing was done I enabled the snaptrim and my osds started flapping

Re: [ceph-users] block storage over provisioning

2019-01-30 Thread Void Star Nill
Thanks Wido. Appreciate quick response. On Wed, 30 Jan 2019 at 12:27, Wido den Hollander wrote: > > > On 1/30/19 9:12 PM, Void Star Nill wrote: > > Hello, > > > > When a Ceph block device is created with a given size, does Ceph > > allocate all that space right away or is that allocated as the

Re: [ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread M Ranga Swami Reddy
My thought was - Ceph block volume with raid#0 (means I mounted a ceph block volumes to an instance/VM, there I would like to configure this volume with RAID0). Just to know, if anyone doing the same as above, if yes what are the constraints? Thanks Swami On Wed, Jan 30, 2019 at 7:56 PM Janne

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Lenz Grimmer
Am 30. Januar 2019 19:33:14 MEZ schrieb PHARABOT Vincent : >Thanks for the info >But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full, >/api/health/minimal also...) On which node did you try to access the API? Did you enable the Dashboard module in Ceph manager? Lenz --

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
Hi Stefan, >>currently i'm in the process of switching back from jemalloc to tcmalloc >>like suggested. This report makes me a little nervous about my change. Well,I'm really not sure that it's a tcmalloc bug. maybe bluestore related (don't have filestore anymore to compare) I need to compare

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hi Yes it could do the job in the meantime Thank you ! Vincent -Message d'origine- De : Alexandru Cucu [mailto:m...@alexcucu.ro] Envoyé : mercredi 30 janvier 2019 14:31 À : PHARABOT Vincent Cc : ceph-users@lists.ceph.com Objet : Re: [ceph-users] Simple API to have cluster healthcheck ?

Re: [ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread Janne Johansson
Den ons 30 jan. 2019 kl 14:47 skrev M Ranga Swami Reddy < swamire...@gmail.com>: > Hello - Can I use the ceph block volume with RAID#0? Are there any > issues with this? > Hard to tell if you mean raid0 over a block volume or a block volume over raid0. Still, it is seldom a good idea to stack