[ceph-users] MDS locatiins

2017-12-21 Thread nigel davies
Hay all Is it ok to set up mds on the same serves that do host the osd's or should they be on different server's ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Cephfs limis

2017-12-21 Thread nigel davies
Right ok I take an look. Can you do that after the pool /cephfs has been set up On 21 Dec 2017 12:25 pm, "Yan, Zheng" wrote: > On Thu, Dec 21, 2017 at 6:18 PM, nigel davies wrote: > > Hay all is it possable to set cephfs to have an sapce limit > > eg i

Re: [ceph-users] Cephfs NFS failover

2017-12-21 Thread nigel davies
Thanks all on this one The ctdb worked amazing, just need to tweak the settings on it. So the failover happens a tad faster. But all in all it works. Thanks for all your help On 21 Dec 2017 9:08 am, "Robert Sander" wrote: > On 20.12.2017 18:45, nigel davies

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Yan, Zheng
On Thu, Dec 21, 2017 at 11:46 PM, Webert de Souza Lima wrote: > Hello Zheng, > > Thanks for opening that issue on the bug tracker. > > Also thanks for that tip. Caps dropped from 1.6M to 600k for that client. idle client shouldn't hold so many caps. > Is it safe to run in

Re: [ceph-users] Ceph as an Alternative to HDFS for Hadoop

2017-12-21 Thread Serkan Çoban
>Also, are there any benchmark comparisons between hdfs and ceph specifically >around performance of apps benefiting from data locality ? There will be no data locality in ceph, because all the data is accessed through network. On Fri, Dec 22, 2017 at 4:52 AM, Traiano Welcome

[ceph-users] Ceph as an Alternative to HDFS for Hadoop

2017-12-21 Thread Traiano Welcome
Hi List I'm researching the possibility os using ceph as a drop in replacement for hdfs for applications using spark and hadoop. I note that the jewel documentation states that it requires hadoop 1.1.x, which seems a little dated and would be of concern for peopel:

Re: [ceph-users] Permissions for mon status command

2017-12-21 Thread David Turner
You aren't specifying your cluster user, only the keyring. So the connection command is still trying to use the default client.admin instead of client.python. Here's the connect line I use in my scripts. rados.Rados(conffile='/etc/ceph/ceph.conf', conf=dict(keyring = '

[ceph-users] Ceph not reclaiming space or overhead?

2017-12-21 Thread Brian Woods
I will start with I am very new to ceph and am trying to teach myself the ins and outs. While doing this I have been creating and destroying pools as I experiment on some test hardware. Something I noticed was that when a pool is deleted, the space is not always freed 100%. This is true even

Re: [ceph-users] Permissions for mon status command

2017-12-21 Thread Alvaro Soto
Hi Andreas, I believe is not a problem of caps, I have tested using the same cap on mon and I have the same problem, still looking into. [client.python] key = AQDORjxaYHG9JxAA0qiZC0Rmf3qulsO3P/bZgw== caps mon = "allow r" # ceph -n client.python --keyring ceph.client.python.keyring health

[ceph-users] Permissions for mon status command

2017-12-21 Thread Andreas Calminder
Hi, I'm writing a small python script using librados to display cluster health, same info as ceph health detail show, it works fine but I rather not use the admin keyring for something like this. However I have no clue what kind of caps I should or can set, I was kind of hoping that mon allow r

Re: [ceph-users] How to use vfs_ceph

2017-12-21 Thread David C
At a glance looks OK, I've not tested this in a while. Silly question but does your Samba package definitely ship with the Ceph vfs? Caught me out in the past. Have you tried exporting a sub dir? Maybe 777 it although shouldn't make a difference. On 21 Dec 2017 13:16, "Felix Stolte"

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Stefan Kooman
Quoting Dan van der Ster (d...@vanderster.com): > Thanks Stefan. But isn't there also some vgremove or lvremove magic > that needs to bring down these /dev/dm-... devices I have? Ah, you want to clean up properly before that. Sure: lvremove -f / vgremove pvremove /dev/ceph-device (should wipe

Re: [ceph-users] Cache tier unexpected behavior: promote on lock

2017-12-21 Thread Захаров Алексей
Thanks for the answers! As it leads to a decrease of caching efficiency, i've opened an issue: http://tracker.ceph.com/issues/22528 15.12.2017, 23:03, "Gregory Farnum" : > On Thu, Dec 14, 2017 at 9:11 AM, Захаров Алексей > wrote: >>  Hi, Gregory, >>  

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2017-12-21 Thread Dénes Dolhay
If i were in your shoes, i would grab a failed disc which DOES NOT contain the data you need, an oscilloscope, and start experimenting on it ... try to find debug testpoints on the panel etc. At the same time i would contact the factory or a data recovery company with a good reputation, and

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima
Hello Zheng, Thanks for opening that issue on the bug tracker. Also thanks for that tip. Caps dropped from 1.6M to 600k for that client. Is it safe to run in a cronjob? Let's say, once or twice a day during production? Thanks! Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
On Thu, Dec 21, 2017 at 3:59 PM, Stefan Kooman wrote: > Quoting Dan van der Ster (d...@vanderster.com): >> Hi, >> >> For someone who is not an lvm expert, does anyone have a recipe for >> destroying a ceph-volume lvm osd? >> (I have a failed disk which I want to deactivate / wipe

Re: [ceph-users] MDS behind on trimming

2017-12-21 Thread Stefan Kooman
Quoting Dan van der Ster (d...@vanderster.com): > Hi, > > We've used double the defaults for around 6 months now and haven't had any > behind on trimming errors in that time. > >mds log max segments = 60 >mds log max expiring = 40 > > Should be simple to try. Yup, and works like a

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Thu, Dec 21, 2017 at 3:04 PM, Serguei Bezverkhi (sbezverk) wrote: > Hi Ilya, > > Here you go, no k8s services running this time: > > sbezverk@kube-4:~$ sudo rbd map raw-volume --pool kubernetes --id admin -m > 192.168.80.233 --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA==

Re: [ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Stefan Kooman
Quoting Dan van der Ster (d...@vanderster.com): > Hi, > > For someone who is not an lvm expert, does anyone have a recipe for > destroying a ceph-volume lvm osd? > (I have a failed disk which I want to deactivate / wipe before > physically removing from the host, and the tooling for this doesn't

[ceph-users] [luminous 12.2.2] Cluster write performance degradation problem(possibly tcmalloc related)

2017-12-21 Thread shadow_lin
My testing cluster is an all hdd cluster with 12 osd(10T hdd each). I moinitor luminous 12.2.2 write performance and osd memory usage with grafana graph for statistic logging. The test is done by using fio on a mounted rbd with follow fio parameters: fio -directory=fiotest -direct=1 -thread

[ceph-users] ceph-volume lvm deactivate/destroy/zap

2017-12-21 Thread Dan van der Ster
Hi, For someone who is not an lvm expert, does anyone have a recipe for destroying a ceph-volume lvm osd? (I have a failed disk which I want to deactivate / wipe before physically removing from the host, and the tooling for this doesn't exist yet http://tracker.ceph.com/issues/22287) >

[ceph-users] Gateway timeout

2017-12-21 Thread Brent Kennedy
I have noticed over the years ( been using ceph since 2013 ) that when an OSD attached to a single physical drive ( JBOD setup ) that is failing, that at times this will cause rados gateways to go offline. I have two clusters running ( one on firefly and one on hammer, both scheduled for upgrades

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Serguei Bezverkhi (sbezverk)
Hi Ilya, Here you go, no k8s services running this time: sbezverk@kube-4:~$ sudo rbd map raw-volume --pool kubernetes --id admin -m 192.168.80.233 --key=AQCeHO1ZILPPDRAA7zw3d76bplkvTwzoosybvA== /dev/rbd0 sbezverk@kube-4:~$ sudo rbd status raw-volume --pool kubernetes --id admin -m

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Yan, Zheng
On Thu, Dec 21, 2017 at 7:33 PM, Webert de Souza Lima wrote: > I have upgraded the kernel on a client node (one that has close-to-zero > traffic) used for tests. > >{ > "reconnecting" : false, > "id" : 1620266, > "num_leases" : 0, > "inst" :

Re: [ceph-users] MDS behind on trimming

2017-12-21 Thread Yan, Zheng
On Thu, Dec 21, 2017 at 9:32 PM, Stefan Kooman wrote: > Hi, > > We have two MDS servers. One active, one active-standby. While doing a > parallel rsync of 10 threads with loads of files, dirs, subdirs we get > the following HEALTH_WARN: > > ceph health detail > HEALTH_WARN 2 MDSs

Re: [ceph-users] MDS behind on trimming

2017-12-21 Thread Dan van der Ster
Hi, We've used double the defaults for around 6 months now and haven't had any behind on trimming errors in that time. mds log max segments = 60 mds log max expiring = 40 Should be simple to try. -- dan On Thu, Dec 21, 2017 at 2:32 PM, Stefan Kooman wrote: > Hi, > >

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2017-12-21 Thread David Herselman
Hi, I assume this can only be a physical manufacturing flaw or a firmware bug? Do Intel publish advisories on recalled equipment? Should others be concerned about using Intel DC S4600 SSD drives? Could this be an electrical issue on the Hot Swap Backplane or BMC firmware issue? Either way, all

[ceph-users] MDS behind on trimming

2017-12-21 Thread Stefan Kooman
Hi, We have two MDS servers. One active, one active-standby. While doing a parallel rsync of 10 threads with loads of files, dirs, subdirs we get the following HEALTH_WARN: ceph health detail HEALTH_WARN 2 MDSs behind on trimming MDS_TRIM 2 MDSs behind on trimming mdsmds2(mds.0): Behind on

Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

2017-12-21 Thread shadow_lin
Thanks for your information, but I don't think it is my case.My cluster don't have any ssd. 2017-12-21 lin.yunfan 发件人:Denes Dolhay 发送时间:2017-12-18 06:41 主题:Re: [ceph-users] [Luminous 12.2.2] Cluster peformance drops after certain point of time

[ceph-users] How to use vfs_ceph

2017-12-21 Thread Felix Stolte
Hello folks, is anybody using the vfs_ceph module for exporting cephfs as samba shares? We are running ceph jewel with cephx enabled. Manpage of vfs_ceph only references the option ceph:config_file. How do I need to configure my share (or maybe ceph.conf)? log.smbd:  '/' does not exist or

Re: [ceph-users] Added two OSDs, 10% of pgs went inactive

2017-12-21 Thread Daniel K
Caspar, I found Nick Fisk's post yesterday http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-December/023223.html and set osd_max_pg_per_osd_hard_ratio = 4 in my ceph.conf on the OSDs and restarted the 10TB OSDs. The PGs went back active and recovery is complete now. My setup is similar

Re: [ceph-users] Cephfs limis

2017-12-21 Thread Yan, Zheng
On Thu, Dec 21, 2017 at 6:18 PM, nigel davies wrote: > Hay all is it possable to set cephfs to have an sapce limit > eg i like to set my cephfs to have an limit of 20TB > and my s3 storage to have 4TB for example > you can set pool quota on cephfs data pools > thanks > >

Re: [ceph-users] ceph status doesnt show available and used disk space after upgrade

2017-12-21 Thread kevin parrikar
accidently removed mailing list email ++ceph-users Thanks a lot JC for looking into this issue. I am really out of ideas. ceph.conf on mgr node which is also monitor node. [global] fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf mon_initial_members = node-16 node-30 node-31 mon_host = 172.16.1.9

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

2017-12-21 Thread Denes Dolhay
Hi, Since many ceph clusters use intel ssds and admins do recommend them, they are probably very good drives. My own experiences however are not so good with them. (About 70% of our intel drives ran into the 8mb bug at my previous job, 5xx and DC35xx series both, latest firmware at that

Re: [ceph-users] cephfs mds millions of caps

2017-12-21 Thread Webert de Souza Lima
I have upgraded the kernel on a client node (one that has close-to-zero traffic) used for tests. { "reconnecting" : false, "id" : 1620266, "num_leases" : 0, "inst" : "client.1620266 10.0.0.111:0/3921220890", "state" : "open", "completed_requests" : 0,

Re: [ceph-users] Slow backfilling with bluestore, ssd and metadatapools

2017-12-21 Thread Burkhard Linke
Hi, On 12/21/2017 11:43 AM, Richard Hesketh wrote: On 21/12/17 10:28, Burkhard Linke wrote: OSD config section from ceph.conf: [osd] osd_scrub_sleep = 0.05 osd_journal_size = 10240 osd_scrub_chunk_min = 1 osd_scrub_chunk_max = 1 max_pg_per_osd_hard_ratio = 4.0 osd_max_pg_per_osd_hard_ratio =

Re: [ceph-users] POOL_NEARFULL

2017-12-21 Thread Konstantin Shalygin
Update your ceph.conf file This is also not help. I was create ticket http://tracker.ceph.com/issues/22520 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Wed, Dec 20, 2017 at 6:20 PM, Serguei Bezverkhi (sbezverk) wrote: > It took 30 minutes for the Watcher to time out after ungraceful restart. Is > there a way limit it to something a bit more reasonable? Like 1-3 minutes? > > On 2017-12-20, 12:01 PM, "Serguei Bezverkhi

Re: [ceph-users] Not timing out watcher

2017-12-21 Thread Ilya Dryomov
On Wed, Dec 20, 2017 at 6:56 PM, Jason Dillaman wrote: > ... looks like this watch "timeout" was introduced in the kraken > release [1] so if you don't see this issue with a Jewel cluster, I > suspect that's the cause. > > [1] https://github.com/ceph/ceph/pull/11378 Strictly

Re: [ceph-users] Slow backfilling with bluestore, ssd and metadata pools

2017-12-21 Thread Richard Hesketh
On 21/12/17 10:28, Burkhard Linke wrote: > OSD config section from ceph.conf: > > [osd] > osd_scrub_sleep = 0.05 > osd_journal_size = 10240 > osd_scrub_chunk_min = 1 > osd_scrub_chunk_max = 1 > max_pg_per_osd_hard_ratio = 4.0 > osd_max_pg_per_osd_hard_ratio = 4.0 > bluestore_cache_size_hdd =

Re: [ceph-users] Proper way of removing osds

2017-12-21 Thread Burkhard Linke
Hi, On 12/21/2017 11:03 AM, Karun Josy wrote: Hi, This is how I remove an OSD from cluster * Take it out ceph osd out osdid Wait for the balancing to finish * Mark it down ceph osd down osdid Then Purge it  cephosd purge osdid --yes-i-really-mean-it While purging

Re: [ceph-users] Proper way of removing osds

2017-12-21 Thread Richard Hesketh
On 21/12/17 10:21, Konstantin Shalygin wrote: >> Is this the correct way to removes OSDs, or am I doing something wrong ? > Generic way for maintenance (e.g. disk replace) is rebalance by change osd > weight: > > > ceph osd crush reweight osdid 0 > > cluster migrate data "from this osd" > >

[ceph-users] Slow backfilling with bluestore, ssd and metadata pools

2017-12-21 Thread Burkhard Linke
Hi, we are in the process of migrating our hosts to bluestore. Each host has 12 HDDs (6TB / 4TB) and two Intel P3700 NVME SSDs with 375 GB capacity. The new bluestore OSDs are created by ceph-volume: ceph-volume lvm create --bluestore --block.db /dev/nvmeXn1pY --data /dev/sdX1 6 OSDs

Re: [ceph-users] Proper way of removing osds

2017-12-21 Thread Konstantin Shalygin
Is this the correct way to removes OSDs, or am I doing something wrong ? Generic way for maintenance (e.g. disk replace) is rebalance by change osd weight: ceph osd crush reweight osdid 0 cluster migrate data "from this osd" When HEALTH_OK you can safe remove this OSD: ceph osd out osd_id

[ceph-users] Cephfs limis

2017-12-21 Thread nigel davies
Hay all is it possable to set cephfs to have an sapce limit eg i like to set my cephfs to have an limit of 20TB and my s3 storage to have 4TB for example thanks ___ ceph-users mailing list ceph-users@lists.ceph.com

[ceph-users] Proper way of removing osds

2017-12-21 Thread Karun Josy
Hi, This is how I remove an OSD from cluster - Take it out ceph osd out osdid Wait for the balancing to finish - Mark it down ceph osd down osdid Then Purge it ceph osd purge osdid --yes-i-really-mean-it While purging I can see there is another rebalancing occurring.

Re: [ceph-users] Cephfs NFS failover

2017-12-21 Thread Robert Sander
On 20.12.2017 18:45, nigel davies wrote: > Hay all > > Can any one advise on how it can do this. You can use ctdb for that and run an active/active NFS cluster: https://wiki.samba.org/index.php/Setting_up_CTDB_for_Clustered_NFS The cluster filesystem can be a CephFS. This also works with