Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??
On 13/05/2015, at 11.23, Steffen W Sørensen ste...@me.com wrote: On 13/05/2015, at 04.08, Gregory Meno gm...@redhat.com mailto:gm...@redhat.com wrote: Ideally I would like everything in /var/log/calmari be sure to set calamari.conf like so: [shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf log_level = DEBUG db_log_level = DEBUG log_level = DEBUG then restart cthulhu and apache visit http://essperf3/api/v2/cluster http://essperf3/api/v2/cluster and http://essperf3 http://essperf3/ and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log Since I got similar issue, I sneak my log data in here as well, no offence… … had similar issue, dunno what changed, but just but revisiting our calamari UI, it seems to be working again… knowing of our cluster at least :) only it’ll not update [health] state, which seems stuck, but IO and other stats are updated fine. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [ceph-calamari] Does anyone understand Calamari??
On 13/05/2015, at 04.08, Gregory Meno gm...@redhat.com wrote: Ideally I would like everything in /var/log/calmari be sure to set calamari.conf like so: [shadow_man@vpm107 ~]$ grep DEBUG /etc/calamari/calamari.conf log_level = DEBUG db_log_level = DEBUG log_level = DEBUG then restart cthulhu and apache visit http://essperf3/api/v2/cluster and http://essperf3 and then share the logs here. Hopefully something obvious will be off in either calamari or cthulhu log Since I got similar issue, I sneak my log data in here as well, no offence… -rw-r--r-- 1 www-data www-data 1554 May 13 11:14 info.log -rw-r--r-- 1 root root 29311 May 13 11:14 httpd_error.log -rw-r--r-- 1 root root 4599 May 13 11:14 httpd_access.log -rw-r--r-- 1 www-data www-data739 May 13 11:14 calamari.log -rw-r--r-- 1 root root 238047 May 13 11:14 cthulhu.log root@node1:/var/log/calamari# cat calamari.log 2015-05-13 04:05:40,787 - metric_access - django.request Not Found: /favicon.ico 2015-05-13 04:14:02,263 - DEBUG - django.request.profile [17.5249576569ms] /api/v2/cluster 2015-05-13 04:14:02,263 - DEBUG - django.request.profile RPC timing for 'list_clusters': 3.89504432678/3.89504432678/3.89504432678 avg/min/max ms 2015-05-13 04:14:02,263 - DEBUG - django.request.profile Total time in RPC: 3.89504432678ms 2015-05-13 04:14:06,172 - DEBUG - django.request.profile [15.8069133759ms] /api/v2/cluster 2015-05-13 04:14:06,173 - DEBUG - django.request.profile RPC timing for 'list_clusters': 2.44808197021/2.44808197021/2.44808197021 avg/min/max ms 2015-05-13 04:14:06,173 - DEBUG - django.request.profile Total time in RPC: 2.44808197021ms root@node1:/var/log/calamari# tail cthulhu.log 2015-05-13 11:14:46,694 - DEBUG - cthulhu nivcsw: 102 2015-05-13 11:14:46,709 - DEBUG - cthulhu Eventer.on_tick 2015-05-13 11:14:46,710 - INFO - cthulhu Eventer._emit: 2015-05-13 09:14:46.710030+00:00/WARNING/Cluster 'ceph' is late reporting in 2015-05-13 11:14:46,710 - INFO - sqlalchemy.engine.base.Engine BEGIN (implicit) 2015-05-13 11:14:46,711 - INFO - sqlalchemy.engine.base.Engine INSERT INTO cthulhu_event (when, severity, message, fsid, fqdn, service_type, service_id) VALUES (%(when)s, %(severity)s, %(message)s, %(fsid)s, %(fqdn)s, %(service_type)s, %(service_id)s) RETURNING cthulhu_event.id 2015-05-13 11:14:46,711 - INFO - sqlalchemy.engine.base.Engine {'severity': 3, 'when': datetime.datetime(2015, 5, 13, 9, 14, 46, 710030, tzinfo=tzutc()), 'fqdn': None, 'service_type': None, 'service_id': None, 'message': Cluster 'ceph' is late reporting in, 'fsid': u'16fe2dcf-2629-422f-a649-871deba78bcd'} 2015-05-13 11:14:46,713 - DEBUG - sqlalchemy.engine.base.Engine Col ('id',) 2015-05-13 11:14:46,714 - DEBUG - sqlalchemy.engine.base.Engine Row (54,) 2015-05-13 11:14:46,714 - INFO - sqlalchemy.engine.base.Engine COMMIT 2015-05-13 11:14:56,710 - DEBUG - cthulhu Eventer.on_tick root@node1:/var/log/calamari# tail httpd_error.log [Wed May 13 04:14:05 2015] [warn] File /usr/lib/python2.7/dist-packages/git/__init__.py, line 20, in _init_externals [Wed May 13 04:14:05 2015] [warn] import gitdb [Wed May 13 04:14:05 2015] [warn] File /usr/lib/python2.7/dist-packages/gitdb/__init__.py, line 25, in module [Wed May 13 04:14:05 2015] [warn] _init_externals() [Wed May 13 04:14:05 2015] [warn] File /usr/lib/python2.7/dist-packages/gitdb/__init__.py, line 17, in _init_externals [Wed May 13 04:14:05 2015] [warn] __import__(module) [Wed May 13 04:14:05 2015] [warn] File /usr/lib/python2.7/dist-packages/async/__init__.py, line 36, in module [Wed May 13 04:14:05 2015] [warn] _init_signals() [Wed May 13 04:14:05 2015] [warn] File /usr/lib/python2.7/dist-packages/async/__init__.py, line 26, in _init_signals [Wed May 13 04:14:05 2015] [warn] signal.signal(signal.SIGINT, thread_interrupt_handler) root@node1:/var/log/calamari# tail httpd_access.log ip - - [13/May/2015:11:14:02 +0200] GET /static/rest_framework/js/default.js HTTP/1.1 304 209 ip - - [13/May/2015:11:14:04 +0200] GET /api/v2/cluster HTTP/1.1 200 2258 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/css/bootstrap.min.css HTTP/1.1 304 211 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/css/bootstrap-tweaks.css HTTP/1.1 304 209 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/css/prettify.css HTTP/1.1 304 209 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/js/jquery-1.8.1-min.js HTTP/1.1 304 211 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/js/bootstrap.min.js HTTP/1.1 304 210 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/css/default.css HTTP/1.1 304 209 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/js/prettify-min.js HTTP/1.1 304 210 ip - - [13/May/2015:11:14:06 +0200] GET /static/rest_framework/js/default.js HTTP/1.1 304 209 wget /api/v2/cluster returns: GET /api/v2/cluster HTTP 200 OK Vary: Accept Content-Type:
Re: [ceph-users] New Calamari server
On 12/05/2015, at 19.51, Bruce McFarland bruce.mcfarl...@taec.toshiba.com wrote: I am having a similar issue. The cluster is up and salt is running on and has accepted keys from all nodes, including the monitor. I can issue salt and salt/ceph.py commands from the Calamari including 'salt \* ceph.get_heartbeats' which returns from all nodes including the monitor with the monmap epoch etc. Calamari reports that it sees all of the Ceph servers, but not a Ceph cluster. Is there a salt event besides ceph.get-heartbeats that the Calamari master requires to recognize the cluster? ~ +1, I can get boot time but not the heartbeats, so I would still say salt is working: root@node1:/# salt node2 ceph.get_boot_time node2: 1431002208 root@node1:/# salt node2 ceph.get_heartbeats node2: The minion function caused an exception: Traceback (most recent call last): File /usr/lib/python2.7/dist-packages/salt/minion.py, line 1020, in _thread_return return_data = func(*args, **kwargs) File /var/cache/salt/minion/extmods/modules/ceph.py, line 467, in get_heartbeats service_data = service_status(filename) File /var/cache/salt/minion/extmods/modules/ceph.py, line 526, in service_status fsid = json.loads(admin_socket(socket_path, ['status'], 'json'))['cluster_fsid'] KeyError: 'cluster_fsid' /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] sparse RBD devices
I’ve live migrated RBD images of our VMs (with ext4 FS) through our Proxmox PVE cluster from one pool to anther and now it seems those device are no longer so sparse as before, ie. pool usage has grown to almost sum of full image sizes, wondering if there’s a way to untrim RBD images to become more sparse again? fstrim doesn’t seem to be supported on the virtual devices in the VMs. TIA /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] The first infernalis dev release will be v9.0.0
On 05/05/2015, at 18.52, Sage Weil sw...@redhat.com wrote: On Tue, 5 May 2015, Tony Harris wrote: So with this, will even numbers then be LTS? Since 9.0.0 is following 0.94.x/Hammer, and every other release is normally LTS, I'm guessing 10.x.x, 12.x.x, etc. will be LTS... It looks that way now, although I can't promise the pattern will hold! I read it like major version is the release ie. Infernails, Jewel etc. following the letter position in the alfabet, I = 9th. letter, so we see all numbers 10,11,12,13…25 minor numbers = 2 will denote LTS eg. major release.2.patch level /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xfs corruption, data disaster!
On 04/05/2015, at 15.01, Yujian Peng pengyujian5201...@126.com wrote: Alexandre DERUMIER aderumier@... writes: maybe this could help to repair pgs ? http://www.sebastien-han.fr/blog/2015/04/27/ceph-manually-repair-object/ (6 disk at the same time seem pretty strange. do you have some kind of writeback cache enable of theses disks ?) The only writeback cache is the raid card. Each disk is raid0. And are these cards BBU with working batteries… /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How to estimate whether putting a journal on SSD will help with performance?
Also remember to drive your Ceph cluster as hard as you got means to, eg. tuning the VM OSes/IO sub systems like using multiple RBD devices per VM (to issue more out standing IOPs from VM IO subsystem), best IO scheduler, CPU power + memory per VM, also ensure low network latency + bandwidth between your rsyncing VMs etc. On 01/05/2015, at 11.13, Piotr Wachowicz piotr.wachow...@brightcomputing.com wrote: Thanks for your answer, Nick. Typically it's a single rsync session at a time (sometimes two, but rarely more concurrently). So it's a single ~5GB typical linux filesystem from one random VM to another random VM. Apart from using RBD Cache, is there any other way to improve the overall performance of such a use case in a Ceph cluster? In theory I guess we could always tarball it, and rsync the tarball, thus effectively using sequential IO rather than random. But that's simply not feasible for us at the moment. Any other ways? Sidequestion: does using RBDCache impact the way data is stored on the client? (e.g. a write call returning after data has been written to Journal (fast) vs written all the way to the OSD data store(slow)). I'm guessing it's always the first one, regardless of whether client uses RBDCache or not, right? My logic here is that otherwise that would imply that clients can impact the way OSDs behave, which could be dangerous in some situations. Kind Regards, Piotr On Fri, May 1, 2015 at 10:59 AM, Nick Fisk n...@fisk.me.uk mailto:n...@fisk.me.uk wrote: How many Rsync’s are doing at a time? If it is only a couple, you will not be able to take advantage of the full number of OSD’s, as each block of data is only located on 1 OSD (not including replicas). When you look at disk statistics you are seeing an average over time, so it will look like the OSD’s are not very busy, when in fact each one is busy for a very brief period. SSD journals will help your write latency, probably going down from around 15-30ms to under 5ms From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piotr Wachowicz Sent: 01 May 2015 09:31 To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: [ceph-users] How to estimate whether putting a journal on SSD will help with performance? Is there any way to confirm (beforehand) that using SSDs for journals will help? We're seeing very disappointing Ceph performance. We have 10GigE interconnect (as a shared public/internal network). We're wondering whether it makes sense to buy SSDs and put journals on them. But we're looking for a way to verify that this will actually help BEFORE we splash cash on SSDs. The problem is that the way we have things configured now, with journals on spinning HDDs (shared with OSDs as the backend storage), apart from slow read/write performance to Ceph I already mention, we're also seeing fairly low disk utilization on OSDs. This low disk utilization suggests that journals are not really used to their max, which begs for the questions whether buying SSDs for journals will help. This kind of suggests that the bottleneck is NOT the disk. But,m yeah, we cannot really confirm that. Our typical data access use case is a lot of small random read/writes. We're doing a lot of rsyncing (entire regular linux filesystems) from one VM to another. We're using Ceph for OpenStack storage (kvm). Enabling RBD cache didn't really help all that much. So, is there any way to confirm beforehand that using SSDs for journals will help in our case? Kind Regards, Piotr ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calamari server not working after upgrade 0.87-1 - 0.94-1
On 27/04/2015, at 15.51, Alexandre DERUMIER aderum...@odiso.com wrote: Hi, can you check on your ceph node /var/log/salt/minion ? I have had some similar problem, I have need to remove rm /etc/salt/pki/minion/minion_master.pub /etc/init.d/salt-minion restart (I don't known if calamari-ctl clear change the salt master key) Apparently not, master key is the same. Before clearing we still had various perf. data updated like IOPS per pool, cpu etc only the cluster PG info seemed stuck on old cluster info, so thought we wanted to clear the slate and start all over. Howto make calamari properly aware of an existing cluster? /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse unable to run through screen ?
On 23/04/2015, at 12.48, Burkhard Linke burkhard.li...@computational.bio.uni-giessen.de wrote: Hi, I had a similar problem during reboots. It was solved by adding '_netdev' to the options for the fstab entry. Otherwise the system may try to mount the cephfs mount point before the network is available. Didn’t knew of the _netdev mount option, it’s nice to learn something new, thanks for sharing :) /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Hammer question..
I have a cluster currently on Giant - is Hammer stable/ready for production use? Assume so, upgraded a 0.87-1 to 0.94-1, only thing that came up was that now Ceph will warn if you got too many PGs (300/OSD) which it turned out I and others had. So had too do pool consolidation in order to achieve OK health status again, otherwise Hammer is doing fine. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Cephfs: proportion of data between data pool and metadata pool
But in the menu, the use case cephfs only doesn't exist and I have no idea of the %data for each pools metadata and data. So, what is the proportion (approximatively) of %data between the data pool and the metadata pool of cephfs in a cephfs-only cluster? Is it rather metadata=20%, data=80%? Is it rather metadata=10%, data=90%? Is it rather metadata= 5%, data=95%? etc. Assuming miles vary here, depending on your ratio between number of entries in your Ceph FS vs their sizes, eg. many small files vs few large ones. So you are properly the best one to estimate this your self :) /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse unable to run through screen ?
On 23/04/2015, at 10.24, Florent B flor...@coppint.com wrote: I come back with this problem because it persists even after upgrading to Hammer. With CephFS, it does not work, and the only workaround I found does not work 100% of time : I also found issues at reboots also, becaouse starting Ceph fuse daemon will possible create mount point directory, so I got this: in /etc/fstab I have: id=admin /var/lib/ceph/backup fuse.ceph defaults 0 0 then I run below script from /etc/rc.local : #!/bin/sh # after boot Ceph FS doesn't get mounted, so run this to verify and optional mount mounted=`df -h /var/lib/ceph/backup/.| grep -c '^ceph-fuse'` if [ $mounted -eq 1 ]; then echo CephFS is mounted else echo CephFS is not mounted, clearing mountpoint cd /var/lib/ceph mv backup backup.old mkdir backup # assume it is in the fstab mount /var/lib/ceph/backup # a bit dangerous :/ so ONLY on mount success [ $? -eq 0 ] rm -rf backup.old fi It seems to work most of the times otherwise I run script once by hand :) /Steffen shell: bash -c mountpoint /var/www/sites/default/files || rm -Rf /var/www/sites/default/files/{*,*.*,.*}; screen -d -m -S cephfs-drupal mount /var/www/sites/default/files Sometimes it mounts, sometimes not... that's really weird. My mount point is configured with daemonize=false, because if I set it to true, it never works ! I really does not understand what the problem is. What Ceph-fuse needs to mount correctly 100% of times ?? Thank you. On 03/18/2015 10:42 AM, Florent B wrote: In fact, my problem is not related to Ansible. For example, make a bash script : #! /bin/bash mountpoint /mnt/cephfs || mount /mnt/cephfs And run it with screen : screen mount-me.sh Directory is not mounted ! What is this ? :D If you run the script without screen, all works fine ! Is there any kind of particular return system with ceph-fuse ? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
On 17/04/2015, at 21.07, Andrija Panic andrija.pa...@gmail.com wrote: nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type of SSD devices ain’t of near same age and doing preventive replacement rotation might be good practice I guess. /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] metadata management in case of ceph object storage and ceph block storage
On 17/04/2015, at 07.33, Josef Johansson jose...@gmail.com wrote: To your question, which I’m not sure I understand completely. So yes, you don’t need the MDS if you just keep track of block storage and object storage. (i.e. images for KVM) So the Mon keeps track of the metadata for the Pool and PG Well there really ain’t no metadata at all as with a traditional File System, monitors keep track of status of OSDs. Client compute which OSDs to go talk to to get to wanted objects. thus no need for central meta data service to tell clients where data are stored. Ceph is a distributed object storage system with potential no SPF and ability to scale out. Try studying Ross’ slides f.ex. here: http://www.slideshare.net/buildacloud/ceph-intro-and-architectural-overview-by-ross-turk http://www.slideshare.net/buildacloud/ceph-intro-and-architectural-overview-by-ross-turk or many other good intros on the net, youtube etc. Clients of a Ceph Cluster can access ‘objects’ (blobs with data) through several means, programatic with librados, as virtual block devices through librbd+librados, and finally as a S3 service through rados GW over http[s] the meta data (users + ACLs, buckets+data…) for S3 objects are stored in various pools in Ceph. CephFS built on top of a Ceph object store can best be compared with combination of a POSIX File System and other Networked File Systems f.ex. NFS,CiFS, AFP, only with a different protocol + access mean (FUSE daemon or kernel module). As it implements a regular file name space, it needs to store meta data of which files exist in such a name space, this is the job of the MDS server(s) which of course uses Ceph object store pools to persistent store this file system meta data info and the MDS keep track of all the files, hence the MDS should have at least 10x the memory of what the Mon have. Hmm 10x memory isn’t a rule of thumb in my book, it all depends of use case at hand. MDS tracks meta data of files stored in a CephFS, which usually is far from all data of a cluster unless CephFS is the only usage of course :) Many use Ceph for sharing virtual block devices among multiple Hypervisors as disk devices for virtual machines (VM images), f.ex. with Openstack, Proxmox etc. I’m no Ceph expert, especially not on CephFS, but this is my picture of it :) Maybe the architecture docs could help you out? http://docs.ceph.com/docs/master/architecture/#cluster-map http://docs.ceph.com/docs/master/architecture/#cluster-map Hope that resolves your question. Cheers, Josef On 06 Apr 2015, at 18:51, pragya jain prag_2...@yahoo.co.in mailto:prag_2...@yahoo.co.in wrote: Please somebody reply my queries. Thank yuo - Regards Pragya Jain Department of Computer Science University of Delhi Delhi, India On Saturday, 4 April 2015 3:24 PM, pragya jain prag_2...@yahoo.co.in mailto:prag_2...@yahoo.co.in wrote: hello all! As the documentation said One of the unique features of Ceph is that it decouples data and metadata. for applying the mechanism of decoupling, Ceph uses Metadata Server (MDS) cluster. MDS cluster manages metadata operations, like open or rename a file On the other hand, Ceph implementation for object storage as a service and block storage as a service does not require MDS implementation. My question is: In case of object storage and block storage, how does Ceph manage the metadata? Please help me to understand this concept more clearly. Thank you - Regards Pragya Jain Department of Computer Science University of Delhi Delhi, India ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] replace dead SSD journal
I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Well assuming the OSDs are downwith journal device lost and as data have been rebalanced/re-replicated else where. I would assume scratch these 6x downed+out OSD+journal and rebuilt 6 new OSD and add such to cluster capacity after properly maintaining the CRUSH map remove the crashes OSDs. Should I flush journal (where to, journals doesnt still exist...?), or just recreate journal from scratch (making symboliv links again: ln -s /dev/$DISK$PART /var/lib/ceph/osd/ceph-$ID/journal) and starting OSDs. I expect the folowing procedure, but would like confirmation please: rm /var/lib/ceph/osd/ceph-$ID/journal -f (sym link) ln -s /dev/SDAxxx /var/lib/ceph/osd/ceph-$ID/journal ceph-osd -i $ID --mkjournal ll /var/lib/ceph/osd/ceph-$ID/journal service ceph start osd.$ID Any thought greatly appreciated ! Thanks, -- Andrija Panić ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
That later change would have _increased_ the number of recommended PG, not decreased it. Weird as our Giant health status was ok before upgrading to Hammer… With your cluster 2048 PGs total (all pools combined!) would be the sweet spot, see: http://ceph.com/pgcalc/ http://ceph.com/pgcalc/ Had read this originally when creating the cluster It seems to me that you increased PG counts assuming that the formula is per pool. Well yes maybe, believe we bumped PGs per status complain in Giant mentioned explicit different pool names, eg. too few PGs in pool-name… so we naturally bumped mentioned pools slightly up til next 2-power until health stop complaining and yes we wondered over this relative high number of PGs in total for the cluster, as we initially had read pgcalc and thought we understood this. ceph.com http://ceph.com/ not responsding presently… - are you saying one needs to consider in advance #pools in a cluster and factor this in when calculating the number of PGs? - If so, how to decide which pool gets what #PG, as this is set per pool, especially if one can’t precalc the amount objects ending up in each pool? But yes understand also that more pools means more PGs per OSD, does this imply using different pools to segregate various data f.ex. per application in same cluster is a bad idea? Using pools as sort of name space segregation makes it easy f.e. to remove/migration data per application and thus a handy segregation tool ImHO. - Are the BCP to consolidate data in few pools per cluster? /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
On 16/04/2015, at 01.48, Steffen W Sørensen ste...@me.com wrote: Also our calamari web UI won't authenticate anymore, can’t see any issues in any log under /var/log/calamari, any hints on what to look for are appreciated, TIA! Well this morning it will authenticate me, but seems calamari can’t talk to cluster anymore, wondering where to start digging… or will I need to rebuilt newer version to talk with a hammer cluster? # dpkg -l | egrep -i calamari\|ceph ii calamari-clients 1.2.3.1-2-gc1f14b2all Inktank Calamari user interface ii calamari-server1.3-rc-16-g321cd58amd64 Inktank package containing the Calamari management srever Are this version of calamari able to monitor a Hammer cluster like below? ii ceph 0.94.1-1~bpo70+1 amd64 distributed storage and file system ii ceph-common0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy1.5.23~bpo70+1all Ceph-deploy is an easy to use configuration tool ii ceph-fs-common 0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.94.1-1~bpo70+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.94.1-1~bpo70+1 amd64 metadata server for the ceph distributed file system ii curl 7.29.0-1~bpo70+1.ceph amd64 command line tool for transferring data with URL syntax ii libcephfs1 0.94.1-1~bpo70+1 amd64 Ceph distributed file system client library ii libcurl3:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (OpenSSL flavour) ii libcurl3-gnutls:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (GnuTLS flavour) ii libleveldb1:amd64 1.12.0-1~bpo70+1.ceph amd64 fast key-value storage library ii python-ceph0.94.1-1~bpo70+1 amd64 Meta-package for python libraries for the Ceph libraries ii python-cephfs 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph libcephfs library ii python-rados 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librados library ii python-rbd 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librbd library TIA /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
On 16/04/2015, at 11.09, Christian Balzer ch...@gol.com wrote: On Thu, 16 Apr 2015 10:46:35 +0200 Steffen W Sørensen wrote: That later change would have _increased_ the number of recommended PG, not decreased it. Weird as our Giant health status was ok before upgrading to Hammer… I'm pretty sure the too many check was added around then, and the the too little warning one earlier. Okay, might explain why too many shown up now :) It seems to me that you increased PG counts assuming that the formula is per pool. Well yes maybe, believe we bumped PGs per status complain in Giant mentioned explicit different pool names, eg. too few PGs in pool-name… Probably something like less then 20 PGs or some such, right? Properly yes, at least fewer than what seemed good for proper distribution Your cluster (OSD count) needs (should really, it is not a hard failure but a warning) to be high enough to satisfy the minimum amount of PGs, so (too) many pools with a small cluster will leave you between a rock and hard place. Right, maybe pgcalc should mention/explain a bit on considering #pools ahead as well... - are you saying one needs to consider in advance #pools in a cluster and factor this in when calculating the number of PGs? Yes. Of course the idea is that pools consume space, so if you have many, you also will have more OSDs to spread your PGs around. In this case we wanted to test out radosgw S3 and thus needed to create the required number of pools which increased #PGs But so far not real any data in GW pools as it failed working for our AS3 compatible App. Now we removed those pools again. And are back down to 4 pool. two for ceph FS and two for RBD images, each with 1024 PGs, but still to many PGs, will try to consolidate the two RBD pools into one or two new with fewer PGs… - If so, how to decide which pool gets what #PG, as this is set per pool, especially if one can’t precalc the amount objects ending up in each pool? Dead reckoning. As in, you should have some idea which pool is going to receive how much data. Certainly, but unless you have a large enough cluster and pools that have predictable utilization, fewer pools are the answer. becasuse this makes it easier to match PGs against #OSDs I see It would be nice somehow if #PGs could be decoupled from pools, but then against how to figure out where each pools object are… Just convient to be have all data from a single App in a seperate pool/name space to easily see usage and perform management tasks :/ It is for me, as I have clusters of similar small size and only one type of usage, RBD images. So they have 1 or 2 pools and that's it. This also results in the smoothest data distribution possible of course. Right, thanks 4 sharing! /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Also our calamari web UI won't authenticate anymore, can’t see any issues in any log under /var/log/calamari, any hints on what to look for are appreciated, TIA! # dpkg -l | egrep -i calamari\|ceph ii calamari-clients 1.2.3.1-2-gc1f14b2all Inktank Calamari user interface ii calamari-server1.3-rc-16-g321cd58amd64 Inktank package containing the Calamari management srever ii ceph 0.94.1-1~bpo70+1 amd64 distributed storage and file system ii ceph-common0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph storage cluster ii ceph-deploy1.5.23~bpo70+1all Ceph-deploy is an easy to use configuration tool ii ceph-fs-common 0.94.1-1~bpo70+1 amd64 common utilities to mount and interact with a ceph file system ii ceph-fuse 0.94.1-1~bpo70+1 amd64 FUSE-based client for the Ceph distributed file system ii ceph-mds 0.94.1-1~bpo70+1 amd64 metadata server for the ceph distributed file system ii curl 7.29.0-1~bpo70+1.ceph amd64 command line tool for transferring data with URL syntax ii libcephfs1 0.94.1-1~bpo70+1 amd64 Ceph distributed file system client library ii libcurl3:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (OpenSSL flavour) ii libcurl3-gnutls:amd64 7.29.0-1~bpo70+1.ceph amd64 easy-to-use client-side URL transfer library (GnuTLS flavour) ii libleveldb1:amd64 1.12.0-1~bpo70+1.ceph amd64 fast key-value storage library ii python-ceph0.94.1-1~bpo70+1 amd64 Meta-package for python libraries for the Ceph libraries ii python-cephfs 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph libcephfs library ii python-rados 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librados library ii python-rbd 0.94.1-1~bpo70+1 amd64 Python libraries for the Ceph librbd library On 16/04/2015, at 00.41, Steffen W Sørensen ste...@me.com wrote: Hi, Successfully upgrade a small development 4x node Giant 0.87-1 cluster to Hammer 0.94-1, each node with 6x OSD - 146GB, 19 pools, mainly 2 in usage. Only minor thing now ceph -s complaining over too may PGs, previously Giant had complain of too few, so various pools were bumped up till health status was okay as before upgrading. Admit, that after bumping PGs up in Giant we had changed pool sizes from 3 to 2 min 1 in fear of perf. when backfilling/recovering PGs. # ceph -s cluster 16fe2dcf-2629-422f-a649-871deba78bcd health HEALTH_WARN too many PGs per OSD (1237 max 300) monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} election epoch 1370, quorum 0,1,2 2,1,0 mdsmap e142: 1/1/1 up {0=2=up:active}, 1 up:standby osdmap e3483: 24 osds: 24 up, 24 in pgmap v3719606: 14848 pgs, 19 pools, 530 GB data, 133 kobjects 1055 GB used, 2103 GB / 3159 GB avail 14848 active+clean Can we just reduce PGs again and should we decrement in minor steps one pool at a time… Any thoughts, TIA! /Steffen 1. restart the monitor daemons on each node 2. then, restart the osd daemons on each node 3. then, restart the mds daemons on each node 4. then, restart the radosgw daemon on each node Regards. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1
Hi, Successfully upgrade a small development 4x node Giant 0.87-1 cluster to Hammer 0.94-1, each node with 6x OSD - 146GB, 19 pools, mainly 2 in usage. Only minor thing now ceph -s complaining over too may PGs, previously Giant had complain of too few, so various pools were bumped up till health status was okay as before upgrading. Admit, that after bumping PGs up in Giant we had changed pool sizes from 3 to 2 min 1 in fear of perf. when backfilling/recovering PGs. # ceph -s cluster 16fe2dcf-2629-422f-a649-871deba78bcd health HEALTH_WARN too many PGs per OSD (1237 max 300) monmap e29: 3 mons at {0=10.0.3.4:6789/0,1=10.0.3.2:6789/0,2=10.0.3.1:6789/0} election epoch 1370, quorum 0,1,2 2,1,0 mdsmap e142: 1/1/1 up {0=2=up:active}, 1 up:standby osdmap e3483: 24 osds: 24 up, 24 in pgmap v3719606: 14848 pgs, 19 pools, 530 GB data, 133 kobjects 1055 GB used, 2103 GB / 3159 GB avail 14848 active+clean Can we just reduce PGs again and should we decrement in minor steps one pool at a time… Any thoughts, TIA! /Steffen 1. restart the monitor daemons on each node 2. then, restart the osd daemons on each node 3. then, restart the mds daemons on each node 4. then, restart the radosgw daemon on each node Regards. -- François Lafont ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] more human readable log to track request or using mapreduce for data statistics
On 26/03/2015, at 09.05, 池信泽 xmdx...@gmail.com wrote: hi,ceph: Currently, the command ”ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_historic_ops“ may return as below: { description: osd_op(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92), received_at: 2015-03-25 19:41:47.146145, age: 2.186521, duration: 1.237882, type_data: [ commit sent; apply or cleanup, { client: client.4436, tid: 11617}, [ { time: 2015-03-25 19:41:47.150803, event: event1}, { time: 2015-03-25 19:41:47.150873, event: event2}, { time: 2015-03-25 19:41:47.150895, event: event3}, { time: 2015-03-25 19:41:48.384027, event: event4}]]} Seems like JSON format So consider doing your custom conversion by some means of CLI convert json format to string I think this message is not so suitable for grep log or using mapreduce for data statistics. Such as, I want to know the write request avg latency for each rbd everyday. If we could output the all latency in just one line, it would be very easy to achieve it. Such as, the output log maybe something like this: 2015-03-26 03:30:53.859759 osd=osd.0 pg=2.11 op=(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92) received_at=1427355253 age=2.186521 duration=1.237882 tid=11617 client=client.4436 event1=20ms event2=300ms event3=400ms event4=100ms. The above: duration means: the time between (reply_to_client_stamp - request_received_stamp) event1 means: the time between (the event1_stamp - request_received_stamp) ... event4 means: the time between (the event4_stamp - request_received_stamp) Now, If we output the every log as above. it would be every easy to know the write request avg latency for each rbd everyday. Or if I use grep it is more easy to find out which stage is the bottleneck. -- Regards, xinze ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] more human readable log to track request or using mapreduce for data statistics
On 26/03/2015, at 12.14, 池信泽 xmdx...@gmail.com wrote: It is not so convenience to do conversion in custom. Because there are many kinds of log in ceph-osd.log. we only need some of them including latency. But now, It is hard to grep the log what we want and decode them. Still run output through a pipe which either knows and reads json and either print directly what your need and/or stores data i whatever data repository you what to accumulate statistic in. eg.: ceph —admin-daemon … dump_history | myjsonreaderNformatter.php | grep, awk, sed, cut, posix-1 filter-cmd Don’t expect ceph developers to alter ceph code base to complement your exact need when you still wants to filter output through grep whatever anyway ImHO :) 2015-03-26 16:38 GMT+08:00 Steffen W Sørensen ste...@me.com: On 26/03/2015, at 09.05, 池信泽 xmdx...@gmail.com wrote: hi,ceph: Currently, the command ”ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok dump_historic_ops“ may return as below: { description: osd_op(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92), received_at: 2015-03-25 19:41:47.146145, age: 2.186521, duration: 1.237882, type_data: [ commit sent; apply or cleanup, { client: client.4436, tid: 11617}, [ { time: 2015-03-25 19:41:47.150803, event: event1}, { time: 2015-03-25 19:41:47.150873, event: event2}, { time: 2015-03-25 19:41:47.150895, event: event3}, { time: 2015-03-25 19:41:48.384027, event: event4}]]} Seems like JSON format So consider doing your custom conversion by some means of CLI convert json format to string I think this message is not so suitable for grep log or using mapreduce for data statistics. Such as, I want to know the write request avg latency for each rbd everyday. If we could output the all latency in just one line, it would be very easy to achieve it. Such as, the output log maybe something like this: 2015-03-26 03:30:53.859759 osd=osd.0 pg=2.11 op=(client.4436.1:11617 rb.0.1153.6b8b4567.0192 [] 2.8eb4757c ondisk+write e92) received_at=1427355253 age=2.186521 duration=1.237882 tid=11617 client=client.4436 event1=20ms event2=300ms event3=400ms event4=100ms. The above: duration means: the time between (reply_to_client_stamp - request_received_stamp) event1 means: the time between (the event1_stamp - request_received_stamp) ... event4 means: the time between (the event4_stamp - request_received_stamp) Now, If we output the every log as above. it would be every easy to know the write request avg latency for each rbd everyday. Or if I use grep it is more easy to find out which stage is the bottleneck. -- Regards, xinze ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Regards, xinze ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Calamari Deployment
On 26/03/2015, at 17.18, LaBarre, James (CTR) A6IT james.laba...@cigna.com wrote: For that matter, is there a way to build Calamari without going the whole vagrant path at all? Some way of just building it through command-line tools? I would be building it on an Openstack instance, no GUI. Seems silly to have to install an entire virtualbox environment inside something that’s already a VM. Agreed... if U wanted to built in on your server farm/cloud stack env. I just built my packages for Debian Wheezy (with CentOS+RHEL rpms as a bonus) on my desktop Mac/OS-X with use of virtualbox and vagrant ( vagrant is an easy disposable built-env:) From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of JESUS CHAVEZ ARGUELLES Sent: Monday, March 02, 2015 3:00 PM To: ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com Subject: [ceph-users] Calamari Deployment Does anybody know how to succesful install Calamari in rhel7 ? I have tried the vagrant thug without sucesss and it seems like a nightmare there is a Kind of Sidur when you do vagrant up where it seems not to find the vm path... Regards Jesus Chavez SYSTEMS ENGINEER-C.SALES jesch...@cisco.com mailto:jesch...@cisco.com Phone: +52 55 5267 3146 tel:+52%2055%205267%203146 Mobile: +51 1 5538883255 tel:+51%201%205538883255 CCIE - 44433 -- CONFIDENTIALITY NOTICE: If you have received this email in error, please immediately notify the sender by e-mail at the address shown. This email transmission may contain confidential information. This information is intended only for the use of the individual(s) or entity to whom it is intended even if addressed incorrectly. Please delete it from your files if you are not the intended recipient. Thank you for your compliance. Copyright (c) 2015 Cigna == ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating objects from one pool to another?
On 26/03/2015, at 22.53, Steffen W Sørensen ste...@me.com wrote: On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net mailto:jpmet...@gtcomm.net wrote: That's a great idea. I know I can setup cinder (the openstack volume manager) as a multi-backend manager and migrate from one backend to the other, each backend linking to different pools of the same ceph cluster. What bugs me though is that I'm pretty sure the image store, glance, wouldn't let me do that. Additionally, since the compute component also has its own ceph pool, I'm pretty sure it won't let me migrate the data through openstack. Hm wouldn’t it be possible to do something similar ala: # list object from src pool rados ls objects loop | filter-obj-id | while read obj; do # export $obj to local disk rados -p pool-wth-too-many-pgs get $obj # import $obj from local disk to new pool rados -p better-sized-pool put $obj done and of course when done redirect glance to new pool :) Not sure, but this might require you to quenching the object usage from openstack during migration, dunno, maybe ask openstack community if it’s possible to live migration of objects first :/ possible split/partition list of objects into multiple concurrent loops, possible from multiple boxes as seems fit for resources at hand, cpu, memory, network, ceph perf. /Steffen On 3/26/2015 3:54 PM, Steffen W Sørensen wrote: On 26/03/2015, at 20.38, J-P Methot jpmet...@gtcomm.net wrote: Lately I've been going back to work on one of my first ceph setup and now I see that I have created way too many placement groups for the pools on that setup (about 10 000 too many). I believe this may impact performances negatively, as the performances on this ceph cluster are abysmal. Since it is not possible to reduce the number of PGs in a pool, I was thinking of creating new pools with a smaller number of PGs, moving the data from the old pools to the new pools and then deleting the old pools. I haven't seen any command to copy objects from one pool to another. Would that be possible? I'm using ceph for block storage with openstack, so surely there must be a way to move block devices from a pool to another, right? What I did a one point was going one layer higher in my storage abstraction, and created new Ceph pools and used those for new storage resources/pool in my VM env. (ProxMox) on top of Ceph RBD and then did a live migration of virtual disks there, assume you could do the same in OpenStack. My 0.02$ /Steffen -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating objects from one pool to another?
On 26/03/2015, at 23.01, Gregory Farnum g...@gregs42.com wrote: On Thu, Mar 26, 2015 at 2:53 PM, Steffen W Sørensen ste...@me.com mailto:ste...@me.com wrote: On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net wrote: That's a great idea. I know I can setup cinder (the openstack volume manager) as a multi-backend manager and migrate from one backend to the other, each backend linking to different pools of the same ceph cluster. What bugs me though is that I'm pretty sure the image store, glance, wouldn't let me do that. Additionally, since the compute component also has its own ceph pool, I'm pretty sure it won't let me migrate the data through openstack. Hm wouldn’t it be possible to do something similar ala: # list object from src pool rados ls objects loop | filter-obj-id | while read obj; do # export $obj to local disk rados -p pool-wth-too-many-pgs get $obj # import $obj from local disk to new pool rados -p better-sized-pool put $obj done You would also have issues with snapshots if you do this on an RBD pool. That's unfortunately not feasible. What isn’t possible, export-import objects out-and-in of pools or snapshots issues? /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating objects from one pool to another?
On 26/03/2015, at 20.38, J-P Methot jpmet...@gtcomm.net wrote: Lately I've been going back to work on one of my first ceph setup and now I see that I have created way too many placement groups for the pools on that setup (about 10 000 too many). I believe this may impact performances negatively, as the performances on this ceph cluster are abysmal. Since it is not possible to reduce the number of PGs in a pool, I was thinking of creating new pools with a smaller number of PGs, moving the data from the old pools to the new pools and then deleting the old pools. I haven't seen any command to copy objects from one pool to another. Would that be possible? I'm using ceph for block storage with openstack, so surely there must be a way to move block devices from a pool to another, right? What I did a one point was going one layer higher in my storage abstraction, and created new Ceph pools and used those for new storage resources/pool in my VM env. (ProxMox) on top of Ceph RBD and then did a live migration of virtual disks there, assume you could do the same in OpenStack. My 0.02$ /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Migrating objects from one pool to another?
On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net wrote: That's a great idea. I know I can setup cinder (the openstack volume manager) as a multi-backend manager and migrate from one backend to the other, each backend linking to different pools of the same ceph cluster. What bugs me though is that I'm pretty sure the image store, glance, wouldn't let me do that. Additionally, since the compute component also has its own ceph pool, I'm pretty sure it won't let me migrate the data through openstack. Hm wouldn’t it be possible to do something similar ala: # list object from src pool rados ls objects loop | filter-obj-id | while read obj; do # export $obj to local disk rados -p pool-wth-too-many-pgs get $obj # import $obj from local disk to new pool rados -p better-sized-pool put $obj done possible split/partition list of objects into multiple concurrent loops, possible from multiple boxes as seems fit for resources at hand, cpu, memory, network, ceph perf. /Steffen On 3/26/2015 3:54 PM, Steffen W Sørensen wrote: On 26/03/2015, at 20.38, J-P Methot jpmet...@gtcomm.net wrote: Lately I've been going back to work on one of my first ceph setup and now I see that I have created way too many placement groups for the pools on that setup (about 10 000 too many). I believe this may impact performances negatively, as the performances on this ceph cluster are abysmal. Since it is not possible to reduce the number of PGs in a pool, I was thinking of creating new pools with a smaller number of PGs, moving the data from the old pools to the new pools and then deleting the old pools. I haven't seen any command to copy objects from one pool to another. Would that be possible? I'm using ceph for block storage with openstack, so surely there must be a way to move block devices from a pool to another, right? What I did a one point was going one layer higher in my storage abstraction, and created new Ceph pools and used those for new storage resources/pool in my VM env. (ProxMox) on top of Ceph RBD and then did a live migration of virtual disks there, assume you could do the same in OpenStack. My 0.02$ /Steffen -- == Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmet...@gtcomm.net http://www.gtcomm.net ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All client writes block when 2 of 3 OSDs down
On 26/03/2015, at 23.36, Somnath Roy somnath@sandisk.com wrote: Got most portion of it, thanks ! But, still not able to get when second node is down why with single monitor in the cluster client is not able to connect ? 1 monitor can form a quorum and should be sufficient for a cluster to run. To have quorum you need more than 50% of monitors, which isn’t possible with one out of two, since 1 (0.5*2 + 1) hence at least 3 monitors. Thanks Regards Somnath -Original Message- From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Thursday, March 26, 2015 3:29 PM To: Somnath Roy Cc: Lee Revell; ceph-users@lists.ceph.com Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down On Thu, Mar 26, 2015 at 3:22 PM, Somnath Roy somnath@sandisk.com wrote: Greg, Couple of dumb question may be. 1. If you see , the clients are connecting fine with two monitors in the cluster. 2 monitors can never form a quorum, but, 1 can, so, why with 1 monitor (which is I guess happening after making 2 nodes down) it is not able to connect ? A quorum is a strict majority of the total membership. 2 monitors can form a quorum just fine if there are either 2 or 3 total membership. (As long as those two agree on every action, it cannot be lost.) We don't *recommend* configuring systems with an even number of monitors, because it increases the number of total possible failures without increasing the number of failures that can be tolerated. (3 monitors requires 2 in quorum, 4 does too. Same for 5 and 6, 7 and 8, etc etc.) 2. Also, my understanding is while IO is going on *no* monitor interaction will be on that path, so, why the client io will be stopped because the monitor quorum is not there ? If the min_size =1 is properly set it should able to serve IO as long as 1 OSD (node) is up, isn't it ? Well, the remaining OSD won't be able to process IO because it's lost its peers, and it can't reach any monitors to do updates or get new maps. (Monitors which are not in quorum will not allow clients to connect.) The clients will eventually stop serving IO if they know they can't reach a monitor, although I don't remember exactly how that's triggered. In this particular case, though, the client probably just tried to do an op against the dead osd, realized it couldn't, and tried to fetch a map from the monitors. When that failed it went into search mode, which is what the logs are showing you. -Greg Thanks Regards Somnath -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Gregory Farnum Sent: Thursday, March 26, 2015 2:40 PM To: Lee Revell Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] All client writes block when 2 of 3 OSDs down On Thu, Mar 26, 2015 at 2:30 PM, Lee Revell rlrev...@gmail.com wrote: On Thu, Mar 26, 2015 at 4:40 PM, Gregory Farnum g...@gregs42.com wrote: Has the OSD actually been detected as down yet? I believe it has, however I can't directly check because ceph health starts to hang when I down the second node. Oh. You need to keep a quorum of your monitors running (just the monitor processes, not of everything in the system) or nothing at all is going to work. That's how we prevent split brain issues. You'll also need to set that min size on your existing pools (ceph osd pool pool set min_size 1 or similar) to change their behavior; the config option only takes effect for newly-created pools. (Thus the default.) I've done this, however the behavior is the same: $ for f in `ceph osd lspools | sed 's/[0-9]//g' | sed 's/,//g'`; do ceph osd pool set $f min_size 1; done set pool 0 min_size to 1 set pool 1 min_size to 1 set pool 2 min_size to 1 set pool 3 min_size to 1 set pool 4 min_size to 1 set pool 5 min_size to 1 set pool 6 min_size to 1 set pool 7 min_size to 1 $ ceph -w cluster db460aa2-5129-4aaa-8b2e-43eac727124e health HEALTH_WARN 1 mons down, quorum 0,1 ceph-node-1,ceph-node-2 monmap e3: 3 mons at {ceph-node-1=192.168.122.121:6789/0,ceph-node-2=192.168.122.131:6789/ 0 ,ceph-node-3=192.168.122.141:6789/0}, election epoch 194, quorum 0,1 ceph-node-1,ceph-node-2 mdsmap e94: 1/1/1 up {0=ceph-node-1=up:active} osdmap e362: 3 osds: 2 up, 2 in pgmap v5913: 840 pgs, 8 pools, 7441 MB data, 994 objects 25329 MB used, 12649 MB / 40059 MB avail 840 active+clean 2015-03-26 17:23:56.009938 mon.0 [INF] pgmap v5913: 840 pgs: 840 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail 2015-03-26 17:25:51.042802 mon.0 [INF] pgmap v5914: 840 pgs: 840 active+clean; 7441 MB data, 25329 MB used, 12649 MB / 40059 MB avail; active+0 B/s rd, 260 kB/s wr, 13 op/s 2015-03-26 17:25:56.046491 mon.0 [INF] pgmap v5915: 840 pgs: 840 active+clean; 7441 MB data, 25333 MB used, 12645 MB / 40059 MB avail; active+0 B/s rd, 943 kB/s
Re: [ceph-users] Migrating objects from one pool to another?
On 26/03/2015, at 23.13, Gregory Farnum g...@gregs42.com wrote: The procedure you've outlined won't copy snapshots, just the head objects. Preserving the proper snapshot metadata and inter-pool relationships on rbd images I think isn't actually possible when trying to change pools. This wasn’t ment for migrating a RBD pool, but pure object/Swift pools… Anyway seems Glance http://docs.openstack.org/developer/glance/architecture.html#basic-architecture supports multiple storages http://docs.openstack.org/developer/glance/configuring.html#configuring-multiple-swift-accounts-stores so assume one could use a glance client to also extract/download images into local file format (raw, qcow2 vmdk…) as well as uploading images to glance. And as glance images ain’t ‘live’ like virtual disk images one could also download glance images from one glance store over local file and upload back into a different glance back end store. Again this is properly better than dealing at a lower abstraction level and having to known its internal storage structures and avoid what you’re pointing put Greg. On Thu, Mar 26, 2015 at 3:05 PM, Steffen W Sørensen ste...@me.com wrote: On 26/03/2015, at 23.01, Gregory Farnum g...@gregs42.com wrote: On Thu, Mar 26, 2015 at 2:53 PM, Steffen W Sørensen ste...@me.com wrote: On 26/03/2015, at 21.07, J-P Methot jpmet...@gtcomm.net wrote: That's a great idea. I know I can setup cinder (the openstack volume manager) as a multi-backend manager and migrate from one backend to the other, each backend linking to different pools of the same ceph cluster. What bugs me though is that I'm pretty sure the image store, glance, wouldn't let me do that. Additionally, since the compute component also has its own ceph pool, I'm pretty sure it won't let me migrate the data through openstack. Hm wouldn’t it be possible to do something similar ala: # list object from src pool rados ls objects loop | filter-obj-id | while read obj; do # export $obj to local disk rados -p pool-wth-too-many-pgs get $obj # import $obj from local disk to new pool rados -p better-sized-pool put $obj done You would also have issues with snapshots if you do this on an RBD pool. That's unfortunately not feasible. What isn’t possible, export-import objects out-and-in of pools or snapshots issues? /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mapping users to different rgw pools
My vag understanding is that this is mapped through the zone associated with the specific user. So define your desiree pools and zones mapping to the pools and assign users to desired regions+zones and thus to different pools per user. Den 13/03/2015 kl. 07.48 skrev Sreenath BH bhsreen...@gmail.com: Hi all, Can one Radow gateway support more than one pool for storing objects? And as a follow-up question, is there a way to map different users to separate rgw pools so that their obejcts get stored in different pools? thanks, Sreenath ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Giant 0.87 update on CentOs 7
All, Got a test cluster on a 4 node CentOs 7, which fails to pull updates, any hints: [root@n1 ~]# rpm -qa | grep -i ceph python-ceph-0.87-0.el7.centos.x86_64 ceph-release-1-0.el7.noarch ceph-deploy-1.5.21-0.noarch ceph-common-0.87-0.el7.centos.x86_64 libcephfs1-0.87-0.el7.centos.x86_64 ceph-0.87-0.el7.centos.x86_64 [root@n1 ~]# cat /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch baseurl=http://ceph.com/rpm-giant/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [Ceph-noarch] name=Ceph noarch packages baseurl=http://ceph.com/rpm-giant/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [ceph-source] name=Ceph source packages baseurl=http://ceph.com/rpm-giant/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 # Isn’t above yum repos corrects as Dependency checks fails talking of Firefly 0.80 or what? [root@n1 ~]# yum -y update … --- Package python-ceph.x86_64 1:0.87-0.el7.centos will be obsoleted --- Package python-ceph-compat.x86_64 1:0.80.7-0.4.el7 will be obsoleting --- Package python-cephfs.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: libcephfs1 = 1:0.80.7 for package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 --- Package python-rados.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: librados2 = 1:0.80.7 for package: 1:python-rados-0.80.7-0.4.el7.x86_64 --- Package python-rbd.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: librbd1 = 1:0.80.7 for package: 1:python-rbd-0.80.7-0.4.el7.x86_64 -- Finished Dependency Resolution Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 (epel) Requires: librados2 = 1:0.80.7 Installed: 1:librados2-0.87-0.el7.centos.x86_64 (@Ceph) librados2 = 1:0.87-0.el7.centos Available: 1:librados2-0.86-0.el7.centos.x86_64 (Ceph) librados2 = 1:0.86-0.el7.centos Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 (epel) Requires: libcephfs1 = 1:0.80.7 Installed: 1:libcephfs1-0.87-0.el7.centos.x86_64 (@Ceph) libcephfs1 = 1:0.87-0.el7.centos Available: 1:libcephfs1-0.86-0.el7.centos.x86_64 (Ceph) libcephfs1 = 1:0.86-0.el7.centos Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 (epel) Requires: librbd1 = 1:0.80.7 Installed: 1:librbd1-0.87-0.el7.centos.x86_64 (@Ceph) librbd1 = 1:0.87-0.el7.centos Available: 1:librbd1-0.86-0.el7.centos.x86_64 (Ceph) librbd1 = 1:0.86-0.el7.centos You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest [root@n1 ~]# ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Giant 0.87 update on CentOs 7
On 22/03/2015, at 22.28, Steffen W Sørensen ste...@me.com wrote: All, Got a test cluster on a 4 node CentOs 7, which fails to pull updates, any hints: [root@n1 ~]# rpm -qa | grep -i ceph python-ceph-0.87-0.el7.centos.x86_64 ceph-release-1-0.el7.noarch ceph-deploy-1.5.21-0.noarch ceph-common-0.87-0.el7.centos.x86_64 libcephfs1-0.87-0.el7.centos.x86_64 ceph-0.87-0.el7.centos.x86_64 [root@n1 ~]# cat /etc/yum.repos.d/ceph.repo [Ceph] name=Ceph packages for $basearch baseurl=http://ceph.com/rpm-giant/el7/$basearch http://ceph.com/rpm-giant/el7/$basearch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [Ceph-noarch] name=Ceph noarch packages baseurl=http://ceph.com/rpm-giant/el7/noarch http://ceph.com/rpm-giant/el7/noarch enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 [ceph-source] name=Ceph source packages baseurl=http://ceph.com/rpm-giant/el7/SRPMS http://ceph.com/rpm-giant/el7/SRPMS enabled=1 gpgcheck=1 type=rpm-md gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc priority=1 # Isn’t above yum repos corrects as Dependency checks fails talking of Firefly 0.80 or what? [root@n1 ~]# yum -y update … --- Package python-ceph.x86_64 1:0.87-0.el7.centos will be obsoleted --- Package python-ceph-compat.x86_64 1:0.80.7-0.4.el7 will be obsoleting --- Package python-cephfs.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: libcephfs1 = 1:0.80.7 for package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 --- Package python-rados.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: librados2 = 1:0.80.7 for package: 1:python-rados-0.80.7-0.4.el7.x86_64 --- Package python-rbd.x86_64 1:0.80.7-0.4.el7 will be obsoleting -- Processing Dependency: librbd1 = 1:0.80.7 for package: 1:python-rbd-0.80.7-0.4.el7.x86_64 -- Finished Dependency Resolution Error: Package: 1:python-rados-0.80.7-0.4.el7.x86_64 (epel) Requires: librados2 = 1:0.80.7 Installed: 1:librados2-0.87-0.el7.centos.x86_64 (@Ceph) librados2 = 1:0.87-0.el7.centos Available: 1:librados2-0.86-0.el7.centos.x86_64 (Ceph) librados2 = 1:0.86-0.el7.centos Error: Package: 1:python-cephfs-0.80.7-0.4.el7.x86_64 (epel) Requires: libcephfs1 = 1:0.80.7 Installed: 1:libcephfs1-0.87-0.el7.centos.x86_64 (@Ceph) libcephfs1 = 1:0.87-0.el7.centos Available: 1:libcephfs1-0.86-0.el7.centos.x86_64 (Ceph) libcephfs1 = 1:0.86-0.el7.centos Error: Package: 1:python-rbd-0.80.7-0.4.el7.x86_64 (epel) Requires: librbd1 = 1:0.80.7 Installed: 1:librbd1-0.87-0.el7.centos.x86_64 (@Ceph) librbd1 = 1:0.87-0.el7.centos Available: 1:librbd1-0.86-0.el7.centos.x86_64 (Ceph) librbd1 = 1:0.86-0.el7.centos You could try using --skip-broken to work around the problem You could try running: rpm -Va --nofiles --nodigest [root@n1 ~]# :) Now disabling epel which seems the confusing Repo above just renders me with TOs from http://ceph.com http://ceph.com/… are Ceph.com http://ceph.com/ down currently? [root@n1 ~]# yum -y --disablerepo epel --disablerepo ceph-source update Loaded plugins: fastestmirror, priorities http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: (28, 'Connection timed out after 30403 milliseconds') Trying other mirror. http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: (28, 'Connection timed out after 30042 milliseconds') Trying other mirror. … /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Giant 0.87 update on CentOs 7
:) Now disabling epel which seems the confusing Repo above just renders me with TOs from http://ceph.com http://ceph.com/… are Ceph.com http://ceph.com/ down currently? http://eu.ceph.com http://eu.ceph.com/ answers currently… properly the trans-atlantic line or my provider :/ [root@n1 ~]# yum -y --disablerepo epel --disablerepo ceph-source update Loaded plugins: fastestmirror, priorities http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: (28, 'Connection timed out after 30403 milliseconds') Trying other mirror. http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: [Errno 12] Timeout on http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: http://ceph.com/rpm-giant/el7/x86_64/repodata/repomd.xml: (28, 'Connection timed out after 30042 milliseconds') Trying other mirror. … /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cciss driver package for RHEL7
On 19/03/2015, at 17.46, O'Reilly, Dan daniel.orei...@dish.com wrote: The problem with using the hpsa driver is that I need to install RHEL 7.1 on a Proliant system using the SmartArray 400 controller. Therefore, I need a driver that supports it to even install RHEL 7.1. RHEL 7.1 doesn’t generically recognize that controller out of the box. I known, got the same issue when utilizing old proliants for test/PoC with newer SW. Maybe we should try to use such old raid ctlrs similar to this for OSD journaling and avoid wearability issues as with SSDs :) /Steffen From: Steffen W Sørensen [mailto:ste...@me.com] Sent: Thursday, March 19, 2015 10:08 AM To: O'Reilly, Dan Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] cciss driver package for RHEL7 On 19/03/2015, at 15.57, O'Reilly, Dan daniel.orei...@dish.com wrote: I understand there’s a KMOD_CCISS package available. However, I can’t find it for download. Anybody have any ideas? Oh I believe HP swapped cciss for hpsa (Smart Array) driver long ago… so maybe only download cciss latest source and then compile your self, or… Sourceforge says: *New* The cciss driver has been removed from RHEL7 and SLES12. If you really want cciss on RHEL7 checkout the elrepo directory. /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Issue with Ceph mons starting up- leveldb store
On 19/03/2015, at 15.50, Andrew Diller dill...@gmail.com wrote: We moved the data dir over (/var/lib/ceph/mon) from one of the good ones to this 3rd node, but it won't start- we see this error, after which no further logging occurs: 2015-03-19 06:25:05.395210 7fcb57f1c7c0 -1 failed to create new leveldb store 2015-03-19 06:25:05.417716 7f272ae0d7c0 0 ceph version 0.61.9 (7440dcd135750839fa0f00263f80722ff6f51e90), process ceph-mon, pid 37967 Does anyone have an idea why the mon process would have issues creating the leveldb store (we've seen this error since the outage) and where does it create it? Is it part of the paxos implementation? Just guessing... maybe the simple offen RC, permission on dirs along the path. /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cciss driver package for RHEL7
On 19/03/2015, at 15.57, O'Reilly, Dan daniel.orei...@dish.com wrote: I understand there’s a KMOD_CCISS package available. However, I can’t find it for download. Anybody have any ideas? Oh I believe HP swapped cciss for hpsa (Smart Array) driver long ago… so maybe only download cciss latest source and then compile your self, or… Sourceforge http://cciss.sourceforge.net/ says: *New* The cciss driver has been removed from RHEL7 and SLES12. If you really want cciss on RHEL7 checkout the elrepo http://elrepo.org/ directory. /Steffen___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [SPAM] Changing pg_num = RBD VM down !
On 16/03/2015, at 12.23, Alexandre DERUMIER aderum...@odiso.com wrote: We use Proxmox, so I think it uses librbd ? As It's me that I made the proxmox rbd plugin, I can confirm that yes, it's librbd ;) Is the ceph cluster on dedicated nodes ? or vms are running on same nodes than osd daemons ? My cluster have Ceph OSDs+MONs on seperate PVE nodes, no VMs And I precise that not all VMs on that pool crashed, only some of them (a large majority), and on a same host, some crashed and others not. Is the vm crashed, like no more qemu process ? or is it the guest os which is crashed ? Hmm long time now, remember VM status was stopped, resumed didn't work aka they were started again asap :) (do you use virtio, virtio-scsi or ide for your guest ?) virtio /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] [SPAM] Changing pg_num = RBD VM down !
On 16/03/2015, at 11.14, Florent B flor...@coppint.com wrote: On 03/16/2015 11:03 AM, Alexandre DERUMIER wrote: This is strange, that could be: - qemu crash, maybe a bug in rbd block storage (if you use librbd) - oom-killer on you host (any logs ?) what is your qemu version ? Now, we have version 2.1.3. Some VMs that stopped were running for a long time, but some other had only 4 days uptime. And I precise that not all VMs on that pool crashed, only some of them (a large majority), and on a same host, some crashed and others not. We use Proxmox, so I think it uses librbd ? I had the same issue once also when bumping up PG_NUM, majority of my ProxMox VMs stopped. I believe this might be due to heavy rebalancing causing time out when VMs tries to do IO OPs and thus generating kernel panics. Next time around I want to go smaller increments of pg_num and hopefully avoid this. I follow the need for more PGs when having more OSDs, but how come PGs gets to few when adding more objects/data to a pool? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Add monitor unsuccesful
On 12/03/2015, at 03.08, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Thanks Steffen I have followed everything not sure what is going on, the mon keyring and client admin are individual? Per mon host? Or do I need to copy from the first initial mon node? I'm no expert, but I would assume keyring could be both as long as it got the right permissions. I followed the manually route once with success /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Add monitor unsuccesful
On 12/03/2015, at 20.00, Jesus Chavez (jeschave) jesch...@cisco.com wrote: Thats what I thought and did actually the monmap and keyring were copied to the new monitor and there with 2 elements I did the mkfs thing and still have that Messages, do I need osd configured? Because I have non and I am not sure if it is requiered ... Also is weird that monmap is not taking the new monitor I think I should try to configure the 3 monitors as initial monitors an see how it goes Dunno about your config, but I seem to remember when I decommissioned one mon instance and addition of a new on another node that I needed to have mon.id section in ceph.conf inorder to be able to start the monitor. ceph.conf snippet: [osd] osd mount options xfs = rw,noatime,nobarrier,logbsize=256k,logbufs=8,allocsize=4M,attr2,delaylog,inode64,noquota keyring = /var/lib/ceph/osd/ceph-$id/keyring ; Tuning ;# By default, Ceph makes 3 replicas of objects. If you want to make four ;# copies of an object the default value--a primary copy and three replica ;# copies--reset the default values as shown in 'osd pool default size'. ;# If you want to allow Ceph to write a lesser number of copies in a degraded ;# state, set 'osd pool default min size' to a number less than the ;# 'osd pool default size' value. osd pool default size = 2 # Write an object 2 times. osd pool default min size = 1 # Allow writing one copy in a degraded state. ;# Ensure you have a realistic number of placement groups. We recommend ;# approximately 100 per OSD. E.g., total number of OSDs multiplied by 100 ;# divided by the number of replicas (i.e., osd pool default size). So for ;# 10 OSDs and osd pool default size = 3, we'd recommend approximately ;# (100 * 10) / 3 = 333. ;# got 24 OSDs = 1200 pg, but this is not a full production site, so let's settle for 1024 to lower cpu load osd pool default pg num = 1024 osd pool default pgp num = 1024 client cache size = 131072 osd client op priority = 40 osd op threads = 8 osd client message size cap = 512 filestore min sync interval = 10 filestore max sync interval = 60 ;filestore queue max bytes = 10485760 ;filestore queue max ops = 50 ;filestore queue committing max ops = 500 ;filestore queue committing max bytes = 104857600 ;filestore op threads = 2 recovery max active = 2 recovery op priority = 30 osd max backfills = 2 ; Journal Tuning journal size = 5120 ;journal max write bytes = 1073714824 ;journal max write entries = 1 ;journal queue max ops = 5 ;journal queue max bytes = 1048576 [mon.0] host = node4 mon addr = 10.0.3.4:6789 [mon.1] host = node2 mon addr = 10.0.3.2:6789 [mon.2] host = node1 mon addr = 10.0.3.1:6789 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Duplication name Container
On 11/03/2015, at 15.31, Wido den Hollander w...@42on.com wrote: On 03/11/2015 03:23 PM, Jimmy Goffaux wrote: Hello All, I use Ceph in production for several months. but i have an errors with Ceph Rados Gateway for multiple users. I am faced with the following error: Error trying to create container 'xs02': 409 Conflict: BucketAlreadyExists Which corresponds to the documentation : http://ceph.com/docs/master/radosgw/s3/bucketops/ By which means I can avoid this kind of problem? You can not. Bucket names are unique inside the RADOS Gateway. Just as with Amazon S3. Well it can be avoided but not at the Ceph level but at your Application level :) Either ignore already exist errors in your App or try to verify bucket exists before creating buckets... /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote: On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Ah here: http://tracker.ceph.com/projects/rgw/issues Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Please also backport to Giant if possible :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Add monitor unsuccesful
On 12/03/2015, at 00.55, Jesus Chavez (jeschave) jesch...@cisco.com wrote: can anybody tell me a good blog link that explain how to add monitor? I have tried manually and also with ceph-deploy without success =( Dunno if these might help U: http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#adding-a-monitor-manual http://cephnotes.ksperis.com/blog/2013/08/29/mon-failed-to-start /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 11/03/2015, at 08.19, Steffen W Sørensen ste...@me.com wrote: On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Asked my vendor what confuses their App about the reply. Would be nice if they could work against Ceph S3 :) 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Ok, any resolution/work-around to this? Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Thanks! BTW running Giant: [root@rgw ~]# rpm -qa| grep -i ceph httpd-tools-2.2.22-1.ceph.el6.x86_64 ceph-common-0.87.1-0.el6.x86_64 mod_fastcgi-2.4.7-1.ceph.el6.x86_64 libcephfs1-0.87.1-0.el6.x86_64 xfsprogs-3.1.1-14_ceph.el6.x86_64 ceph-radosgw-0.87.1-0.el6.x86_64 httpd-2.2.22-1.ceph.el6.x86_64 python-ceph-0.87.1-0.el6.x86_64 ceph-0.87.1-0.el6.x86_64 [root@rgw ~]# uname -a Linux rgw.sprawl.dk 2.6.32-504.8.1.el6.x86_64 #1 SMP Wed Jan 28 21:11:36 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux [root@rgw ~]# cat /etc/redhat-release CentOS release 6.6 (Final) signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 10/03/2015, at 23.31, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: What kind of application is that? Commercial Email platform from Openwave.com Maybe it could be worked around using an apache rewrite rule. In any case, I opened issue #11091. Okay, how, by rewriting the response? Thanks, where can tickets be followed/viewed? Asked my vendor what confuses their App about the reply. Would be nice if they could work against Ceph S3 :) 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? Is there something wrong since I get multiple ACL object for this bucket everytime my App tries to recreate same bucket or is this a feature/bug in radosGW? That's a bug. Ok, any resolution/work-around to this? Not at the moment. There's already issue #6961, I bumped its priority higher, and we'll take a look at it. Thanks! /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] EC Pool and Cache Tier Tuning
On 09/03/2015, at 22.44, Nick Fisk n...@fisk.me.uk wrote: Either option #1 or #2 depending on if your data has hot spots or you need to use EC pools. I'm finding that the cache tier can actually slow stuff down depending on how much data is in the cache tier vs on the slower tier. Writes will be about the same speed for both solutions, reads will be a lot faster using a cache tier if the data resides in it. Of course, a large cache tier miss rate would be a 'hit' on perf :) Assuming that RBD client/OS page caching do help read OPs to some degree, though memory can't cache as much data as a larger SSD. /Steffen -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Steffen Winther Sent: 09 March 2015 20:47 To: ceph-users@lists.ceph.com Subject: Re: [ceph-users] EC Pool and Cache Tier Tuning Nick Fisk nick@... writes: My Ceph cluster comprises of 4 Nodes each with the following:- 10x 3TB WD Red Pro disks - EC pool k=3 m=3 (7200rpm) 2x S3700 100GB SSD's (20k Write IOPs) for HDD Journals 1x S3700 400GB SSD (35k Write IOPs) for cache tier - 3x replica If I have following 4x node config: 2x S3700 200GB SSD's 4x 4TB HDDs What config to aim for to optimize RBD write/read OPs: 1x S3700 200GB SSD for 4x journals 1x S3700 200GB cache tier 4x 4TB HDD OSD disk or: 2x S3700 200GB SSD for 2x journals 4x 4TB HDD OSD disk or: 2x S3700 200GB cache tier 4x 4TB HDD OSD disk /Steffen ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] tgt and krbd
On 06/03/2015, at 22.47, Jake Young jak3...@gmail.com wrote: I wish there was a way to incorporate a local cache device into tgt with librbd backends. What about a ram disk device like rapid disk+cache in front of your rbd block device http://www.rapiddisk.org/?page_id=15#rapiddisk /Steffen I could try that in my VM to prototype the solution before I buy hardware. Note this from given URL: 'Do not use this in a virtual guest or with a loopback device. You will not see any performance improvements for reasons I do not feel like explaining at the moment. In fact, the performance will be worse in such environments. Only use this with an actual physical disk device.' RAM based cache is pretty dangerous for this application. If I reboot the VM and don't disconnect the initiators, there would most likely be data corruption, or at the very least data loss. I know, it's a trade of between perf vs integrity, but also note this: 'In RapidCache, all writes are cached to a rxdsk volume but also written to disk immediately. All disk reads are cached.' 'Enable general block device caching for: (1) Locally attached disk devices and (2) Remotely attached disk devices mapped over a Storage Area Network (SAN).' Haven't utilized such my self yet. I'm still wondering about the difference between rapid cache vs normal linux page cache. Believe rapid cache might ack write faster to an app and then handle the write to spindle following, and again looking at authors numbers write seems identical to spindle numbers so maybe write ack isn't given before staged to underlying spindle. Need to dig deeper into rapid cache. /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] S3 RadosGW - Create bucket OP
On 06/03/2015, at 12.24, Steffen W Sørensen ste...@me.com wrote: 3. What are BCP for maintaining GW pools, need I run something like GC / cleanup OPs / log object pruning etc. any pointers to doc here for? Is this all manitaince one should consider on pools for a GW instance? http://ceph.com/docs/master/radosgw/purge-temp/ /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] S3 RadosGW - Create bucket OP
Hi Check the S3 Bucket OPS at : http://ceph.com/docs/master/radosgw/s3/bucketops/ I've read that as well, but I'm having other issues getting an App to run against our Ceph S3 GW, maybe you have a few hints on this as well... Got the cluster working for rbd+cephFS and have initial verified the S3 service as working. But have two-three issues currently: 1. when the App initial launches it want to create a bucket to hold App data, seems the bucket gets created, only the App (which is known to work against AWS + Scality S3) doesn't recognize the response so it never gets to run. Got a tcp dump of: Request: PUT /mssCl/ HTTP/1.1 Host: rgw.gsp Authorization: AWS auth id Date: Fri, 06 Mar 2015 10:41:14 GMT Content-Length: 0 Response: HTTP/1.1 200 OK Date: Fri, 06 Mar 2015 10:41:14 GMT Server: Apache/2.2.22 (Fedora) Connection: close Transfer-Encoding: chunked Content-Type: application/xml This response makes the App say: S3.createBucket, class S3, code UnexpectedContent, message Inconsistency in S3 response. error response is not a valid xml message Are our S3 GW not responding properly? 2. at every create bucket OP the GW create what looks like new containers for ACLs in .rgw pool, is this normal or howto avoid such multiple objects clottering the GW pools? # rados -p .rgw ls .bucket.meta.mssCl:default.6309817.1 .bucket.meta.mssCl:default.6187712.3 .bucket.meta.mssCl:default.6299841.7 .bucket.meta.mssCl:default.6309817.5 .bucket.meta.mssCl:default.6187712.2 .bucket.meta.mssCl:default.6187712.19 .bucket.meta.mssCl:default.6187712.12 mssCl ... # rados -p .rgw listxattr .bucket.meta.mssCl:default.6187712.12 ceph.objclass.version user.rgw.acl the GW log shows: 2015-03-06 09:16:04.505655 7fd3ac5e6700 1 == starting new request req=0x7fd3d8009c40 = 2015-03-06 09:16:04.506131 7fd3ac5e6700 2 req 2:0.000484::PUT /mssCl/::initializing 2015-03-06 09:16:04.507110 7fd3ac5e6700 10 host=rgw.gsp rgw_dns_name=rgw.gsp.sprawl.dk 2015-03-06 09:16:04.510138 7fd3ac5e6700 10 s-object=NULL s-bucket=mssCl 2015-03-06 09:16:04.511087 7fd3ac5e6700 2 req 2:0.005439:s3:PUT /mssCl/::getting op 2015-03-06 09:16:04.512041 7fd3ac5e6700 2 req 2:0.006395:s3:PUT /mssCl/:create_bucket:authorizing 2015-03-06 09:16:04.515173 7fd3ac5e6700 20 get_obj_state: rctx=0x7fd3dc007600 obj=.users:WL4EJJYTLVYXEHNR6QSA state=0x7fd3dc00c5d8 s-pre fetch_data=0 2015-03-06 09:16:04.516185 7fd3ac5e6700 10 cache get: name=.users+WL4EJJYTLVYXEHNR6QSA : type miss (requested=6, cached=3) 2015-03-06 09:16:04.542851 7fd3ac5e6700 10 cache put: name=.users+WL4EJJYTLVYXEHNR6QSA 2015-03-06 09:16:04.542933 7fd3ac5e6700 10 moving .users+WL4EJJYTLVYXEHNR6QSA to cache LRU end 2015-03-06 09:16:04.544187 7fd3ac5e6700 20 get_obj_state: s-obj_tag was set empty 2015-03-06 09:16:04.546138 7fd3ac5e6700 10 cache get: name=.users+WL4EJJYTLVYXEHNR6QSA : hit 2015-03-06 09:16:04.550060 7fd3ac5e6700 20 get_obj_state: rctx=0x7fd3dc007600 obj=.users.uid:mx9mss state=0x7fd3dc00daa8 s-prefetch_data =0 2015-03-06 09:16:04.551777 7fd3ac5e6700 10 cache get: name=.users.uid+mx9mss : hit 2015-03-06 09:16:04.551927 7fd3ac5e6700 20 get_obj_state: s-obj_tag was set empty 2015-03-06 09:16:04.554924 7fd3ac5e6700 10 cache get: name=.users.uid+mx9mss : hit 2015-03-06 09:16:04.559059 7fd3ac5e6700 10 chain_cache_entry: cache_locator=.users.uid+mx9mss 2015-03-06 09:16:04.565872 7fd3ac5e6700 10 get_canon_resource(): dest=/mssCl/ 2015-03-06 09:16:04.566637 7fd3ac5e6700 10 auth_hdr: PUT Fri, 06 Mar 2015 08:16:03 GMT /mssCl/ 2015-03-06 09:16:04.600259 7fd3ac5e6700 15 calculated digest=7nava2kuDurTiUVqxpn8OkP5I10= 2015-03-06 09:16:04.600529 7fd3ac5e6700 15 auth_sign=7nava2kuDurTiUVqxpn8OkP5I10= 2015-03-06 09:16:04.600563 7fd3ac5e6700 15 compare=0 2015-03-06 09:16:04.600653 7fd3ac5e6700 2 req 2:0.095001:s3:PUT /mssCl/:create_bucket:reading permissions 2015-03-06 09:16:04.600779 7fd3ac5e6700 2 req 2:0.095133:s3:PUT /mssCl/:create_bucket:init op 2015-03-06 09:16:04.600836 7fd3ac5e6700 2 req 2:0.095191:s3:PUT /mssCl/:create_bucket:verifying op mask 2015-03-06 09:16:04.600888 7fd3ac5e6700 20 required_mask= 2 user.op_mask=7 2015-03-06 09:16:04.600925 7fd3ac5e6700 2 req 2:0.095281:s3:PUT /mssCl/:create_bucket:verifying op permissions 2015-03-06 09:16:04.643915 7fd3ac5e6700 2 req 2:0.138261:s3:PUT /mssCl/:create_bucket:verifying op params 2015-03-06 09:16:04.644421 7fd3ac5e6700 2 req 2:0.138775:s3:PUT /mssCl/:create_bucket:executing 2015-03-06 09:16:04.644912 7fd3ac5e6700 20 get_obj_state: rctx=0x7fd3ac5e55d0 obj=.rgw:mssCl state=0x7fd3dc00ebe8 s-prefetch_data=0 2015-03-06 09:16:04.645328 7fd3ac5e6700 10 cache get: name=.rgw+mssCl : type miss (requested=6, cached=19) 2015-03-06 09:16:04.665683 7fd3ac5e6700 10 cache put: name=.rgw+mssCl 2015-03-06 09:16:04.665844 7fd3ac5e6700 10 moving .rgw+mssCl to cache LRU end 2015-03-06 09:16:04.666369 7fd3ac5e6700 20 get_obj_state: s-obj_tag was set empty 2015-03-06 09:16:04.666437 7fd3ac5e6700 0 WARNING: couldn't
Re: [ceph-users] tgt and krbd
On 06/03/2015, at 16.50, Jake Young jak3...@gmail.com wrote: After seeing your results, I've been considering experimenting with that. Currently, my iSCSI proxy nodes are VMs. I would like to build a few dedicated servers with fast SSDs or fusion-io devices. It depends on my budget, it's hard to justify getting a card that costs 10x the rest of the server... I would run all my tgt instances in containers pointing to the rbd disk+cache device. A fusion-io device could support many tgt containers. I don't really want to go back to krbd. I have a few rbd's that are format 2 with striping, there aren't any stable kernels that support that (or any kernels at all yet for fancy striping). I wish there was a way to incorporate a local cache device into tgt with librbd backends. What about a ram disk device like rapid disk+cache in front of your rbd block device http://www.rapiddisk.org/?page_id=15#rapiddisk /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Mail not reaching the list?
On 01/03/2015, at 06.03, Sudarshan Pathak sushan@gmail.com wrote: Mail is landed in Spam. Here is message from google: Why is this message in Spam? It has a from address in yahoo.com but has failed yahoo.com's required tests for authentication. Learn more Maybe Tony didn't send through a Yahoo SMTP service, thus didn't get his outbound messages properly DKIM signed by Yahoo thus it's requested DMARC policy will not be valid hence Gmail as requested by Yahoo policy should rejects such messages: ~$ dig +short txt _dmarc.yahoo.com v=DMARC1\; p=reject\; sp=none\; pct=100\; rua=mailto:dmarc-yahoo-...@yahoo-inc.com, mailto:dmarc_y_...@yahoo.com\;; Regards, Sudarshan Pathak On Sat, Feb 28, 2015 at 9:25 PM, Tony Harris kg4...@yahoo.com wrote: Hi, I've sent a couple of emails to the list since subscribing, but I've never seen them reach the list; I was just wondering if there was something wrong? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 17.20, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. When launching radosgw it logs this: ... 2015-02-27 18:33:58.663960 7f200b67a8a0 20 rados-read obj-ofs=0 read_ofs=0 read_len=524288 2015-02-27 18:33:58.675821 7f200b67a8a0 20 rados-read r=0 bl.length=678 2015-02-27 18:33:58.676532 7f200b67a8a0 10 cache put: name=.rgw.root+zone_info.default 2015-02-27 18:33:58.676573 7f200b67a8a0 10 moving .rgw.root+zone_info.default to cache LRU end 2015-02-27 18:33:58.677415 7f200b67a8a0 2 zone default is master 2015-02-27 18:33:58.677666 7f200b67a8a0 20 get_obj_state: rctx=0x2a85cd0 obj=.rgw.root:region_map state=0x2a86498 s-prefetch_data=0 2015-02-27 18:33:58.677760 7f200b67a8a0 10 cache get: name=.rgw.root+region_map : miss 2015-02-27 18:33:58.709411 7f200b67a8a0 10 cache put: name=.rgw.root+region_map 2015-02-27 18:33:58.709846 7f200b67a8a0 10 adding .rgw.root+region_map to cache LRU end 2015-02-27 18:33:58.957336 7f1ff17f2700 2 garbage collection: start 2015-02-27 18:33:58.959189 7f1ff0df1700 20 BucketsSyncThread: start 2015-02-27 18:33:58.985486 7f200b67a8a0 0 framework: fastcgi 2015-02-27 18:33:58.985778 7f200b67a8a0 0 framework: civetweb 2015-02-27 18:33:58.985879 7f200b67a8a0 0 framework conf key: port, val: 7480 2015-02-27 18:33:58.986462 7f200b67a8a0 0 starting handler: civetweb 2015-02-27 18:33:59.032173 7f1fc3fff700 20 UserSyncThread: start 2015-02-27 18:33:59.214739 7f200b67a8a0 0 starting handler: fastcgi 2015-02-27 18:33:59.286723 7f1fb59e8700 10 allocated request req=0x2aa1b20 2015-02-27 18:34:00.533188 7f1fc3fff700 20 RGWRados::pool_iterate: got {my user name} 2015-02-27 18:34:01.038190 7f1ff17f2700 2 garbage collection: stop 2015-02-27 18:34:01.670780 7f1fc3fff700 20 RGWUserStatsCache: sync user={my user name} 2015-02-27 18:34:01.687730 7f1fc3fff700 0 ERROR: can't read user header: ret=-2 2015-02-27 18:34:01.689734 7f1fc3fff700 0 ERROR: sync_user() failed, user={my user name} ret=-2 Why does it seem to find my radosgw defined user name as a pool and what might bring it to fail to read user header? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
That's the old way of defining pools. The new way involves in defining a zone and placement targets for that zone. Then you can have different default placement targets for different users. Anu URL/pointers to better understand such matters? Do you have any special config in your ceph.conf? E.g., did you modify the rgw_enable_apis configurable by any chance? # tail -20 /etc/ceph/ceph.conf [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw user = apache rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 What is the purpose of the data directory btw? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
Sorry forgot to send to the list... Begin forwarded message: From: Steffen W Sørensen ste...@me.com Subject: Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed Date: 27. feb. 2015 18.29.51 CET To: Yehuda Sadeh-Weinraub yeh...@redhat.com It seems that your request did find its way to the gateway, but the question here is why doesn't it match to a known operation. This really looks like a valid list all buckets request, so I'm not sure what's happening. I'd look at two things first. One is the '{fqdn}' string, which I'm not sure whether that's the actual string that you have, or whether you just replaced it for the sake of anonymity. I replaced for anonymity thou I run on private IP but still :) The second is the port number, which should be fine, but maybe the fact that it appears as part of the script uri triggers some issue. Hmm will try with default port 80... though I would assume that anything before the 'slash' gets cut off as part of the hostname[:port] portion. Makes not difference using port 80. ... 2015-02-27 18:15:43.402729 7f37889e0700 20 SERVER_PORT=80 2015-02-27 18:15:43.402747 7f37889e0700 20 SERVER_PROTOCOL=HTTP/1.1 2015-02-27 18:15:43.402765 7f37889e0700 20 SERVER_SIGNATURE= 2015-02-27 18:15:43.402783 7f37889e0700 20 SERVER_SOFTWARE=Apache/2.2.22 (Fedora) 2015-02-27 18:15:43.402814 7f37889e0700 1 == starting new request req=0x7f37b80083d0 = 2015-02-27 18:15:43.403157 7f37889e0700 2 req 1:0.000345::GET /::initializing 2015-02-27 18:15:43.403491 7f37889e0700 10 host={fqdn} rgw_dns_name={fqdn} 2015-02-27 18:15:43.404624 7f37889e0700 2 req 1:0.001816::GET /::http status=405 2015-02-27 18:15:43.404676 7f37889e0700 1 == req done req=0x7f37b80083d0 http_status=405 == 2015-02-27 18:15:43.404901 7f37889e0700 20 process_request() returned -2003 I'm not sure how to define my radosgw user, i made one with full rights key type s3: # radosgw-admin user info --uid='{user name}' { user_id: {user name}, display_name: test user for testlab, email: {email}, suspended: 0, max_buckets: 1000, auid: 0, subusers: [], keys: [ { user: {user name}, access_key: WL4EJJYTLVYXEHNR6QSA, secret_key: {secret}}], swift_keys: [], caps: [], op_mask: read, write, delete, default_placement: , placement_tags: [], bucket_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, user_quota: { enabled: false, max_size_kb: -1, max_objects: -1}, temp_url_keys: []} When authenticating to the S3 API should I then use the unencrypted access key string or the encrypted seen above plus my secret? Howto verify if I authenticate successfully through S3 maybe this is my problem? test example: #!/usr/bin/python import boto import boto.s3.connection access_key = 'WL4EJJYTLVYXEHNR6QSA' secret_key = '{secret}' conn = boto.connect_s3( aws_access_key_id = access_key, aws_secret_access_key = secret_key, host = '{fqdn}', port = 8005, debug = 1, is_secure=False, # uncomment if you are not using ssl calling_format = boto.s3.connection.OrdinaryCallingFormat(), ) ## Any access on conn object fails with 405 not allowed for bucket in conn.get_all_buckets(): print {name}\t{created}.format( name = bucket.name, created = bucket.creation_date, ) bucket = conn.create_bucket('my-new-bucket') How does one btw control/map a user to/with a Ceph Pool or will an user with full right be able to create Ceph Pools through the admin API? I've added a pool to radosgw before creating my user with --pool=owmblob option not sure though that this will 'limit' a user to a default pool like that. Would have thought that this would set the default_placement attribute on the user then. Any good URLs to doc on the understanding of such matters as ACL, users and pool mapping etc in a gateway are also appreciated. # radosgw-admin pools list [ { name: owmblob}] signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
rgw enable apis = s3 Commenting this out makes it work :) [root@rgw tests3]# ./lsbuckets.py [root@rgw tests3]# ./lsbuckets.py my-new-bucket 2015-02-27T17:49:04.000Z [root@rgw tests3]# ... 2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2]) 2015-02-27 18:49:22.625672 7f48f2bdd700 2 req 4:0.350444:s3:PUT /my-new-bucket/:create_bucket:http status=200 2015-02-27 18:49:22.625758 7f48f2bdd700 1 == req done req=0x7f4938007810 http_status=200 == ... Why I just wants a S3 API available not admin API? /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
Hi, Newbie to RadosGW+Ceph, but learning... Got a running Ceph Cluster working with rbd+CephFS clients. Now I'm trying to verify a RadosGW S3 api, but seems to have an issue with RadosGW access. I get the error (not found anything searching so far...): S3ResponseError: 405 Method Not Allowed when trying to access the rgw. Apache vhost access log file says: 10.20.0.29 - - [27/Feb/2015:14:09:04 +0100] GET / HTTP/1.1 405 27 - Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 and Apache's general error_log file says: [Fri Feb 27 14:09:04 2015] [warn] FastCGI: 10.20.0.29 GET http://{fqdn}:8005/ auth AWS WL4EJJYTLVYXEHNR6QSA:X6XR4z7Gr9qTMNDphTNlRUk3gfc= RadosGW seems to launch and run fine, though /var/log/messages at launches says: Feb 27 14:12:34 rgw kernel: radosgw[14985]: segfault at e0 ip 003fb36cb1dc sp 7fffde221410 error 4 in librados.so.2.0.0[3fb320+6d] # ps -fuapache UIDPID PPID C STIME TTY TIME CMD apache 15113 15111 0 14:07 ?00:00:00 /usr/sbin/fcgi- apache 15114 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15115 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15116 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15117 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15118 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15119 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15120 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15121 15111 0 14:07 ?00:00:00 /usr/sbin/httpd apache 15224 1 1 14:12 ?00:00:25 /usr/bin/radosgw -n client.radosgw.owmblob RadosGW create my FastCGI socket and a default .asok, (not sure why/what default socket are meant for) as well as the configured log file though it never logs anything... # tail -18 /etc/ceph/ceph.conf: [client.radosgw.owmblob] keyring = /etc/ceph/ceph.client.radosgw.keyring host = rgw rgw data = /var/lib/ceph/radosgw/ceph-rgw log file = /var/log/radosgw/client.radosgw.owmblob.log debug rgw = 20 rgw enable log rados = true rgw enable ops log = true rgw enable apis = s3 rgw cache enabled = true rgw cache lru size = 1 rgw socket path = /var/run/ceph/ceph.radosgw.owmblob.fastcgi.sock ;#rgw host = localhost ;#rgw port = 8004 rgw dns name = {fqdn} rgw print continue = true rgw thread pool size = 20 Turned out /etc/init.d/ceph-radosgw didn't chown $USER even when log_file didn't exist, assuming radosgw creates this log file when opening it, only it creates it as root not $USER, thus not output, manually chowning it and restarting GW gives output ala: 2015-02-27 15:25:14.464112 7fef463e9700 20 enqueued request req=0x25dea40 2015-02-27 15:25:14.465750 7fef463e9700 20 RGWWQ: 2015-02-27 15:25:14.465786 7fef463e9700 20 req: 0x25dea40 2015-02-27 15:25:14.465864 7fef463e9700 10 allocated request req=0x25e3050 2015-02-27 15:25:14.466214 7fef431e4700 20 dequeued request req=0x25dea40 2015-02-27 15:25:14.466677 7fef431e4700 20 RGWWQ: empty 2015-02-27 15:25:14.467888 7fef431e4700 20 CONTENT_LENGTH=0 2015-02-27 15:25:14.467922 7fef431e4700 20 DOCUMENT_ROOT=/var/www/html 2015-02-27 15:25:14.467941 7fef431e4700 20 FCGI_ROLE=RESPONDER 2015-02-27 15:25:14.467958 7fef431e4700 20 GATEWAY_INTERFACE=CGI/1.1 2015-02-27 15:25:14.467976 7fef431e4700 20 HTTP_ACCEPT_ENCODING=identity 2015-02-27 15:25:14.469476 7fef431e4700 20 HTTP_AUTHORIZATION=AWS WL4EJJYTLVYXEHNR6QSA:OAT0zVItGyp98T5mALeHz4p1fcg= 2015-02-27 15:25:14.469516 7fef431e4700 20 HTTP_DATE=Fri, 27 Feb 2015 14:25:14 GMT 2015-02-27 15:25:14.469533 7fef431e4700 20 HTTP_HOST={fqdn}:8005 2015-02-27 15:25:14.469550 7fef431e4700 20 HTTP_USER_AGENT=Boto/2.34.0 Python/2.6.6 Linux/2.6.32-504.8.1.el6.x86_64 2015-02-27 15:25:14.469571 7fef431e4700 20 PATH=/sbin:/usr/sbin:/bin:/usr/bin 2015-02-27 15:25:14.469589 7fef431e4700 20 QUERY_STRING= 2015-02-27 15:25:14.469607 7fef431e4700 20 REMOTE_ADDR=10.20.0.29 2015-02-27 15:25:14.469624 7fef431e4700 20 REMOTE_PORT=34386 2015-02-27 15:25:14.469641 7fef431e4700 20 REQUEST_METHOD=GET 2015-02-27 15:25:14.469658 7fef431e4700 20 REQUEST_URI=/ 2015-02-27 15:25:14.469677 7fef431e4700 20 SCRIPT_FILENAME=/var/www/html/s3gw.fcgi 2015-02-27 15:25:14.469694 7fef431e4700 20 SCRIPT_NAME=/ 2015-02-27 15:25:14.469711 7fef431e4700 20 SCRIPT_URI=http://{fqdn}:8005/ 2015-02-27 15:25:14.469730 7fef431e4700 20 SCRIPT_URL=/ 2015-02-27 15:25:14.469748 7fef431e4700 20 SERVER_ADDR=10.20.0.29 2015-02-27 15:25:14.469765 7fef431e4700 20 SERVER_ADMIN={email} 2015-02-27 15:25:14.469782 7fef431e4700 20 SERVER_NAME={fqdn} 2015-02-27 15:25:14.469801 7fef431e4700 20 SERVER_PORT=8005 2015-02-27 15:25:14.469818 7fef431e4700 20 SERVER_PROTOCOL=HTTP/1.1 2015-02-27 15:25:14.469835 7fef431e4700 20 SERVER_SIGNATURE= 2015-02-27 15:25:14.469852 7fef431e4700 20 SERVER_SOFTWARE=Apache/2.2.22 (Fedora) 2015-02-27
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 18.51, Steffen W Sørensen ste...@me.com wrote: rgw enable apis = s3 Commenting this out makes it work :) Thanks for helping on this initial issue! [root@rgw tests3]# ./lsbuckets.py [root@rgw tests3]# ./lsbuckets.py my-new-bucket 2015-02-27T17:49:04.000Z [root@rgw tests3]# ... 2015-02-27 18:49:22.601578 7f48f2bdd700 20 rgw_create_bucket returned ret=-17 bucket=my-new-bucket(@{i=.rgw.buckets.index,e=.rgw.buckets.extra}.rgw.buckets[default.5234475.2]) 2015-02-27 18:49:22.625672 7f48f2bdd700 2 req 4:0.350444:s3:PUT /my-new-bucket/:create_bucket:http status=200 2015-02-27 18:49:22.625758 7f48f2bdd700 1 == req done req=0x7f4938007810 http_status=200 == ... Into which pool does such user data (buckets and objects) gets stored and possible howto direct user data into a dedicated pool? [root@rgw ~]# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB .intent-log - 000 0 00000 .log- 110 0 00022 .rgw- 140 0 0 17 14 104 .rgw.buckets- 000 0 00000 .rgw.buckets.extra - 000 0 00000 .rgw.buckets.index - 010 0 02030 .rgw.control- 080 0 00000 .rgw.gc - 0 320 0 0 8302 8302 55560 .rgw.root - 130 0 0 929 61833 .usage - 000 0 00000 .users - 110 0 06453 .users.email- 110 0 03253 .users.swift- 000 0 00000 .users.uid - 120 0 0 65 54 164 Assume a bucket is a naming container for objects in a pool maybe similar to a directory with files. /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Minor flaw in /etc/init.d/ceph-radsgw script
Hi Seems there's a minor flaw in CentOS/RHEL niit script: line 91 reads: daemon --user=$user $RADOSGW -n $name should ImHO be: daemon --user=$user $RADOSGW -n $name to avoid /etc/rc.d/init.d/functions:__pids_var_run line 151 complain in dirname :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] too few pgs in cache tier
On 27/02/2015, at 17.04, Udo Lembke ulem...@polarzone.de wrote: ceph health detail HEALTH_WARN pool ssd-archiv has too few pgs Slightly different I had an issue with my Ceph Cluster underneath a PVE cluster yesterday. Had two Ceph pools for RBD virt disks, vm_images (boot hdd images) + rbd_data (extra hdd images). Then while adding pools for a rados GW (.rgw.*) suddenly health status said that my vm_images pool had too few PGs, thus I ran: ceph osd pool set vm_images pg_num larger_number ceph osd pool set vm_images pgp_num larger_number Kicking off a 20 min rebalancing with a lot of IO in the Ceph Cluster, eventually Ceph Cluster was fine again, only almost all my PVE VMs ended up in stopped state, wondering why, a watchdog thingy maybe... /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RadosGW S3ResponseError: 405 Method Not Allowed
On 27/02/2015, at 19.02, Steffen W Sørensen ste...@me.com wrote: Into which pool does such user data (buckets and objects) gets stored and possible howto direct user data into a dedicated pool? [root@rgw ~]# rados df pool name category KB objects clones degraded unfound rdrd KB wrwr KB .intent-log - 000 0 00000 .log- 110 0 00022 .rgw- 140 0 0 17 14 104 .rgw.buckets- 000 0 00000 .rgw.buckets.extra - 000 0 00000 .rgw.buckets.index - 010 0 02030 .rgw.control- 080 0 00000 .rgw.gc - 0 320 0 0 8302 8302 55560 .rgw.root - 130 0 0 929 61833 .usage - 000 0 00000 .users - 110 0 06453 .users.email- 110 0 03253 .users.swift- 000 0 00000 .users.uid - 120 0 0 65 54 164 So it's mapped into a zone (at least on my Giant version 0.87) and in my simple non-federated config it's in the default region+zone: [root@rgw ~]# radosgw-admin region list { default_info: { default_region: default}, regions: [ default]} [root@rgw ~]# radosgw-admin zone list { zones: [ default]} [root@rgw ~]# radosgw-admin region get { name: default, api_name: , is_master: true, endpoints: [], master_zone: , zones: [ { name: default, endpoints: [], log_meta: false, log_data: false}], placement_targets: [ { name: default-placement, tags: []}], default_placement: default-placement} [root@rgw ~]# radosgw-admin zone get { domain_root: .rgw, control_pool: .rgw.control, gc_pool: .rgw.gc, log_pool: .log, intent_log_pool: .intent-log, usage_log_pool: .usage, user_keys_pool: .users, user_email_pool: .users.email, user_swift_pool: .users.swift, user_uid_pool: .users.uid, system_key: { access_key: , secret_key: }, placement_pools: [ { key: default-placement, val: { index_pool: .rgw.buckets.index, data_pool: .rgw.buckets, data_extra_pool: .rgw.buckets.extra}}]} and my user if associated with the default region+zone, thus it's data goes into .rgw.buckets + .rgw.buckets.index [+ .rgw.buckets.extra] Buckets seems a naming container at the radosgw level, above the underlying Ceph pool abstraction level, 'just' providing object persistence for radosgw abstraction/object FS on top of Ceph Pools... I think. So more users associated with same region+zone can share buckets+objects? Would be nice with a drawing showing abstractions at the different levels possible woth links to details on administration at different levels :) Lot of stuff to grasp for a newbie just in the need of a S3 service for an App usage :) /Steffen signature.asc Description: Message signed with OpenPGP using GPGMail ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com