Re: [ceph-users] CephFS msg length greater than osd_max_write_size
Thanks for the reply! We will be more proactive about evicting clients in the future rather than waiting. One followup however, it seems that the filesystem going read only was only a WARNING state, which didn’t immediately catch our eye due to some other rebalancing operations. Is there a reason that this wouldn’t be a HEALTH_ERR condition since it represents a significant service degradation? Thanks! Ryan > On May 22, 2019, at 4:20 AM, Yan, Zheng wrote: > > On Tue, May 21, 2019 at 6:10 AM Ryan Leimenstoll > wrote: >> >> Hi all, >> >> We recently encountered an issue where our CephFS filesystem unexpectedly >> was set to read-only. When we look at some of the logs from the daemons I >> can see the following: >> >> On the MDS: >> ... >> 2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error >> (90) Message too long, force readonly... >> 2019-05-18 16:34:24.341 7fb3bd610700 1 mds.0.cache force file system >> read-only >> 2019-05-18 16:34:24.341 7fb3bd610700 0 log_channel(cluster) log [WRN] : >> force file system read-only >> 2019-05-18 16:34:41.289 7fb3c0616700 1 heartbeat_map is_healthy 'MDSRank' >> had timed out after 15 >> 2019-05-18 16:34:41.289 7fb3c0616700 0 mds.beacon.objmds00 Skipping beacon >> heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is >> not healthy! >> ... >> >> On one of the OSDs it was most likely targeting: >> ... >> 2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( >> v 682796'15706523 (682693'15703449,682796'15706523] >> local-lis/les=673041/673042 n=10524 ec=245563/245563 lis/c 673041/673041 >> les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 >> crt=682796'15706523 lcod 682796'15706522 mlcod 682796'15706522 active+clean] >> do_op msg data len 95146005 > osd_max_write_size 94371840 on >> osd_op(mds.0.89098:48609421 49.20b 49:d0630e4c:::mds0_sessionmap:head >> [omap-set-header,omap-set-vals] snapc 0=[] >> ondisk+write+known_if_redirected+full_force e682796) v8 >> 2019-05-18 17:10:33.695 7f813466b700 0 log_channel(cluster) log [DBG] : >> 49.31c scrub starts >> 2019-05-18 17:10:34.980 7f813466b700 0 log_channel(cluster) log [DBG] : >> 49.31c scrub ok >> 2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( >> v 682861'15706526 (682693'15703449,682861'15706526] >> local-lis/les=673041/673042 n=10525 ec=245563/245563 lis/c 673041/673041 >> les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 >> crt=682861'15706526 lcod 682859'15706525 mlcod 682859'15706525 active+clean] >> do_op msg data len 95903764 > osd_max_write_size 94371840 on >> osd_op(mds.0.91565:357877 49.20b 49:d0630e4c:::mds0_sessionmap:head >> [omap-set-header,omap-set-vals,omap-rm-keys] snapc 0=[] >> ondisk+write+known_if_redirected+full_force e683434) v8 >> … >> >> During this time there were some health concerns with the cluster. >> Significantly, since the error above seems to be related to the SessionMap, >> we had a client that had a few blocked requests for over 35948 secs (it’s a >> member of a compute cluster so we let the node drain/finish jobs before >> rebooting). We have also had some issues with certain OSDs running older >> hardware staying up/responding timely to heartbeats after upgrading to >> Nautilus, although that seems to be an iowait/load issue that we are >> actively working to mitigate separately. >> > > This prevent mds from trimming completed requests recorded in session. > which results a very large session item. To recovery, blacklist the > client that has blocked request, the restart mds. > >> We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with >> an active/standby setup between two MDS nodes. MDS clients are mounted using >> the RHEL7.6 kernel driver. >> >> My read here would be that the MDS is sending too large a message to the >> OSD, however my understanding was that the MDS should be using >> osd_max_write_size to determine the size of that message [0]. Is this maybe >> a bug in how this is calculated on the MDS side? >> >> >> Thanks! >> Ryan Leimenstoll >> rleim...@umiacs.umd.edu >> University of Maryland Institute for Advanced Computer Studies >> >> >> >> [0] https://www.spinics.net/lists/ceph-devel/msg11951.html >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CephFS msg length greater than osd_max_write_size
Hi all, We recently encountered an issue where our CephFS filesystem unexpectedly was set to read-only. When we look at some of the logs from the daemons I can see the following: On the MDS: ... 2019-05-18 16:34:24.341 7fb3bd610700 -1 mds.0.89098 unhandled write error (90) Message too long, force readonly... 2019-05-18 16:34:24.341 7fb3bd610700 1 mds.0.cache force file system read-only 2019-05-18 16:34:24.341 7fb3bd610700 0 log_channel(cluster) log [WRN] : force file system read-only 2019-05-18 16:34:41.289 7fb3c0616700 1 heartbeat_map is_healthy 'MDSRank' had timed out after 15 2019-05-18 16:34:41.289 7fb3c0616700 0 mds.beacon.objmds00 Skipping beacon heartbeat to monitors (last acked 4.00101s ago); MDS internal heartbeat is not healthy! ... On one of the OSDs it was most likely targeting: ... 2019-05-18 16:34:24.140 7f8134e6c700 -1 osd.602 pg_epoch: 682796 pg[49.20b( v 682796'15706523 (682693'15703449,682796'15706523] local-lis/les=673041/673042 n=10524 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682796'15706523 lcod 682796'15706522 mlcod 682796'15706522 active+clean] do_op msg data len 95146005 > osd_max_write_size 94371840 on osd_op(mds.0.89098:48609421 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals] snapc 0=[] ondisk+write+known_if_redirected+full_force e682796) v8 2019-05-18 17:10:33.695 7f813466b700 0 log_channel(cluster) log [DBG] : 49.31c scrub starts 2019-05-18 17:10:34.980 7f813466b700 0 log_channel(cluster) log [DBG] : 49.31c scrub ok 2019-05-18 22:17:37.320 7f8134e6c700 -1 osd.602 pg_epoch: 683434 pg[49.20b( v 682861'15706526 (682693'15703449,682861'15706526] local-lis/les=673041/673042 n=10525 ec=245563/245563 lis/c 673041/673041 les/c/f 673042/673042/0 673038/673041/668565) [602,530,558] r=0 lpr=673041 crt=682861'15706526 lcod 682859'15706525 mlcod 682859'15706525 active+clean] do_op msg data len 95903764 > osd_max_write_size 94371840 on osd_op(mds.0.91565:357877 49.20b 49:d0630e4c:::mds0_sessionmap:head [omap-set-header,omap-set-vals,omap-rm-keys] snapc 0=[] ondisk+write+known_if_redirected+full_force e683434) v8 … During this time there were some health concerns with the cluster. Significantly, since the error above seems to be related to the SessionMap, we had a client that had a few blocked requests for over 35948 secs (it’s a member of a compute cluster so we let the node drain/finish jobs before rebooting). We have also had some issues with certain OSDs running older hardware staying up/responding timely to heartbeats after upgrading to Nautilus, although that seems to be an iowait/load issue that we are actively working to mitigate separately. We are running Nautilus 14.2.1 on RHEL7.6. There is only one MDS Rank, with an active/standby setup between two MDS nodes. MDS clients are mounted using the RHEL7.6 kernel driver. My read here would be that the MDS is sending too large a message to the OSD, however my understanding was that the MDS should be using osd_max_write_size to determine the size of that message [0]. Is this maybe a bug in how this is calculated on the MDS side? Thanks! Ryan Leimenstoll rleim...@umiacs.umd.edu University of Maryland Institute for Advanced Computer Studies [0] https://www.spinics.net/lists/ceph-devel/msg11951.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rados block on SSD - performance - how to tune and get insight?
I just ran your test on a cluster with 5 hosts 2x Intel 6130, 12x 860 Evo 2TB SSD per host (6 per SAS3008), 2x bonded 10GB NIC, 2x Arista switches. Pool with 3x replication rados bench -p scbench -b 4096 10 write --no-cleanup hints = 1 Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 10 seconds or 0 objects Object prefix: benchmark_data_dc1-kube-01_3458991 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 16 5090 5074 19.7774 19.8203 0.00312568 0.00315352 2 16 10441 10425 20.3276 20.9023 0.00332591 0.00307105 3 16 15548 1553220.201 19.9492 0.00337573 0.00309134 4 16 20906 20890 20.3826 20.9297 0.00282902 0.00306437 5 16 26107 26091 20.3686 20.3164 0.00269844 0.00306698 6 16 31246 31230 20.3187 20.0742 0.00339814 0.00307462 7 16 36372 36356 20.2753 20.0234 0.00286653 0.0030813 8 16 41470 41454 20.2293 19.9141 0.00272051 0.00308839 9 16 46815 46799 20.3011 20.8789 0.00284063 0.00307738 Total time run: 10.0035 Total writes made: 51918 Write size: 4096 Object size:4096 Bandwidth (MB/sec): 20.2734 Stddev Bandwidth: 0.464082 Max bandwidth (MB/sec): 20.9297 Min bandwidth (MB/sec): 19.8203 Average IOPS: 5189 Stddev IOPS:118 Max IOPS: 5358 Min IOPS: 5074 Average Latency(s): 0.00308195 Stddev Latency(s): 0.00142825 Max latency(s): 0.0267947 Min latency(s): 0.00217364 rados bench -p scbench 10 rand hints = 1 sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s) 0 0 0 0 0 0 - 0 1 15 39691 39676154.95 154.984 0.00027022 0.000395993 2 16 83701 83685 163.416171.91 0.000318949 0.000375363 3 15129218129203 168.199 177.805 0.000300898 0.000364647 4 15173733173718 169.617 173.887 0.000311723 0.00036156 5 15216073216058 168.769 165.391 0.000407594 0.000363371 6 16260381260365 169.483 173.074 0.000323371 0.000361829 7 15306838306823 171.193 181.477 0.000284247 0.000358199 8 15353675353660 172.661 182.957 0.000338128 0.000355139 9 15399221399206 173.243 177.914 0.000422527 0.00035393 Total time run: 10.0003 Total reads made: 446353 Read size:4096 Object size: 4096 Bandwidth (MB/sec): 174.351 Average IOPS: 44633 Stddev IOPS: 2220 Max IOPS: 46837 Min IOPS: 39676 Average Latency(s): 0.000351679 Max latency(s): 0.00530195 Min latency(s): 0.000135292 On Thu, Feb 7, 2019 at 2:17 AM wrote: > Hi List > > We are in the process of moving to the next usecase for our ceph cluster > (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and > that works fine. > > We're currently on luminous / bluestore, if upgrading is deemed to > change what we're seeing then please let us know. > > We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each. Connected > through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and scheduler set to > deadline, nomerges = 1, rotational = 0. > > Each disk "should" give approximately 36K IOPS random write and the double > random read. > > Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of > well performing SSD block devices - potentially to host databases and > things like that. I ready through this nice document [0], I know the > HW are radically different from mine, but I still think I'm in the > very low end of what 6 x S4510 should be capable of doing. > > Since it is IOPS i care about I have lowered block size to 4096 -- 4M > blocksize nicely saturates the NIC's in both directions. > > > $ sudo rados bench -p scbench -b 4096 10 write --no-cleanup > hints = 1 > Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for > up to 10 seconds or 0 objects > Object prefix: benchmark_data_torsk2_11207 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg > lat(s) > 0 0 0 0 0 0 - > 0 > 1 16 5857 5841 22.8155 22.8164 0.00238437 > 0.00273434 > 2 15 11768 11753 22.9533 23.0938 0.0028559 > 0.00271944 > 3 16 17264 17248 22.4564 21.4648 0.0024 > 0.00278101 > 4 16 22857 22841 22.3037 21.84770.002716 > 0.00280023 > 5 16 28462 28446 22.2213 21.8945 0.00220186 > 0.002811 > 6 16 34216 34200 22.2635 22.4766 0.00234315 > 0.00280552 > 7 16 39616
Re: [ceph-users] Object Gateway Cloud Sync to S3
On Tue, Feb 5, 2019 at 3:35 PM Ryan wrote: > I've been trying to configure the cloud sync module to push changes to an > Amazon S3 bucket without success. I've configured the module according to > the docs with the trivial configuration settings. Is there an error log I > should be checking? Is the "radosgw-admin sync status > --rgw-zone=mycloudtierzone" the correct command to check status? > > Thanks, > Ryan > It turns out I can get it to sync as long as I leave "radosgw-admin --rgw-zone=aws-docindex data sync run --source-zone=default" running. I thought with mimic the sync was built into the ceph-radosgw service? I'm running version 13.2.4. I'm also seeing these errors on the console while running that command. 2019-02-05 17:40:10.679 7fb1ef06b680 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-02-05 17:40:10.694 7fb1ef06b680 0 RGW-SYNC:data:sync:shard[25]: ERROR: failed to read remote data log info: ret=-2 2019-02-05 17:40:10.695 7fb1ef06b680 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-02-05 17:40:10.711 7fb1ef06b680 0 RGW-SYNC:data:sync:shard[43]: ERROR: failed to read remote data log info: ret=-2 2019-02-05 17:40:10.712 7fb1ef06b680 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 2019-02-05 17:40:10.720 7fb1ef06b680 0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2 Additionally "radosgw-admin --rgw-zone=aws-docindex data sync error list --source-zone=default" is showing numerous error code 39 responses/ "message": "failed to sync bucket instance: (39) Directory not empty" "message": "failed to sync object(39) Directory not empty" When it successfully completes I see the following metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: af57fe9a-43a7-4998-9574-4016f5fa6661 (default) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is caught up with source When I stop the "data sync run" the status will just sit on data sync source: af57fe9a-43a7-4998-9574-4016f5fa6661 (default) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 1 shards behind shards: [75] oldest incremental change not applied: 2019-02-05 17:44:51.0.367478s Thanks, Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Object Gateway Cloud Sync to S3
I've been trying to configure the cloud sync module to push changes to an Amazon S3 bucket without success. I've configured the module according to the docs with the trivial configuration settings. Is there an error log I should be checking? Is the "radosgw-admin sync status --rgw-zone=mycloudtierzone" the correct command to check status? Thanks, Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] deleting a file
Hello. I am deleting files via S3CMD and for the most part have no issue. Every once in a while though, I get a positive response that a file has been deleted but when I check back the next day, the file is still there. I was wondering if there is a way to delete a file from within CEPH? I don't want to go through the RADOS Gateway but instead SSH into the system and delete the file. Thank you and happy holidays. Rhys Ryan Data Architect NOAA ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] "rgw relaxed s3 bucket names" and underscores
Nope, you are right. I think it was just boto catching this for me and I took that for granted. I think that is the behavior I would expect too, S3-compliant restrictions on create and allow legacy buckets to remain. Anyway, noticed you created a ticket [0] in the tracker for this, thanks! Best, Ryan [0] https://tracker.ceph.com/issues/36293 <https://tracker.ceph.com/issues/36293> > On Oct 2, 2018, at 6:08 PM, Robin H. Johnson wrote: > > On Tue, Oct 02, 2018 at 12:37:02PM -0400, Ryan Leimenstoll wrote: >> I was hoping to get some clarification on what "rgw relaxed s3 bucket >> names = false” is intended to filter. > Yes, it SHOULD have caught this case, but does not. > > Are you sure it rejects the uppercase? My test also showed that it did > NOT reject the uppercase as intended. > > This code did used to work, I contributed to the logic and discussion > for earlier versions. A related part I wanted was allowing access to > existing buckets w/ relaxed names, but disallowing creating of relaxed > names. > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Treasurer > E-Mail : robb...@gentoo.org > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] "rgw relaxed s3 bucket names" and underscores
Hi all, I was hoping to get some clarification on what "rgw relaxed s3 bucket names = false” is intended to filter. In our cluster (Luminous 12.2.8, serving S3) it seems that RGW, with that setting set to false, is still allowing buckets with underscores in the name to be created, although this is now prohibited by Amazon in US-East and seemingly all of their other regions [0]. Since clients typically follow Amazon’s direction, should RGW be rejecting underscores in these names to be in compliance? (I did notice it already rejects uppercase letters.) Thanks much! Ryan Leimenstoll rleim...@umiacs.umd.edu <mailto:rleim...@umiacs.umd.edu> [0] https://docs.aws.amazon.com/AmazonS3/latest/dev/BucketRestrictions.html ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs-data-scan safety on active filesystem
Hi Gregg, John, Thanks for the warning. It was definitely conveyed that they are dangerous. I thought the online part was implied to be a bad idea, but just wanted to verify. John, We were mostly operating off of what the mds logs reported. After bringing the mds back online and active, we mounted the volume using the kernel driver to one host and started a recursive ls through the root of the filesystem to see what was broken. There were seemingly two main paths of the tree that were affected initially, both reporting errors like the following in the mds log (I’ve swapped out the paths): Group 1: 2018-05-04 12:04:38.004029 7fc81f69a700 -1 log_channel(cluster) log [ERR] : dir 0x10011125556 object missing on disk; some files may be lost (/cephfs/redacted1/path/dir1) 2018-05-04 12:04:38.028861 7fc81f69a700 -1 log_channel(cluster) log [ERR] : dir 0x1001112bf14 object missing on disk; some files may be lost (/cephfs/redacted1/path/dir2) 2018-05-04 12:04:38.030504 7fc81f69a700 -1 log_channel(cluster) log [ERR] : dir 0x10011131118 object missing on disk; some files may be lost (/cephfs/redacted1/path/dir3) Group 2: 2021-05-04 13:24:29.495892 7fc81f69a700 -1 log_channel(cluster) log [ERR] : dir 0x1001102c5f6 object missing on disk; some files may be lost (/cephfs/redacted2/path/dir1) For some of the paths it complained about were empty via ls, although trying to rm [-r] them via the mount failed with an error suggesting files still exist in the directory. We removed the dir object in the metadata pool that it was still warning about (rados -p metapool rm 10011125556., for example). This cleaned up errors on this path. We then did the same for Group 2. After this, we initiated a recursive scrub with the mds daemon on the root of the filesystem to run over the weekend. In retrospect, we probably should have done the data scan steps mentioned in the disaster recovery guide before bringing the system online. The cluster is currently healthy (or, rather, reporting healthy) and has been for a while. My understanding here is that we would need something like the cephfs-data-scan steps to recreate metadata or at least identify (for cleanup) objects that may have been stranded in the data pool. Is there anyway, likely with another tool, to do this for an active cluster? If not, is this something that can be done with some amount of safety on an offline system? (not sure how long it would take, data pool is ~100T large w/ 242 million objects, and downtime is a big pain point for our users with deadlines). Thanks, Ryan > On May 8, 2018, at 5:05 AM, John Spray <jsp...@redhat.com> wrote: > > On Mon, May 7, 2018 at 8:50 PM, Ryan Leimenstoll > <rleim...@umiacs.umd.edu> wrote: >> Hi All, >> >> We recently experienced a failure with our 12.2.4 cluster running a CephFS >> instance that resulted in some data loss due to a seemingly problematic OSD >> blocking IO on its PGs. We restarted the (single active) mds daemon during >> this, which caused damage due to the journal not having the chance to flush >> back. We reset the journal, session table, and fs to bring the filesystem >> online. We then removed some directories/inodes that were causing the >> cluster to report damaged metadata (and were otherwise visibly broken by >> navigating the filesystem). > > This may be over-optimistic of me, but is there any chance you kept a > detailed record of exactly what damage was reported, and what you did > to the filesystem so far? It's hard to give any intelligent advice on > repairing it, when we don't know exactly what was broken, and a bunch > of unknown repair-ish things have already manipulated the metadata > behind the scenes. > > John > >> With that, there are now some paths that seem to have been orphaned (which >> we expected). We did not run the ‘cephfs-data-scan’ tool [0] in the name of >> getting the system back online ASAP. Now that the filesystem is otherwise >> stable, can we initiate a scan_links operation with the mds active safely? >> >> [0] >> http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/#recovery-from-missing-metadata-objects >> >> Thanks much, >> Ryan Leimenstoll >> rleim...@umiacs.umd.edu >> University of Maryland Institute for Advanced Computer Studies >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] cephfs-data-scan safety on active filesystem
Hi All, We recently experienced a failure with our 12.2.4 cluster running a CephFS instance that resulted in some data loss due to a seemingly problematic OSD blocking IO on its PGs. We restarted the (single active) mds daemon during this, which caused damage due to the journal not having the chance to flush back. We reset the journal, session table, and fs to bring the filesystem online. We then removed some directories/inodes that were causing the cluster to report damaged metadata (and were otherwise visibly broken by navigating the filesystem). With that, there are now some paths that seem to have been orphaned (which we expected). We did not run the ‘cephfs-data-scan’ tool [0] in the name of getting the system back online ASAP. Now that the filesystem is otherwise stable, can we initiate a scan_links operation with the mds active safely? [0] http://docs.ceph.com/docs/luminous/cephfs/disaster-recovery/#recovery-from-missing-metadata-objects Thanks much, Ryan Leimenstoll rleim...@umiacs.umd <mailto:rleim...@umiacs.umd>.edu University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] change radosgw object owner
Hi Robin, Thanks for the pointer! My one concern though is that it didn’t seem to update the original object owner’s quota however, which is a bit of a sticking point. Is this expected (and is there a workaround)? I will admit to being a bit naive to how radosgw’s quota system works under the hood. Thanks, Ryan > On Mar 6, 2018, at 2:54 PM, Robin H. Johnson <robb...@gentoo.org> wrote: > > On Tue, Mar 06, 2018 at 02:40:11PM -0500, Ryan Leimenstoll wrote: >> Hi all, >> >> We are trying to move a bucket in radosgw from one user to another in an >> effort both change ownership and attribute the storage usage of the data to >> the receiving user’s quota. >> >> I have unlinked the bucket and linked it to the new user using: >> >> radosgw-admin bucket unlink —bucket=$MYBUCKET —uid=$USER >> radosgw-admin bucket link —bucket=$MYBUCKET —bucket-id=$BUCKET_ID >> —uid=$NEWUSER >> >> However, perhaps as expected, the owner of all the objects in the >> bucket remain as $USER. I don’t believe changing the owner is a >> supported operation from the S3 protocol, however it would be very >> helpful to have the ability to do this on the radosgw backend. This is >> especially useful for large buckets/datasets where copying the objects >> out and into radosgw could be time consuming. > At the raw radosgw-admin level, you should be able to do it with > bi-list/bi-get/bi-put. The downside here is that I don't think the BI ops are > exposed in the HTTP Admin API, so it's going to be really expensive to chown > lots of objects. > > Using a quick example: > # radosgw-admin \ > --uid UID-CENSORED \ > --bucket BUCKET-CENSORED \ > bi get \ > --object=OBJECTNAME-CENSORED > { >"type": "plain", >"idx": "OBJECTNAME-CENSORED", >"entry": { >"name": "OBJECTNAME-CENSORED", >"instance": "", >"ver": { >"pool": 5, >"epoch": 266028 >}, >"locator": "", >"exists": "true", >"meta": { >"category": 1, >"size": 1066, >"mtime": "2016-11-17 17:01:29.668746Z", >"etag": "e7a75c39df3d123c716d5351059ad2d9", >"owner": "UID-CENSORED", >"owner_display_name": "UID-CENSORED", >"content_type": "image/png", >"accounted_size": 1066, >"user_data": "" >}, >"tag": "default.293024600.1188196", >"flags": 0, >"pending_map": [], >"versioned_epoch": 0 >} > } > > -- > Robin Hugh Johnson > Gentoo Linux: Dev, Infra Lead, Foundation Treasurer > E-Mail : robb...@gentoo.org > GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 > GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] change radosgw object owner
Hi all, We are trying to move a bucket in radosgw from one user to another in an effort both change ownership and attribute the storage usage of the data to the receiving user’s quota. I have unlinked the bucket and linked it to the new user using: radosgw-admin bucket unlink —bucket=$MYBUCKET —uid=$USER radosgw-admin bucket link —bucket=$MYBUCKET —bucket-id=$BUCKET_ID —uid=$NEWUSER However, perhaps as expected, the owner of all the objects in the bucket remain as $USER. I don’t believe changing the owner is a supported operation from the S3 protocol, however it would be very helpful to have the ability to do this on the radosgw backend. This is especially useful for large buckets/datasets where copying the objects out and into radosgw could be time consuming. Is this something that is currently possible within radosgw? We are running Ceph 12.2.2. Thanks, Ryan Leimenstoll rleim...@umiacs.umd.edu University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] rgw resharding operation seemingly won't end
Thanks for the response Yehuda. Staus: [root@objproxy02 UMobjstore]# radosgw-admin reshard status —bucket=$bucket_name [ { "reshard_status": 1, "new_bucket_instance_id": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.47370206.1", "num_shards": 4 } ] I cleared the flag using the bucket check —fix command and will keep an eye on that tracker issue. Do you have any insight into why the RGWs ultimately paused/reloaded and failed to come back? I am happy to provide more information that could assist. At the moment we are somewhat nervous to reenable dynamic sharding as it seems to have contributed to this problem. Thanks, Ryan > On Oct 9, 2017, at 5:26 PM, Yehuda Sadeh-Weinraub <yeh...@redhat.com> wrote: > > On Mon, Oct 9, 2017 at 1:59 PM, Ryan Leimenstoll > <rleim...@umiacs.umd.edu> wrote: >> Hi all, >> >> We recently upgraded to Ceph 12.2.1 (Luminous) from 12.2.0 however are now >> seeing issues running radosgw. Specifically, it appears an automatically >> triggered resharding operation won’t end, despite the jobs being cancelled >> (radosgw-admin reshard cancel). I have also disabled dynamic sharding for >> the time being in the ceph.conf. >> >> >> [root@objproxy02 ~]# radosgw-admin reshard list >> [] >> >> The two buckets were also reported in the `radosgw-admin reshard list` >> before our RGW frontends paused recently (and only came back after a service >> restart). These two buckets cannot currently be written to at this point >> either. >> >> 2017-10-06 22:41:19.547260 7f90506e9700 0 block_while_resharding ERROR: >> bucket is still resharding, please retry >> 2017-10-06 22:41:19.547411 7f90506e9700 0 WARNING: set_req_state_err >> err_no=2300 resorting to 500 >> 2017-10-06 22:41:19.547729 7f90506e9700 0 ERROR: >> RESTFUL_IO(s)->complete_header() returned err=Input/output error >> 2017-10-06 22:41:19.548570 7f90506e9700 1 == req done req=0x7f90506e3180 >> op status=-2300 http_status=500 == >> 2017-10-06 22:41:19.548790 7f90506e9700 1 civetweb: 0x55766d111000: >> $MY_IP_HERE$ - - [06/Oct/2017:22:33:47 -0400] "PUT / >> $REDACTED_BUCKET_NAME$/$REDACTED_KEY_NAME$ HTTP/1.1" 1 0 - Boto3/1.4.7 >> Python/2.7.12 Linux/4.9.43-17.3 >> 9.amzn1.x86_64 exec-env/AWS_Lambda_python2.7 Botocore/1.7.2 Resource >> [.. slightly later in the logs..] >> 2017-10-06 22:41:53.516272 7f90406c9700 1 rgw realm reloader: Frontends >> paused >> 2017-10-06 22:41:53.528703 7f907893f700 0 ERROR: failed to clone shard, >> completion_mgr.get_next() returned ret=-125 >> 2017-10-06 22:44:32.049564 7f9074136700 0 ERROR: keystone revocation >> processing returned error r=-22 >> 2017-10-06 22:59:32.059222 7f9074136700 0 ERROR: keystone revocation >> processing returned error r=-22 >> >> Can anyone advise on the best path forward to stop the current sharding >> states and avoid this moving forward? >> > > What does 'radosgw-admin reshard status --bucket=' return? > I think just manually resharding the buckets should clear this flag, > is that not an option? > manual reshard: radosgw-admin bucket reshard --bucket= > --num-shards= > > also, the 'radosgw-admin bucket check --fix' might clear that flag. > > For some reason it seems that the reshard cancellation code is not > clearing that flag on the bucket index header (pretty sure it used to > do it at one point). I'll open a tracker ticket. > > Thanks, > Yehuda > >> >> Some other details: >> - 3 rgw instances >> - Ceph Luminous 12.2.1 >> - 584 active OSDs, rgw bucket index is on Intel NVMe OSDs >> >> >> Thanks, >> Ryan Leimenstoll >> rleim...@umiacs.umd.edu >> University of Maryland Institute for Advanced Computer Studies >> >> >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] rgw resharding operation seemingly won't end
Hi all, We recently upgraded to Ceph 12.2.1 (Luminous) from 12.2.0 however are now seeing issues running radosgw. Specifically, it appears an automatically triggered resharding operation won’t end, despite the jobs being cancelled (radosgw-admin reshard cancel). I have also disabled dynamic sharding for the time being in the ceph.conf. [root@objproxy02 ~]# radosgw-admin reshard list [] The two buckets were also reported in the `radosgw-admin reshard list` before our RGW frontends paused recently (and only came back after a service restart). These two buckets cannot currently be written to at this point either. 2017-10-06 22:41:19.547260 7f90506e9700 0 block_while_resharding ERROR: bucket is still resharding, please retry 2017-10-06 22:41:19.547411 7f90506e9700 0 WARNING: set_req_state_err err_no=2300 resorting to 500 2017-10-06 22:41:19.547729 7f90506e9700 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Input/output error 2017-10-06 22:41:19.548570 7f90506e9700 1 == req done req=0x7f90506e3180 op status=-2300 http_status=500 == 2017-10-06 22:41:19.548790 7f90506e9700 1 civetweb: 0x55766d111000: $MY_IP_HERE$ - - [06/Oct/2017:22:33:47 -0400] "PUT / $REDACTED_BUCKET_NAME$/$REDACTED_KEY_NAME$ HTTP/1.1" 1 0 - Boto3/1.4.7 Python/2.7.12 Linux/4.9.43-17.3 9.amzn1.x86_64 exec-env/AWS_Lambda_python2.7 Botocore/1.7.2 Resource [.. slightly later in the logs..] 2017-10-06 22:41:53.516272 7f90406c9700 1 rgw realm reloader: Frontends paused 2017-10-06 22:41:53.528703 7f907893f700 0 ERROR: failed to clone shard, completion_mgr.get_next() returned ret=-125 2017-10-06 22:44:32.049564 7f9074136700 0 ERROR: keystone revocation processing returned error r=-22 2017-10-06 22:59:32.059222 7f9074136700 0 ERROR: keystone revocation processing returned error r=-22 Can anyone advise on the best path forward to stop the current sharding states and avoid this moving forward? Some other details: - 3 rgw instances - Ceph Luminous 12.2.1 - 584 active OSDs, rgw bucket index is on Intel NVMe OSDs Thanks, Ryan Leimenstoll rleim...@umiacs.umd.edu University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Luminous RGW dynamic sharding
Hi all, I noticed Luminous now has dynamic sharding for RGW bucket indices as a production option. Does anyone know of any potential caveats or issues we should be aware of before enabling this? Beyond the Luminous v12.2.0 release notes and a few mailing list entries from during the release candidate phase, I haven’t seen much mention of it. For some time now we have been experiencing blocked requests when deep scrubbing PGs in our bucket index, so this could be quite useful for us. Thanks, Ryan Leimenstoll rleim...@umiacs.umd.edu University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] RGW Multisite Sync Memory Usage
Hi all, We are currently trying to migrate our RGW Object Storage service from one zone to another (in the same zonegroup) in part to make use of erasure coded data pools. That being said, the rgw daemon is reliably getting OOM killed on the rgw origin host serving the original zone (and thus the current production data) as a result of high rgw memory usage. We are willing to consider more memory for the rgw daemon’s hosts to solve this problem, but was wondering what would be expected memory wise (at least as a rule of thumb). I noticed there were a few memory related rgw sync fixes in 10.2.9, but so far upgrading hasn’t seemed to prevent crashing. Some details about our cluster: Ceph Version: 10.2.9 OS: RHEL 7.3 584 OSDs Serving RBD, CephFS, and RGW RGW Origin Hosts: Virtualized via KVM/QEMU, RHEL 7.3 Memory: 32GB CPU: 12 virtual cores (Hypervisor processors: Intel E5-2630) First zone data and index pools: pool name KB objects clones degraded unfound rdrd KB wrwr KB .rgw.buckets112190858231 3423974600 0 2713542251 265848150719475841837 153970795085 .rgw.buckets.index0 497200 0 3721485483 5926323574 360300980 Thanks, Ryan Leimenstoll University of Maryland Institute for Advanced Computer Studies ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-dbg package for Xenial (ubuntu-16.04.x) broken
Inspecting the ceph-dbg packages under http://download.ceph.com/debian-jewel/pool/main/c/ceph/ it looks like this is an ongoing issue and not specific to just 10.2.2. Specifically there are only 2 ceph-dbg package versions: ceph-dbg_10.0.2-1trusty_amd64.deb ceph-dbg_10.0.2-1~bpo80+1_amd64.deb There aren't even 10.0.2 'ceph' packages there, only 10.1.x and 10.2.x versions of the actual binaries. So it seems that there are literally no debug packages available for any of the Debian-based Jewel releases available. This seems like a systemic issue. I've created an issue on the tracker: http://tracker.ceph.com/issues/16912 On Wed, Aug 3, 2016 at 1:30 PM, Ken Dreyer <kdre...@redhat.com> wrote: > For some reason, during the v10.2.2 release, > ceph-dbg_10.0.2-1xenial_amd64.deb did not get transferred to > http://download.ceph.com/debian-jewel/pool/main/c/ceph/ > > - Ken > > On Wed, Aug 3, 2016 at 12:27 PM, J. Ryan Earl <o...@jryanearl.us> wrote: > > Hello, > > > > New to the list. I'm working on performance tuning and testing a new > Ceph > > cluster built on Ubuntu 16.04 LTS and newest "Jewel" Ceph release. I'm > in > > the process of collecting stack frames as part of a profiling inspection > > using FlameGraph (https://github.com/brendangregg/FlameGraph) to inspect > > where the CPU is spending time but need to load the 'dbg' packages to get > > symbol information. However, it appears the 'ceph-dbg' package has > broken > > dependencies: > > > > ceph1.oak:/etc/apt# apt-get install ceph-dbgReading package lists... > > DoneBuilding dependency tree Reading state information... DoneSome > > packages could not be installed. This may mean that you haverequested an > > impossible situation or if you are using the unstabledistribution that > some > > required packages have not yet been createdor been moved out of > Incoming.The > > following information may help to resolve the situation: > > The following packages have unmet dependencies: ceph-dbg : Depends: ceph > (= > > 10.2.2-0ubuntu0.16.04.2) but 10.2.2-1xenial is to be installedE: Unable > to > > correct problems, you have held broken packages. > > Any ideas on how to quickly work around this issue so I can continue > > performance profiling? > > Thank you,-JR > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph-dbg package for Xenial (ubuntu-16.04.x) broken
Hello, New to the list. I'm working on performance tuning and testing a new Ceph cluster built on Ubuntu 16.04 LTS and newest "Jewel" Ceph release. I'm in the process of collecting stack frames as part of a profiling inspection using FlameGraph (https://github.com/brendangregg/FlameGraph) to inspect where the CPU is spending time but need to load the 'dbg' packages to get symbol information. However, it appears the 'ceph-dbg' package has broken dependencies: ceph1.oak:/etc/apt# apt-get install ceph-dbgReading package lists... DoneBuilding dependency tree Reading state information... DoneSome packages could not be installed. This may mean that you haverequested an impossible situation or if you are using the unstabledistribution that some required packages have not yet been createdor been moved out of Incoming.The following information may help to resolve the situation: The following packages have unmet dependencies: ceph-dbg : Depends: ceph (= 10.2.2-0ubuntu0.16.04.2) but 10.2.2-1xenial is to be installedE: Unable to correct problems, you have held broken packages. Any ideas on how to quickly work around this issue so I can continue performance profiling? Thank you,-JR ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery
> On Nov 30, 2015, at 6:52 PM, Laurent GUERBY <laur...@guerby.net> wrote: > > Hi, > > We lost a disk today in our ceph cluster so we added a new machine with > 4 disks to replace the capacity and we activated straw1 tunable too > (we also tried straw2 but we quickly backed up this change). > > During recovery OSD started crashing on all of our machines > the issue being OSD RAM usage that goes very high, eg: > > 24078 root 20 0 27.784g 0.026t 10888 S 5.9 84.9 > 16:23.63 /usr/bin/ceph-osd --cluster=ceph -i 41 -f > /dev/sda1 2.7T 2.2T 514G 82% /var/lib/ceph/osd/ceph-41 > > That's about 8GB resident RAM per TB of disk, way above > what we provisionned ~ 2-4 GB RAM/TB. We had something vaguely similar (not nearly that dramatic though!) happen to us. During a recovery (actually, I think this was rebalancing after upgrading from an earlier version of ceph), our OSDs took so much memory they would get killed by oom_killer and we couldn't keep the cluster up long enough to get back to healthy. A solution for us was to enable zswap; previously we had been running with no swap at all. If you are running a kernel newer than 3.11 (you might want more recent than that as I believe there were major fixes after 3.17), then enabling zswap allows the kernel to compress pages in memory before needing to touch disk. The default max pool size for this is 20% of memory. There is extra CPU time to compress/decompress, but it's much faster than going to disk, and the OSD data appears to be quite compressible. For us, nothing actually made it to the disk, but a swapfile must to be enabled for zswap to do its work. https://www.kernel.org/doc/Documentation/vm/zswap.txt http://askubuntu.com/questions/471912/zram-vs-zswap-vs-zcache-ultimate-guide-when-to-use-which-one Add "zswap.enabled=1" to your kernel bool parameters and reboot. If you have no swap file/partition/disk/whatever, then you need one for zswap to actually do anything. Here is an example, but use whatever sizes, locations, process you prefer: dd if=/dev/zero of=/var/swap bs=1M count=8192 chmod 600 /var/swap mkswap /var/swap swapon /var/swap Consider adding it to /etc/fstab: /var/swap swapswapdefaults 0 0 This got us through the rebalancing. The OSDs eventually returned to normal, but we've just left zswap enabled with no apparent problems. I don't know that it will be enough for your situation, but it might help. Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
> On Oct 22, 2015, at 3:57 PM, John-Paul Robinson <j...@uab.edu> wrote: > > Hi, > > Has anyone else experienced a problem with RBD-to-NFS gateways blocking > nfsd server requests when their ceph cluster has a placement group that > is not servicing I/O for some reason, eg. too few replicas or an osd > with slow request warnings? We have experienced exactly that kind of problem except that it sometimes happens even when ceph health reports "HEALTH_OK". This has been incredibly vexing for us. If the cluster is unhealthy for some reason, then I'd expect your/our symptoms as writes can't be completed. I'm guessing that you have file systems with barriers turned on. Whichever file system that has a barrier write stuck on the problem pg, will cause any other process trying to write anywhere in that FS also to block. This likely means a cascade of nfsd processes will block as they each try to service various client writes to that FS. Even though, theoretically, the rest of the "disk" (rbd) and other file systems might still be writable, the NFS processes will still be in uninterruptible sleep just because of that stuck write request (or such is my understanding). Disabling barriers on the gateway machine might postpone the problem (never tried it and don't want to) until you hit your vm.dirty_bytes or vm.dirty_ratio thresholds, but it is dangerous as you could much more easily lose data. You'd be better off solving the underlying issues when they happen (too few replicas available or overloaded osds). For us, even when the cluster reports itself as healthy, we sometimes have this problem. All nfsd processes block. sync blocks. echo 3 > /proc/sys/vm/drop_caches blocks. There is a persistent 4-8MB "Dirty" in /proc/meminfo. None of the osds log slow requests. Everything seems fine on the osds and mons. Neither CPU nor I/O load is extraordinary on the ceph nodes, but at least one file system on the gateway machine will stop accepting writes. If we just wait, the situation resolves itself in 10 to 30 minutes. A forced reboot of the NFS gateway "solves" the performance problem, but is annoying and dangerous (we unmount all of the file systems that are still unmountable, but the stuck ones lead us to a sysrq-b). This is on Scientific Linux 6.7 systems with elrepo 4.1.10 Kernels running Ceph Firefly (0.8.10) and XFS file systems exported over NFS and samba. Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] hanging nfsd requests on an RBD to NFS gateway
> On Oct 22, 2015, at 10:19 PM, John-Paul Robinson <j...@uab.edu> wrote: > > A few clarifications on our experience: > > * We have 200+ rbd images mounted on our RBD-NFS gateway. (There's > nothing easier for a user to understand than "your disk is full".) Same here, and agreed. It sounds like our situations are similar except for my blocking on an apparently healthy cluster issue. > * I'd expect more contention potential with a single shared RBD back > end, but with many distinct and presumably isolated backend RBD images, > I've always been surprised that *all* the nfsd task hang. This leads me > to think it's an nfsd issue rather than and rbd issue. (I realize this > is an rbd list, looking for shared experience. ;) ) It's definitely possible. I've experienced exactly the behavior you're seeing. My guess is that when an nfsd thread blocks and goes dark, affected clients (even if it's only one) will retransmit their requests thinking there's a network issue causing more nfsds to go dark until all the server threads are stuck (that could be hogwash, but it fits the behavior). Or perhaps there are enough individual clients writing to the affected NFS volume that they consume all the available nfsd threads (I'm not sure about your client to FS and nfsd thread ratio, but that is plausible in my situation). I think some testing with xfs_freeze and non-critical nfs server/clients is called for. I don't think this part is related to ceph except that it happens to be providing the underlying storage. I'm fairly certain that my problems with an apparently healthy cluster blocking writes is a ceph problem, but I haven't figured out what the source of that is. > * I haven't seen any difference between reads and writes. Any access to > any backing RBD store from the NFS client hangs. All NFS clients are hung, but in my situation, it's usually only 1-3 local file systems that stop accepting writes. NFS is completely unresponsive, but local and remote-samba operations on the unaffected file systems are totally happy. I don't have a solution to NFS issue, but I've seen it all too often. I wonder whether setting a huge number of threads and or playing with client retransmit times would help, but I suspect this problem is just intrinsic to Linux NFS servers. Ryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] getting started
Hi, I'm brand new to Ceph, attempting to follow the Getting Startedhttp://ceph.com/docs/master/start/guide with 2 VMs. I completed the Preflight without issue. I completed Storage Cluster Quick Start http://ceph.com/docs/master/start/quick-ceph-deploy/, but have some questions: The *Single Node Quick Start* grey box -- does 'single node' mean if you're running the whole thing on a single machine, if you have only one server node like the diagram at the top of the page, or if you're only running one OSD process? I'm not sure if I need to make the `osd crush chooseleaf type` change. Are the LIST, ZAP, and ADD OSDS ON STANDALONE DISKS sections an alternative to the MULTIPLE OSDS ON THE OS DISK (DEMO ONLY) section? I thought I set up my OSDs already on /tmp/osd{0,1}. Moving on to the Block Device Quick Starthttp://ceph.com/docs/master/start/quick-rbd/ -- it says To use this guide, you must have executed the procedures in the Object Store Quick Start guide first -- but the link to the Object Store Quick Start actually points to the Storage Cluster Quick Starthttp://ceph.com/docs/master/start/quick-ceph-deploy/ -- which is it? Most importantly, it says Ensure your Ceph Storage Cluster is in an active + clean state before working with the Ceph Block Device --- how can tell if my cluster is active+clean?? The only ceph* command on the admin node is ceph-deploy, and running `ceph` on the server node: ceph@jr-ceph2:~$ ceph 2013-09-16 16:53:10.880267 7feb96c1b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2013-09-16 16:53:10.880271 7feb96c1b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound Thanks in advance for any help, and apologies if I missed anything obvious. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] getting started
thanks, running as root does give me status, but not clean. r...@jr-ceph2.vm:~# ceph status cluster 9059dfad-924a-425c-a20b-17dc1d53111e health HEALTH_WARN 91 pgs degraded; 192 pgs stuck unclean; recovery 21/42 degraded (50.000%) monmap e1: 1 mons at {jr-ceph2=10.88.26.55:6789/0}, election epoch 2, quorum 0 jr-ceph2 osdmap e10: 2 osds: 2 up, 2 in pgmap v2715: 192 pgs: 101 active+remapped, 91 active+degraded; 9518 bytes data, 9148 MB used, 362 GB / 391 GB avail; 21/42 degraded (50.000%) mdsmap e4: 1/1/1 up {0=jr-ceph2.XXX=up:active} don't see anything telling in the ceph logs; Should I wait for the new quickstart? On Mon, Sep 16, 2013 at 2:27 PM, John Wilkins john.wilk...@inktank.comwrote: We will have a new update to the quick start this week. On Mon, Sep 16, 2013 at 12:18 PM, Alfredo Deza alfredo.d...@inktank.com wrote: On Mon, Sep 16, 2013 at 12:58 PM, Justin Ryan justin.r...@kixeye.com wrote: Hi, I'm brand new to Ceph, attempting to follow the Getting Started guide with 2 VMs. I completed the Preflight without issue. I completed Storage Cluster Quick Start, but have some questions: The Single Node Quick Start grey box -- does 'single node' mean if you're running the whole thing on a single machine, if you have only one server node like the diagram at the top of the page, or if you're only running one OSD process? I'm not sure if I need to make the `osd crush chooseleaf type` change. Are the LIST, ZAP, and ADD OSDS ON STANDALONE DISKS sections an alternative to the MULTIPLE OSDS ON THE OS DISK (DEMO ONLY) section? I thought I set up my OSDs already on /tmp/osd{0,1}. Moving on to the Block Device Quick Start -- it says To use this guide, you must have executed the procedures in the Object Store Quick Start guide first -- but the link to the Object Store Quick Start actually points to the Storage Cluster Quick Start -- which is it? Most importantly, it says Ensure your Ceph Storage Cluster is in an active + clean state before working with the Ceph Block Device --- how can tell if my cluster is active+clean?? The only ceph* command on the admin node is ceph-deploy, and running `ceph` on the server node: ceph@jr-ceph2:~$ ceph 2013-09-16 16:53:10.880267 7feb96c1b700 -1 monclient(hunting): ERROR: missing keyring, cannot use cephx for authentication 2013-09-16 16:53:10.880271 7feb96c1b700 0 librados: client.admin initialization error (2) No such file or directory Error connecting to cluster: ObjectNotFound There is a ticket open for this, but you basically need super-user permissions here to run (any?) ceph commands. Thanks in advance for any help, and apologies if I missed anything obvious. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- John Wilkins Senior Technical Writer Intank john.wilk...@inktank.com (415) 425-9599 http://inktank.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Format 2 Image support in the RBD driver
I've not been following the list for long, so forgive me if this has been covered, but is there a plan for image 2 support in the kernel RBD driver? I assume with Linux 3.9 in the RC phase, its not likely to appear there? Thanks! NOTICE: Protect the information in this message in accordance with the company's security policies. If you received this message in error, immediately notify the sender and destroy all copies. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com