Re: [ceph-users] The OSD can be “down” but still “in”.
Thanks for reply. If the OSD represents the primary one for a PG, then all IO will be stopped..which may lead to application failure.. On Tue, Jan 22, 2019 at 5:32 PM Matthew Vernon wrote: > > Hi, > > On 22/01/2019 10:02, M Ranga Swami Reddy wrote: > > Hello - If an OSD shown as down and but its still "in" state..what > > will happen with write/read operations on this down OSD? > > It depends ;-) > > In a typical 3-way replicated setup with min_size 2, writes to placement > groups on that OSD will still go ahead - when 2 replicas are written OK, > then the write will complete. Once the OSD comes back up, these writes > will then be replicated to that OSD. If it stays down for long enough to > be marked out, then pgs on that OSD will be replicated elsewhere. > > If you had min_size 3 as well, then writes would block until the OSD was > back up (or marked out and the pgs replicated to another OSD). > > Regards, > > Matthew > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS performance issue
On Wed, Jan 23, 2019 at 10:02 AM Albert Yue wrote: > > But with enough memory on MDS, I can just cache all metadata into memory. > Right now there are around 500GB metadata in the ssd. So this is not enough? > mds needs to tracking lots of extra information for each object. For 500G metadata, mds may need 1T or more memory. > On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng wrote: >> >> On Tue, Jan 22, 2019 at 10:49 AM Albert Yue >> wrote: >> > >> > Hi Yan Zheng, >> > >> > In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB >> > memory machine? >> > >> >> The problem is from client side, especially clients with large memory. >> I don't think enlarge mds cache size is good idea. you can >> periodically check periodically >> each kernel clients' /sys/kernel/debug/ceph/xxx/caps. run 'echo 2 >> >/proc/sys/vm/drop_caches' if a client used too many caps (for example >> 10k), >> >> > On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng wrote: >> >> >> >> On Mon, Jan 21, 2019 at 11:16 AM Albert Yue >> >> wrote: >> >> > >> >> > Dear Ceph Users, >> >> > >> >> > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB >> >> > harddisk. Ceph version is luminous 12.2.5. We created one data pool >> >> > with these hard disks and created another meta data pool with 3 ssd. We >> >> > created a MDS with 65GB cache size. >> >> > >> >> > But our users are keep complaining that cephFS is too slow. What we >> >> > observed is cephFS is fast when we switch to a new MDS instance, once >> >> > the cache fills up (which will happen very fast), client became very >> >> > slow when performing some basic filesystem operation such as `ls`. >> >> > >> >> >> >> It seems that clients hold lots of unused inodes their icache, which >> >> prevent mds from trimming corresponding objects from its cache. mimic >> >> has command "ceph daemon mds.x cache drop" to ask client to drop its >> >> cache. I'm also working on a patch that make kclient client release >> >> unused inodes. >> >> >> >> For luminous, there is not much we can do, except periodically run >> >> "echo 2 > /proc/sys/vm/drop_caches" on each client. >> >> >> >> >> >> > What we know is our user are putting lots of small files into the >> >> > cephFS, now there are around 560 Million files. We didn't see high CPU >> >> > wait on MDS instance and meta data pool just used around 200MB space. >> >> > >> >> > My question is, what is the relationship between the metadata pool and >> >> > MDS? Is this performance issue caused by the hardware behind meta data >> >> > pool? Why the meta data pool only used 200MB space, and we saw 3k iops >> >> > on each of these three ssds, why can't MDS cache all these 200MB into >> >> > memory? >> >> > >> >> > Thanks very much! >> >> > >> >> > >> >> > Best Regards, >> >> > >> >> > Albert >> >> > >> >> > ___ >> >> > ceph-users mailing list >> >> > ceph-users@lists.ceph.com >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS performance issue
But with enough memory on MDS, I can just cache all metadata into memory. Right now there are around 500GB metadata in the ssd. So this is not enough? On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng wrote: > On Tue, Jan 22, 2019 at 10:49 AM Albert Yue > wrote: > > > > Hi Yan Zheng, > > > > In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB > memory machine? > > > > The problem is from client side, especially clients with large memory. > I don't think enlarge mds cache size is good idea. you can > periodically check periodically > each kernel clients' /sys/kernel/debug/ceph/xxx/caps. run 'echo 2 > >/proc/sys/vm/drop_caches' if a client used too many caps (for example > 10k), > > > On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng wrote: > >> > >> On Mon, Jan 21, 2019 at 11:16 AM Albert Yue > wrote: > >> > > >> > Dear Ceph Users, > >> > > >> > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB > harddisk. Ceph version is luminous 12.2.5. We created one data pool with > these hard disks and created another meta data pool with 3 ssd. We created > a MDS with 65GB cache size. > >> > > >> > But our users are keep complaining that cephFS is too slow. What we > observed is cephFS is fast when we switch to a new MDS instance, once the > cache fills up (which will happen very fast), client became very slow when > performing some basic filesystem operation such as `ls`. > >> > > >> > >> It seems that clients hold lots of unused inodes their icache, which > >> prevent mds from trimming corresponding objects from its cache. mimic > >> has command "ceph daemon mds.x cache drop" to ask client to drop its > >> cache. I'm also working on a patch that make kclient client release > >> unused inodes. > >> > >> For luminous, there is not much we can do, except periodically run > >> "echo 2 > /proc/sys/vm/drop_caches" on each client. > >> > >> > >> > What we know is our user are putting lots of small files into the > cephFS, now there are around 560 Million files. We didn't see high CPU wait > on MDS instance and meta data pool just used around 200MB space. > >> > > >> > My question is, what is the relationship between the metadata pool > and MDS? Is this performance issue caused by the hardware behind meta data > pool? Why the meta data pool only used 200MB space, and we saw 3k iops on > each of these three ssds, why can't MDS cache all these 200MB into memory? > >> > > >> > Thanks very much! > >> > > >> > > >> > Best Regards, > >> > > >> > Albert > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken CephFS stray entries?
On Tue, Jan 22, 2019 at 10:42 PM Dan van der Ster wrote: > > On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote: > > > > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster > > wrote: > > > > > > Hi Zheng, > > > > > > We also just saw this today and got a bit worried. > > > Should we change to: > > > > > > > What is the error message (on stray dir or other dir)? does the > > cluster ever enable multi-acitive mds? > > > > It was during an upgrade from v12.2.8 to v12.2.10. 5 active MDS's > during the upgrade. > > 2019-01-22 10:08:22.629545 mds.p01001532184554 mds.2 > 128.142.39.144:6800/268398 36 : cluster [WRN] replayed op > client.54045065:2282648,2282514 used ino 0x3001c85b193 but session > next is 0x3001c28f018 > 2019-01-22 10:08:22.629617 mds.p01001532184554 mds.2 > 128.142.39.144:6800/268398 37 : cluster [WRN] replayed op > client.54045065:2282649,2282514 used ino 0x3001c85b194 but session > next is 0x3001c28f018 > 2019-01-22 10:08:22.629652 mds.p01001532184554 mds.2 > 128.142.39.144:6800/268398 38 : cluster [WRN] replayed op > client.54045065:2282650,2282514 used ino 0x3001c85b195 but session > next is 0x3001c28f018 > 2019-01-22 10:08:37.373704 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2748 : cluster [INF] daemon mds.p01001532184554 > is now active in filesystem cephfs as rank 2 > 2019-01-22 10:08:37.805675 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2749 : cluster [INF] Health check cleared: > FS_DEGRADED (was: 1 filesystem is degraded) > 2019-01-22 10:08:39.784260 mds.p01001532184554 mds.2 > 128.142.39.144:6800/268398 547 : cluster [ERR] bad/negative dir > size on 0x61b f(v27 m2019-01-22 10:07:38.509466 0=-1+1) > 2019-01-22 10:08:39.784271 mds.p01001532184554 mds.2 > 128.142.39.144:6800/268398 548 : cluster [ERR] unmatched fragstat > on 0x61b, inode has f(v28 m2019-01-22 10:07:38.509466 0=-1+1), > dirfrags have f(v0 m2019-01-22 10:07:38.509466 1=0+1) Incorrect fragstat on stray dir is not big deal. mds uses it only for printing debug/warning message. But incorrect fragstat on other dir may need manual intervention. So I'd like not to change it to 'warning' message. Regards Yan, Zheng > 2019-01-22 10:10:02.605036 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2803 : cluster [INF] Health check cleared: > MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons > available) > 2019-01-22 10:10:02.605089 mon.cephflax-mon-9b406e0261 mon.0 > 137.138.121.135:6789/0 2804 : cluster [INF] Cluster is now healthy > > > > > > > > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc > > > index e8c1bc8bc1..e2539390fb 100644 > > > --- a/src/mds/CInode.cc > > > +++ b/src/mds/CInode.cc > > > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type) > > > > > > if (pf->fragstat.nfiles < 0 || > > > pf->fragstat.nsubdirs < 0) { > > > - clog->error() << "bad/negative dir size on " > > > + clog->warn() << "bad/negative dir size on " > > > << dir->dirfrag() << " " << pf->fragstat; > > > assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter); > > > > > > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type) > > > if (state_test(CInode::STATE_REPAIRSTATS)) { > > > dout(20) << " dirstat mismatch, fixing" << dendl; > > > } else { > > > - clog->error() << "unmatched fragstat on " << ino() << ", > > > inode has " > > > + clog->warn() << "unmatched fragstat on " << ino() << ", inode > > > has " > > > << pi->dirstat << ", dirfrags have " << dirstat; > > > assert(!"unmatched fragstat" == g_conf->mds_verify_scatter); > > > } > > > > > > > > > Cheers, Dan > > > > > > > > > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng wrote: > > >> > > >> no action is required. mds fixes this type of error atomically. > > >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke > > >> wrote: > > >> > > > >> > Hi, > > >> > > > >> > > > >> > upon failover or restart, or MDS complains that something is wrong with > > >> > one of the stray directories: > > >> > > > >> > > > >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log > > >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 > > >> > 12:51:12.016360 -4=-5+1) > > >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log > > >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19 > > >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 > > >> > 12:51:12.016360 > > >> > 1=0+1) > > >> > > > >> > > > >> > How do we handle this problem? > > >> > > > >> > > > >> > Regards, > > >> > > > >> > Burkhard > > >> > > > >> > > > >> > ___ > > >> > ceph-users mailing list > > >> > ceph-users@lists.ceph.com > > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> ___ > > >>
Re: [ceph-users] cephfs performance degraded very fast
On Tue, Jan 22, 2019 at 8:24 PM renjianxinlover wrote: > > hi, >at some time, as cache pressure or caps release failure, client apps mount > got stuck. >my use case is in kubernetes cluster and automatic kernel client mount in > nodes. >is anyone faced with same issue or has related solution? > Brs > > If you mean "client.xxx failing to respond to capability release". you'd better to make sure all clients are uptodate (newest version of ceph-fuse, recent kernel) > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Process stuck in D+ on cephfs mount
On Wed, Jan 23, 2019 at 5:50 AM Marc Roos wrote: > > > I got one again > > [] wait_on_page_bit_killable+0x83/0xa0 > [] __lock_page_or_retry+0xb2/0xc0 > [] filemap_fault+0x3b7/0x410 > [] ceph_filemap_fault+0x13c/0x310 [ceph] > [] __do_fault+0x4c/0xc0 > [] do_read_fault.isra.42+0x43/0x130 > [] handle_mm_fault+0x6b1/0x1040 > [] __do_page_fault+0x154/0x450 > [] do_page_fault+0x35/0x90 > [] page_fault+0x28/0x30 > [] 0x > > This is likely caused by hang osd request, was you cluster health? > >check /proc//stack to find where it is stuck > > > >> > >> > >> I have a process stuck in D+ writing to cephfs kernel mount. > Anything > >> can be done about this? (without rebooting) > >> > >> > >> CentOS Linux release 7.5.1804 (Core) > >> Linux 3.10.0-514.21.2.el7.x86_64 > >> > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
If you use librados directly it's up to you to ensure you can identify your objects. Generally RADOS stores objects and not files so when you provide your object ids you need to come up with a convention so you can correctly identify them. If you need to provide meta data (i.e. a list of all existing backups, when they were taken etc.) then again you need to manage that yourself (probably in dedicated meta-data objects). Using RADOS namespaces (like one per database) is probably a good idea. Also keep in mind that for example Bluestore has a maximum object size of 4GB so mapping files 1:1 to object is probably not a wise approach and you should breakup your files into smaller chunks when storing them. There is libradosstriper which handles the striping of large objects transparently but not sure if that has support for RADOS namespaces. Using RGW instead might be an easier route to go down On Wed, 23 Jan 2019 at 10:10, cmonty14 <74cmo...@gmail.com> wrote: > My backup client is using librados. > I understand that defining a pool for the same application is recommended. > > However this would not answer my other questions: > How can I identify a backup created by client A that I want to restore > on another client Z? > I mean typically client A would write a backup file identified by the > filename. > Would it be possible on client Z to identify this backup file by > filename? If yes, how? > > Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb : > > > > Hi, > > > > Ceph's pool are meant to let you define specific engineering rules > > and/or application (rbd, cephfs, rgw) > > They are not designed to be created in a massive fashion (see pgs etc) > > So, create a pool for each engineering ruleset, and store your data in > them > > For what is left of your project, I believe you have to implement that > > on top of Ceph > > > > For instance, let say you simply create a pool, with a rbd volume in it > > You then create a filesystem on that, and map it on some server > > Finally, you can push your files on that mountpoint, using various > > Linux's user, acl or whatever : beyond that point, there is nothing more > > specific to Ceph, it is "just" a mounted filesystem > > > > Regards, > > > > On 01/22/2019 02:16 PM, cmonty14 wrote: > > > Hi, > > > > > > my use case for Ceph is providing a central backup storage. > > > This means I will backup multiple databases in Ceph storage cluster. > > > > > > This is my question: > > > What is the best practice for creating pools & images? > > > Should I create multiple pools, means one pool per database? > > > Or should I create a single pool "backup" and use namespace when > writing > > > data in the pool? > > > > > > This is the security demand that should be considered: > > > DB-owner A can only modify the files that belong to A; other files > > > (owned by B, C or D) are accessible for A. > > > > > > And there's another issue: > > > How can I identify a backup created by client A that I want to restore > > > on another client Z? > > > I mean typically client A would write a backup file identified by the > > > filename. > > > Would it be possible on client Z to identify this backup file by > > > filename? If yes, how? > > > > > > > > > THX > > > ___ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Process stuck in D+ on cephfs mount
I got one again [] wait_on_page_bit_killable+0x83/0xa0 [] __lock_page_or_retry+0xb2/0xc0 [] filemap_fault+0x3b7/0x410 [] ceph_filemap_fault+0x13c/0x310 [ceph] [] __do_fault+0x4c/0xc0 [] do_read_fault.isra.42+0x43/0x130 [] handle_mm_fault+0x6b1/0x1040 [] __do_page_fault+0x154/0x450 [] do_page_fault+0x35/0x90 [] page_fault+0x28/0x30 [] 0x >check /proc//stack to find where it is stuck > >> >> >> I have a process stuck in D+ writing to cephfs kernel mount. Anything >> can be done about this? (without rebooting) >> >> >> CentOS Linux release 7.5.1804 (Core) >> Linux 3.10.0-514.21.2.el7.x86_64 >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] read-only mounts of RBD images on multiple nodes for parallel reads
Thanks all for the great advices and inputs. Regarding Mykola's suggestion to use Read-Only snapshots, what is the overhead of creating these snapshots? I assume these are copy-on-write snapshots, so there's no extra space consumed except for the metadata? Thanks, Shridhar On Fri, 18 Jan 2019 at 04:10, Ilya Dryomov wrote: > On Fri, Jan 18, 2019 at 11:25 AM Mykola Golub > wrote: > > > > On Thu, Jan 17, 2019 at 10:27:20AM -0800, Void Star Nill wrote: > > > Hi, > > > > > > We am trying to use Ceph in our products to address some of the use > cases. > > > We think Ceph block device for us. One of the use cases is that we > have a > > > number of jobs running in containers that need to have Read-Only > access to > > > shared data. The data is written once and is consumed multiple times. I > > > have read through some of the similar discussions and the > recommendations > > > on using CephFS for these situations, but in our case Block device > makes > > > more sense as it fits well with other use cases and restrictions we > have > > > around this use case. > > > > > > The following scenario seems to work as expected when we tried on a > test > > > cluster, but we wanted to get an expert opinion to see if there would > be > > > any issues in production. The usage scenario is as follows: > > > > > > - A block device is created with "--image-shared" options: > > > > > > rbd create mypool/foo --size 4G --image-shared > > > > "--image-shared" just means that the created image will have > > "exclusive-lock" feature and all other features that depend on it > > disabled. It is useful for scenarios when one wants simulteous write > > access to the image (e.g. when using a shared-disk cluster fs like > > ocfs2) and does not want a performance penalty due to "exlusive-lock" > > being pinged-ponged between writers. > > > > For your scenario it is not necessary but is ok. > > > > > - The image is mapped to a host, formatted in ext4 format (or other > file > > > formats), mounted to a directory in read/write mode and data is > written to > > > it. Please note that the image will be mapped in exclusive write mode > -- no > > > other read/write mounts are allowed a this time. > > > > The map "exclusive" option works only for images with "exclusive-lock" > > feature enabled and prevent in this case automatic exclusive lock > > transitions (ping-pong mentioned above) from one writer to > > another. And in this case it will not prevent from mapping and > > mounting it ro and probably even rw (I am not familiar enough with > > kernel rbd implementation to be sure here), though in the last case > > the write will fail. > > With -o exclusive, in addition to preventing automatic lock > transitions, the kernel will attempt to acquire the lock at map time > (i.e. before allowing any I/O) and return an error from "rbd map" in > case the lock cannot be acquired. > > However, the fact the image is mapped -o exclusive on one host doesn't > mean that it can't be mapped without -o exclusive on another host. If > you then try to write though the non-exclusive mapping, the write will > block until the exclusive mapping goes away resulting a hung tasks in > uninterruptible sleep state -- a much less pleasant failure mode. > > So make sure that all writers use -o exclusive. > > Thanks, > > Ilya > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
AFAIK, the only AAA available with librados works on a pool granularity So, if you create a ceph user with access to your pool, he will get access to all the content stored in this pool If you want to use librados for your use case, you will need to implement, on your code, the application logic required for your security needs So, to answer precisely: "How can I identify a backup created by client A that I want to restore on another client Z?" You cannot, a client will get access to all the content of the pool, including others' backup (which are keys, at the rados level) "Would it be possible on client Z to identify this backup file by filename? If yes, how?" On the rados level, AFAIK, there is no metadata associated with key So you have to includes those informations on the key name (the key are what you are calling "backup", "file" etc) Regards, On 01/22/2019 10:09 PM, cmonty14 wrote: > My backup client is using librados. > I understand that defining a pool for the same application is recommended. > > However this would not answer my other questions: > How can I identify a backup created by client A that I want to restore > on another client Z? > I mean typically client A would write a backup file identified by the > filename. > Would it be possible on client Z to identify this backup file by > filename? If yes, how? > > Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb : >> >> Hi, >> >> Ceph's pool are meant to let you define specific engineering rules >> and/or application (rbd, cephfs, rgw) >> They are not designed to be created in a massive fashion (see pgs etc) >> So, create a pool for each engineering ruleset, and store your data in them >> For what is left of your project, I believe you have to implement that >> on top of Ceph >> >> For instance, let say you simply create a pool, with a rbd volume in it >> You then create a filesystem on that, and map it on some server >> Finally, you can push your files on that mountpoint, using various >> Linux's user, acl or whatever : beyond that point, there is nothing more >> specific to Ceph, it is "just" a mounted filesystem >> >> Regards, >> >> On 01/22/2019 02:16 PM, cmonty14 wrote: >>> Hi, >>> >>> my use case for Ceph is providing a central backup storage. >>> This means I will backup multiple databases in Ceph storage cluster. >>> >>> This is my question: >>> What is the best practice for creating pools & images? >>> Should I create multiple pools, means one pool per database? >>> Or should I create a single pool "backup" and use namespace when writing >>> data in the pool? >>> >>> This is the security demand that should be considered: >>> DB-owner A can only modify the files that belong to A; other files >>> (owned by B, C or D) are accessible for A. >>> >>> And there's another issue: >>> How can I identify a backup created by client A that I want to restore >>> on another client Z? >>> I mean typically client A would write a backup file identified by the >>> filename. >>> Would it be possible on client Z to identify this backup file by >>> filename? If yes, how? >>> >>> >>> THX >>> ___ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
My backup client is using librados. I understand that defining a pool for the same application is recommended. However this would not answer my other questions: How can I identify a backup created by client A that I want to restore on another client Z? I mean typically client A would write a backup file identified by the filename. Would it be possible on client Z to identify this backup file by filename? If yes, how? Am Di., 22. Jan. 2019 um 15:07 Uhr schrieb : > > Hi, > > Ceph's pool are meant to let you define specific engineering rules > and/or application (rbd, cephfs, rgw) > They are not designed to be created in a massive fashion (see pgs etc) > So, create a pool for each engineering ruleset, and store your data in them > For what is left of your project, I believe you have to implement that > on top of Ceph > > For instance, let say you simply create a pool, with a rbd volume in it > You then create a filesystem on that, and map it on some server > Finally, you can push your files on that mountpoint, using various > Linux's user, acl or whatever : beyond that point, there is nothing more > specific to Ceph, it is "just" a mounted filesystem > > Regards, > > On 01/22/2019 02:16 PM, cmonty14 wrote: > > Hi, > > > > my use case for Ceph is providing a central backup storage. > > This means I will backup multiple databases in Ceph storage cluster. > > > > This is my question: > > What is the best practice for creating pools & images? > > Should I create multiple pools, means one pool per database? > > Or should I create a single pool "backup" and use namespace when writing > > data in the pool? > > > > This is the security demand that should be considered: > > DB-owner A can only modify the files that belong to A; other files > > (owned by B, C or D) are accessible for A. > > > > And there's another issue: > > How can I identify a backup created by client A that I want to restore > > on another client Z? > > I mean typically client A would write a backup file identified by the > > filename. > > Would it be possible on client Z to identify this backup file by > > filename? If yes, how? > > > > > > THX > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Spec for Ceph Mon+Mgr?
Hi. We're currently co-locating our mons with the head node of our Hadoop installation. That may be giving us some problems, we dont know yet, but thus I'm speculation about moving them to dedicated hardware. It is hard to get specifications "small" engough .. the specs for the mon is where we usually virtualize our way out of if .. which seems very wrong here. Are other people just co-locating it with something random or what are others typically using in a small ceph cluster (< 100 OSDs .. 7 OSD hosts) Thanks. Jesper ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken CephFS stray entries?
On Tue, Jan 22, 2019 at 3:33 PM Yan, Zheng wrote: > > On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster wrote: > > > > Hi Zheng, > > > > We also just saw this today and got a bit worried. > > Should we change to: > > > > What is the error message (on stray dir or other dir)? does the > cluster ever enable multi-acitive mds? > It was during an upgrade from v12.2.8 to v12.2.10. 5 active MDS's during the upgrade. 2019-01-22 10:08:22.629545 mds.p01001532184554 mds.2 128.142.39.144:6800/268398 36 : cluster [WRN] replayed op client.54045065:2282648,2282514 used ino 0x3001c85b193 but session next is 0x3001c28f018 2019-01-22 10:08:22.629617 mds.p01001532184554 mds.2 128.142.39.144:6800/268398 37 : cluster [WRN] replayed op client.54045065:2282649,2282514 used ino 0x3001c85b194 but session next is 0x3001c28f018 2019-01-22 10:08:22.629652 mds.p01001532184554 mds.2 128.142.39.144:6800/268398 38 : cluster [WRN] replayed op client.54045065:2282650,2282514 used ino 0x3001c85b195 but session next is 0x3001c28f018 2019-01-22 10:08:37.373704 mon.cephflax-mon-9b406e0261 mon.0 137.138.121.135:6789/0 2748 : cluster [INF] daemon mds.p01001532184554 is now active in filesystem cephfs as rank 2 2019-01-22 10:08:37.805675 mon.cephflax-mon-9b406e0261 mon.0 137.138.121.135:6789/0 2749 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded) 2019-01-22 10:08:39.784260 mds.p01001532184554 mds.2 128.142.39.144:6800/268398 547 : cluster [ERR] bad/negative dir size on 0x61b f(v27 m2019-01-22 10:07:38.509466 0=-1+1) 2019-01-22 10:08:39.784271 mds.p01001532184554 mds.2 128.142.39.144:6800/268398 548 : cluster [ERR] unmatched fragstat on 0x61b, inode has f(v28 m2019-01-22 10:07:38.509466 0=-1+1), dirfrags have f(v0 m2019-01-22 10:07:38.509466 1=0+1) 2019-01-22 10:10:02.605036 mon.cephflax-mon-9b406e0261 mon.0 137.138.121.135:6789/0 2803 : cluster [INF] Health check cleared: MDS_INSUFFICIENT_STANDBY (was: insufficient standby MDS daemons available) 2019-01-22 10:10:02.605089 mon.cephflax-mon-9b406e0261 mon.0 137.138.121.135:6789/0 2804 : cluster [INF] Cluster is now healthy > > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc > > index e8c1bc8bc1..e2539390fb 100644 > > --- a/src/mds/CInode.cc > > +++ b/src/mds/CInode.cc > > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type) > > > > if (pf->fragstat.nfiles < 0 || > > pf->fragstat.nsubdirs < 0) { > > - clog->error() << "bad/negative dir size on " > > + clog->warn() << "bad/negative dir size on " > > << dir->dirfrag() << " " << pf->fragstat; > > assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter); > > > > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type) > > if (state_test(CInode::STATE_REPAIRSTATS)) { > > dout(20) << " dirstat mismatch, fixing" << dendl; > > } else { > > - clog->error() << "unmatched fragstat on " << ino() << ", inode > > has " > > + clog->warn() << "unmatched fragstat on " << ino() << ", inode > > has " > > << pi->dirstat << ", dirfrags have " << dirstat; > > assert(!"unmatched fragstat" == g_conf->mds_verify_scatter); > > } > > > > > > Cheers, Dan > > > > > > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng wrote: > >> > >> no action is required. mds fixes this type of error atomically. > >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke > >> wrote: > >> > > >> > Hi, > >> > > >> > > >> > upon failover or restart, or MDS complains that something is wrong with > >> > one of the stray directories: > >> > > >> > > >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log > >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 > >> > 12:51:12.016360 -4=-5+1) > >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log > >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19 > >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360 > >> > 1=0+1) > >> > > >> > > >> > How do we handle this problem? > >> > > >> > > >> > Regards, > >> > > >> > Burkhard > >> > > >> > > >> > ___ > >> > ceph-users mailing list > >> > ceph-users@lists.ceph.com > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> ___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken CephFS stray entries?
On Tue, Jan 22, 2019 at 9:08 PM Dan van der Ster wrote: > > Hi Zheng, > > We also just saw this today and got a bit worried. > Should we change to: > What is the error message (on stray dir or other dir)? does the cluster ever enable multi-acitive mds? > diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc > index e8c1bc8bc1..e2539390fb 100644 > --- a/src/mds/CInode.cc > +++ b/src/mds/CInode.cc > @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type) > > if (pf->fragstat.nfiles < 0 || > pf->fragstat.nsubdirs < 0) { > - clog->error() << "bad/negative dir size on " > + clog->warn() << "bad/negative dir size on " > << dir->dirfrag() << " " << pf->fragstat; > assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter); > > @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type) > if (state_test(CInode::STATE_REPAIRSTATS)) { > dout(20) << " dirstat mismatch, fixing" << dendl; > } else { > - clog->error() << "unmatched fragstat on " << ino() << ", inode > has " > + clog->warn() << "unmatched fragstat on " << ino() << ", inode has > " > << pi->dirstat << ", dirfrags have " << dirstat; > assert(!"unmatched fragstat" == g_conf->mds_verify_scatter); > } > > > Cheers, Dan > > > On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng wrote: >> >> no action is required. mds fixes this type of error atomically. >> On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke >> wrote: >> > >> > Hi, >> > >> > >> > upon failover or restart, or MDS complains that something is wrong with >> > one of the stray directories: >> > >> > >> > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log >> > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 >> > 12:51:12.016360 -4=-5+1) >> > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log >> > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19 >> > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360 >> > 1=0+1) >> > >> > >> > How do we handle this problem? >> > >> > >> > Regards, >> > >> > Burkhard >> > >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] monitor cephfs mount io's
Hi Marc, My point was that there was no way to do that for a kernel mount except from the client that consumes the mounted RBDs. Mohamad On 1/21/19 4:29 AM, Marc Roos wrote: > > Hi Mohamad, How do you do that client side, I am having currently two > kernel mounts? > > > > > > -Original Message- > From: Mohamad Gebai [mailto:mge...@suse.de] > Sent: 17 January 2019 15:57 > To: Marc Roos; ceph-users > Subject: Re: [ceph-users] monitor cephfs mount io's > > You can do that either straight from your client, or by querying the > perf dump if you're using ceph-fuse. > > Mohamad > > On 1/17/19 6:19 AM, Marc Roos wrote: >> How / where can I monitor the ios on cephfs mount / client? >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
Hi, Ceph's pool are meant to let you define specific engineering rules and/or application (rbd, cephfs, rgw) They are not designed to be created in a massive fashion (see pgs etc) So, create a pool for each engineering ruleset, and store your data in them For what is left of your project, I believe you have to implement that on top of Ceph For instance, let say you simply create a pool, with a rbd volume in it You then create a filesystem on that, and map it on some server Finally, you can push your files on that mountpoint, using various Linux's user, acl or whatever : beyond that point, there is nothing more specific to Ceph, it is "just" a mounted filesystem Regards, On 01/22/2019 02:16 PM, cmonty14 wrote: > Hi, > > my use case for Ceph is providing a central backup storage. > This means I will backup multiple databases in Ceph storage cluster. > > This is my question: > What is the best practice for creating pools & images? > Should I create multiple pools, means one pool per database? > Or should I create a single pool "backup" and use namespace when writing > data in the pool? > > This is the security demand that should be considered: > DB-owner A can only modify the files that belong to A; other files > (owned by B, C or D) are accessible for A. > > And there's another issue: > How can I identify a backup created by client A that I want to restore > on another client Z? > I mean typically client A would write a backup file identified by the > filename. > Would it be possible on client Z to identify this backup file by > filename? If yes, how? > > > THX > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] backfill_toofull while OSDs are not full
Hi, I've got a couple of PGs which are stuck in backfill_toofull, but none of them are actually full. "up": [ 999, 1900, 145 ], "acting": [ 701, 1146, 1880 ], "backfill_targets": [ "145", "999", "1900" ], "acting_recovery_backfill": [ "145", "701", "999", "1146", "1880", "1900" ], I checked all these OSDs, but they are all <75% utilization. full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.9 So I started checking all the PGs and I've noticed that each of these PGs has one OSD in the 'acting_recovery_backfill' which is marked as out. In this case osd.1880 is marked as out and thus it's capacity is shown as zero. [ceph@ceph-mgr ~]$ ceph osd df|grep 1880 1880 hdd 4.545990 0 B 0 B 0 B 00 27 [ceph@ceph-mgr ~]$ This is on a Mimic 13.2.4 cluster. Is this expected or is this a unknown side-effect of one of the OSDs being marked as out? Thanks, Wido ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RBD client hangs
Your "mon" cap should be "profile rbd" instead of "allow r" [1]. [1] http://docs.ceph.com/docs/master/rbd/rados-rbd-cmds/#create-a-block-device-user On Mon, Jan 21, 2019 at 9:05 PM ST Wong (ITSC) wrote: > > Hi, > > > Is this an upgraded or a fresh cluster? > It's a fresh cluster. > > > Does client.acapp1 have the permission to blacklist other clients? You can > > check with "ceph auth get client.acapp1". > > No, it's our first Ceph cluster with basic setup for testing, without any > blacklist implemented. > > --- cut here --- > # ceph auth get client.acapp1 > exported keyring for client.acapp1 > [client.acapp1] > key = > caps mds = "allow r" > caps mgr = "allow r" > caps mon = "allow r" > caps osd = "allow rwx pool=2copy, allow rwx pool=4copy" > --- cut here --- > > Thanks a lot. > /st > > > > -Original Message- > From: Ilya Dryomov > Sent: Monday, January 21, 2019 7:33 PM > To: ST Wong (ITSC) > Cc: ceph-users@lists.ceph.com > Subject: Re: [ceph-users] RBD client hangs > > On Mon, Jan 21, 2019 at 11:43 AM ST Wong (ITSC) wrote: > > > > Hi, we’re trying mimic on an VM farm. It consists 4 OSD hosts (8 OSDs) and > > 3 MON. We tried mounting as RBD and CephFS (fuse and kernel mount) on > > different clients without problem. > > Is this an upgraded or a fresh cluster? > > > > > Then one day we perform failover test and stopped one of the OSD. Not sure > > if it’s related but after that testing, the RBD client freeze when trying > > to mount the rbd device. > > > > > > > > Steps to reproduce: > > > > > > > > # modprobe rbd > > > > > > > > (dmesg) > > > > [ 309.997587] Key type dns_resolver registered > > > > [ 310.043647] Key type ceph registered > > > > [ 310.044325] libceph: loaded (mon/osd proto 15/24) > > > > [ 310.054548] rbd: loaded > > > > > > > > # rbd -n client.acapp1 map 4copy/foo > > > > /dev/rbd0 > > > > > > > > # rbd showmapped > > > > id pool image snap device > > > > 0 4copy foo -/dev/rbd0 > > > > > > > > > > > > Then hangs if I tried to mount or reboot the server after rbd map. There > > are lot of error in dmesg, e.g. > > > > > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > > failed: -13 > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13 > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: client74700 seems dead, > > breaking lock > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: blacklist of client74700 > > failed: -13 > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: failed to acquire lock: -13 > > > > Jan 20 03:43:32 acapp1 kernel: rbd: rbd0: no lock owners detected > > Does client.acapp1 have the permission to blacklist other clients? You can > check with "ceph auth get client.acapp1". If not, follow step 6 of > http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken. > > Thanks, > > Ilya > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Using Ceph central backup storage - Best practice creating pools
Hi, my use case for Ceph is providing a central backup storage. This means I will backup multiple databases in Ceph storage cluster. This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single pool "backup" and use namespace when writing data in the pool? This is the security demand that should be considered: DB-owner A can only modify the files that belong to A; other files (owned by B, C or D) are accessible for A. And there's another issue: How can I identify a backup created by client A that I want to restore on another client Z? I mean typically client A would write a backup file identified by the filename. Would it be possible on client Z to identify this backup file by filename? If yes, how? THX ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Broken CephFS stray entries?
Hi Zheng, We also just saw this today and got a bit worried. Should we change to: diff --git a/src/mds/CInode.cc b/src/mds/CInode.cc index e8c1bc8bc1..e2539390fb 100644 --- a/src/mds/CInode.cc +++ b/src/mds/CInode.cc @@ -2040,7 +2040,7 @@ void CInode::finish_scatter_gather_update(int type) if (pf->fragstat.nfiles < 0 || pf->fragstat.nsubdirs < 0) { - clog->error() << "bad/negative dir size on " + clog->warn() << "bad/negative dir size on " << dir->dirfrag() << " " << pf->fragstat; assert(!"bad/negative fragstat" == g_conf->mds_verify_scatter); @@ -2077,7 +2077,7 @@ void CInode::finish_scatter_gather_update(int type) if (state_test(CInode::STATE_REPAIRSTATS)) { dout(20) << " dirstat mismatch, fixing" << dendl; } else { - clog->error() << "unmatched fragstat on " << ino() << ", inode has " + clog->warn() << "unmatched fragstat on " << ino() << ", inode has " << pi->dirstat << ", dirfrags have " << dirstat; assert(!"unmatched fragstat" == g_conf->mds_verify_scatter); } Cheers, Dan On Sat, Oct 20, 2018 at 2:33 AM Yan, Zheng wrote: > no action is required. mds fixes this type of error atomically. > On Fri, Oct 19, 2018 at 6:59 PM Burkhard Linke > wrote: > > > > Hi, > > > > > > upon failover or restart, or MDS complains that something is wrong with > > one of the stray directories: > > > > > > 2018-10-19 12:56:06.442151 7fc908e2d700 -1 log_channel(cluster) log > > [ERR] : bad/negative dir size on 0x607 f(v133 m2018-10-19 > > 12:51:12.016360 -4=-5+1) > > 2018-10-19 12:56:06.442182 7fc908e2d700 -1 log_channel(cluster) log > > [ERR] : unmatched fragstat on 0x607, inode has f(v134 m2018-10-19 > > 12:51:12.016360 -4=-5+1), dirfrags have f(v0 m2018-10-19 12:51:12.016360 > > 1=0+1) > > > > > > How do we handle this problem? > > > > > > Regards, > > > > Burkhard > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt
On Tue, Jan 22, 2019 at 6:45 AM Manuel Lausch wrote: > > Hi, > > we want upgrade our ceph clusters from jewel to luminous. And also want > to migrate the osds to ceph-volume described in > http://docs.ceph.com/docs/luminous/ceph-volume/simple/scan/#ceph-volume-simple-scan > > The clusters are running since dumpling and are setup with dmcrypt. > Since dumpling there are until now three different types of dmcrypt > > plain dmcrypt with keys local > luks with keys local > luks with keys on the ceph monitors > > Now it seems only the last type can be migrated to ceph-volume. > > ceph-volume simple scan trys to mount a lockbox which does not exists > on the older OSDs. Are those OSDs not supported with ceph-volume? This is one case we didn't anticipate :/ We supported the wonky lockbox setup and thought we wouldn't need to go further back, although we did add support for both plain and luks keys. Looking through the code, it is very tightly couple to storing/retrieving keys from the monitors, and I don't know what workarounds might be possible here other than throwing away the OSD and deploying a new one (I take it this is not an option for you at all) > > This are the errors: > > # ceph-volume simple scan /var/lib/ceph/osd/ceph-183 > stderr: lsblk: /var/lib/ceph/osd/ceph-183: not a block device > stderr: lsblk: /var/lib/ceph/osd/ceph-183: not a block device > Running command: /usr/sbin/cryptsetup status > /dev/mapper/21ad7722-002f-464c-b460-a8976a7b4872 > Running command: /usr/sbin/cryptsetup status > 21ad7722-002f-464c-b460-a8976a7b4872 > Running command: mount -v /tmp/tmp3t1WRC > stderr: mount: is write-protected, mounting read-only > stderr: mount: unknown filesystem type '(null)' > --> RuntimeError: command returned non-zero exit status: 32 > > > and this is in the ceph-volume.log > > [2019-01-22 12:39:31,456][ceph_volume.process][INFO ] Running command: > /usr/sbin/blkid -p /dev/mapper/9b68b7e9-854e-498a-8381-4eef128a9d7a > [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] detecting > if argument is a device or a directory: /var/lib/ceph/osd/ceph-183 > [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] will scan > directly, path is a directory > [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] will scan > encrypted OSD directory at path: /var/lib/ceph/osd/ceph-183 > [2019-01-22 12:39:31,534][ceph_volume.process][INFO ] Running command: > /usr/sbin/blkid -s PARTUUID -o value /dev/sdv1 > [2019-01-22 12:39:31,539][ceph_volume.process][INFO ] stdout > 21ad7722-002f-464c-b460-a8976a7b4872 > [2019-01-22 12:39:31,540][ceph_volume.process][INFO ] Running command: > /usr/sbin/cryptsetup status 21ad7722-002f-464c-b460-a8976a7b4872 > [2019-01-22 12:39:31,546][ceph_volume.process][INFO ] stdout > /dev/mapper/21ad7722-002f-464c-b460-a8976a7b4872 is active and is in use. > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout type:PLAIN > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout cipher: > aes-cbc-essiv:sha256 > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout keysize: 256 > bits > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout key location: > dm-crypt > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout device: > /dev/sdv1 > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout sector size: > 512 > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout offset: 0 > sectors > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout size: > 7805646479 sectors > [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout mode: > read/write > [2019-01-22 12:39:31,548][ceph_volume.process][INFO ] Running command: mount > -v /tmp/tmp3t1WRC > [2019-01-22 12:39:31,597][ceph_volume.process][INFO ] stderr mount: is > write-protected, mounting read-only > [2019-01-22 12:39:31,622][ceph_volume.process][INFO ] stderr mount: unknown > filesystem type '(null)' > [2019-01-22 12:39:31,622][ceph_volume][ERROR ] exception caught by decorator > Traceback (most recent call last): > File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, > in newfunc > return f(*a, **kw) > File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in > main > terminal.dispatch(self.mapper, subcommand_args) > File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, > in dispatch > instance.main() > File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/main.py", > line 33, in main > terminal.dispatch(self.mapper, self.argv) > File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, > in dispatch > instance.main() > File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/scan.py", > line 353, in main > self.scan(args) > File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, > in is_root > return func(*a,
[ceph-users] cephfs performance degraded very fast
hi, at some time, as cache pressure or caps release failure, client apps mount got stuck. my use case is in kubernetes cluster and automatic kernel client mount in nodes. is anyone faced with same issue or has related solution? Brs___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] The OSD can be “down” but still “in”.
Hi, On 22/01/2019 10:02, M Ranga Swami Reddy wrote: > Hello - If an OSD shown as down and but its still "in" state..what > will happen with write/read operations on this down OSD? It depends ;-) In a typical 3-way replicated setup with min_size 2, writes to placement groups on that OSD will still go ahead - when 2 replicas are written OK, then the write will complete. Once the OSD comes back up, these writes will then be replicated to that OSD. If it stays down for long enough to be marked out, then pgs on that OSD will be replicated elsewhere. If you had min_size 3 as well, then writes would block until the OSD was back up (or marked out and the pgs replicated to another OSD). Regards, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] migrate ceph-disk to ceph-volume fails with dmcrypt
Hi, we want upgrade our ceph clusters from jewel to luminous. And also want to migrate the osds to ceph-volume described in http://docs.ceph.com/docs/luminous/ceph-volume/simple/scan/#ceph-volume-simple-scan The clusters are running since dumpling and are setup with dmcrypt. Since dumpling there are until now three different types of dmcrypt plain dmcrypt with keys local luks with keys local luks with keys on the ceph monitors Now it seems only the last type can be migrated to ceph-volume. ceph-volume simple scan trys to mount a lockbox which does not exists on the older OSDs. Are those OSDs not supported with ceph-volume? This are the errors: # ceph-volume simple scan /var/lib/ceph/osd/ceph-183 stderr: lsblk: /var/lib/ceph/osd/ceph-183: not a block device stderr: lsblk: /var/lib/ceph/osd/ceph-183: not a block device Running command: /usr/sbin/cryptsetup status /dev/mapper/21ad7722-002f-464c-b460-a8976a7b4872 Running command: /usr/sbin/cryptsetup status 21ad7722-002f-464c-b460-a8976a7b4872 Running command: mount -v /tmp/tmp3t1WRC stderr: mount: is write-protected, mounting read-only stderr: mount: unknown filesystem type '(null)' --> RuntimeError: command returned non-zero exit status: 32 and this is in the ceph-volume.log [2019-01-22 12:39:31,456][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid -p /dev/mapper/9b68b7e9-854e-498a-8381-4eef128a9d7a [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] detecting if argument is a device or a directory: /var/lib/ceph/osd/ceph-183 [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] will scan directly, path is a directory [2019-01-22 12:39:31,533][ceph_volume.devices.simple.scan][INFO ] will scan encrypted OSD directory at path: /var/lib/ceph/osd/ceph-183 [2019-01-22 12:39:31,534][ceph_volume.process][INFO ] Running command: /usr/sbin/blkid -s PARTUUID -o value /dev/sdv1 [2019-01-22 12:39:31,539][ceph_volume.process][INFO ] stdout 21ad7722-002f-464c-b460-a8976a7b4872 [2019-01-22 12:39:31,540][ceph_volume.process][INFO ] Running command: /usr/sbin/cryptsetup status 21ad7722-002f-464c-b460-a8976a7b4872 [2019-01-22 12:39:31,546][ceph_volume.process][INFO ] stdout /dev/mapper/21ad7722-002f-464c-b460-a8976a7b4872 is active and is in use. [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout type:PLAIN [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout cipher: aes-cbc-essiv:sha256 [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout keysize: 256 bits [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout key location: dm-crypt [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout device: /dev/sdv1 [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout sector size: 512 [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout offset: 0 sectors [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout size: 7805646479 sectors [2019-01-22 12:39:31,547][ceph_volume.process][INFO ] stdout mode: read/write [2019-01-22 12:39:31,548][ceph_volume.process][INFO ] Running command: mount -v /tmp/tmp3t1WRC [2019-01-22 12:39:31,597][ceph_volume.process][INFO ] stderr mount: is write-protected, mounting read-only [2019-01-22 12:39:31,622][ceph_volume.process][INFO ] stderr mount: unknown filesystem type '(null)' [2019-01-22 12:39:31,622][ceph_volume][ERROR ] exception caught by decorator Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/main.py", line 33, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in dispatch instance.main() File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/scan.py", line 353, in main self.scan(args) File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/scan.py", line 244, in scan osd_metadata = self.scan_encrypted(osd_path) File "/usr/lib/python2.7/site-packages/ceph_volume/devices/simple/scan.py", line 169, in scan_encrypted with system.tmp_mount(lockbox) as lockbox_path: File "/usr/lib/python2.7/site-packages/ceph_volume/util/system.py", line 145, in __enter__ self.path File "/usr/lib/python2.7/site-packages/ceph_volume/process.py", line 153, in run raise RuntimeError(msg) RuntimeError: command returned non-zero exit status: 32 ceph version 12.2.10 (177915764b752804194937482a39e95e0ca3de94) luminous (stable) Regards Manuel --
Re: [ceph-users] Using Ceph central backup storage - Best practice creating pools
Hi Thomas, What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single pool "backup" and use namespace when writing data in the pool? I don't think one pool per DB is reasonable. If the number of DBs increases you'll have to create more pools and change the respective auth settings. One pool for your DB backups would suffice, and restricting user access is possible on rbd image level. You can grant read/write access for one client and only read access for other clients, you have to create different clients for that, see [1] for more details. Regards, Eugen [1] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024424.html Zitat von Thomas <74cmo...@gmail.com>: Hi, my use case for Ceph is serving a central backup storage. This means I will backup multiple databases in Ceph storage cluster. This is my question: What is the best practice for creating pools & images? Should I create multiple pools, means one pool per database? Or should I create a single pool "backup" and use namespace when writing data in the pool? This is the security demand that should be considered: DB-owner A can only modify the files that belong to A; other files (owned by B, C or D) are accessible for A. And there's another issue: How can I identify a backup created by client A that I want to restore on another client Z? I mean typically client A would write a backup file identified by the filename. Would it be possible on client Z to identify this backup file by filename? If yes, how? THX ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] The OSD can be “down” but still “in”.
Hello - If an OSD shown as down and but its still "in" state..what will happen with write/read operations on this down OSD? Thanks Swami ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS performance issue
On Tue, Jan 22, 2019 at 10:49 AM Albert Yue wrote: > > Hi Yan Zheng, > > In your opinion, can we resolve this issue by move MDS to a 512GB or 1TB > memory machine? > The problem is from client side, especially clients with large memory. I don't think enlarge mds cache size is good idea. you can periodically check periodically each kernel clients' /sys/kernel/debug/ceph/xxx/caps. run 'echo 2 >/proc/sys/vm/drop_caches' if a client used too many caps (for example 10k), > On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng wrote: >> >> On Mon, Jan 21, 2019 at 11:16 AM Albert Yue >> wrote: >> > >> > Dear Ceph Users, >> > >> > We have set up a cephFS cluster with 6 osd machines, each with 16 8TB >> > harddisk. Ceph version is luminous 12.2.5. We created one data pool with >> > these hard disks and created another meta data pool with 3 ssd. We created >> > a MDS with 65GB cache size. >> > >> > But our users are keep complaining that cephFS is too slow. What we >> > observed is cephFS is fast when we switch to a new MDS instance, once the >> > cache fills up (which will happen very fast), client became very slow when >> > performing some basic filesystem operation such as `ls`. >> > >> >> It seems that clients hold lots of unused inodes their icache, which >> prevent mds from trimming corresponding objects from its cache. mimic >> has command "ceph daemon mds.x cache drop" to ask client to drop its >> cache. I'm also working on a patch that make kclient client release >> unused inodes. >> >> For luminous, there is not much we can do, except periodically run >> "echo 2 > /proc/sys/vm/drop_caches" on each client. >> >> >> > What we know is our user are putting lots of small files into the cephFS, >> > now there are around 560 Million files. We didn't see high CPU wait on MDS >> > instance and meta data pool just used around 200MB space. >> > >> > My question is, what is the relationship between the metadata pool and >> > MDS? Is this performance issue caused by the hardware behind meta data >> > pool? Why the meta data pool only used 200MB space, and we saw 3k iops on >> > each of these three ssds, why can't MDS cache all these 200MB into memory? >> > >> > Thanks very much! >> > >> > >> > Best Regards, >> > >> > Albert >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] quick questions about a 5-node homelab setup
Den tis 22 jan. 2019 kl 00:50 skrev Brian Topping : > > I've scrounged up 5 old Atom Supermicro nodes and would like to run them > > 365/7 for limited production as RBD with Bluestore (ideally latest 13.2.4 > > Mimic), triple copy redundancy. Underlying OS is a Debian 9 64 bit, minimal > > install. > > The other thing to consider about a lab is “what do you want to learn?” If > reliability isn’t an issue (ie you aren’t putting your family pictures on > it), regardless of the cluster technology, you can often learn basics more > quickly without the overhead of maintaining quorums and all that stuff on day > one. So at risk of being a heretic, start small, for instance with single > mon/manager and add more later. Well, if you start small with one OSD, you are going to run into "the defaults will work against you" since as you make your first pool, it will want to place 3 copies on the separate hosts, so not only are you trying to get accustomed to ceph terms and technologies, you are also working against the whole cluster idea by not building a cluster at all, so you will encounter problems regular ceph admins don't see ever, so chances of getting help is smaller. Things like "OSD will pre-allocate so much data a 10G OSD crashes at start" or "my pool wont start since my pgs are in a bad state since I have only one OSD or only one host and I didn't change the crush rules" is just something people starting small will ever experience. Anyone with 3 or more real hosts with real drives attached just will never see it. Telling people to learn clusters by building a non-cluster might be counter-productive. When you have a working ceph cluster you can practice in getting it to run on a rpi with a usb stick for a drive, but starting at that will make you fight two or more unknowns at the same time, both ceph being new to you, and un-clustering a cluster software suite. (and possibly running on non-x86_64 for a third unknown) -- May the most significant bit of your life be positive. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] predict impact of crush tunables change
dear all, i have a luminious cluster with tunables profile "hammer" - now all my hammer clients are gone and i could raise the tunables level to "jewel". is there any good way to predict the data movement caused by such a config change? br wolfgang smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Does "mark_unfound_lost delete" only delete missing/unfound objects of a PG
Hello. I have a question about `ceph pg {pg.num} mark_unfound_lost delete`. Will this only delete objects which are unfound, or the whole PG which you put in as an argument? Objects (oid's) which i can see with `ceph pg {pg.num} list_missing`? So in the case bellow, would it remove the object "rbd_data.e53c3c27c0089c.01c7"? If so that would be great, since e53c3c27c0089c is not linked to any volume any more. ceph pg 65.2d0 list_missing { "offset": { "oid": "", "key": "", "snapid": 0, "hash": 0, "max": 0, "pool": -9223372036854775808, "namespace": "" }, "num_missing": 2, "num_unfound": 2, "objects": [ { "oid": { "oid": "rbd_data.e53c3c27c0089c.01c7", "key": "", "snapid": -2, "hash": 3857244880, "max": 0, "pool": 65, "namespace": "" }, "need": "613065'497508155", "have": "0'0", "locations": [] }, { "oid": { "oid": "rbd_data.e53c3c27c0089c.0059", "key": "", "snapid": -2, "hash": 2362866384, "max": 0, "pool": 65, "namespace": "" }, "need": "612939'497508050", "have": "0'0", "locations": [] } ], "more": 0 } Thanks in advance. Mathijs ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] krbd reboot hung
I’m using krbd to map a rbd device to a VM, it appears when the device is mounted, reboot OS will hung for more than 7min, in baremetal case, it could be more than 15min, even using the latest kernel 5.0.0, the problem still occurs. Here are the console logs with 4.15.18 kernel and mimic rbd client, reboot seems to be stuck in umount rbd operation [ OK ] Stopped Update UTMP about System Boot/Shutdown. [ OK ] Stopped Create Volatile Files and Directories. [ OK ] Stopped target Local File Systems. Unmounting /run/user/110281572... Unmounting /var/tmp... Unmounting /root/test... Unmounting /run/user/78402... Unmounting Configuration File System... [ OK ] Stopped Configure read-only root support. [ OK ] Unmounted /var/tmp. [ OK ] Unmounted /run/user/78402. [ OK ] Unmounted /run/user/110281572. [ OK ] Stopped target Swap. [ OK ] Unmounted Configuration File System. [ 189.919062] libceph: mon4 XX.XX.XX.XX:6789 session lost, hunting for new mon [ 189.950085] libceph: connect XX.XX.XX.XX:6789 error -101 [ 189.950764] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 190.687090] libceph: connect XX.XX.XX.XX:6789 error -101 [ 190.694197] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 191.711080] libceph: connect XX.XX.XX.XX:6789 error -101 [ 191.745254] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 193.695065] libceph: connect XX.XX.XX.XX:6789 error -101 [ 193.727694] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 197.087076] libceph: connect XX.XX.XX.XX:6789 error -101 [ 197.121077] libceph: mon4 XX.XX.XX.XX:6789 connect error [ 197.663082] libceph: connect XX.XX.XX.XX:6789 error -101 [ 197.680671] libceph: mon4 XX.XX.XX.XX:6789 connect error [ 198.687122] libceph: connect XX.XX.XX.XX:6789 error -101 [ 198.719253] libceph: mon4 XX.XX.XX.XX:6789 connect error [ 200.671136] libceph: connect XX.XX.XX.XX:6789 error -101 [ 200.702717] libceph: mon4 XX.XX.XX.XX:6789 connect error [ 204.703115] libceph: connect XX.XX.XX.XX:6789 error -101 [ 204.736586] libceph: mon4 XX.XX.XX.XX:6789 connect error [ 209.887141] libceph: connect XX.XX.XX.XX:6789 error -101 [ 209.918721] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 210.719078] libceph: connect XX.XX.XX.XX:6789 error -101 [ 210.750378] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 211.679118] libceph: connect XX.XX.XX.XX:6789 error -101 [ 211.712246] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 213.663116] libceph: connect XX.XX.XX.XX:6789 error -101 [ 213.696943] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 217.695062] libceph: connect XX.XX.XX.XX:6789 error -101 [ 217.728511] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 225.759109] libceph: connect XX.XX.XX.XX:6789 error -101 [ 225.775869] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 233.951062] libceph: connect XX.XX.XX.XX:6789 error -101 [ 233.951997] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 234.719114] libceph: connect XX.XX.XX.XX:6789 error -101 [ 234.720083] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 235.679112] libceph: connect XX.XX.XX.XX:6789 error -101 [ 235.680060] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 237.663088] libceph: connect XX.XX.XX.XX:6789 error -101 [ 237.664121] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 241.695082] libceph: connect XX.XX.XX.XX:6789 error -101 [ 241.696500] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 249.823095] libceph: connect XX.XX.XX.XX:6789 error -101 [ 249.824101] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 264.671119] libceph: connect XX.XX.XX.XX:6789 error -101 [ 264.672102] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 265.695109] libceph: connect XX.XX.XX.XX:6789 error -101 [ 265.696106] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 266.719145] libceph: connect XX.XX.XX.XX:6789 error -101 [ 266.720204] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 268.703121] libceph: connect XX.XX.XX.XX:6789 error -101 [ 268.704110] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 272.671115] libceph: connect XX.XX.XX.XX:6789 error -101 [ 272.672159] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 281.055087] libceph: connect XX.XX.XX.XX:6789 error -101 [ 281.056577] libceph: mon0 XX.XX.XX.XX:6789 connect error [ 294.879098] libceph: connect XX.XX.XX.XX:6789 error -101 [ 294.880230] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 295.711107] libceph: connect XX.XX.XX.XX:6789 error -101 [ 295.712102] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 296.671090] libceph: connect XX.XX.XX.XX:6789 error -101 [ 296.672082] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 298.719086] libceph: connect XX.XX.XX.XX:6789 error -101 [ 298.720027] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 302.687077] libceph: connect XX.XX.XX.XX:6789 error -101 [ 302.688103] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 310.751132] libceph: connect XX.XX.XX.XX:6789 error -101 [ 310.763103] libceph: mon3 XX.XX.XX.XX:6789 connect error [ 325.087096]
[ceph-users] RadosGW replication and failover issues
Hi, We are running the following radosgw( luminous 12.2.8) replications scenario. 1) We have 2 clusters, each running a radosgw, Cluster1 defined as master, and Cluster2 as slave. 2) We create a number of bucket with objects via master and slave 3) We shutdown the Cluster1 4) We execute failover on Cluster2: radosgw-admin zone modify --master --default radosgw-admin period update --commit 5) We create some new bucket and delete some existing bucket that were created in Step 2 6) We restart Cluster1, and execute :radosgw-admin realm pull radosgw-admin period pull 7) We saw that resync has finished succesfull and Cluster1 is defined as slave and Cluster2 as master The issue that now we see in Cluster1 the buckets that were deleted in Step5 ( while this cluster was down). We have waited awhile to see if maybe there were some objects left that should be deleted by GC, but even after a few hours those buckets are still visible in Cluster1 and not visible in Cluster2 We also tried: 6) We restart Cluster1, and execute :radosgw-admin period pull But then we see that sync is stuck, both of the clusters are defined as masters, and Cluster1 current period is the one before last period of Cluster2 How can we fix this issue? Is there some config command that should be called during failover? Thanks, Rom ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com