Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 7:45 PM,wrote: > I was doing parallel deletes until the point when there are >1M objects in > the stry. Then delete fails with ‘no space left’ error. If one would > deep-scrub those pgs containing corresponidng metadata, they turn to be > inconsistent. In worst case one would get virtually empty folders that have > size of 16EB. Those are impossible to delete as they are ‘non empty’. Yeah, as far as I can tell these are unrelated. You just got unlucky. :) -Greg > > > > -Mykola > > > > From: Gregory Farnum > Sent: Saturday, 15 October 2016 05:02 > To: Mykola Dvornik > Cc: Heller, Chris; ceph-users@lists.ceph.com > Subject: Re: [ceph-users] cephfs slow delete > > > > On Fri, Oct 14, 2016 at 6:26 PM, wrote: > >> If you are running 10.2.3 on your cluster, then I would strongly recommend > >> to NOT delete files in parallel as you might hit > >> http://tracker.ceph.com/issues/17177 > > > > I don't think these have anything to do with each other. What gave you > > the idea simultaneous deletes could invoke that issue? > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
I was doing parallel deletes until the point when there are >1M objects in the stry. Then delete fails with ‘no space left’ error. If one would deep-scrub those pgs containing corresponidng metadata, they turn to be inconsistent. In worst case one would get virtually empty folders that have size of 16EB. Those are impossible to delete as they are ‘non empty’. -Mykola From: Gregory Farnum Sent: Saturday, 15 October 2016 05:02 To: Mykola Dvornik Cc: Heller, Chris; ceph-users@lists.ceph.com Subject: Re: [ceph-users] cephfs slow delete On Fri, Oct 14, 2016 at 6:26 PM,wrote: > If you are running 10.2.3 on your cluster, then I would strongly recommend > to NOT delete files in parallel as you might hit > http://tracker.ceph.com/issues/17177 I don't think these have anything to do with each other. What gave you the idea simultaneous deletes could invoke that issue? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 6:26 PM,wrote: > If you are running 10.2.3 on your cluster, then I would strongly recommend > to NOT delete files in parallel as you might hit > http://tracker.ceph.com/issues/17177 I don't think these have anything to do with each other. What gave you the idea simultaneous deletes could invoke that issue? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
If you are running 10.2.3 on your cluster, then I would strongly recommend to NOT delete files in parallel as you might hit http://tracker.ceph.com/issues/17177 -Mykola From: Heller, Chris Sent: Saturday, 15 October 2016 03:36 To: Gregory Farnum Cc: ceph-users@lists.ceph.com Subject: Re: [ceph-users] cephfs slow delete Just a thought, but since a directory tree is a first class item in cephfs, could the wire protocol be extended with an “recursive delete” operation, specifically for cases like this? On 10/14/16, 4:16 PM, "Gregory Farnum"wrote: On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris wrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Just a thought, but since a directory tree is a first class item in cephfs, could the wire protocol be extended with an “recursive delete” operation, specifically for cases like this? On 10/14/16, 4:16 PM, "Gregory Farnum"wrote: On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chris wrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing arm64 Ubuntu packages for 10.2.3
On 10/14/16, 2:29 PM, "Alfredo Deza"wrote: >On Thu, Oct 13, 2016 at 5:19 PM, Stillwell, Bryan J > wrote: >> On 10/13/16, 2:32 PM, "Alfredo Deza" wrote: >> >>>On Thu, Oct 13, 2016 at 11:33 AM, Stillwell, Bryan J >>> wrote: I have a basement cluster that is partially built with Odroid-C2 boards and when I attempted to upgrade to the 10.2.3 release I noticed that this release doesn't have an arm64 build. Are there any plans on continuing to make arm64 builds? >>> >>>We have a couple of machines for building ceph releases on ARM64 but >>>unfortunately they sometimes have issues and since Arm64 is >>>considered a "nice to have" at the moment we usually skip them if >>>anything comes up. >>> >>>So it is an on-and-off kind of situation (I don't recall what happened >>>for 10.2.3) >>> >>>But since you've asked, I can try to get them built and see if we can >>>get 10.2.3 out. >> >> Sounds good, thanks Alfredo! > >10.2.3 arm64 for xenial (and centos7) is out. We only have xenial >available for arm64, hopefully that will work for you. Thanks Alfredo, but I'm only seeing xenial arm64 dbg packages here: http://download.ceph.com/debian-jewel/pool/main/c/ceph/ There's also a report on IRC that the Packages file no longer contains the 10.2.3 amd64 packages for xenial. Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Missing arm64 Ubuntu packages for 10.2.3
On Thu, Oct 13, 2016 at 5:19 PM, Stillwell, Bryan Jwrote: > On 10/13/16, 2:32 PM, "Alfredo Deza" wrote: > >>On Thu, Oct 13, 2016 at 11:33 AM, Stillwell, Bryan J >> wrote: >>> I have a basement cluster that is partially built with Odroid-C2 boards >>>and >>> when I attempted to upgrade to the 10.2.3 release I noticed that this >>> release doesn't have an arm64 build. Are there any plans on continuing >>>to >>> make arm64 builds? >> >>We have a couple of machines for building ceph releases on ARM64 but >>unfortunately they sometimes have issues and since Arm64 is >>considered a "nice to have" at the moment we usually skip them if >>anything comes up. >> >>So it is an on-and-off kind of situation (I don't recall what happened >>for 10.2.3) >> >>But since you've asked, I can try to get them built and see if we can >>get 10.2.3 out. > > Sounds good, thanks Alfredo! 10.2.3 arm64 for xenial (and centos7) is out. We only have xenial available for arm64, hopefully that will work for you. > > Bryan > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 1:11 PM, Heller, Chriswrote: > Ok. Since I’m running through the Hadoop/ceph api, there is no syscall > boundary so there is a simple place to improve the throughput here. Good to > know, I’ll work on a patch… Ah yeah, if you're in whatever they call the recursive tree delete function you can unroll that loop a whole bunch. I forget where the boundary is so you may need to go play with the JNI code; not sure. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary so there is a simple place to improve the throughput here. Good to know, I’ll work on a patch… On 10/14/16, 3:58 PM, "Gregory Farnum"wrote: On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chris wrote: > Unfortunately, it was all in the unlink operation. Looks as if it took nearly 20 hours to remove the dir, roundtrip is a killer there. What can be done to reduce RTT to the MDS? Does the client really have to sequentially delete directories or can it have internal batching or parallelization? It's bound by the same syscall APIs as anything else. You can spin off multiple deleters; I'd either keep them on one client (if you want to work within a single directory) or if using multiple clients assign them to different portions of the hierarchy. That will let you parallelize across the IO latency until you hit a cap on the MDS' total throughput (should be 1-10k deletes/s based on latest tests IIRC). -Greg > > -Chris > > On 10/13/16, 4:22 PM, "Gregory Farnum" wrote: > > On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris wrote: > > I have a directory I’ve been trying to remove from cephfs (via > > cephfs-hadoop), the directory is a few hundred gigabytes in size and > > contains a few million files, but not in a single sub directory. I startd > > the delete yesterday at around 6:30 EST, and it’s still progressing. I can > > see from (ceph osd df) that the overall data usage on my cluster is > > decreasing, but at the rate its going it will be a month before the entire > > sub directory is gone. Is a recursive delete of a directory known to be a > > slow operation in CephFS or have I hit upon some bad configuration? What > > steps can I take to better debug this scenario? > > Is it the actual unlink operation taking a long time, or just the > reduction in used space? Unlinks require a round trip to the MDS > unfortunately, but you should be able to speed things up at least some > by issuing them in parallel on different directories. > > If it's the used space, you can let the MDS issue more RADOS delete > ops by adjusting the "mds max purge files" and "mds max purge ops" > config values. > -Greg > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
On Fri, Oct 14, 2016 at 11:41 AM, Heller, Chriswrote: > Unfortunately, it was all in the unlink operation. Looks as if it took nearly > 20 hours to remove the dir, roundtrip is a killer there. What can be done to > reduce RTT to the MDS? Does the client really have to sequentially delete > directories or can it have internal batching or parallelization? It's bound by the same syscall APIs as anything else. You can spin off multiple deleters; I'd either keep them on one client (if you want to work within a single directory) or if using multiple clients assign them to different portions of the hierarchy. That will let you parallelize across the IO latency until you hit a cap on the MDS' total throughput (should be 1-10k deletes/s based on latest tests IIRC). -Greg > > -Chris > > On 10/13/16, 4:22 PM, "Gregory Farnum" wrote: > > On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris > wrote: > > I have a directory I’ve been trying to remove from cephfs (via > > cephfs-hadoop), the directory is a few hundred gigabytes in size and > > contains a few million files, but not in a single sub directory. I > startd > > the delete yesterday at around 6:30 EST, and it’s still progressing. I > can > > see from (ceph osd df) that the overall data usage on my cluster is > > decreasing, but at the rate its going it will be a month before the > entire > > sub directory is gone. Is a recursive delete of a directory known to be > a > > slow operation in CephFS or have I hit upon some bad configuration? What > > steps can I take to better debug this scenario? > > Is it the actual unlink operation taking a long time, or just the > reduction in used space? Unlinks require a round trip to the MDS > unfortunately, but you should be able to speed things up at least some > by issuing them in parallel on different directories. > > If it's the used space, you can let the MDS issue more RADOS delete > ops by adjusting the "mds max purge files" and "mds max purge ops" > config values. > -Greg > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] radowsg keystone integration in mitaka
Hi All, Recently upgraded from Kilo->Mitaka on my OpenStack deploy and now radowsgw nodes (jewel) are unable to validate keystone tokens. Initially I though it was because radowsgw relies on admin_token (which is a a bad idea, but ...) and that's now deperecated. I verified the token was still in keystone.conf and fixed it when I foun it had been commented out of keystone-paste.ini but even after fixing that and resarting my keystone I get: -- grep req-a5030a83-f265-4b25-b6e5-1918c978f824 /var/log/keystone/keystone.log 2016-10-14 15:12:47.631 35977 WARNING keystone.middleware.auth [req-a5030a83-f265-4b25-b6e5-1918c978f824 - - - - -] Deprecated: build_auth_context middleware checking for the admin token is deprecated as of the Mitaka release and will be removed in the O release. If your deployment requires use of the admin token, update keystone-paste.ini so that admin_token_auth is before build_auth_context in the paste pipelines, otherwise remove the admin_token_auth middleware from the paste pipelines. 2016-10-14 15:12:47.671 35977 INFO keystone.common.wsgi [req-a5030a83-f265-4b25-b6e5-1918c978f824 - - - - -] GET https://nimbus-1.csail.mit.edu:35358/v2.0/tokens/ 2016-10-14 15:12:47.672 35977 WARNING oslo_log.versionutils [req-a5030a83-f265-4b25-b6e5-1918c978f824 - - - - -] Deprecated: validate_token of the v2 API is deprecated as of Mitaka in favor of a similar function in the v3 API and may be removed in Q. 2016-10-14 15:12:47.684 35977 WARNING keystone.common.wsgi [req-a5030a83-f265-4b25-b6e5-1918c978f824 - - - - -] You are not authorized to perform the requested action: identity:validate_token I've dug through keystone/policy.json and identity:validate_token is authorized to "role:admin or is_admin:1" which I *think* should cover the token use case...but not 100% sure. Can radosgw use a propper keystone user so I can avoid the admin_token mess (http://docs.ceph.com/docs/jewel/radosgw/keystone/ seems to indicate no)? Or anyone see where in my keystone chain I might have dropped a link? Thanks, -Jon ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs slow delete
Unfortunately, it was all in the unlink operation. Looks as if it took nearly 20 hours to remove the dir, roundtrip is a killer there. What can be done to reduce RTT to the MDS? Does the client really have to sequentially delete directories or can it have internal batching or parallelization? -Chris On 10/13/16, 4:22 PM, "Gregory Farnum"wrote: On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris wrote: > I have a directory I’ve been trying to remove from cephfs (via > cephfs-hadoop), the directory is a few hundred gigabytes in size and > contains a few million files, but not in a single sub directory. I startd > the delete yesterday at around 6:30 EST, and it’s still progressing. I can > see from (ceph osd df) that the overall data usage on my cluster is > decreasing, but at the rate its going it will be a month before the entire > sub directory is gone. Is a recursive delete of a directory known to be a > slow operation in CephFS or have I hit upon some bad configuration? What > steps can I take to better debug this scenario? Is it the actual unlink operation taking a long time, or just the reduction in used space? Unlinks require a round trip to the MDS unfortunately, but you should be able to speed things up at least some by issuing them in parallel on different directories. If it's the used space, you can let the MDS issue more RADOS delete ops by adjusting the "mds max purge files" and "mds max purge ops" config values. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Even data distribution across OSD - Impossible Achievement?
Hi all, after encountering a warning about one of my OSDs running out of space i tried to study better how data distribution works. I'm running a Hammer Ceph cluster v. 0.94.7 I did some test with crushtool trying to figure out how to achieve even data distribution across OSDs. Let's take this simple CRUSH MAP: # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable straw_calc_version 1 tunable chooseleaf_vary_r 1 # devices # ceph-osd-001 device 0 osd.0 # sata-p device 1 osd.1 # sata-p device 3 osd.3 # sata-p device 4 osd.4 # sata-p device 5 osd.5 # sata-p device 7 osd.7 # sata-p device 9 osd.9 # sata-p device 10 osd.10 # sata-p device 11 osd.11 # sata-p device 13 osd.13 # sata-p # ceph-osd-002 device 14 osd.14 # sata-p device 15 osd.15 # sata-p device 16 osd.16 # sata-p device 18 osd.18 # sata-p device 19 osd.19 # sata-p device 21 osd.21 # sata-p device 23 osd.23 # sata-p device 24 osd.24 # sata-p device 25 osd.25 # sata-p device 26 osd.26 # sata-p # ceph-osd-003 device 28 osd.28 # sata-p device 29 osd.29 # sata-p device 30 osd.30 # sata-p device 31 osd.31 # sata-p device 32 osd.32 # sata-p device 33 osd.33 # sata-p device 34 osd.34 # sata-p device 35 osd.35 # sata-p device 36 osd.36 # sata-p device 41 osd.41 # sata-p # types type 0 osd type 1 server type 3 datacenter # buckets ### CEPH-OSD-003 ### server ceph-osd-003-sata-p { id -12 alg straw hash 0 # rjenkins1 item osd.28 weight 1.000 item osd.29 weight 1.000 item osd.30 weight 1.000 item osd.31 weight 1.000 item osd.32 weight 1.000 item osd.33 weight 1.000 item osd.34 weight 1.000 item osd.35 weight 1.000 item osd.36 weight 1.000 item osd.41 weight 1.000 } ### CEPH-OSD-002 ### server ceph-osd-002-sata-p { id -9 alg straw hash 0 # rjenkins1 item osd.14 weight 1.000 item osd.15 weight 1.000 item osd.16 weight 1.000 item osd.18 weight 1.000 item osd.19 weight 1.000 item osd.21 weight 1.000 item osd.23 weight 1.000 item osd.24 weight 1.000 item osd.25 weight 1.000 item osd.26 weight 1.000 } ### CEPH-OSD-001 ### server ceph-osd-001-sata-p { id -5 alg straw hash 0 # rjenkins1 item osd.0 weight 1.000 item osd.1 weight 1.000 item osd.3 weight 1.000 item osd.4 weight 1.000 item osd.5 weight 1.000 item osd.7 weight 1.000 item osd.9 weight 1.000 item osd.10 weight 1.000 item osd.11 weight 1.000 item osd.13 weight 1.000 } # DATACENTER datacenter dc1 { id -1 alg straw hash 0 # rjenkins1 item ceph-osd-001-sata-p weight 10.000 item ceph-osd-002-sata-p weight 10.000 item ceph-osd-003-sata-p weight 10.000 } # rules rule sata-p { ruleset 0 type replicated min_size 2 max_size 10 step take dc1 step chooseleaf firstn 0 type server step emit } # end crush map Basically it's 30 OSDs spanned across 3 servers. One rule exists, the classic replica-3 cephadm@cephadm01:/etc/ceph/$ crushtool -i crushprova.c --test --show-utilization --num-rep 3 --tree --max-x 1 ID WEIGHT TYPE NAME -1 30.0 datacenter milano1 -5 10.0 server ceph-osd-001-sata-p 0 1.0 osd.0 1 1.0 osd.1 3 1.0 osd.3 4 1.0 osd.4 5 1.0 osd.5 7 1.0 osd.7 9 1.0 osd.9 10 1.0 osd.10 11 1.0 osd.11 13 1.0 osd.13 -9 10.0 server ceph-osd-002-sata-p 14 1.0 osd.14 15 1.0 osd.15 16 1.0 osd.16 18 1.0 osd.18 19 1.0 osd.19 21 1.0 osd.21 23 1.0 osd.23 24 1.0 osd.24 25 1.0 osd.25 26 1.0 osd.26 -12 10.0 server ceph-osd-003-sata-p 28 1.0 osd.28 29 1.0 osd.29 30 1.0 osd.30 31 1.0 osd.31 32 1.0 osd.32 33 1.0 osd.33 34 1.0 osd.34 35 1.0 osd.35 36 1.0 osd.36 41 1.0 osd.41 rule 0 (sata-performance), x = 0..1023, numrep = 3..3 rule 0 (sata-performance) num_rep 3 result size == 3: 1024/1024 device 0: stored : 95 expected : 102.49 device 1: stored : 95 expected : 102.49 device 3: stored : 104 expected : 102.49 device 4: stored : 95 expected : 102.49 device 5: stored : 110 expected : 102.49 device 7: stored : 111 expected : 102.49 device 9: stored : 106 expected : 102.49 device 10: stored : 97 expected : 102.49 device 11: stored : 105 expected : 102.49 device 13: stored : 106 expected : 102.49 device 14: stored : 107 expected : 102.49 device 15: stored : 107 expected : 102.49 device 16: stored : 101 expected : 102.49 device 18: stored : 93 expected : 102.49 device 19: stored : 102 expected : 102.49 device 21: stored : 112 expected : 102.49 device 23: stored : 115 expected : 102.49 device 24: stored : 95 expected : 102.49 device 25: stored : 98 expected : 102.49 device 26: stored : 94 expected : 102.49 device 28: stored : 92 expected : 102.49 device 29: stored : 87 expected : 102.49 device 30: stored : 109 expected : 102.49
Re: [ceph-users] resolve split brain situation in ceph cluster
On Fri, Oct 14, 2016 at 7:27 AM, Manuel Lauschwrote: > Hi, > > I need some help to fix a broken cluster. I think we broke the cluster, but > I want to know your opinion and if you see a possibility to recover it. > > Let me explain what happend. > > We have a cluster (Version 0.94.9) in two datacenters (A and B). In each 12 > nodes á 60 ODSs. In A we have 3 monitor nodes and in B 2. The crushrule and > replication factor forces two replicas in each datacenter. > > We write objects via librados in the cluster. The objects are immutable, so > they are either present or absent. > > In this cluster we tested what happens if datacenter A will fail and we need > to bring up the cluster in B by creating a monitor quorum in B. We did this > by cut off the network connection betwenn the two datacenters. The OSDs from > DC B went down like expected. Now we removed the mon Nodes from the monmap > in B (by extracting it offline and edit it). Our clients wrote now data in > both independent clusterparts before we stopped the mons in A. (YES I know. > This is a really bad thing). This story line seems to be missing some points. How did you cut off the network connection? What leads you to believe the OSDs accepted writes on both sides of the split? Did you edit the monmap in both data centers, or just DC A (that you wanted to remain alive)? What monitor counts do you have in each DC? -Greg > > Now we try to join the two sides again. But so far without success. > > Only the OSDs in B are running. The OSDs in A started but the OSDs stay > down. In the mon log we see a lot of „...(leader).pg v3513957 ignoring stats > from non-active osd“ alerts. > > We see, that the current osdmap epoch in the running cluster is „28873“. In > the OSDs in A the epoch is „29003“. We assume that this is the reason why > the OSDs won't to jump in. > > > BTW: This is only a testcluster, so no important data are harmed. > > > Regards > Manuel > > > -- > Manuel Lausch > > Systemadministrator > Cloud Services > > 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135 > Karlsruhe | Germany > Phone: +49 721 91374-1847 > E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de > > Amtsgericht Montabaur, HRB 5452 > > Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen > > > Member of United Internet > > Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen > enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese > E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und > vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten > ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt > auf welche Weise auch immer zu verwenden. > > This e-mail may contain confidential and/or privileged information. If you > are not the intended recipient of this e-mail, you are hereby notified that > saving, distribution or use of the content of this e-mail in any way is > prohibited. If you have received this e-mail in error, please notify the > sender and delete the e-mail. > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] resolve split brain situation in ceph cluster
Hi, I need some help to fix a broken cluster. I think we broke the cluster, but I want to know your opinion and if you see a possibility to recover it. Let me explain what happend. We have a cluster (Version 0.94.9) in two datacenters (A and B). In each 12 nodes á 60 ODSs. In A we have 3 monitor nodes and in B 2. The crushrule and replication factor forces two replicas in each datacenter. We write objects via librados in the cluster. The objects are immutable, so they are either present or absent. In this cluster we tested what happens if datacenter A will fail and we need to bring up the cluster in B by creating a monitor quorum in B. We did this by cut off the network connection betwenn the two datacenters. The OSDs from DC B went down like expected. Now we removed the mon Nodes from the monmap in B (by extracting it offline and edit it). Our clients wrote now data in both independent clusterparts before we stopped the mons in A. (YES I know. This is a really bad thing). Now we try to join the two sides again. But so far without success. Only the OSDs in B are running. The OSDs in A started but the OSDs stay down. In the mon log we see a lot of „...(leader).pg v3513957 ignoring stats from non-active osd“ alerts. We see, that the current osdmap epoch in the running cluster is „28873“. In the OSDs in A the epoch is „29003“. We assume that this is the reason why the OSDs won't to jump in. BTW: This is only a testcluster, so no important data are harmed. Regards Manuel -- Manuel Lausch Systemadministrator Cloud Services 1&1 Mail & Media Development & Technology GmbH | Brauerstraße 48 | 76135 Karlsruhe | Germany Phone: +49 721 91374-1847 E-Mail: manuel.lau...@1und1.de | Web: www.1und1.de Amtsgericht Montabaur, HRB 5452 Geschäftsführer: Frank Einhellinger, Thomas Ludwig, Jan Oetjen Member of United Internet Diese E-Mail kann vertrauliche und/oder gesetzlich geschützte Informationen enthalten. Wenn Sie nicht der bestimmungsgemäße Adressat sind oder diese E-Mail irrtümlich erhalten haben, unterrichten Sie bitte den Absender und vernichten Sie diese E-Mail. Anderen als dem bestimmungsgemäßen Adressaten ist untersagt, diese E-Mail zu speichern, weiterzuleiten oder ihren Inhalt auf welche Weise auch immer zu verwenden. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient of this e-mail, you are hereby notified that saving, distribution or use of the content of this e-mail in any way is prohibited. If you have received this e-mail in error, please notify the sender and delete the e-mail. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Calc the nuber of shards needed for a pucket
Hi, I like to know if someone of you have some kind of a formula to set the right number of shards for a bucket. We have currently a Bucket with 30M objects and expect that it will go up to 50M. At the moment we have 64 Shards configured, but it was told me that this is much to less. Any hints / Formulars for me, thanks Ansgar ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"
On 16-10-13 22:46, Chris Murray wrote: On 13/10/2016 11:49, Henrik Korkuc wrote: Is apt/dpkg doing something now? Is problem repeatable, e.g. by killing upgrade and starting again. Are there any stuck systemctl processes? I had no problems upgrading 10.2.x clusters to 10.2.3 On 16-10-13 13:41, Chris Murray wrote: On 22/09/2016 15:29, Chris Murray wrote: Hi all, Might anyone be able to help me troubleshoot an "apt-get dist-upgrade" which is stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"? I'm upgrading from 10.2.2. The two OSDs on this node are up, and think they are version 10.2.3, but the upgrade doesn't appear to be finishing ... ? Thank you in advance, Chris Hi, Are there possibly any pointers to help troubleshoot this? I've got a test system on which the same thing has happened. The cluster's status is "HEALTH_OK" before starting. I'm running Debian Jessie. dpkg.log only has the following: 2016-10-13 11:37:25 configure ceph-osd:amd64 10.2.3-1~bpo80+1 2016-10-13 11:37:25 status half-configured ceph-osd:amd64 10.2.3-1~bpo80+1 At this point, the ugrade gets stuck and doesn't go any further. Where could I look for the next clue? Thanks, Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Thank you Henrik, I see it's a systemctl process that's stuck. It is reproducible for me on every run of dpkg --configure -a And, indeed, reproducible across two separate machines. I'll pursue the stuck "/bin/systemctl start ceph-osd.target". you can try to check if systemctl daemon-rexec helps to solve this problem. I couldn't find a link quickly but it seems that Jessie systemd sometomes manages to get stuck on systemctl calls. Thanks again, Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com