Re: [ceph-users] Can't upgrade to MDS version 12.2.8

2018-09-02 Thread Yan, Zheng
On Mon, Sep 3, 2018 at 1:57 AM Marlin Cremers wrote: > > Hey there, > > So I now have a problem since none of my MDSes can start anymore. > > They are stuck in the resolve state since Ceph things there are still MDSes > alive which I can see when I run: > need mds log to check why mds are

[ceph-users] Packages for debian in Ceph repo

2018-09-02 Thread aradian
Can anyone confirm if the Ceph repos for Debian/Ubuntu contain packages for Debian? I'm not seeing any, but maybe I'm missing something... I'm seeing ceph-deploy install an older version of ceph on the nodes (from the Debian repo) and then failing when I run "ceph-deploy osd ..." because ceph-

Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread David Wahler
On Sun, Sep 2, 2018 at 1:31 PM Alfredo Deza wrote: > > On Sun, Sep 2, 2018 at 12:00 PM, David Wahler wrote: > > Ah, ceph-volume.log pointed out the actual problem: > > > > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path > > or an existing device is needed > > That is odd,

Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread Alfredo Deza
On Sun, Sep 2, 2018 at 12:00 PM, David Wahler wrote: > Ah, ceph-volume.log pointed out the actual problem: > > RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path > or an existing device is needed That is odd, is it possible that the error log wasn't the one that matched what

Re: [ceph-users] Understanding the output of dump_historic_ops

2018-09-02 Thread ceph
Hi Ronni, Am 2. September 2018 13:32:05 MESZ schrieb Ronnie Lazar : >Hello, > >I'm trying to understand the output of the dump_historic_ops admin sock >command. >I can't find information on what are the meaning of the different >states >that an OP can be in. >For example, in the following

Re: [ceph-users] Can't upgrade to MDS version 12.2.8

2018-09-02 Thread Marlin Cremers
Hey there, So I now have a problem since none of my MDSes can start anymore. They are stuck in the resolve state since Ceph things there are still MDSes alive which I can see when I run: ceph mds deactivate k8s:0 Error EEXIST: mds.4:0 not active (???) ceph mds deactivate k8s:1 Error EEXIST:

Re: [ceph-users] Help Basically..

2018-09-02 Thread David Turner
Agreed on not going the disks until your cluster is healthy again. Making them out and seeing how healthy you can get in the meantime is a good idea. On Sun, Sep 2, 2018, 1:18 PM Ronny Aasen wrote: > On 02.09.2018 17:12, Lee wrote: > > Should I just out the OSD's first or completely zap them

Re: [ceph-users] Help Basically..

2018-09-02 Thread Ronny Aasen
On 02.09.2018 17:12, Lee wrote: Should I just out the OSD's first or completely zap them and recreate? Or delete and let the cluster repair itself? On the second node when it started back up I had problems with the Journals for ID 5 and 7 they were also recreated all the rest are still the

[ceph-users] Profession Support Required

2018-09-02 Thread Lee
Hi, I follow and use your articles regularly to help with our Ceph environment, I am looking for urgent help with our infrastructure after a series of outages over the weekend has ground our ceph environment to its knees. The system is .0.94.5 and deployed as part of open stack. In the series

Re: [ceph-users] Help Basically..

2018-09-02 Thread Lee
Ok, rather than going gunhoe at this.. 1. I have set out, 31,24,21,18,15,14,13,6 and 7,5 (10 is a new OSD) Which gives me ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 23.65970 root default -5 8.18990 host data33-a4 13 0.90999 osd.13 up0

Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread David Wahler
Ah, ceph-volume.log pointed out the actual problem: RuntimeError: Cannot use device (/dev/storage/bluestore). A vg/lv path or an existing device is needed When I changed "--data /dev/storage/bluestore" to "--data storage/bluestore", everything worked fine. I agree that the ceph-deploy logs are

Re: [ceph-users] Help Basically..

2018-09-02 Thread Lee
Should I just out the OSD's first or completely zap them and recreate? Or delete and let the cluster repair itself? On the second node when it started back up I had problems with the Journals for ID 5 and 7 they were also recreated all the rest are still the originals. I know that some PG's are

Re: [ceph-users] Help Basically..

2018-09-02 Thread David Turner
The problem is with never getting a successful run of `ceph-osd --flush-journal` on the old SSD journal drive. All of the OSDs that used the dead journal need to be removed from the cluster, wiped, and added back in. The data on them is not 100% consistent because the old journal died. Any word

Re: [ceph-users] Help Basically..

2018-09-02 Thread Lee
I followed: $ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid) $ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal' --partition-guid=1:$journal_uuid --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk Then $ sudo ceph-osd --mkjournal -i 20 $ sudo

Re: [ceph-users] Help Basically..

2018-09-02 Thread Lee
> > > Hi David, > > Yes heath detail outputs all the errors etc and recovery / backfill is > going on, just taking time 25% misplaced and 1.5 degraded. > > I can list out the pools and see sizes etc.. > > My main problem is I have no client IO from a read perspective, I cannot > start vms I'm

Re: [ceph-users] Help Basically..

2018-09-02 Thread Lee
Hi David, Yes heath detail outputs all the errors etc and recovery / backfill is going on, just taking time 25% misplaced and 1.5 degraded. I can list out the pools and see sizes etc.. My main problem is I have no client IO from a read perspective, I cannot start vms I'm openstack and ceph -w

Re: [ceph-users] Help Basically..

2018-09-02 Thread David Turner
When the first node went offline with a dead SSD journal, all of the dates on the OSDs was useless. Unless you could flush the journals, you can't guarantee that a wire the cluster think happened actually made it to the disk. The proper procedure here is to remove those OSDs and add them again as

Re: [ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

2018-09-02 Thread Jack
Well, you have more than one pool here pg_num = 8, size = 3 -> 24 pgs The extra 48 pgs comes from somewhere else About the pg's distribution, check out the balancer module tldr: that distribution is computed based on an algorithm, it is thus predictable (that is the point) but the perfect

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Brett Chancellor
The warnings look like this. 6 ops are blocked > 32.768 sec on osd.219 1 osds have slow requests On Sun, Sep 2, 2018, 8:45 AM Alfredo Deza wrote: > On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor > wrote: > > Hi Cephers, > > I am in the process of upgrading a cluster from Filestore to

Re: [ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

2018-09-02 Thread Marc Roos
So that changes the question to: why is ceph not distributing the pg's evenly across four osd's? [@c01 ~]# ceph osd df |egrep '^19|^20|^21|^30' 19 ssd 0.48000 1.0 447G 133G 313G 29.81 0.70 16 20 ssd 0.48000 1.0 447G 158G 288G 35.40 0.83 19 21 ssd 0.48000 1.0

[ceph-users] Can't upgrade to MDS version 12.2.8

2018-09-02 Thread Marlin Cremers
Hello there, I've tried setting up some MDS VMs with version 12.2.8 but they are unable to replay which appears to be caused by an error on the monitors. 2018-09-01 18:52:39.101001 7fb7c4c4f700  1 mon.mon2@1(peon).mds e3320 mds mds.? 10.14.4.62:6800/605610442 can't write to fsmap

Re: [ceph-users] Slow requests from bluestore osds

2018-09-02 Thread Alfredo Deza
On Sat, Sep 1, 2018 at 12:45 PM, Brett Chancellor wrote: > Hi Cephers, > I am in the process of upgrading a cluster from Filestore to bluestore, > but I'm concerned about frequent warnings popping up against the new > bluestore devices. I'm frequently seeing messages like this, although the >

Re: [ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread Alfredo Deza
There should be useful logs from ceph-volume in /var/log/ceph/ceph-volume.log that might show a bit more here. I would also try the command that fails directly on the server (sans ceph-deploy) to see what is it that is actually failing. Seems like the ceph-deploy log output is a bit out of order

Re: [ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

2018-09-02 Thread Jack
ceph osd df will get you more information: variation & pg number for each OSD Ceph does not spread object on a per-object basis, but on a pg-basis The data repartition is thus not perfect You may increase your pg_num, and/or use the mgr balancer module

Re: [ceph-users] Help Basically..

2018-09-02 Thread David C
Does "ceph health detail" work? Have you manually confirmed the OSDs on the nodes are working? What was the replica size of the pools? Are you seeing any progress with the recovery? On Sun, Sep 2, 2018 at 9:42 AM Lee wrote: > Running 0.94.5 as part of a Openstack enviroment, our ceph setup is

[ceph-users] Understanding the output of dump_historic_ops

2018-09-02 Thread Ronnie Lazar
Hello, I'm trying to understand the output of the dump_historic_ops admin sock command. I can't find information on what are the meaning of the different states that an OP can be in. For example, in the following excerpt: { "description": "MOSDPGPush(1.a5 421/239

[ceph-users] 3x replicated rbd pool ssd data spread across 4 osd's

2018-09-02 Thread Marc Roos
If I have only one rbd ssd pool, 3 replicated, and 4 ssd osd's. Why are these objects so unevenly spread across the four osd's? Should they all not have 162G? [@c01 ]# ceph osd status 2>&1 ++--+---+---++-++-+- --+ | id | host | used |

[ceph-users] No announce for 12.2.8 / available in repositories

2018-09-02 Thread Nicolas Huillard
Hi all, I just noticed that 12.2.8 was available on the repositories, without any announce. Since upgrading to unannounced 12.2.6 was a bad idea, I'll wait a bit anyway ;-) Where can I find info on this bugfix release ? Nothing there : http://lists.ceph.com/pipermail/ceph-announce-ceph.com/ TIA

[ceph-users] Help Basically..

2018-09-02 Thread Lee
Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x OSD Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting enviroment, 1 OSD node failed (offline with a the journal SSD dead) left with 2 nodes running correctly, 2 hours later a second OSD node failed complaining

[ceph-users] "no valid command found" when running "ceph-deploy osd create"

2018-09-02 Thread David Wahler
Hi all, I'm attempting to get a small Mimic cluster running on ARM, starting with a single node. Since there don't seem to be any Debian ARM64 packages in the official Ceph repository, I had to build from source, which was fairly straightforward. After installing the .deb packages that I built