Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread Ranjan Ghosh
Ah, I understand now. Makes a lot of sense. Well, we have a LOT of small
files so that might be the reason. I'll keep an eye on it whether the
message shows up again.

Thank you!

Ranjan


Am 05.12.19 um 19:40 schrieb Patrick Donnelly:
> On Thu, Dec 5, 2019 at 9:45 AM Ranjan Ghosh  wrote:
>> Ah, that seems to have fixed it. Hope it stays that way. I've raised it
>> to 4 GB. Thanks to you both!
> Just be aware the warning could come back. You just moved the goal posts.
>
> The 1GB default is probably too low for most deployments, I have a PR
> to increase this: https://github.com/ceph/ceph/pull/32042
>
>> Although I have to say that the message is IMHO *very* misleading: "1
>> MDSs report oversized cache" sounds to me like the cache is too large
>> (i.e. wasting RAM unnecessarily). Shouldn't the message rather be "1
>> MDSs report *undersized* cache"? Weird.
> No. I means the MDS cache is larger than its target. This means the
> MDS cannot trim its cache to go back under the limit. This could be
> for many reasons but probably due to clients not releasing
> capabilities, perhaps due to a bug.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread Ranjan Ghosh
Hi,

Ah, that seems to have fixed it. Hope it stays that way. I've raised it
to 4 GB. Thanks to you both!

Although I have to say that the message is IMHO *very* misleading: "1
MDSs report oversized cache" sounds to me like the cache is too large
(i.e. wasting RAM unnecessarily). Shouldn't the message rather be "1
MDSs report *undersized* cache"? Weird.

That's why I was wondering how small it should be to make Ceph happy but
still be sufficient. If I had known that this message meant that the
cache is too small, then I would have obviously just raised it until the
message disappeared.

Thanks again for your help! Much appreciated.

BR

Ranjan


Am 05.12.19 um 16:47 schrieb Nathan Fish:
> MDS cache size scales with the number of files recently opened by
> clients. if you have RAM to spare, increase "mds cache memory limit".
> I have raised mine from the default of 1GiB to 32GiB. My rough
> estimate is 2.5kiB per inode in recent use.
>
>
> On Thu, Dec 5, 2019 at 10:39 AM Ranjan Ghosh  wrote:
>> Okay, now, after I settled the issue with the oneshot service thanks to
>> the amazing help of Paul and Richard (thanks again!), I still wonder:
>>
>> What could I do about that MDS warning:
>>
>> ===
>>
>> health: HEALTH_WARN
>>
>> 1 MDSs report oversized cache
>>
>> ===
>>
>> If anybody has any ideas? I tried googling it, of course, but came up
>> with no really relevant info on how to actually solve this.
>>
>>
>> BR
>>
>> Ranjan
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN 1 MDSs report oversized cache

2019-12-05 Thread Ranjan Ghosh
Okay, now, after I settled the issue with the oneshot service thanks to
the amazing help of Paul and Richard (thanks again!), I still wonder:

What could I do about that MDS warning:

===

health: HEALTH_WARN

1 MDSs report oversized cache

===

If anybody has any ideas? I tried googling it, of course, but came up
with no really relevant info on how to actually solve this.


BR

Ranjan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What does the ceph-volume@simple-crazyhexstuff SystemD service do? And what to do about oversized MDS cache?

2019-12-05 Thread Ranjan Ghosh
Hi Richard,

Ah, I think I understand, now, brilliant. It's *supposed* to do exactly
that. Mount it once on boot and then just exit. So everything is working
as intended. Great.

Thanks

Ranjan


Am 05.12.19 um 15:18 schrieb Richard:
> On 2019-12-05 7:19 AM, Ranjan Ghosh wrote:
>> Why is my service marked inactivate/dead? Shouldn't it be running?
>
> Look up systemd service type "one shot".  The service did it's job of
> performing the mount and has now exited.
>
> Systemd is a beast.  It does many things.  A service isn't a daemon.
> It's different than BSD-init or SYS-v init.
>
> Congrats on your upgrade, btw!  good job!
>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What does the ceph-volume@simple-crazyhexstuff SystemD service do? And what to do about oversized MDS cache?

2019-12-05 Thread Ranjan Ghosh
Hi Paul,

thanks for the explanation. I didn't know about the JSON file yet.
That's certainly good to know. What I still don't understand, though:
Why is my service marked inactivate/dead? Shouldn't it be running?

If I run:

systemctl start
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service

nothing seems to happen:

===

root@yak1 /etc/ceph/osd # systemctl status
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service
● ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service -
Ceph Volume activation: simple-0-6585a10b-917f-4458-a464-b4dd729ef174
   Loaded: loaded (/lib/systemd/system/ceph-volume@.service; enabled;
vendor preset: enabled)
   Active: inactive (dead) since Thu 2019-12-05 14:14:08 CET; 2min 13s ago
 Main PID: 27281 (code=exited, status=0/SUCCESS)

Dec 05 14:14:08 yak1 systemd[1]: Starting Ceph Volume activation:
simple-0-6585a10b-917f-4458-a464-b4dd729ef174...
Dec 05 14:14:08 yak1 systemd[1]:
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service:
Current command vanished from the unit file, execution of the command
Dec 05 14:14:08 yak1 sh[27281]: Running command: /usr/sbin/ceph-volume
simple trigger 0-6585a10b-917f-4458-a464-b4dd729ef174
Dec 05 14:14:08 yak1 systemd[1]:
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service:
Succeeded.
Dec 05 14:14:08 yak1 systemd[1]: Started Ceph Volume activation:
simple-0-6585a10b-917f-4458-a464-b4dd729ef174.

===

It says status=0/SUCCESS and in the log "Succeeded". But then again why
is "Started Ceph Volume activation" the last log entry. It sounds like
sth. is unfinished.

The mount point seems to be mounted perfectly, though:

/dev/sdb1 on /var/lib/ceph/osd/ceph-0 type xfs
(rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,noquota)

Shouldn't that service be running continually?


BR

Ranjan


Am 05.12.19 um 13:25 schrieb Paul Emmerich:
> The ceph-volume services make sure that the right partitions are
> mounted at /var/lib/ceph/osd/ceph-X
>
> In "simple" mode the service gets the necessary information from a
> json file (long-hex-string.json) in /etc/ceph
>
> ceph-volume simple scan/activate create the json file and systemd unit.
>
> ceph-disk used udev instead for the activation which was *very* messy
> and a frequent cause of long startup delays (seen > 40 minutes on
> encrypted ceph-disk OSDs)
>
> Paul
>
> -- 
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io <http://www.croit.io>
> Tel: +49 89 1896585 90
>
>
> On Thu, Dec 5, 2019 at 1:03 PM Ranjan Ghosh  <mailto:gh...@pw6.de>> wrote:
>
> Hi all,
>
> After upgrading to Ubuntu 19.10 and consequently from Mimic to
> Nautilus, I had a mini-shock when my OSDs didn't come up. Okay, I
> should have read the docs more closely, I had to do:
>
> # ceph-volume simple scan /dev/sdb1
>
> # ceph-volume simple activate --all
>
> Hooray. The OSDs came back to life. And I saw that some weird
> services were created. Didn't give that much thought at first, but
> later I noticed there is now a new service in town:
>
> ===
>
> root@yak1 ~ # systemctl status
> ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service
> <mailto:ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service>
>
> ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service
> <mailto:ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service>
> - Ceph Volume activation:
> simple-0-6585a10b-917f-4458-a464-b4dd729ef174
>    Loaded: loaded (/lib/systemd/system/ceph-volume@.service;
> enabled; vendor preset: enabled)
>    Active: inactive (dead) since Wed 2019-12-04 23:29:15 CET; 13h ago
>  Main PID: 10048 (code=exited, status=0/SUCCESS)
>
> ===
>
> Hmm. It's dead. But my cluster is alive & kicking, though.
> Everything is working. Why is this needed? Should I be worried? Or
> can I safely delete that service from /etc/systemd/... since it's
> not running anyway?
>
> Another, probably minor issue:
>
> I still get a HEALTH_WARN "1 MDSs report oversized cache". But it
> doesn't tell me any details and I cannot find anything in the
> logs. What should I do to resolve this? Set
> mds_cache_memory_limit? How do I determine an acceptable value?
>
>
> Thank you / Best regards
>
> Ranjan
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] What does the ceph-volume@simple-crazyhexstuff SystemD service do? And what to do about oversized MDS cache?

2019-12-05 Thread Ranjan Ghosh
Hi all,

After upgrading to Ubuntu 19.10 and consequently from Mimic to Nautilus,
I had a mini-shock when my OSDs didn't come up. Okay, I should have read
the docs more closely, I had to do:

# ceph-volume simple scan /dev/sdb1

# ceph-volume simple activate --all

Hooray. The OSDs came back to life. And I saw that some weird services
were created. Didn't give that much thought at first, but later I
noticed there is now a new service in town:

===

root@yak1 ~ # systemctl status
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service
ceph-volume@simple-0-6585a10b-917f-4458-a464-b4dd729ef174.service - Ceph
Volume activation: simple-0-6585a10b-917f-4458-a464-b4dd729ef174
   Loaded: loaded (/lib/systemd/system/ceph-volume@.service; enabled;
vendor preset: enabled)
   Active: inactive (dead) since Wed 2019-12-04 23:29:15 CET; 13h ago
 Main PID: 10048 (code=exited, status=0/SUCCESS)

===

Hmm. It's dead. But my cluster is alive & kicking, though. Everything is
working. Why is this needed? Should I be worried? Or can I safely delete
that service from /etc/systemd/... since it's not running anyway?

Another, probably minor issue:

I still get a HEALTH_WARN "1 MDSs report oversized cache". But it
doesn't tell me any details and I cannot find anything in the logs. What
should I do to resolve this? Set mds_cache_memory_limit? How do I
determine an acceptable value?


Thank you / Best regards

Ranjan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HEALTH_WARN - 3 modules have failed dependencies

2019-05-01 Thread Ranjan Ghosh

Ah, after researching some more I think I got hit by this bug:

https://github.com/ceph/ceph/pull/25585

At least that's exactly what I see in the logs: "Interpreter change 
detected - this module can only be loaded into one interpreter per process."


Ceph modules don't seem to work at all with the newest Ubuntu version. 
Only one module can be loaded. Sad :-(


Hope this will be fixed soon...


Am 30.04.19 um 21:18 schrieb Ranjan Ghosh:


Hi my beloved Ceph list,

After an upgrade from Ubuntu Cosmic to Ubuntu Disco (and according 
Ceph packages updated from 13.2.2 to 13.2.4), I now get this when I 
enter "ceph health":


HEALTH_WARN 3 modules have failed dependencies

"ceph mgr module ls" only reports those 3 modules enabled:

"enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
...

Then I found this page here:

docs.ceph.com/docs/master/rados/operations/health-checks

Under "MGR_MODULE_DEPENDENCY" it says:

"An enabled manager module is failing its dependency check. This 
health check should come with an explanatory message from the module 
about the problem."


What is "this health check"? If the page talks about "ceph health" or 
"ceph -s" then, no, there is no explanatory message there on what's wrong.


Furthermore, it says:

"This health check is only applied to enabled modules. If a module is 
not enabled, you can see whether it is reporting dependency issues in 
the output of ceph module ls."


The command "ceph module ls", however, doesn't exist. If "ceph mgr 
module ls" is really meant, then I get this:


{
    "enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
    "disabled_modules": [
    {
    "name": "balancer",
    "can_run": true,
    "error_string": ""
    },
    {
    "name": "hello",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "influx",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "iostat",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "localpool",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "prometheus",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "selftest",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "smart",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "telegraf",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "telemetry",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "zabbix",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    }
    ]
}

Usually the Ceph documentation is great, very detailed and helpful. 
But I can find nothing on how to resolve this problem. Any help is 
much appreciated.


Thank you / Best regards

Ranjan





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] HEALTH_WARN - 3 modules have failed dependencies

2019-04-30 Thread Ranjan Ghosh

Hi my beloved Ceph list,

After an upgrade from Ubuntu Cosmic to Ubuntu Disco (and according Ceph 
packages updated from 13.2.2 to 13.2.4), I now get this when I enter 
"ceph health":


HEALTH_WARN 3 modules have failed dependencies

"ceph mgr module ls" only reports those 3 modules enabled:

"enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
...

Then I found this page here:

docs.ceph.com/docs/master/rados/operations/health-checks

Under "MGR_MODULE_DEPENDENCY" it says:

"An enabled manager module is failing its dependency check. This health 
check should come with an explanatory message from the module about the 
problem."


What is "this health check"? If the page talks about "ceph health" or 
"ceph -s" then, no, there is no explanatory message there on what's wrong.


Furthermore, it says:

"This health check is only applied to enabled modules. If a module is 
not enabled, you can see whether it is reporting dependency issues in 
the output of ceph module ls."


The command "ceph module ls", however, doesn't exist. If "ceph mgr 
module ls" is really meant, then I get this:


{
    "enabled_modules": [
    "dashboard",
    "restful",
    "status"
    ],
    "disabled_modules": [
    {
    "name": "balancer",
    "can_run": true,
    "error_string": ""
    },
    {
    "name": "hello",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "influx",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "iostat",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "localpool",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "prometheus",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "selftest",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "smart",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "telegraf",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "telemetry",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    },
    {
    "name": "zabbix",
    "can_run": false,
    "error_string": "Interpreter change detected - this module 
can only be loaded into one interpreter per process."

    }
    ]
}

Usually the Ceph documentation is great, very detailed and helpful. But 
I can find nothing on how to resolve this problem. Any help is much 
appreciated.


Thank you / Best regards

Ranjan




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Urgent: Reduced data availability / All pgs inactive

2019-02-21 Thread Ranjan Ghosh

Wow. Thank you so much Irek! Your help saved me from a lot of trouble...

It turned out to be indeed a firewall issue. Port 6800 in one direction 
wasn't open.



Am 21.02.19 um 07:05 schrieb Irek Fasikhov:

Hi,

You have problems with MRG.
http://docs.ceph.com/docs/master/rados/operations/pg-states/
/The ceph-mgr hasn’t yet received any information about the PG’s state 
from an OSD since mgr started up./


чт, 21 февр. 2019 г. в 09:04, Irek Fasikhov <mailto:malm...@gmail.com>>:


Hi,

You have problems with MRG.
http://docs.ceph.com/docs/master/rados/operations/pg-states/
/The ceph-mgr hasn’t yet received any information about the PG’s
state from an OSD since mgr started up./


ср, 20 февр. 2019 г. в 23:10, Ranjan Ghosh mailto:gh...@pw6.de>>:

Hi all,

hope someone can help me. After restarting a node of my
2-node-cluster suddenly I get this:

root@yak2 /var/www/projects # ceph -s
  cluster:
    id: 749b2473-9300-4535-97a6-ee6d55008a1b
    health: HEALTH_WARN
    Reduced data availability: 200 pgs inactive

  services:
    mon: 3 daemons, quorum yak1,yak2,yak0
    mgr: yak0.planwerk6.de <http://yak0.planwerk6.de>(active),
standbys: yak1.planwerk6.de <http://yak1.planwerk6.de>,
yak2.planwerk6.de <http://yak2.planwerk6.de>
    mds: cephfs-1/1/1 up  {0=yak1.planwerk6.de
<http://yak1.planwerk6.de>=up:active}, 1 up:standby
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   2 pools, 200 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs: 100.000% pgs unknown
 200 unknown

And this:


root@yak2 /var/www/projects # ceph health detail
HEALTH_WARN Reduced data availability: 200 pgs inactive
PG_AVAILABILITY Reduced data availability: 200 pgs inactive
    pg 1.34 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.35 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.36 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.37 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.38 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.39 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3a is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3b is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3c is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3d is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3e is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.3f is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.40 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.41 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.42 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.43 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.44 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.45 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.46 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.47 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.48 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.49 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.4a is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.4b is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.4c is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 1.4d is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 2.34 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 2.35 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 2.36 is stuck inactive for 3506.815664, current state
unknown, last acting []
    pg 2.38 is stuck inactive for 3506.815664, cur

[ceph-users] Urgent: Reduced data availability / All pgs inactive

2019-02-20 Thread Ranjan Ghosh

Hi all,

hope someone can help me. After restarting a node of my 2-node-cluster 
suddenly I get this:


root@yak2 /var/www/projects # ceph -s
  cluster:
    id: 749b2473-9300-4535-97a6-ee6d55008a1b
    health: HEALTH_WARN
    Reduced data availability: 200 pgs inactive

  services:
    mon: 3 daemons, quorum yak1,yak2,yak0
    mgr: yak0.planwerk6.de(active), standbys: yak1.planwerk6.de, 
yak2.planwerk6.de

    mds: cephfs-1/1/1 up  {0=yak1.planwerk6.de=up:active}, 1 up:standby
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   2 pools, 200 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs: 100.000% pgs unknown
 200 unknown

And this:


root@yak2 /var/www/projects # ceph health detail
HEALTH_WARN Reduced data availability: 200 pgs inactive
PG_AVAILABILITY Reduced data availability: 200 pgs inactive
    pg 1.34 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.35 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.36 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.37 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.38 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.39 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3a is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3b is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3c is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3d is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3e is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.3f is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.40 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.41 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.42 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.43 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.44 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.45 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.46 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.47 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.48 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.49 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.4a is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.4b is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.4c is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 1.4d is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.34 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.35 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.36 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.38 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.39 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3a is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3b is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3c is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3d is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3e is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.3f is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.40 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.41 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.42 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.43 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.44 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.45 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.46 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.47 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.48 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.49 is stuck inactive for 3506.815664, current state unknown, 
last acting []
    pg 2.4a is stuck inactive for 3506.815664, current state unknown, 
last acting []
    

[ceph-users] ceph pg dump

2018-06-14 Thread Ranjan Ghosh

Hi all,

we have two small clusters (3 nodes each) called alpha and beta. One 
node (alpha0/beta0) is on a remote site and only has monitor & manager. 
The two other nodes (alpha/beta-1/2) have all 4 services and contain the 
OSDs and are connected via an internal network. In short:


alpha0 -- alpha1--alpha2

beta0 -- beta1--beta2

Now, since a few weeks, I cannot run "ceph pg stat" or "ceph pg dump" or 
anything the like on alpha0. It works flawlessly on all other nodes 
including beta0. When I start such a command on alpha0, it just hangs 
forever.


I wonder what could be the reason? Could it be some firewall issue? I 
see nothing in the logs. Any ideas on how to debug this? I assume it's 
not necessary to run this command on a node with OSDs, right, because it 
works on beta0? I could swear it worked on alpha0 as well for many 
months... I wonder what happened. Weird.


Thank you,
Ranjan




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-26 Thread Ranjan Ghosh

HI Ronny,

Thanks for the detailed answer. It's much appreciated! I will keep this 
in the back of my mind, but for now the cost is prohibitive as we're 
using these servers not as storage-only space but full-fledged servers 
(i.e. Ceph is mounted locally, there's a webserver and database). And 2 
servers can be connected with a cross-link cable. 3 servers would 
require a switch and so on. It adds up quite quickly if you are really 
on a tight budget. Sometimes it's not so easy to advocate for new 
hardware as the benefits are not apparent to everyone :-)


 In addition, a reason why we're using Ceph in the first place is that 
we can do easy maintenance and the other server keeps running and the 
other catches up as soon as it comes back online. With 2/2 we'd lose 
exactly that - so it's a no-go. Of course, if the second node goes down 
as well we have a problem but OTOH: Any new changes wont happen as no 
writes will then happen anyway. And in addition both servers are 
equipped with Hardware RAID and BBU. In combination with our solid 
backup, I'm currently willing to take any risks. If we grow further, we 
might want to look at the 3/2 solution, though. Thanks again for letting 
me know about the underlying reasons!


Best regards,

Ranjan


Am 25.04.2018 um 19:40 schrieb Ronny Aasen:
the difference in cost between 2 and 3 servers are not HUGE. but the 
reliability  difference between a size 2/1 pool and a 3/2 pool is 
massive. a 2/1 pool is just a single fault during maintenance away 
from dataloss.  but you need multiple simultaneous faults, and have 
very bad luck to break a 3/2 pool


I would recommend rather using 2/2 pools if you are willing to accept 
a little downtime when a disk dies.  the cluster io would stop until 
the disks backfill to cover for the lost disk.
but it is better then having inconsistent pg's or dataloss because a 
disk crashed during a routine reboot, or 2 disks


also worth to read this link 
https://www.spinics.net/lists/ceph-users/msg32895.html   a good 
explanation.


you have good backups and are willing to restore the whole pool. And 
it is of course your privilege to run 2/1 pools but be mind full of 
the risks of doing so.



kind regards
Ronny Aasen

BTW: i did not know ubuntu automagically rebooted after a upgrade. you 
can probably avoid that reboot somehow in ubuntu. and do the restarts 
of services manually. if you wish to maintain service during upgrade





On 25.04.2018 11:52, Ranjan Ghosh wrote:
Thanks a lot for your detailed answer. The problem for us, however, 
was that we use the Ceph packages that come with the Ubuntu 
distribution. If you do a Ubuntu upgrade, all packages are upgraded 
in one go and the server is rebooted. You cannot influence anything 
or start/stop services one-by-one etc. This was concering me, because 
the upgrade instructions didn't mention anything about an alternative 
or what to do in this case. But someone here enlightened me that - in 
general - it all doesnt matter that much *if you are just accepting a 
downtime*. And, indeed, it all worked nicely. We stopped all services 
on all servers, upgraded the Ubuntu version, rebooted all servers and 
were ready to go again. Didn't encounter any problems there. The only 
problem turned out to be our own fault and simply a firewall 
misconfiguration.


And, yes, we're running a "size:2 min_size:1" because we're on a very 
tight budget. If I understand correctly, this means: Make changes of 
files to one server. *Eventually* copy them to the other server. I 
hope this *eventually* means after a few minutes. Up until now I've 
never experienced *any* problems with file integrity with this 
configuration. In fact, Ceph is incredibly stable. Amazing. I have 
never ever had any issues whatsoever with broken files/partially 
written files, files that contain garbage etc. Even after 
starting/stopping services, rebooting etc. With GlusterFS and other 
Cluster file system I've experienced many such problems over the 
years, so this is what makes Ceph so great. I have now a lot of trust 
in Ceph, that it will eventually repair everything :-) And: If a file 
that has been written a few seconds ago is really lost it wouldnt be 
that bad for our use-case. It's a web-server. Most important stuff is 
in the DB. We have hourly backups of everything. In a huge emergency, 
we could even restore the backup from an hour ago if we really had 
to. Not nice, but if it happens every 6 years or sth due to some 
freak hardware failure, I think it is manageable. I accept it's not 
the recommended/perfect solution if you have infinite amounts of 
money at your hands, but in our case, I think it's not extremely 
audacious either to do it like this, right?



Am 11.04.2018 um 19:25 schrieb Ronny Aasen:

ceph upgrades are usualy not a problem:
ceph have to be upgraded in the right order. normally when each 
service is on its own machine this is not difficult.
but when you have mon, mgr, osd, mds, a

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-25 Thread Ranjan Ghosh
Thanks a lot for your detailed answer. The problem for us, however, was 
that we use the Ceph packages that come with the Ubuntu distribution. If 
you do a Ubuntu upgrade, all packages are upgraded in one go and the 
server is rebooted. You cannot influence anything or start/stop services 
one-by-one etc. This was concering me, because the upgrade instructions 
didn't mention anything about an alternative or what to do in this case. 
But someone here enlightened me that - in general - it all doesnt matter 
that much *if you are just accepting a downtime*. And, indeed, it all 
worked nicely. We stopped all services on all servers, upgraded the 
Ubuntu version, rebooted all servers and were ready to go again. Didn't 
encounter any problems there. The only problem turned out to be our own 
fault and simply a firewall misconfiguration.


And, yes, we're running a "size:2 min_size:1" because we're on a very 
tight budget. If I understand correctly, this means: Make changes of 
files to one server. *Eventually* copy them to the other server. I hope 
this *eventually* means after a few minutes. Up until now I've never 
experienced *any* problems with file integrity with this configuration. 
In fact, Ceph is incredibly stable. Amazing. I have never ever had any 
issues whatsoever with broken files/partially written files, files that 
contain garbage etc. Even after starting/stopping services, rebooting 
etc. With GlusterFS and other Cluster file system I've experienced many 
such problems over the years, so this is what makes Ceph so great. I 
have now a lot of trust in Ceph, that it will eventually repair 
everything :-) And: If a file that has been written a few seconds ago is 
really lost it wouldnt be that bad for our use-case. It's a web-server. 
Most important stuff is in the DB. We have hourly backups of everything. 
In a huge emergency, we could even restore the backup from an hour ago 
if we really had to. Not nice, but if it happens every 6 years or sth 
due to some freak hardware failure, I think it is manageable. I accept 
it's not the recommended/perfect solution if you have infinite amounts 
of money at your hands, but in our case, I think it's not extremely 
audacious either to do it like this, right?



Am 11.04.2018 um 19:25 schrieb Ronny Aasen:

ceph upgrades are usualy not a problem:
ceph have to be upgraded in the right order. normally when each 
service is on its own machine this is not difficult.
but when you have mon, mgr, osd, mds, and klients on the same host you 
have to do it a bit carefully..


i tend to have a terminal open with "watch ceph -s" running, and i 
never do another service until the health is ok again.


first apt upgrade the packages on all the hosts. This only update the 
software on disk and not the running services.
then do the restart of services in the right order.  and only on one 
host at the time


mons: first you restart the mon service on all mon running hosts.
all the 3 mons are active at the same time, so there is no "shifting 
around" but make sure the quorum is ok again before you do the next mon.


mgr: then restart mgr on all hosts that run mgr. there is only one 
active mgr at the time now, so here there will be a bit of shifting 
around. but it is only for statistics/management so it may affect your 
ceph -s command, but not the cluster operation.


osd: restart osd processes one osd at the time, make sure health_ok 
before doing the next osd process. do this for all hosts that have osd's


mds: restart mds's one at the time. you will notice the standby mds 
taking over for the mds that was restarted. do both.


klients: restart clients, that means remount filesystems, migrate or 
restart vm's. or restart whatever process uses the old ceph libraries.



about pools:
since you only have 2 osd's you can obviously not be running the 
recommended 3 replication pools. ? this makes me worry that you may be 
running size:2 min_size:1 pools. and are daily running risk of 
dataloss due to corruption and inconsistencies. especially when you 
restart osd's


if your pools are size:2 min_size:2 then your cluster will fail when 
any osd is restarted, until the osd is up and healthy again. but you 
have less chance for dataloss then 2/1 pools.


if you added a osd on a third host you can run size:3 min_size:2 . the 
recommended config when you can have both redundancy and high 
availabillity.



kind regards
Ronny Aasen







On 11.04.2018 17:42, Ranjan Ghosh wrote:
Ah, nevermind, we've solved it. It was a firewall issue. The only 
thing that's weird is that it became an issue immediately after an 
update. Perhaps it has sth. to do with monitor nodes shifting around 
or anything. Well, thanks again for your quick support, though. It's 
much appreciated.


BR

Ranjan


Am 11.04.2018 um 17:07 schrieb Ranjan Ghosh:
Thank you for your answer. Do you have any specifics on which thread 
you're talking about? Would be very interested to read a

Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-11 Thread Ranjan Ghosh
Ah, nevermind, we've solved it. It was a firewall issue. The only thing 
that's weird is that it became an issue immediately after an update. 
Perhaps it has sth. to do with monitor nodes shifting around or 
anything. Well, thanks again for your quick support, though. It's much 
appreciated.


BR

Ranjan


Am 11.04.2018 um 17:07 schrieb Ranjan Ghosh:
Thank you for your answer. Do you have any specifics on which thread 
you're talking about? Would be very interested to read about a success 
story, because I fear that if I update the other node that the whole 
cluster comes down.



Am 11.04.2018 um 10:47 schrieb Marc Roos:

I think you have to update all osd's, mon's etc. I can remember running
into similar issue. You should be able to find more about this in
mailing list archive.



-Original Message-
From: Ranjan Ghosh [mailto:gh...@pw6.de]
Sent: woensdag 11 april 2018 16:02
To: ceph-users
Subject: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 =>
12.2.2

Hi all,

We have a two-cluster-node (with a third "monitoring-only" node). Over
the last months, everything ran *perfectly* smooth. Today, I did an
Ubuntu "apt-get upgrade" on one of the two servers. Among others, the
ceph packages were upgraded from 12.2.1 to 12.2.2. A minor release
update, one might think. But, to my surprise, after restarting the
services, Ceph is now in degraded state :-( (see below). Only the first
node - which ist still on 12.2.1 - seems to be running. I did a bit of
research and found this:

https://ceph.com/community/new-luminous-pg-overdose-protection/

I did set "mon_max_pg_per_osd = 300" to no avail. Don't know if this is
the problem at all.

Looking at the status it seems we have 264 pgs, right? When I enter
"ceph osd df" (which I found on another website claiming it should print
the number of PGs per OSD), it just hangs (need to abort with Ctrl+C).

Hope anybody can help me. The cluster know works with the single node,
but it is definively quite worrying because we don't have redundancy.

Thanks in advance,

Ranjan


root@tukan2 /var/www/projects # ceph -s
    cluster:
      id: 19895e72-4a0c-4d5d-ae23-7f631ec8c8e4
      health: HEALTH_WARN
      insufficient standby MDS daemons available
      Reduced data availability: 264 pgs inactive
      Degraded data redundancy: 264 pgs unclean

    services:
      mon: 3 daemons, quorum tukan1,tukan2,tukan0
      mgr: tukan0(active), standbys: tukan2
      mds: cephfs-1/1/1 up  {0=tukan2=up:active}
      osd: 2 osds: 2 up, 2 in

    data:
      pools:   3 pools, 264 pgs
      objects: 0 objects, 0 bytes
      usage:   0 kB used, 0 kB / 0 kB avail
      pgs: 100.000% pgs unknown

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-11 Thread Ranjan Ghosh
Thank you for your answer. Do you have any specifics on which thread 
you're talking about? Would be very interested to read about a success 
story, because I fear that if I update the other node that the whole 
cluster comes down.



Am 11.04.2018 um 10:47 schrieb Marc Roos:

I think you have to update all osd's, mon's etc. I can remember running
into similar issue. You should be able to find more about this in
mailing list archive.



-Original Message-
From: Ranjan Ghosh [mailto:gh...@pw6.de]
Sent: woensdag 11 april 2018 16:02
To: ceph-users
Subject: [ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 =>
12.2.2

Hi all,

We have a two-cluster-node (with a third "monitoring-only" node). Over
the last months, everything ran *perfectly* smooth. Today, I did an
Ubuntu "apt-get upgrade" on one of the two servers. Among others, the
ceph packages were upgraded from 12.2.1 to 12.2.2. A minor release
update, one might think. But, to my surprise, after restarting the
services, Ceph is now in degraded state :-( (see below). Only the first
node - which ist still on 12.2.1 - seems to be running. I did a bit of
research and found this:

https://ceph.com/community/new-luminous-pg-overdose-protection/

I did set "mon_max_pg_per_osd = 300" to no avail. Don't know if this is
the problem at all.

Looking at the status it seems we have 264 pgs, right? When I enter
"ceph osd df" (which I found on another website claiming it should print
the number of PGs per OSD), it just hangs (need to abort with Ctrl+C).

Hope anybody can help me. The cluster know works with the single node,
but it is definively quite worrying because we don't have redundancy.

Thanks in advance,

Ranjan


root@tukan2 /var/www/projects # ceph -s
    cluster:
      id: 19895e72-4a0c-4d5d-ae23-7f631ec8c8e4
      health: HEALTH_WARN
      insufficient standby MDS daemons available
      Reduced data availability: 264 pgs inactive
      Degraded data redundancy: 264 pgs unclean

    services:
      mon: 3 daemons, quorum tukan1,tukan2,tukan0
      mgr: tukan0(active), standbys: tukan2
      mds: cephfs-1/1/1 up  {0=tukan2=up:active}
      osd: 2 osds: 2 up, 2 in

    data:
      pools:   3 pools, 264 pgs
      objects: 0 objects, 0 bytes
      usage:   0 kB used, 0 kB / 0 kB avail
      pgs: 100.000% pgs unknown

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster degraded after Ceph Upgrade 12.2.1 => 12.2.2

2018-04-11 Thread Ranjan Ghosh

Hi all,

We have a two-cluster-node (with a third "monitoring-only" node). Over 
the last months, everything ran *perfectly* smooth. Today, I did an 
Ubuntu "apt-get upgrade" on one of the two servers. Among others, the 
ceph packages were upgraded from 12.2.1 to 12.2.2. A minor release 
update, one might think. But, to my surprise, after restarting the 
services, Ceph is now in degraded state :-( (see below). Only the first 
node - which ist still on 12.2.1 - seems to be running. I did a bit of 
research and found this:


https://ceph.com/community/new-luminous-pg-overdose-protection/

I did set "mon_max_pg_per_osd = 300" to no avail. Don't know if this is 
the problem at all.


Looking at the status it seems we have 264 pgs, right? When I enter 
"ceph osd df" (which I found on another website claiming it should print 
the number of PGs per OSD), it just hangs (need to abort with Ctrl+C).


Hope anybody can help me. The cluster know works with the single node, 
but it is definively quite worrying because we don't have redundancy.


Thanks in advance,

Ranjan


root@tukan2 /var/www/projects # ceph -s
  cluster:
    id: 19895e72-4a0c-4d5d-ae23-7f631ec8c8e4
    health: HEALTH_WARN
    insufficient standby MDS daemons available
    Reduced data availability: 264 pgs inactive
    Degraded data redundancy: 264 pgs unclean

  services:
    mon: 3 daemons, quorum tukan1,tukan2,tukan0
    mgr: tukan0(active), standbys: tukan2
    mds: cephfs-1/1/1 up  {0=tukan2=up:active}
    osd: 2 osds: 2 up, 2 in

  data:
    pools:   3 pools, 264 pgs
    objects: 0 objects, 0 bytes
    usage:   0 kB used, 0 kB / 0 kB avail
    pgs: 100.000% pgs unknown

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ubuntu upgrade Zesty => Aardvark, Implications for Ceph?

2017-11-13 Thread Ranjan Ghosh

Hi everyone,

In January, support for Ubuntu Zesty will run out and we're planning to 
upgrade our servers to Aardvark. We have a two-node-cluster (and one 
additional monitoring-only server) and we're using the packages that 
come with the distro. We have mounted CephFS on the same server with the 
kernel client in FSTab. AFAIK, Aardvark includes Ceph 12.0. What would 
happen if we used the usual "do-release-upgrade" to upgrade the servers 
one-by-one? I assume the procedure described here 
"http://ceph.com/releases/v12-2-0-luminous-released/; (section "Upgrade 
from Jewel or Kraken") probably won't work for us, because 
"do-release-upgrade" will upgrade all packages (including the ceph ones) 
at once and then reboots the machine. So we cannot really upgrade only 
the monitoring nodes. And I'd rather avoid switching to PPAs beforehand. 
So, what are the real consequences if we upgrade all servers one-by-one 
with "do-release-upgrade" and then reboot all the nodes? Is it only the 
downtime why this isnt recommended or do we lose data? Any other 
recommendations on how to tackle this?


Thank you / BR

Ranjan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh
Hm. That's quite weird. On our cluster, when I set "noscrub", 
"nodeep-scrub", scrubbing will always stop pretty quickly (a few 
minutes). I wonder why this doesnt happen on your cluster. When exactly 
did you set the flag? Perhaps it just needs some more time... Or there 
might be a disk problem why the scrubbing never finishes. Perhaps it's 
really a good idea, just like you proposed, to shutdown the 
corresponding OSDs. But that's just my thoughts. Perhaps some Ceph pro 
can shed some light on the possible reasons, why a scrubbing might get 
stuck and how to resolve this.



Am 22.08.2017 um 18:58 schrieb Ramazan Terzi:

Hi Ranjan,

Thanks for your reply. I did set scrub and nodeep-scrub flags. But active 
scrubbing operation can’t working properly. Scrubbing operation always in same 
pg (20.1e).

$ ceph pg dump | grep scrub
dumped all in format plain
pg_stat objects mip degrmispunf bytes   log disklog state   
state_stamp v   reportedup  up_primary  acting  
acting_primary  last_scrub  scrub_stamp last_deep_scrub deep_scrub_stamp
20.1e   25189   0   0   0   0   98359116362 30483048
active+clean+scrubbing  2017-08-21 04:55:13.354379  6930'2393   
6930:20949058   [29,31,3]   29  [29,31,3]   29  6712'22950171   
2017-08-20 04:46:59.208792  6712'22950171   2017-08-20 04:46:59.208792


$ ceph -s
 cluster 
  health HEALTH_WARN
 33 requests are blocked > 32 sec
 noscrub,nodeep-scrub flag(s) set
  monmap e9: 3 mons at 
{ceph-mon01=**:6789/0,ceph-mon02=**:6789/0,ceph-mon03=**:6789/0}
 election epoch 84, quorum 0,1,2 ceph-mon01,ceph-mon02,ceph-mon03
  osdmap e6930: 36 osds: 36 up, 36 in
 flags noscrub,nodeep-scrub,sortbitwise,require_jewel_osds
   pgmap v17667617: 1408 pgs, 5 pools, 24779 GB data, 6494 kobjects
 70497 GB used, 127 TB / 196 TB avail
 1407 active+clean
1 active+clean+scrubbing


Thanks,
Ramazan



On 22 Aug 2017, at 18:52, Ranjan Ghosh <gh...@pw6.de> wrote:

Hi Ramazan,

I'm no Ceph expert, but what I can say from my experience using Ceph is:

1) During "Scrubbing", Ceph can be extremely slow. This is probably where your "blocked 
requests" are coming from. BTW: Perhaps you can even find out which processes are currently blocking 
with: ps aux | grep "D". You might even want to kill some of those and/or shutdown services in 
order to relieve some stress from the machine until it recovers.

2) I usually have the following in my ceph.conf. This lets the scrubbing only 
run between midnight and 6 AM (hopefully the time of least demand; adjust as 
necessary)  - and with the lowest priority.

#Reduce impact of scrub.
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = "idle"
osd_scrub_end_hour = 6

3) The Scrubbing begin and end hour will always work. The low priority mode, 
however, works (AFAIK!) only with CFQ I/O Scheduler. Show your current 
scheduler like this (replace sda with your device):

cat /sys/block/sda/queue/scheduler

You can also echo to this file to set a different scheduler.


With these settings you can perhaps alleviate the problem so far, that the 
scrubbing runs over many nights until it finished. Again, AFAIK, it doesnt have 
to finish in one night. It will continue the next night and so on.

The Ceph experts say scrubbing is important. Don't know why, but I just believe 
them. They've built this complex stuff after all :-)

Thus, you can use "noscrub"/"nodeepscrub" to quickly get a hung server back to 
work, but you should not let it run like this forever and a day.

Hope this helps at least a bit.

BR,

Ranjan


Am 22.08.2017 um 15:20 schrieb Ramazan Terzi:

Hello,

I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD 
journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2

Ceph version is Jewel 10.2.6.

My cluster is active and a lot of virtual machines running on it (Linux and 
Windows VM's, database clusters, web servers etc).

During normal use, cluster slowly went into a state of blocked requests. 
Blocked requests periodically incrementing. All OSD's seems healthy. Benchmark, 
iowait, network tests, all of them succeed.

Yerterday, 08:00:
$ ceph health detail
HEALTH_WARN 3 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
1 ops are blocked > 8388.61 sec on osd.29
3 osds have slow requests

Todat, 16:05:
$ ceph health detail
HEALTH_WARN 32 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
16

Re: [ceph-users] Blocked requests problem

2017-08-22 Thread Ranjan Ghosh

Hi Ramazan,

I'm no Ceph expert, but what I can say from my experience using Ceph is:

1) During "Scrubbing", Ceph can be extremely slow. This is probably 
where your "blocked requests" are coming from. BTW: Perhaps you can even 
find out which processes are currently blocking with: ps aux | grep "D". 
You might even want to kill some of those and/or shutdown services in 
order to relieve some stress from the machine until it recovers.


2) I usually have the following in my ceph.conf. This lets the scrubbing 
only run between midnight and 6 AM (hopefully the time of least demand; 
adjust as necessary)  - and with the lowest priority.


#Reduce impact of scrub.
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = "idle"
osd_scrub_end_hour = 6

3) The Scrubbing begin and end hour will always work. The low priority 
mode, however, works (AFAIK!) only with CFQ I/O Scheduler. Show your 
current scheduler like this (replace sda with your device):


cat /sys/block/sda/queue/scheduler

You can also echo to this file to set a different scheduler.


With these settings you can perhaps alleviate the problem so far, that 
the scrubbing runs over many nights until it finished. Again, AFAIK, it 
doesnt have to finish in one night. It will continue the next night and 
so on.


The Ceph experts say scrubbing is important. Don't know why, but I just 
believe them. They've built this complex stuff after all :-)


Thus, you can use "noscrub"/"nodeepscrub" to quickly get a hung server 
back to work, but you should not let it run like this forever and a day.


Hope this helps at least a bit.

BR,

Ranjan


Am 22.08.2017 um 15:20 schrieb Ramazan Terzi:

Hello,

I have a Ceph Cluster with specifications below:
3 x Monitor node
6 x Storage Node (6 disk per Storage Node, 6TB SATA Disks, all disks have SSD 
journals)
Distributed public and private networks. All NICs are 10Gbit/s
osd pool default size = 3
osd pool default min size = 2

Ceph version is Jewel 10.2.6.

My cluster is active and a lot of virtual machines running on it (Linux and 
Windows VM's, database clusters, web servers etc).

During normal use, cluster slowly went into a state of blocked requests. 
Blocked requests periodically incrementing. All OSD's seems healthy. Benchmark, 
iowait, network tests, all of them succeed.

Yerterday, 08:00:
$ ceph health detail
HEALTH_WARN 3 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
1 ops are blocked > 8388.61 sec on osd.29
3 osds have slow requests

Todat, 16:05:
$ ceph health detail
HEALTH_WARN 32 requests are blocked > 32 sec; 3 osds have slow requests
1 ops are blocked > 134218 sec on osd.31
1 ops are blocked > 134218 sec on osd.3
16 ops are blocked > 134218 sec on osd.29
11 ops are blocked > 67108.9 sec on osd.29
2 ops are blocked > 16777.2 sec on osd.29
1 ops are blocked > 8388.61 sec on osd.29
3 osds have slow requests

$ ceph pg dump | grep scrub
dumped all in format plain
pg_stat objects mip degrmispunf bytes   log disklog state   
state_stamp v   reportedup  up_primary  acting  
acting_primary  last_scrub  scrub_stamp last_deep_scrub deep_scrub_stamp
20.1e   25183   0   0   0   0   98332537930 30663066
active+clean+scrubbing  2017-08-21 04:55:13.354379  6930'23908781   
6930:20905696   [29,31,3]   29  [29,31,3]   29  6712'22950171   
2017-08-20 04:46:59.208792  6712'22950171   2017-08-20 04:46:59.208792

Active scrub does not finish (about 24 hours). I did not restart any OSD 
meanwhile.
I'm thinking set noscrub, noscrub-deep, norebalance, nobackfill, and norecover 
flags and restart 3,29,31th OSDs. Is this solve my problem? Or anyone has 
suggestion about this problem?

Thanks,
Ramazan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] WBThrottle

2017-08-22 Thread Ranjan Ghosh

Hi Ceph gurus,

I've got the following problem with our Ceph installation (Jewel): There 
are various websites served from the CephFS mount. Sometimes, when I 
copy many new (large?) files onto this mount, it seems that after a 
certain delay, everything grinds to a halt. No websites are served; 
processes are in D state; probably until Ceph has written everything to 
disk. Then after a while, everythign recovers. Obviously, it would be 
great if I could tune some values to make the experience more "even" 
i.e. it can be a bit slower in general but OTOH without such huge 
"spikes" in performance...


Now, first, I discovered there is "filestore flusher" documented here:

http://docs.ceph.com/docs/jewel/rados/configuration/filestore-config-ref/?highlight=flusher

Weirdly, when I use

ceph --admin-daemon /bla/bla config show

then I cannot see anything about this config option. Does it still exist?

Then I found this somewhat cryptic page:

http://docs.ceph.com/docs/jewel/dev/osd_internals/wbthrottle/

It says: "The flusher was not an adequate solution to this problem since 
it forced writeback of small writes too eagerly killing performance."


Perhaps the "filestore flusher" was removed? But why is it still documented?

On the other hand, "config show" lists many "wbthrottle"-Options:

"filestore_wbthrottle_enable": "true",
"filestore_wbthrottle_xfs_bytes_hard_limit": "419430400",
"filestore_wbthrottle_xfs_bytes_start_flusher": "41943040",
"filestore_wbthrottle_xfs_inodes_hard_limit": "5000",
"filestore_wbthrottle_xfs_inodes_start_flusher": "500",
"filestore_wbthrottle_xfs_ios_hard_limit": "5000",
"filestore_wbthrottle_xfs_ios_start_flusher": "500",

I couldnt find them documented under docs.ceph.com, however they are 
documented here:


https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/configuration_guide/file_store_configuration_reference

Quite confusing! Now, I wonder: Could/should I modify (raise/lower) some 
of these values (we're using XFS)? Should I perhaps disable the 
WBThrottle altogether for my use case?


Thank you,

Ranjan








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] num_caps

2017-05-15 Thread Ranjan Ghosh
Ah, understand it much better now. Thank you so much for explaining. I 
hope/assume the caps dont prevent other clients from accessing the stuff 
in some way, right?


+1, though, for the idea to be able to specify a timeout. We have a 
rsync backup job which runs over the whole filesystem every few hours to 
do an incremental backup. If you have only one cronjob like that you 
consequently have all files as caps - permanently - up to the defined 
mds cache size. Even if the backup is finished after a few minutes. 
Which I think goes a bit overboard just for a backup even though you say 
the actually performance impact of all those caps is not that bad... :-/




Am 15.05.2017 um 14:49 schrieb John Spray:

On Mon, May 15, 2017 at 1:36 PM, Henrik Korkuc <li...@kirneh.eu> wrote:

On 17-05-15 13:40, John Spray wrote:

On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh <gh...@pw6.de> wrote:

Hi all,

When I run "ceph daemon mds. session ls" I always get a fairly
large
number for num_caps (200.000). Is this normal? I thought caps are sth.
like
open/locked files meaning a client is holding a cap on a file and no
other
client can access it during this time.

Capabilities are much broader than that, they cover clients keeping
some fresh metadata in their cache, even if the client isn't doing
anything with the file at that moment.  It's common for a client to
accumulate a large number of capabilities in normal operation, as it
keeps the metadata for many files in cache.

You can adjust the "client cache size" setting on the fuse client to
encourage it to cache metadata on fewer files and thereby hold onto
fewer capabilities if you want.

John

Is there an option (or planned option) for clients to release caps after
some time of inuse?

In my testing I saw that clients tend to hold on caps for indefinite time.

Currently in prod I have use case where are over 8mil caps and little over
800k inodes_with_caps.

Both the MDS and client caches operate on a LRU, size-limited basis.
That means that if they aren't hitting their size thresholds, they
will tend to keep lots of stuff in cache indefinitely.

One could add a behaviour that also actively expires cached metadata
if it has not been used for a certain period of time, but it's not
clear what the right time threshold would be, and whether it would be
desirable for most users.  If we free up memory because the system is
quiet this minute/hour, then it potentially just creates an issue when
we get busy again and need that memory back.

With caching/resources generally, there's a conflict between the
desire to keep things in cache in case they're needed again, and the
desire to evict things from cache so that we have lots of free space
available for new entries.  Which one is better is entirely workload
dependent: there is clearly scope to add different behaviours as
options, but its hard to know how much people would really use them --
the sanity of the defaults is the most important thing.  I do think
there's a reasonable argument that part of the sane defaults should
not be to keep something in cache if it hasn't been used for e.g. a
day or more.

BTW, clients do have an additional behaviour where they will drop
unneeded caps when an MDS restarts, to avoid making a newly started
MDS do a lot of unnecessary work to restore those caps, so the
overhead of all those extra caps isn't quite as much as one might
first imagine.

John





How can I debug this if it is a cause
of concern? Is there any way to debug on which files the caps are held
excatly?

Thank you,

Ranjan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] num_caps

2017-05-15 Thread Ranjan Ghosh

Hi all,

When I run "ceph daemon mds. session ls" I always get a fairly 
large number for num_caps (200.000). Is this normal? I thought caps are 
sth. like open/locked files meaning a client is holding a cap on a file 
and no other client can access it during this time. How can I debug this 
if it is a cause of concern? Is there any way to debug on which files 
the caps are held excatly?


Thank you,

Ranjan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unsolved questions

2017-02-06 Thread Ranjan Ghosh

Hi everyone,

I'm now running our two-node mini-cluster for some months. OSD, MDS and 
Monitor is running on both nodes. Additionally there is a very small 
third node which is only running a third monitor but no MDS/OSD. On both 
main servers, CephFS is mounted via FSTab/Kernel driver. The mounted 
folder is /var/www hosting many websites. We use Ceph in this situation 
to achieve redundancy so we can easily switch over to the other node in 
case one of them fails. Kernel version is 4.9.6. For the most part, it's 
running great and the performance of the filesystem is very good. Only 
some stubborn problems/questions have still remained over the whole time 
and I'd like to settle them once and for all:


1) Every once in a while, some processes (PHP) accessing the filesystem 
get stuck in a D-state (Uninterruptable sleep). I wonder if this happens 
due to network fluctuations (both server are connected via a simple 
Gigabit crosslink cable) or how to diagnose this. Why exactly does this 
happen in the first place? And what is the proper way to get these 
processes out of this situation? Why doesnt a timeout happen or anything 
else? I've read about client eviction, but when I enter "ceph daemon 
mds.node1 session ls" I only see two "entries" - one for each server. 
But I don't want to evict all processes on the server, obviously. Only 
the stuck process. So far, the only method I found to remove the D 
process is to reboot. Which is of course not a great solution. When I 
tried to only restart the MDS service instead of rebooting, many more 
processes got stuck and the load was >500 (not CPU most probably but due 
to processes waiting for I/O).


I found this thread here: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-May/001513.html


Is this (still) relevant for my problem? And I read somewhere that you 
should not mount the folder on the same server as the MDS is - except 
you have a "newer" kernel (can't find where I've read this). The 
information was a bit older, though, so I wondered if 4.9.6 isnt 
sufficient or whether this is still a problem at all...


2) A second, also still unsolved problem: Most of the time "ceph health" 
shows sth. like: "Client node2 failing to respond to cache pressure". 
Restarting the mds removes this message for a while before it appears 
again. I could remove the message by setting "mds cache size" higher 
than the total number of files/folder on the whole filesystem. Which is 
obviously not a great scalable solution. The message doesnt seem to 
cause any problems, though. Nevertheless, I'd like to solve this. BTW: 
When I run "session ls" I see the number of caps held (num_caps) very 
high (8). Doesnt this mean that so many files are open/occupied by 
one ore more processes? Is this normal? I have some cronjobs running 
from time to time which run find or chmod over the filesystem. Could 
they be resposible for this? Is there some value to have Ceph release 
those "caps" faster/earlier?


Thank you / BR

Ranjan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Very Small Cluster

2016-10-24 Thread Ranjan Ghosh
Thanks JC & Greg, I've changed the "mon osd min down reporters" to 1. 
According to this:


http://docs.ceph.com/docs/jewel/rados/configuration/mon-osd-interaction/

the default is already 1, though. I don't remember the value before I 
changed it everywhere, so I can't say for sure now. But I think it was 2 
despite what the docs say. Whatever. It's now 1 everywhere.


Another somewhat weird thing I found was: When I check the values of an 
OSD(!) with "ceph daemon osd.0 config show | sort | grep mon_osd" I see 
an entry "mon osd min down reporters". I can even change it. But 
according to the docs, this is just a setting for monitors. Why does it 
appear there? Does it influence anything? If not: Is there a way to only 
show relevant config entries for a daemon?


Then, when checking the doc page mentioned above and reading the 
descriptions of the multitude of config settings, I wonder: How can I 
properly estimate the time until my cluster works again? Since I get 
hung requests until the failed node is finally declared *down*, this 
time is obviously quite important for me. What exactly is the sequence 
of events when a node fails (i.e. someone accidentally hits the power 
off button). My (possibly totally wrong & dumb) idea:


1) osd0 fails/doesn't answer

2) osd1 pings osd0 every 6 seconds ( osd heartbeat interval). Thus, 
after 6 seconds max. osd1 notices osd0 *could be* down.


3) After another 20 seconds (osd heartbeat grace), osd1 decides osd0 is 
definitely down.


4) Another 120 seconds might elapse ( osd mon report interval max) until 
osd1 reports the bad news to the monitor.


5) The monitor gets the information about failed osd0 and since "mon osd 
min down reporters" is 1, this single osd is sufficent for the monitor 
to believe the bad news that osd0 is unresponsive.


6) But since "mon osd min down reports" is 3, all the stuff up until now 
has to happen 3 times in a row until the monitor finally realizes osd0 
is *really* unresponsive.


7) After another 900 seconds  (mon osd report timeout) of waiting in 
hope of another news that osd0 is still/back alive, the monitor marks 
osd0 as down


8) After another 300 seconds (mon osd down out interval) the monitor 
marks osd0 as down+out



So, after my possibly very naive understanding, it takes 3*(6+20+120) + 
900 + 300 seconds from the event "someone accidentally hit the power off 
switch" to "osd0 is marked down+out".


Correct? I expect not. Which config variables did I misunderstand?


Thank you

Ranjan




Am 29.09.2016 um 20:48 schrieb LOPEZ Jean-Charles:

mon_osd_min_down_reporters by default set to 2

I guess you’ll have to set it to 1 in your case

JC

On Sep 29, 2016, at 08:16, Gregory Farnum <gfar...@redhat.com 
<mailto:gfar...@redhat.com>> wrote:


I think the problem is that Ceph requires a certain number of OSDs or 
a certain number of reports of failure before it marks an OSD down. 
These thresholds are not tuned for a 2-OSD cluster; you probably want 
to set them to 1.
Also keep in mind that the OSDs provide a grace period of 20-30 
seconds before they'll report somebody down; this helps prevent 
spurious recovery but means you will get paused IO on an unclean 
shutdown.


I can't recall the exact config options off-hand, but it's something 
like "mon osd min down reports". Search the docs for that. :)

-Greg

On Thursday, September 29, 2016, Peter Maloney 
<peter.malo...@brockmann-consult.de 
<mailto:peter.malo...@brockmann-consult.de>> wrote:


On 09/29/16 14:07, Ranjan Ghosh wrote:
> Wow. Amazing. Thanks a lot!!! This works. 2 (hopefully) last
questions
> on this issue:
>
> 1) When the first node is coming back up, I can just call "ceph
osd up
> 0" and Ceph will start auto-repairing everything everything, right?
> That is, if there are e.g. new files that were created during
the time
> the first node was down, they will (sooner or later) get replicated
> there?
Nope, there is no "ceph osd up "; you just start the osd, and it
already gets recognized as up. (if you don't like this, you set
it out,
not just down; and there is a "ceph osd in " to undo that.)
>
> 2) If I don't call "osd down" manually (perhaps at the weekend when
> I'm not at the office) when a node dies - did I understand
correctly
> that the "hanging" I experienced is temporary and that after a few
> minutes (don't want to try out now) the node should also go down
> automatically?
I believe so, yes.

Also, FYI, RBD images don't seem to have this issue, and work
right away
on a 3 osd cluster. Maybe cephfs would also work better with a
3rd osd,
even an empty one (weight=0). (and I had an unresolved issue
testing the
 

Re: [ceph-users] Ceph Very Small Cluster

2016-09-29 Thread Ranjan Ghosh
Wow. Amazing. Thanks a lot!!! This works. 2 (hopefully) last questions 
on this issue:


1) When the first node is coming back up, I can just call "ceph osd up 
0" and Ceph will start auto-repairing everything everything, right? That 
is, if there are e.g. new files that were created during the time the 
first node was down, they will (sooner or later) get replicated there?


2) If I don't call "osd down" manually (perhaps at the weekend when I'm 
not at the office) when a node dies - did I understand correctly that 
the "hanging" I experienced is temporary and that after a few minutes 
(don't want to try out now) the node should also go down automatically?


BR,
Ranjan


Am 29.09.2016 um 13:00 schrieb Peter Maloney:


And also you could try:
 ceph osd down 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Very Small Cluster

2016-09-29 Thread Ranjan Ghosh

Hi Vasu,

thank you for your answer.

Yes, all the pools have min_size 1:

root@uhu2 /scripts # ceph osd lspools
0 rbd,1 cephfs_data,2 cephfs_metadata,
root@uhu2 /scripts # ceph osd pool get cephfs_data min_size
min_size: 1
root@uhu2 /scripts # ceph osd pool get cephfs_metadata min_size
min_size: 1

I stopped all the ceph services gracefully on the first machine. But, 
just to get this straight: What if the first machine really suffered a 
catastrophic failure? My expectation was, that the second machine just 
keeps on running and serving files? This is why we are using a Cluster 
in the first place... Or is already this expectation wrong?


When I stop the services on node1, I get this:

# ceph pg stat
2016-09-29 11:51:09.514814 7fcba012f700  0 -- :/1939885874 >> 
136.243.82.227:6789/0 pipe(0x7fcb9c05a730 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7fcb9c05c3f0).fault
v41732: 264 pgs: 264 active+clean; 18514 MB data, 144 GB used, 3546 GB / 
3690 GB avail; 1494 B/s rd, 0 op/s


So, my question still is: Is there a way to (preferably) automatically 
avoid such a situation? Or at least manually tell the second node to 
keep on working and forget about those files?


BR,
Ranjan



Am 28.09.2016 um 18:25 schrieb Vasu Kulkarni:


Are all the pools using min_size 1?  did you check pg stat and see which ones
are waiting? some steps to debug further and check
  http://docs.ceph.com/docs/jewel/rados/operations/monitoring-osd-pg/

Also did you shutdown the server abruptly while it was busy?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Very Small Cluster

2016-09-28 Thread Ranjan Ghosh

Hi everyone,

Up until recently, we were using GlusterFS to have two web servers in 
sync so we could take one down and switch back and forth between them - 
e.g. for maintenance or failover. Usually, both were running, though. 
The performance was abysmal, unfortunately. Copying many small files on 
the file system caused outages for several minutes - simply 
unacceptable. So I found Ceph. It's fairly new but I thought I'd give it 
a try. I liked especially the good, detailed documentation, the 
configurability and the many command-line tools which allow you to find 
out what is going on with your Cluster. All of this is severly lacking 
with GlusterFS IMHO.


Because we're on a very tiny budget for this project we cannot currently 
have more than two file system servers. I added a small Virtual Server, 
though, only for monitoring. So at least we have 3 monitoring nodes. I 
also created 3 MDS's, though as far as I understood, two are only for 
standby. To sum it up, we have:


server0: Admin (Deployment started from here) + Monitor + MDS
server1: Monitor + MDS + OSD
server2: Monitor + MDS + OSD

So, the OSD is on server1 and server2 which are next to each other 
connected by a local GigaBit-Ethernet connection. The cluster is mounted 
(also on server1 and server2) as /var/www and Apache is serving files 
off the cluster.


I've used these configuration settings:

osd pool default size = 2
osd pool default min_size = 1

My idea was that by default everything should be replicated on 2 servers 
i.e. each file is normally written on server1 and server2. In case of 
emergency though (one server has a failure), it's better to keep 
operating and only write the file to one server. Therefore, i set 
min_size = 1. My further understanding is (correct me if I'm wrong), 
that when the server comes back online, the files that were written to 
only 1 server during the outage will automatically be replicated to the 
server that has come back online.


So far, so good. With two servers now online, the performance is 
light-years away from sluggish GlusterFS. I've also worked with 
XtreemFS, OCFS2, AFS and never had such a good performance with any 
Cluster. In fact it's so blazingly fast, that I had to check twice I 
really had the cluster mounted and wasnt accidentally working on the 
hard drive. Impressive. I can edit files on server1 and they are 
immediately changed on server2 and vice versa. Great!


Unfortunately, when I'm now stopping all ceph-Services on server1, the 
websites on server2 start to hang/freeze. And "ceph health" shows "#x 
blocked requests". Now, what I don't understand: Why is it blocking? 
Shouldnt both servers have the file? And didn't I set min_size to "1"? 
And if there are a few files (could be some unimportant stuff) that's 
missing on one of the servers: How can I abort the blocking? I'd rather 
have a missing file or whatever, then a completely blocking website.


Are my files really duplicated 1:1 - or are they perhaps spread evenly 
between both OSDs? Do I have to edit the crushmap to achieve a real 
"RAID-1"-type of replication? Is there a command to find out for a 
specific file where it actually resides and whether it has really been 
replicated?


Thank you!
Ranjan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com