Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Anton Dmitriev
"recent enough version of the ceph-objectstore-tool" - sounds very 
interesting. Would it be released in one of next Jewel minor releases?


On 10.05.2017 19:03, David Turner wrote:
PG subfolder splitting is the primary reason people are going to be 
deploying Luminous and Bluestore much faster than any other major 
release of Ceph.  Bluestore removes the concept of subfolders in PGs.


I have had clusters that reached what seemed a hardcoded maximum of 
12,800 objects in a subfolder.  It would take an osd_heartbeat_grace 
of 240 or 300 to let them finish splitting their subfolders without 
being marked down.  Recently I came across a cluster that had a 
setting of 240 objects per subfolder before splitting, so it was 
splitting all the time, and several of the OSDs took longer than 30 
seconds to finish splitting into subfolders.  That led to more 
problems as we started adding backfilling to everything and we lost a 
significant amount of throughput on the cluster.


I have yet to manage a cluster with a recent enough version of the 
ceph-objectstore-tool (hopefully I'll have one this month) that 
includes the ability to take an osd offline, split the subfolders, 
then bring it back online.  If you set up a way to monitor how big 
your subfolders are getting, you can leave the ceph settings as high 
as you want, and then go in and perform maintenance on your cluster 1 
failure domain at a time splitting all of the PG subfolders on the 
OSDs.  This approach would remove this ever happening in the wild.


On Wed, May 10, 2017 at 5:37 AM Piotr Nowosielski 
> wrote:


It is difficult for me to clearly state why some PGs have not been
migrated.
crushmap settings? Weight of OSD?

One thing is certain - you will not find any information about the
split
process in the logs ...

pn

-Original Message-
From: Anton Dmitriev [mailto:t...@enumnet.ru ]
Sent: Wednesday, May 10, 2017 10:14 AM
To: Piotr Nowosielski >;
ceph-users@lists.ceph.com 
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

When I created cluster, I made a mistake in configuration, and set
split
parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder.
After that I changed split to 8, and increased number of pg and
pgp from
2048 to 4096 for pool, where problem occurs. While it was
backfilling I
observed, that placement groups were backfilling from one set of 3
OSD to
another set of 3 OSD (replicated size = 3), so I made a
conclusion, that PGs
are completely recreating while increasing PG and PGP for pool and
after
this process number of files per directory must be Ok. But when
backfilling
finished I found many directories in this pool with ~20
000 files. Why Increasing PG num did not helped? Or maybe after
this process
some files will be deleted with some delay?

I couldn`t find any information about directory split process in
logs, also
with osd and filestore debug 20. What pattern and in what log I
need to grep
for finding it?

On 10.05.2017 10:36, Piotr Nowosielski wrote:
> You can:
> - change these parameters and use ceph-objectstore-tool
> - add OSD host - rebuild the cluster will reduce the number of files
> in the directories
> - wait until "split" operations are over ;-)
>
> In our case, we could afford to wait until the "split" operation is
> over (we have 2 clusters in slightly different configurations
storing
> the same data)
>
> hint:
> When creating a new pool, use the parameter "expected_num_objects"
>
https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_
> pools_operate.html
>
> Piotr Nowosielski
> Senior Systems Engineer
> Zespół Infrastruktury 5
> Grupa Allegro sp. z o.o.
> Tel: +48 512 08 55 92
>
>
> -Original Message-
> From: Anton Dmitriev [mailto:t...@enumnet.ru
]
> Sent: Wednesday, May 10, 2017 9:19 AM
> To: Piotr Nowosielski >;
> ceph-users@lists.ceph.com 
> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>
> How did you solved it? Set new split/merge thresholds, and manually
> applied it by ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-${osd_num} --journal-path
> /var/lib/ceph/osd/ceph-${osd_num}/journal
> --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
> apply-layout-settings --pool default.rgw.buckets.data
>
> on each OSD?
>
> How I can see in logs, 

Re: [ceph-users] Rebalancing causing IO Stall/IO Drops to zero

2017-05-10 Thread Alex Gorbachev
On Thu, May 4, 2017 at 8:40 AM Osama Hasebou  wrote:

> Hi Everyone,
>
> We keep running into stalled IOs / they also drop almost to zero, whenever
> a node suddenly would go down or if there was a large amount of rebalancing
> going on and once rebalancing is completed, we would also get stalled io
> for 2-10 mins.
>
> Has anyone seen this behaviour before and found a way to fix this? We are
> seeing this on Ceph Hammer and also on Jewel.
>

 Please check the setting "mon osd down out subtree limit". Setting to host
would prevent automatic marking OSDs down when a host fails.

Also

osd-recovery-max-active (my setting is 5)
osd-recovery-threads (my setting is 3)
osd-max-backfills (my setting is 5)

Hth,
Alex

>
> Thanks.
>
> Regards,
> Ossi
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread Peter Maloney
On 05/10/17 22:07, David Turner wrote:
> Are you mounting your OSDs using fstab or anything else?  Ceph uses
> udev rules and partition identifiers to know what a disk is and where
> to mount it, assuming that you have your GUIDs set properly on your
> disks.  ceph-deploy does this by default.
>
> On Wed, May 10, 2017 at 3:46 PM David Turner  > wrote:
>
> `update-rc.d 'ceph' defaults 99`
> That should put it last in the boot order.  The '99' here is a
> number 01-99 where the lower the number the earlier in the boot
> sequence the service is started.  To see what order your service
> is set to start and stop, `ls /etc/rc*.d/*{service}.  Each rc#
> represents the runlevels.  K## is the order that services will be
> stopped, S$$ is the order that services will be started.  After
> you run the above command, it should change Ceph to S99.  If you
> want to fine tune it, you can see which services are starting up
> after ceph and see if you can locate the specific one that is
> causing your problems.
>
I think that might be sys-v specific.

> On Wed, May 10, 2017 at 3:34 PM  > wrote:
>
> David,
>
>  
>
> ceph tell osd.12 version replies version 11.2.0
>
>  
>
> Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart
> for ceph.
>
>  
>
> I don’t see a good way ensure last in an event based system
> like upstart.
>
>  
>
But speaking of sys-v... I am using Ubuntu 14.04 with ceph, and I don't
use the upstart stuff. I don't like it, except that it works great if
you just pretend it's sys-v. :)

So to use it like sysv, just do this for mons, osds, etc.:

rm /var/lib/ceph/osd/ceph-.../upstart
touch /var/lib/ceph/osd/ceph-.../sysvinit

And then start it the sys-v way, like:

service ceph start osd.0

Use update-rc.d to change the order. And you can see the order of the
sys-v side of it like:  ls -1 /etc/rc2.d/  (where 2 is your runlevel)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread vida.zach
I was able to set the order to 99 as your indicated but /var/log/upstart/ceph 
logs still complain excessively about unable to look up group 'ceph': (34) 
Numerical result out of range

Mounting is done via /etc/fstab for osds. Which are xfs formatted HDDs. 

-Zach
From: David Turner
Sent: Wednesday, May 10, 2017 4:07 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot

Are you mounting your OSDs using fstab or anything else?  Ceph uses udev rules 
and partition identifiers to know what a disk is and where to mount it, 
assuming that you have your GUIDs set properly on your disks.  ceph-deploy does 
this by default.

On Wed, May 10, 2017 at 3:46 PM David Turner  wrote:
`update-rc.d 'ceph' defaults 99`
That should put it last in the boot order.  The '99' here is a number 01-99 
where the lower the number the earlier in the boot sequence the service is 
started.  To see what order your service is set to start and stop, `ls 
/etc/rc*.d/*{service}.  Each rc# represents the runlevels.  K## is the order 
that services will be stopped, S$$ is the order that services will be started.  
After you run the above command, it should change Ceph to S99.  If you want to 
fine tune it, you can see which services are starting up after ceph and see if 
you can locate the specific one that is causing your problems.

On Wed, May 10, 2017 at 3:34 PM  wrote:
David, 
 
ceph tell osd.12 version replies version 11.2.0
 
Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph. 
 
I don’t see a good way ensure last in an event based system like upstart. 
 
For the record I already tried after networking and after filesystems are 
mounted to and that didn’t seem to help things. 
 
 
 
From: David Turner
Sent: Wednesday, May 10, 2017 3:21 PM

To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot
 
I would probably just make it start last in the boot order.  Depending on your 
distribution/version, that will be as simple as setting it to 99 for starting 
up.  Which distribution/version are you running? 
 
On Wed, May 10, 2017 at 2:36 PM  wrote:
David,
 
I get what you are saying. Do you have a suggestion as to what service I make 
ceph-osd depend on to reliable start?
 
My understanding is that these daemons should all be sort of independent of 
each other. 
 
-Zach
 
 
 
From: David Turner
Sent: Wednesday, May 10, 2017 1:18 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot
 
Have you attempted to place the ceph-osd startup later in the boot process.  
Which distribution/version are you running?  Each does it slightly different.  
This can be problematic for some services, very commonly in cases where a 
network drive is mapped and used by a service like mysql (terrible example, but 
effective).  If you try to start mysql before the network is up and the drive 
is mapped, then mysql will fail.  Some work arounds are to put a sleep in the 
init script, or retry (similar to what you did), but ultimately, you probably 
want to set a requisite service to have started or just place the service in a 
later starting position.
 
On Wed, May 10, 2017 at 9:43 AM  wrote:
System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in 
/var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service 
with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: 
disabling aio for non-block journal. Use journal_force_aio to force use of aio 
anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully 
starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this 
service doesn’t not improve starting the service without the workaround in 
place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser 
<< "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }


Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread David Turner
Are you mounting your OSDs using fstab or anything else?  Ceph uses udev
rules and partition identifiers to know what a disk is and where to mount
it, assuming that you have your GUIDs set properly on your disks.
 ceph-deploy does this by default.

On Wed, May 10, 2017 at 3:46 PM David Turner  wrote:

> `update-rc.d 'ceph' defaults 99`
> That should put it last in the boot order.  The '99' here is a number
> 01-99 where the lower the number the earlier in the boot sequence the
> service is started.  To see what order your service is set to start and
> stop, `ls /etc/rc*.d/*{service}.  Each rc# represents the runlevels.  K##
> is the order that services will be stopped, S$$ is the order that services
> will be started.  After you run the above command, it should change Ceph to
> S99.  If you want to fine tune it, you can see which services are starting
> up after ceph and see if you can locate the specific one that is causing
> your problems.
>
> On Wed, May 10, 2017 at 3:34 PM  wrote:
>
>> David,
>>
>>
>>
>> ceph tell osd.12 version replies version 11.2.0
>>
>>
>>
>> Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph.
>>
>>
>>
>> I don’t see a good way ensure last in an event based system like upstart.
>>
>>
>>
>> For the record I already tried after networking and after filesystems are
>> mounted to and that didn’t seem to help things.
>>
>>
>>
>>
>>
>>
>>
>> *From: *David Turner 
>> *Sent: *Wednesday, May 10, 2017 3:21 PM
>>
>>
>> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com
>> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>>
>>
>>
>> I would probably just make it start last in the boot order.  Depending on
>> your distribution/version, that will be as simple as setting it to 99 for
>> starting up.  Which distribution/version are you running?
>>
>>
>>
>> On Wed, May 10, 2017 at 2:36 PM  wrote:
>>
>> David,
>>
>>
>>
>> I get what you are saying. Do you have a suggestion as to what service I
>> make ceph-osd depend on to reliable start?
>>
>>
>>
>> My understanding is that these daemons should all be sort of independent
>> of each other.
>>
>>
>>
>> -Zach
>>
>>
>>
>>
>>
>>
>>
>> *From: *David Turner 
>> *Sent: *Wednesday, May 10, 2017 1:18 PM
>> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com
>> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>>
>>
>>
>> Have you attempted to place the ceph-osd startup later in the boot
>> process.  Which distribution/version are you running?  Each does it
>> slightly different.  This can be problematic for some services, very
>> commonly in cases where a network drive is mapped and used by a service
>> like mysql (terrible example, but effective).  If you try to start mysql
>> before the network is up and the drive is mapped, then mysql will fail.
>> Some work arounds are to put a sleep in the init script, or retry (similar
>> to what you did), but ultimately, you probably want to set a requisite
>> service to have started or just place the service in a later starting
>> position.
>>
>>
>>
>> On Wed, May 10, 2017 at 9:43 AM  wrote:
>>
>> System: Ubuntu Trusty 14.04
>>
>> Release : Kraken
>>
>>
>> Issue:
>>
>> When starting ceph-osd daemon on boot via upstart. Error message in
>> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service
>> with the errors message below
>>
>>
>>
>> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
>> /var/lib/ceph/osd/ceph-12/journal
>>
>> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
>> disabling aio for non-block journal. Use journal_force_aio to force use of
>> aio anyway
>>
>> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are
>> upgrading
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>> unable to look up group 'ceph': (34) Numerical result out of range
>>
>>
>>
>> Workaround:
>>
>>
>>
>> If I configure /etc/init/ceph-osd.conf like so
>>
>>
>>
>> -respawn limit 3 1800
>>
>> +respawn limit unlimited
>>
>>
>>
>> I get roughly 20 attempts to start the each osd daemon and then it
>> successfully starts.
>>
>>
>>
>> Starting the daemons by hand works just fine after boot.
>>
>>
>>
>> Possible reasons:
>>
>>
>>
>> NSCD is being utilized and may not have started yet. However disabling
>> this service doesn’t not improve starting the service without the
>> workaround in place.
>>
>>
>>
>>
>>
>> The message seems to be coming global/global_init.cc
>>
>>
>>
>> ./global/global_init.cc- struct passwd *p = 0;
>>
>> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf,
>> sizeof(buf), );
>>
>> ./global/global_init.cc- if (!p) {
>>
>> ./global/global_init.cc- cerr << "unable to look up user '" <<
>> g_conf->setuser << "'"
>>
>> ./global/global_init.cc- << std::endl;
>>
>> 

Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread vida.zach
David, 

ceph tell osd.12 version replies version 11.2.0

Distro is Ubuntu 14.04.5 LTS (trusty) which utilizes upstart for ceph. 

I don’t see a good way ensure last in an event based system like upstart. 

For the record I already tried after networking and after filesystems are 
mounted to and that didn’t seem to help things. 



From: David Turner
Sent: Wednesday, May 10, 2017 3:21 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot

I would probably just make it start last in the boot order.  Depending on your 
distribution/version, that will be as simple as setting it to 99 for starting 
up.  Which distribution/version are you running? 

On Wed, May 10, 2017 at 2:36 PM  wrote:
David,
 
I get what you are saying. Do you have a suggestion as to what service I make 
ceph-osd depend on to reliable start?
 
My understanding is that these daemons should all be sort of independent of 
each other. 
 
-Zach
 
 
 
From: David Turner
Sent: Wednesday, May 10, 2017 1:18 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot
 
Have you attempted to place the ceph-osd startup later in the boot process.  
Which distribution/version are you running?  Each does it slightly different.  
This can be problematic for some services, very commonly in cases where a 
network drive is mapped and used by a service like mysql (terrible example, but 
effective).  If you try to start mysql before the network is up and the drive 
is mapped, then mysql will fail.  Some work arounds are to put a sleep in the 
init script, or retry (similar to what you did), but ultimately, you probably 
want to set a requisite service to have started or just place the service in a 
later starting position.
 
On Wed, May 10, 2017 at 9:43 AM  wrote:
System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in 
/var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service 
with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: 
disabling aio for non-block journal. Use journal_force_aio to force use of aio 
anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully 
starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this 
service doesn’t not improve starting the service without the workaround in 
place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser 
<< "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- uid = p->pw_uid;

./global/global_init.cc- gid = p->pw_gid;

./global/global_init.cc- uid_string = g_conf->setuser;

./global/global_init.cc- }

./global/global_init.cc- }

./global/global_init.cc- if (g_conf->setgroup.length() > 0) {

./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());

./global/global_init.cc- if (!gid) {

./global/global_init.cc- char buf[4096];

./global/global_init.cc- struct group gr;

./global/global_init.cc- struct group *g = 0;

./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!g) {

./global/global_init.cc: cerr << "unable to look up group '" << 
g_conf->setgroup << "'"

./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- gid = g->gr_gid;

./global/global_init.cc- gid_string = g_conf->setgroup;

./global/global_init.cc- }

./global/global_init.cc- }



34 as an error code seems to correspond to ERANGE Insufficient buffer space 
supplied. I assume this is because getgrnam_r() returns NULL if it can’t find 
the group.



But as to why the group isn’t retrievable I have no idea, As

getent group ceph

ceph:x:59623:ceph



GID changed for security reasons.



Additional Information:



I also see this in boot.log not sure if it is related

failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file 

Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread David Turner
I would probably just make it start last in the boot order.  Depending on
your distribution/version, that will be as simple as setting it to 99 for
starting up.  Which distribution/version are you running?

On Wed, May 10, 2017 at 2:36 PM  wrote:

> David,
>
>
>
> I get what you are saying. Do you have a suggestion as to what service I
> make ceph-osd depend on to reliable start?
>
>
>
> My understanding is that these daemons should all be sort of independent
> of each other.
>
>
>
> -Zach
>
>
>
>
>
>
>
> *From: *David Turner 
> *Sent: *Wednesday, May 10, 2017 1:18 PM
> *To: *vida.z...@gmail.com; ceph-users@lists.ceph.com
> *Subject: *Re: [ceph-users] trouble starting ceph @ boot
>
>
>
> Have you attempted to place the ceph-osd startup later in the boot
> process.  Which distribution/version are you running?  Each does it
> slightly different.  This can be problematic for some services, very
> commonly in cases where a network drive is mapped and used by a service
> like mysql (terrible example, but effective).  If you try to start mysql
> before the network is up and the drive is mapped, then mysql will fail.
> Some work arounds are to put a sleep in the init script, or retry (similar
> to what you did), but ultimately, you probably want to set a requisite
> service to have started or just place the service in a later starting
> position.
>
>
>
> On Wed, May 10, 2017 at 9:43 AM  wrote:
>
> System: Ubuntu Trusty 14.04
>
> Release : Kraken
>
>
> Issue:
>
> When starting ceph-osd daemon on boot via upstart. Error message in
> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service
> with the errors message below
>
>
>
> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
> /var/lib/ceph/osd/ceph-12/journal
>
> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
> disabling aio for non-block journal. Use journal_force_aio to force use of
> aio anyway
>
> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
>
>
> Workaround:
>
>
>
> If I configure /etc/init/ceph-osd.conf like so
>
>
>
> -respawn limit 3 1800
>
> +respawn limit unlimited
>
>
>
> I get roughly 20 attempts to start the each osd daemon and then it
> successfully starts.
>
>
>
> Starting the daemons by hand works just fine after boot.
>
>
>
> Possible reasons:
>
>
>
> NSCD is being utilized and may not have started yet. However disabling
> this service doesn’t not improve starting the service without the
> workaround in place.
>
>
>
>
>
> The message seems to be coming global/global_init.cc
>
>
>
> ./global/global_init.cc- struct passwd *p = 0;
>
> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf,
> sizeof(buf), );
>
> ./global/global_init.cc- if (!p) {
>
> ./global/global_init.cc- cerr << "unable to look up user '" <<
> g_conf->setuser << "'"
>
> ./global/global_init.cc- << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- uid = p->pw_uid;
>
> ./global/global_init.cc- gid = p->pw_gid;
>
> ./global/global_init.cc- uid_string = g_conf->setuser;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- if (g_conf->setgroup.length() > 0) {
>
> ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());
>
> ./global/global_init.cc- if (!gid) {
>
> ./global/global_init.cc- char buf[4096];
>
> ./global/global_init.cc- struct group gr;
>
> ./global/global_init.cc- struct group *g = 0;
>
> ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), , buf,
> sizeof(buf), );
>
> ./global/global_init.cc- if (!g) {
>
> ./global/global_init.cc: cerr << "unable to look up group '" <<
> g_conf->setgroup << "'"
>
> ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- gid = g->gr_gid;
>
> ./global/global_init.cc- gid_string = g_conf->setgroup;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
>
>
> 34 as an error code seems to correspond to ERANGE Insufficient buffer
> space supplied. I assume this is because getgrnam_r() returns NULL if it
> can’t find the group.
>
>
>
> But as to why the group isn’t retrievable I have no idea, As
>
> getent group ceph
>
> ceph:x:59623:ceph
>
>
>
> GID changed for security reasons.
>
>
>
> Additional Information:
>
>
>
> I also see this in boot.log not sure if it is related
>
> failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file
> /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf
> --cluster ceph --setuser ceph --setgroup ceph '
>
>
> Any pointers would be helpful.
>
>
> -Zach
>
> 

Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread vida.zach
David,

I get what you are saying. Do you have a suggestion as to what service I make 
ceph-osd depend on to reliable start?

My understanding is that these daemons should all be sort of independent of 
each other. 

-Zach



From: David Turner
Sent: Wednesday, May 10, 2017 1:18 PM
To: vida.z...@gmail.com; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] trouble starting ceph @ boot

Have you attempted to place the ceph-osd startup later in the boot process.  
Which distribution/version are you running?  Each does it slightly different.  
This can be problematic for some services, very commonly in cases where a 
network drive is mapped and used by a service like mysql (terrible example, but 
effective).  If you try to start mysql before the network is up and the drive 
is mapped, then mysql will fail.  Some work arounds are to put a sleep in the 
init script, or retry (similar to what you did), but ultimately, you probably 
want to set a requisite service to have started or just place the service in a 
later starting position.

On Wed, May 10, 2017 at 9:43 AM  wrote:
System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in 
/var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service 
with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: 
disabling aio for non-block journal. Use journal_force_aio to force use of aio 
anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully 
starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this 
service doesn’t not improve starting the service without the workaround in 
place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser 
<< "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- uid = p->pw_uid;

./global/global_init.cc- gid = p->pw_gid;

./global/global_init.cc- uid_string = g_conf->setuser;

./global/global_init.cc- }

./global/global_init.cc- }

./global/global_init.cc- if (g_conf->setgroup.length() > 0) {

./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());

./global/global_init.cc- if (!gid) {

./global/global_init.cc- char buf[4096];

./global/global_init.cc- struct group gr;

./global/global_init.cc- struct group *g = 0;

./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!g) {

./global/global_init.cc: cerr << "unable to look up group '" << 
g_conf->setgroup << "'"

./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- gid = g->gr_gid;

./global/global_init.cc- gid_string = g_conf->setgroup;

./global/global_init.cc- }

./global/global_init.cc- }



34 as an error code seems to correspond to ERANGE Insufficient buffer space 
supplied. I assume this is because getgrnam_r() returns NULL if it can’t find 
the group.



But as to why the group isn’t retrievable I have no idea, As

getent group ceph

ceph:x:59623:ceph



GID changed for security reasons.



Additional Information:



I also see this in boot.log not sure if it is related

failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file 
/var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf 
--cluster ceph --setuser ceph --setgroup ceph '


Any pointers would be helpful.

-Zach
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread Peter Maloney
On 05/10/17 15:34, vida.z...@gmail.com wrote:
>
> System: Ubuntu Trusty 14.04
>
> Release : Kraken
>
>
> Issue:
>
> When starting ceph-osd daemon on boot via upstart. Error message in
> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the
> service with the errors message below
>
>
>
> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
> /var/lib/ceph/osd/ceph-12/journal
>
> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
> disabling aio for non-block journal. Use journal_force_aio to force
> use of aio anyway
>
This is a bad message...it usually means you forgot to make a block
device for the journal, so it made a file instead, which will be slow
for many reasons including disabling aio. It's far better to make a 2nd
partition for a colocated journal, even if you don't want a separate
device. And then you have to point to it either with a symlink named
journal, or in ceph.conf. And don't use raw names like /dev/sdb1, but
use like /dev/disk/by-partlabel/journal_osd0 or partuuid or something
else unique and stable.

But that's not your failing to start problem.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] trouble starting ceph @ boot

2017-05-10 Thread David Turner
Have you attempted to place the ceph-osd startup later in the boot
process.  Which distribution/version are you running?  Each does it
slightly different.  This can be problematic for some services, very
commonly in cases where a network drive is mapped and used by a service
like mysql (terrible example, but effective).  If you try to start mysql
before the network is up and the drive is mapped, then mysql will fail.
Some work arounds are to put a sleep in the init script, or retry (similar
to what you did), but ultimately, you probably want to set a requisite
service to have started or just place the service in a later starting
position.

On Wed, May 10, 2017 at 9:43 AM  wrote:

> System: Ubuntu Trusty 14.04
>
> Release : Kraken
>
>
> Issue:
>
> When starting ceph-osd daemon on boot via upstart. Error message in
> /var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service
> with the errors message below
>
>
>
> starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12
> /var/lib/ceph/osd/ceph-12/journal
>
> 2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open:
> disabling aio for non-block journal. Use journal_force_aio to force use of
> aio anyway
>
> 2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
> unable to look up group 'ceph': (34) Numerical result out of range
>
>
>
> Workaround:
>
>
>
> If I configure /etc/init/ceph-osd.conf like so
>
>
>
> -respawn limit 3 1800
>
> +respawn limit unlimited
>
>
>
> I get roughly 20 attempts to start the each osd daemon and then it
> successfully starts.
>
>
>
> Starting the daemons by hand works just fine after boot.
>
>
>
> Possible reasons:
>
>
>
> NSCD is being utilized and may not have started yet. However disabling
> this service doesn’t not improve starting the service without the
> workaround in place.
>
>
>
>
>
> The message seems to be coming global/global_init.cc
>
>
>
> ./global/global_init.cc- struct passwd *p = 0;
>
> ./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf,
> sizeof(buf), );
>
> ./global/global_init.cc- if (!p) {
>
> ./global/global_init.cc- cerr << "unable to look up user '" <<
> g_conf->setuser << "'"
>
> ./global/global_init.cc- << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- uid = p->pw_uid;
>
> ./global/global_init.cc- gid = p->pw_gid;
>
> ./global/global_init.cc- uid_string = g_conf->setuser;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- if (g_conf->setgroup.length() > 0) {
>
> ./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());
>
> ./global/global_init.cc- if (!gid) {
>
> ./global/global_init.cc- char buf[4096];
>
> ./global/global_init.cc- struct group gr;
>
> ./global/global_init.cc- struct group *g = 0;
>
> ./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), , buf,
> sizeof(buf), );
>
> ./global/global_init.cc- if (!g) {
>
> ./global/global_init.cc: cerr << "unable to look up group '" <<
> g_conf->setgroup << "'"
>
> ./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;
>
> ./global/global_init.cc- exit(1);
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- gid = g->gr_gid;
>
> ./global/global_init.cc- gid_string = g_conf->setgroup;
>
> ./global/global_init.cc- }
>
> ./global/global_init.cc- }
>
>
>
> 34 as an error code seems to correspond to ERANGE Insufficient buffer
> space supplied. I assume this is because getgrnam_r() returns NULL if it
> can’t find the group.
>
>
>
> But as to why the group isn’t retrievable I have no idea, As
>
> getent group ceph
>
> ceph:x:59623:ceph
>
>
>
> GID changed for security reasons.
>
>
>
> Additional Information:
>
>
>
> I also see this in boot.log not sure if it is related
>
> failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file
> /var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf
> --cluster ceph --setuser ceph --setgroup ceph '
>
>
> Any pointers would be helpful.
>
>
> -Zach
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Performance

2017-05-10 Thread Webert de Souza Lima
On Tue, May 9, 2017 at 9:07 PM, Brady Deetz  wrote:

> So with email, you're talking about lots of small reads and writes. In my
> experience with dicom data (thousands of 20KB files per directory), cephfs
> doesn't perform very well at all on platter drivers. I haven't experimented
> with pure ssd configurations, so I can't comment on that.
>

Yes, that's pretty much why I'm using cache tiering on SSDs.


> Somebody may correct me here, but small block io on writes just makes
> latency all that much more important due to the need to wait for your
> replicas to be written before moving on to the next block.
>

I think that is correct. Smaller blocks = more I/O, so SSDs benefit a lot.


> Without know exact hardware details, my brain is immediately jumping to
> networking constraints. 2 or 3 spindle drives can pretty much saturate a
> 1gbps link. As soon as you create contention for that resource, you create
> system load for iowait and latency.
>
You mentioned you don't control the network. Maybe you can scale down and
> out.
>

 I'm constrained with the topology I showed you for now. I did planned
another (see
https://creately.com/diagram/j1eyig9i/7wloXLNOAYjeregBGkvelMXL50%3D) but it
won't be possible at the time.
 That setup would have a 10 gig interconnection link.

On Wed, May 10, 2017 at 3:55 AM, John Spray  wrote:

>
> Hmm, to understand this better I would start by taking cache tiering
> out of the mix, it adds significant complexity.
>
> The "-direct=1" part could be significant here: when you're using RBD,
> that's getting handled by ext4, and then ext4 is potentially still
> benefiting from some caching at the ceph layer.  With CephFS on the
> other hand, it's getting handled by CephFS, and CephFS will be
> laboriously doing direct access to OSD.
>
> John


I won't be able to change that by now. I would need another testing cluster.
The point of direct=1 was to remove any caching possibility in the middle.
That fio suite was suggested by username peetaur on IRC channel (thanks :)

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread David Turner
PG subfolder splitting is the primary reason people are going to be
deploying Luminous and Bluestore much faster than any other major release
of Ceph.  Bluestore removes the concept of subfolders in PGs.

I have had clusters that reached what seemed a hardcoded maximum of 12,800
objects in a subfolder.  It would take an osd_heartbeat_grace of 240 or 300
to let them finish splitting their subfolders without being marked down.
Recently I came across a cluster that had a setting of 240 objects per
subfolder before splitting, so it was splitting all the time, and several
of the OSDs took longer than 30 seconds to finish splitting into
subfolders.  That led to more problems as we started adding backfilling to
everything and we lost a significant amount of throughput on the cluster.

I have yet to manage a cluster with a recent enough version of the
ceph-objectstore-tool (hopefully I'll have one this month) that includes
the ability to take an osd offline, split the subfolders, then bring it
back online.  If you set up a way to monitor how big your subfolders are
getting, you can leave the ceph settings as high as you want, and then go
in and perform maintenance on your cluster 1 failure domain at a time
splitting all of the PG subfolders on the OSDs.  This approach would remove
this ever happening in the wild.

On Wed, May 10, 2017 at 5:37 AM Piotr Nowosielski <
piotr.nowosiel...@allegrogroup.com> wrote:

> It is difficult for me to clearly state why some PGs have not been
> migrated.
> crushmap settings? Weight of OSD?
>
> One thing is certain - you will not find any information about the split
> process in the logs ...
>
> pn
>
> -Original Message-
> From: Anton Dmitriev [mailto:t...@enumnet.ru]
> Sent: Wednesday, May 10, 2017 10:14 AM
> To: Piotr Nowosielski ;
> ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>
> When I created cluster, I made a mistake in configuration, and set split
> parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder.
> After that I changed split to 8, and increased number of pg and pgp from
> 2048 to 4096 for pool, where problem occurs. While it was backfilling I
> observed, that placement groups were backfilling from one set of 3 OSD to
> another set of 3 OSD (replicated size = 3), so I made a conclusion, that
> PGs
> are completely recreating while increasing PG and PGP for pool and after
> this process number of files per directory must be Ok. But when backfilling
> finished I found many directories in this pool with ~20
> 000 files. Why Increasing PG num did not helped? Or maybe after this
> process
> some files will be deleted with some delay?
>
> I couldn`t find any information about directory split process in logs, also
> with osd and filestore debug 20. What pattern and in what log I need to
> grep
> for finding it?
>
> On 10.05.2017 10:36, Piotr Nowosielski wrote:
> > You can:
> > - change these parameters and use ceph-objectstore-tool
> > - add OSD host - rebuild the cluster will reduce the number of files
> > in the directories
> > - wait until "split" operations are over ;-)
> >
> > In our case, we could afford to wait until the "split" operation is
> > over (we have 2 clusters in slightly different configurations storing
> > the same data)
> >
> > hint:
> > When creating a new pool, use the parameter "expected_num_objects"
> > https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_
> > pools_operate.html
> >
> > Piotr Nowosielski
> > Senior Systems Engineer
> > Zespół Infrastruktury 5
> > Grupa Allegro sp. z o.o.
> > Tel: +48 512 08 55 92
> >
> >
> > -Original Message-
> > From: Anton Dmitriev [mailto:t...@enumnet.ru]
> > Sent: Wednesday, May 10, 2017 9:19 AM
> > To: Piotr Nowosielski ;
> > ceph-users@lists.ceph.com
> > Subject: Re: [ceph-users] All OSD fails after few requests to RGW
> >
> > How did you solved it? Set new split/merge thresholds, and manually
> > applied it by ceph-objectstore-tool --data-path
> > /var/lib/ceph/osd/ceph-${osd_num} --journal-path
> > /var/lib/ceph/osd/ceph-${osd_num}/journal
> > --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
> > apply-layout-settings --pool default.rgw.buckets.data
> >
> > on each OSD?
> >
> > How I can see in logs, that split occurs?
> >
> > On 10.05.2017 10:13, Piotr Nowosielski wrote:
> >> Hey,
> >> We had similar problems. Look for information on "Filestore merge and
> >> split".
> >>
> >> Some explain:
> >> The OSD, after reaching a certain number of files in the directory
> >> (it depends of 'filestore merge threshold' and 'filestore split
> multiple'
> >> parameters) rebuilds the structure of this directory.
> >> If the files arrives, the OSD creates new subdirectories and moves
> >> some of the files there.
> >> If the files are missing the OSD will reduce the number of
> >> subdirectories.
> >>
> >>
> >> --
> >> Piotr Nowosielski
> >> 

[ceph-users] trouble starting ceph @ boot

2017-05-10 Thread vida.zach
System: Ubuntu Trusty 14.04

Release : Kraken


Issue:

When starting ceph-osd daemon on boot via upstart. Error message in 
/var/log/upstart/ceph-osd-ceph_#.log reports 3 attempt to start the service 
with the errors message below



starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal

2017-05-09 13:38:34.507004 7f6d46a2e980 -1 journal FileJournal::_open: 
disabling aio for non-block journal. Use journal_force_aio to force use of aio 
anyway

2017-05-09 13:38:38.432333 7f6d46a2e980 -1 osd.12 2284024 PGs are upgrading

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range

unable to look up group 'ceph': (34) Numerical result out of range



Workaround:



If I configure /etc/init/ceph-osd.conf like so



-respawn limit 3 1800

+respawn limit unlimited



I get roughly 20 attempts to start the each osd daemon and then it successfully 
starts.



Starting the daemons by hand works just fine after boot.



Possible reasons:



NSCD is being utilized and may not have started yet. However disabling this 
service doesn’t not improve starting the service without the workaround in 
place.





The message seems to be coming global/global_init.cc



./global/global_init.cc- struct passwd *p = 0;

./global/global_init.cc- getpwnam_r(g_conf->setuser.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!p) {

./global/global_init.cc- cerr << "unable to look up user '" << g_conf->setuser 
<< "'"

./global/global_init.cc- << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- uid = p->pw_uid;

./global/global_init.cc- gid = p->pw_gid;

./global/global_init.cc- uid_string = g_conf->setuser;

./global/global_init.cc- }

./global/global_init.cc- }

./global/global_init.cc- if (g_conf->setgroup.length() > 0) {

./global/global_init.cc- gid = atoi(g_conf->setgroup.c_str());

./global/global_init.cc- if (!gid) {

./global/global_init.cc- char buf[4096];

./global/global_init.cc- struct group gr;

./global/global_init.cc- struct group *g = 0;

./global/global_init.cc- getgrnam_r(g_conf->setgroup.c_str(), , buf, 
sizeof(buf), );

./global/global_init.cc- if (!g) {

./global/global_init.cc: cerr << "unable to look up group '" << 
g_conf->setgroup << "'"

./global/global_init.cc- << ": " << cpp_strerror(errno) << std::endl;

./global/global_init.cc- exit(1);

./global/global_init.cc- }

./global/global_init.cc- gid = g->gr_gid;

./global/global_init.cc- gid_string = g_conf->setgroup;

./global/global_init.cc- }

./global/global_init.cc- }



34 as an error code seems to correspond to ERANGE Insufficient buffer space 
supplied. I assume this is because getgrnam_r() returns NULL if it can’t find 
the group.



But as to why the group isn’t retrievable I have no idea, As

getent group ceph

ceph:x:59623:ceph



GID changed for security reasons.



Additional Information:



I also see this in boot.log not sure if it is related

failed: 'ulimit -n 32768; /usr/bin/ceph-mds -i cephstorelx2 --pid-file 
/var/run/ceph/mds.cephstorelx2//mds.cephstorelx2.pid -c /etc/ceph/ceph.conf 
--cluster ceph --setuser ceph --setgroup ceph '


Any pointers would be helpful.

-Zach
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.0.2 Luminous (dev) released

2017-05-10 Thread Jurian Broertjes
I'm having issues with this as well. Since no new dev build is available 
yet, I tried the gitbuilder route, but that seems to be outdated.
eg: http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/ (last 
build was in january and Luminous is missing)


Are building from source or downgrading my options, or am I missing 
something?


Best regards,
Jurian

On 25-04-17 15:16, Sage Weil wrote:

I think this commit just missed 12.0.2:

commit 32b1b0476ad0d6a50d84732ce96cda6ee09f6bec
Author: Sage Weil 
Date:   Mon Apr 10 17:36:37 2017 -0400

 mon/OSDMonitor: tolerate upgrade from post-kraken dev cluster
 
 If the 'creating' pgs key is missing, move on without crashing.
 
 Signed-off-by: Sage Weil 


You can cherry-pick that or run a mon built from the master branch.

sage



On Tue, 25 Apr 2017, Dan van der Ster wrote:


Created ticket to follow up: http://tracker.ceph.com/issues/19769



On Tue, Apr 25, 2017 at 11:34 AM, Dan van der Ster  wrote:

Could this change be the culprit?

commit 973829132bf7206eff6c2cf30dd0aa32fb0ce706
Author: Sage Weil 
Date:   Fri Mar 31 09:33:19 2017 -0400

 mon/OSDMonitor: spinlock -> std::mutex

 I think spinlock is dangerous here: we're doing semi-unbounded
 work (decode).  Also seemingly innocuous code like dout macros
 take mutexes.

 Signed-off-by: Sage Weil 


diff --git a/src/mon/OSDMonitor.cc b/src/mon/OSDMonitor.cc
index 543338bdf3..6fa5e8de4b 100644
--- a/src/mon/OSDMonitor.cc
+++ b/src/mon/OSDMonitor.cc
@@ -245,7 +245,7 @@ void OSDMonitor::update_from_paxos(bool *need_bootstrap)
  bufferlist bl;
  mon->store->get(OSD_PG_CREATING_PREFIX, "creating", bl);
  auto p = bl.begin();
-std::lock_guard l(creating_pgs_lock);
+std::lock_guard l(creating_pgs_lock);
  creating_pgs.decode(p);
  dout(7) << __func__ << " loading creating_pgs e" <<
creating_pgs.last_scan_epoch << dendl;
}
...


Cheers, Dan


On Tue, Apr 25, 2017 at 11:15 AM, Dan van der Ster  wrote:

Hi,

The mon's on my test luminous cluster do not start after upgrading
from 12.0.1 to 12.0.2. Here is the backtrace:

  0> 2017-04-25 11:06:02.897941 7f467ddd7880 -1 *** Caught signal
(Aborted) **
  in thread 7f467ddd7880 thread_name:ceph-mon

  ceph version 12.0.2 (5a1b6b3269da99a18984c138c23935e5eb96f73e)
  1: (()+0x797e7f) [0x7f467e58ce7f]
  2: (()+0xf370) [0x7f467d18d370]
  3: (gsignal()+0x37) [0x7f467a44f1d7]
  4: (abort()+0x148) [0x7f467a4508c8]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f467ad539d5]
  6: (()+0x5e946) [0x7f467ad51946]
  7: (()+0x5e973) [0x7f467ad51973]
  8: (()+0x5eb93) [0x7f467ad51b93]
  9: (ceph::buffer::list::iterator_impl::copy(unsigned int,
char*)+0xa5) [0x7f467e2fc715]
  10: (creating_pgs_t::decode(ceph::buffer::list::iterator&)+0x3c)
[0x7f467e211e8c]
  11: (OSDMonitor::update_from_paxos(bool*)+0x225a) [0x7f467e1cd16a]
  12: (PaxosService::refresh(bool*)+0x1a5) [0x7f467e196335]
  13: (Monitor::refresh_from_paxos(bool*)+0x19b) [0x7f467e12953b]
  14: (Monitor::init_paxos()+0x115) [0x7f467e129975]
  15: (Monitor::preinit()+0x93d) [0x7f467e13b07d]
  16: (main()+0x2518) [0x7f467e07f848]
  17: (__libc_start_main()+0xf5) [0x7f467a43bb35]
  18: (()+0x32671e) [0x7f467e11b71e]
  NOTE: a copy of the executable, or `objdump -rdS ` is
needed to interpret this.

Cheers, Dan


On Mon, Apr 24, 2017 at 5:49 PM, Abhishek Lekshmanan  wrote:

This is the third development checkpoint release of Luminous, the next
long term
stable release.

Major changes from v12.0.1
--
* The original librados rados_objects_list_open (C) and objects_begin
   (C++) object listing API, deprecated in Hammer, has finally been
   removed.  Users of this interface must update their software to use
   either the rados_nobjects_list_open (C) and nobjects_begin (C++) API or
   the new rados_object_list_begin (C) and object_list_begin (C++) API
   before updating the client-side librados library to Luminous.

   Object enumeration (via any API) with the latest librados version
   and pre-Hammer OSDs is no longer supported.  Note that no in-tree
   Ceph services rely on object enumeration via the deprecated APIs, so
   only external librados users might be affected.

   The newest (and recommended) rados_object_list_begin (C) and
   object_list_begin (C++) API is only usable on clusters with the
   SORTBITWISE flag enabled (Jewel and later).  (Note that this flag is
   required to be set before upgrading beyond Jewel.)

* CephFS clients without the 'p' flag in their authentication capability
   string will no longer be able to set quotas or any layout fields.  This
   flag previously only restricted modification of the pool and namespace
   fields in layouts.

* CephFS directory fragmentation (large directory support) is enabled
   by default on new filesystems.  To enable it on existing 

Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Piotr Nowosielski
It is difficult for me to clearly state why some PGs have not been migrated.
crushmap settings? Weight of OSD?

One thing is certain - you will not find any information about the split
process in the logs ...

pn

-Original Message-
From: Anton Dmitriev [mailto:t...@enumnet.ru]
Sent: Wednesday, May 10, 2017 10:14 AM
To: Piotr Nowosielski ;
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

When I created cluster, I made a mistake in configuration, and set split
parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder.
After that I changed split to 8, and increased number of pg and pgp from
2048 to 4096 for pool, where problem occurs. While it was backfilling I
observed, that placement groups were backfilling from one set of 3 OSD to
another set of 3 OSD (replicated size = 3), so I made a conclusion, that PGs
are completely recreating while increasing PG and PGP for pool and after
this process number of files per directory must be Ok. But when backfilling
finished I found many directories in this pool with ~20
000 files. Why Increasing PG num did not helped? Or maybe after this process
some files will be deleted with some delay?

I couldn`t find any information about directory split process in logs, also
with osd and filestore debug 20. What pattern and in what log I need to grep
for finding it?

On 10.05.2017 10:36, Piotr Nowosielski wrote:
> You can:
> - change these parameters and use ceph-objectstore-tool
> - add OSD host - rebuild the cluster will reduce the number of files
> in the directories
> - wait until "split" operations are over ;-)
>
> In our case, we could afford to wait until the "split" operation is
> over (we have 2 clusters in slightly different configurations storing
> the same data)
>
> hint:
> When creating a new pool, use the parameter "expected_num_objects"
> https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_
> pools_operate.html
>
> Piotr Nowosielski
> Senior Systems Engineer
> Zespół Infrastruktury 5
> Grupa Allegro sp. z o.o.
> Tel: +48 512 08 55 92
>
>
> -Original Message-
> From: Anton Dmitriev [mailto:t...@enumnet.ru]
> Sent: Wednesday, May 10, 2017 9:19 AM
> To: Piotr Nowosielski ;
> ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>
> How did you solved it? Set new split/merge thresholds, and manually
> applied it by ceph-objectstore-tool --data-path
> /var/lib/ceph/osd/ceph-${osd_num} --journal-path
> /var/lib/ceph/osd/ceph-${osd_num}/journal
> --log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
> apply-layout-settings --pool default.rgw.buckets.data
>
> on each OSD?
>
> How I can see in logs, that split occurs?
>
> On 10.05.2017 10:13, Piotr Nowosielski wrote:
>> Hey,
>> We had similar problems. Look for information on "Filestore merge and
>> split".
>>
>> Some explain:
>> The OSD, after reaching a certain number of files in the directory
>> (it depends of 'filestore merge threshold' and 'filestore split multiple'
>> parameters) rebuilds the structure of this directory.
>> If the files arrives, the OSD creates new subdirectories and moves
>> some of the files there.
>> If the files are missing the OSD will reduce the number of
>> subdirectories.
>>
>>
>> --
>> Piotr Nowosielski
>> Senior Systems Engineer
>> Zespół Infrastruktury 5
>> Grupa Allegro sp. z o.o.
>> Tel: +48 512 08 55 92
>>
>> Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul.
>> Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego
>> przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII
>> Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 268796, o
>> kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer
>> identyfikacji podatkowej NIP: 5272525995.
>>
>>
>>
>> -Original Message-
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
>> Of Anton Dmitriev
>> Sent: Wednesday, May 10, 2017 8:14 AM
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] All OSD fails after few requests to RGW
>>
>> Hi!
>>
>> I increased pg_num and pgp_num for pool default.rgw.buckets.data from
>> 2048 to 4096, and it seems that situation became a bit better,
>> cluster dies after 20-30 PUTs, not after 1. Could someone please give
>> me some recommendations how to rescue the cluster?
>>
>> On 27.04.2017 09:59, Anton Dmitriev wrote:
>>> Cluster was going well for a long time, but on the previous week
>>> osds start to fail.
>>> We use cluster like image storage for Opennebula with small load and
>>> like object storage with high load.
>>> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
>>> over 1000, while reading or writing a few kilobytes in a second,
>>> osds on this disks become unresponsive and cluster marks them down.
>>> We lower the load to object storage and situation became better.
>>>
>>> 

Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Anton Dmitriev
When I created cluster, I made a mistake in configuration, and set split 
parameter to 32 and merge to 40, so 32*40*16 = 20480 files per folder. 
After that I changed split to 8, and increased number of pg and pgp from 
2048 to 4096 for pool, where problem occurs. While it was backfilling I 
observed, that placement groups were backfilling from one set of 3 OSD 
to another set of 3 OSD (replicated size = 3), so I made a conclusion, 
that PGs are completely recreating while increasing PG and PGP for pool 
and after this process number of files per directory must be Ok. But 
when backfilling finished I found many directories in this pool with ~20 
000 files. Why Increasing PG num did not helped? Or maybe after this 
process some files will be deleted with some delay?


I couldn`t find any information about directory split process in logs, 
also with osd and filestore debug 20. What pattern and in what log I 
need to grep for finding it?


On 10.05.2017 10:36, Piotr Nowosielski wrote:

You can:
- change these parameters and use ceph-objectstore-tool
- add OSD host - rebuild the cluster will reduce the number of files in the
directories
- wait until "split" operations are over ;-)

In our case, we could afford to wait until the "split" operation is over (we
have 2 clusters in slightly different configurations storing the same data)

hint:
When creating a new pool, use the parameter "expected_num_objects"
https://www.suse.com/documentation/ses-4/book_storage_admin/data/ceph_pools_operate.html

Piotr Nowosielski
Senior Systems Engineer
Zespół Infrastruktury 5
Grupa Allegro sp. z o.o.
Tel: +48 512 08 55 92


-Original Message-
From: Anton Dmitriev [mailto:t...@enumnet.ru]
Sent: Wednesday, May 10, 2017 9:19 AM
To: Piotr Nowosielski ;
ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

How did you solved it? Set new split/merge thresholds, and manually applied
it by ceph-objectstore-tool --data-path
/var/lib/ceph/osd/ceph-${osd_num} --journal-path
/var/lib/ceph/osd/ceph-${osd_num}/journal
--log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op
apply-layout-settings --pool default.rgw.buckets.data

on each OSD?

How I can see in logs, that split occurs?

On 10.05.2017 10:13, Piotr Nowosielski wrote:

Hey,
We had similar problems. Look for information on "Filestore merge and
split".

Some explain:
The OSD, after reaching a certain number of files in the directory (it
depends of 'filestore merge threshold' and 'filestore split multiple'
parameters) rebuilds the structure of this directory.
If the files arrives, the OSD creates new subdirectories and moves
some of the files there.
If the files are missing the OSD will reduce the number of subdirectories.


--
Piotr Nowosielski
Senior Systems Engineer
Zespół Infrastruktury 5
Grupa Allegro sp. z o.o.
Tel: +48 512 08 55 92

Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul.
Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego
przez Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII
Gospodarczy Krajowego Rejestru Sądowego pod numerem KRS 268796, o
kapitale zakładowym w wysokości 33 976 500,00 zł, posiadająca numer
identyfikacji podatkowej NIP: 5272525995.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
Of Anton Dmitriev
Sent: Wednesday, May 10, 2017 8:14 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

Hi!

I increased pg_num and pgp_num for pool default.rgw.buckets.data from
2048 to 4096, and it seems that situation became a bit better,
cluster dies after 20-30 PUTs, not after 1. Could someone please give
me some recommendations how to rescue the cluster?

On 27.04.2017 09:59, Anton Dmitriev wrote:

Cluster was going well for a long time, but on the previous week osds
start to fail.
We use cluster like image storage for Opennebula with small load and
like object storage with high load.
Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
over 1000, while reading or writing a few kilobytes in a second, osds
on this disks become unresponsive and cluster marks them down. We
lower the load to object storage and situation became better.

Yesterday situation became worse:
If RGWs are disabled and there is no requests to object storage
cluster performing well, but if enable RGWs and make a few PUTs or
GETs all not SSD osds on all storages become in the same situation,
described above.
IOtop shows, that xfsaild/ burns disks.

trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects,
as i understand it means ~360 000 objects to push per one osd for a
10 seconds
 $ wc -l t.t
10256873 t.t

fragmentation on one of such disks is about 3%

more information about cluster:

https://yadi.sk/d/Y63mXQhl3HPvwt

also debug logs for osd.33 while problem occurs

https://yadi.sk/d/kiqsMF9L3HPvte

debug_osd = 20/20
debug_filestore = 

Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Anton Dmitriev
How did you solved it? Set new split/merge thresholds, and manually 
applied it by
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${osd_num} 
--journal-path /var/lib/ceph/osd/ceph-${osd_num}/journal 
--log-file=/var/log/ceph/objectstore_tool.${osd_num}.log --op 
apply-layout-settings --pool default.rgw.buckets.data


on each OSD?

How I can see in logs, that split occurs?

On 10.05.2017 10:13, Piotr Nowosielski wrote:

Hey,
We had similar problems. Look for information on "Filestore merge and
split".

Some explain:
The OSD, after reaching a certain number of files in the directory (it
depends of 'filestore merge threshold' and 'filestore split multiple'
parameters) rebuilds the structure of this directory.
If the files arrives, the OSD creates new subdirectories and moves some of
the files there.
If the files are missing the OSD will reduce the number of subdirectories.


--
Piotr Nowosielski
Senior Systems Engineer
Zespół Infrastruktury 5
Grupa Allegro sp. z o.o.
Tel: +48 512 08 55 92

Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul.
Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego przez
Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII Gospodarczy
Krajowego Rejestru Sądowego pod numerem KRS 268796, o kapitale
zakładowym w wysokości 33 976 500,00 zł, posiadająca numer identyfikacji
podatkowej NIP: 5272525995.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Anton Dmitriev
Sent: Wednesday, May 10, 2017 8:14 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

Hi!

I increased pg_num and pgp_num for pool default.rgw.buckets.data from
2048 to 4096, and it seems that situation became a bit better,  cluster
dies after 20-30 PUTs, not after 1. Could someone please give me some
recommendations how to rescue the cluster?

On 27.04.2017 09:59, Anton Dmitriev wrote:

Cluster was going well for a long time, but on the previous week osds
start to fail.
We use cluster like image storage for Opennebula with small load and
like object storage with high load.
Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
over 1000, while reading or writing a few kilobytes in a second, osds
on this disks become unresponsive and cluster marks them down. We
lower the load to object storage and situation became better.

Yesterday situation became worse:
If RGWs are disabled and there is no requests to object storage
cluster performing well, but if enable RGWs and make a few PUTs or
GETs all not SSD osds on all storages become in the same situation,
described above.
IOtop shows, that xfsaild/ burns disks.

trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects,
as i understand it means ~360 000 objects to push per one osd for a 10
seconds
$ wc -l t.t
10256873 t.t

fragmentation on one of such disks is about 3%

more information about cluster:

https://yadi.sk/d/Y63mXQhl3HPvwt

also debug logs for osd.33 while problem occurs

https://yadi.sk/d/kiqsMF9L3HPvte

debug_osd = 20/20
debug_filestore = 20/20
debug_tp = 20/20



Ubuntu 14.04
$ uname -a
Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29
20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Ceph 10.2.7

7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd
intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index
One of this storages differs only in number of osd, it has 26 osd on
4tb, instead of 28 on others

Storages connect to each other by bonded 2x10gbit Clients connect to
storages by bonded 2x1gbit

in 5 storages 2 x CPU E5-2650v2  and 256 gb RAM in 2 storages 2 x CPU
E5-2690v3  and 512 gb RAM

7 mons
3 rgw

Help me please to rescue the cluster.




--
Dmitriev Anton

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dmitriev Anton

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Piotr Nowosielski
Hey,
We had similar problems. Look for information on "Filestore merge and
split".

Some explain:
The OSD, after reaching a certain number of files in the directory (it
depends of 'filestore merge threshold' and 'filestore split multiple'
parameters) rebuilds the structure of this directory.
If the files arrives, the OSD creates new subdirectories and moves some of
the files there.
If the files are missing the OSD will reduce the number of subdirectories.


--
Piotr Nowosielski
Senior Systems Engineer
Zespół Infrastruktury 5
Grupa Allegro sp. z o.o.
Tel: +48 512 08 55 92

Grupa Allegro Sp. z o.o. z siedzibą w Poznaniu, 60-166 Poznań, przy ul.
Grunwaldzka 182, wpisana do rejestru przedsiębiorców prowadzonego przez
Sąd Rejonowy Poznań - Nowe Miasto i Wilda, Wydział VIII Gospodarczy
Krajowego Rejestru Sądowego pod numerem KRS 268796, o kapitale
zakładowym w wysokości 33 976 500,00 zł, posiadająca numer identyfikacji
podatkowej NIP: 5272525995.



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Anton Dmitriev
Sent: Wednesday, May 10, 2017 8:14 AM
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] All OSD fails after few requests to RGW

Hi!

I increased pg_num and pgp_num for pool default.rgw.buckets.data from
2048 to 4096, and it seems that situation became a bit better,  cluster
dies after 20-30 PUTs, not after 1. Could someone please give me some
recommendations how to rescue the cluster?

On 27.04.2017 09:59, Anton Dmitriev wrote:
> Cluster was going well for a long time, but on the previous week osds
> start to fail.
> We use cluster like image storage for Opennebula with small load and
> like object storage with high load.
> Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz
> over 1000, while reading or writing a few kilobytes in a second, osds
> on this disks become unresponsive and cluster marks them down. We
> lower the load to object storage and situation became better.
>
> Yesterday situation became worse:
> If RGWs are disabled and there is no requests to object storage
> cluster performing well, but if enable RGWs and make a few PUTs or
> GETs all not SSD osds on all storages become in the same situation,
> described above.
> IOtop shows, that xfsaild/ burns disks.
>
> trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects,
> as i understand it means ~360 000 objects to push per one osd for a 10
> seconds
>$ wc -l t.t
> 10256873 t.t
>
> fragmentation on one of such disks is about 3%
>
> more information about cluster:
>
> https://yadi.sk/d/Y63mXQhl3HPvwt
>
> also debug logs for osd.33 while problem occurs
>
> https://yadi.sk/d/kiqsMF9L3HPvte
>
> debug_osd = 20/20
> debug_filestore = 20/20
> debug_tp = 20/20
>
>
>
> Ubuntu 14.04
> $ uname -a
> Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29
> 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
>
> Ceph 10.2.7
>
> 7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd
> intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index
> One of this storages differs only in number of osd, it has 26 osd on
> 4tb, instead of 28 on others
>
> Storages connect to each other by bonded 2x10gbit Clients connect to
> storages by bonded 2x1gbit
>
> in 5 storages 2 x CPU E5-2650v2  and 256 gb RAM in 2 storages 2 x CPU
> E5-2690v3  and 512 gb RAM
>
> 7 mons
> 3 rgw
>
> Help me please to rescue the cluster.
>
>


--
Dmitriev Anton

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread gjprabu
HI John,
  

Thanks for you reply , we are using below version for client and MDS (ceph 
version 10.2.2)



Regards

Prabu GJ




 On Wed, 10 May 2017 12:29:06 +0530 John Spray jsp...@redhat.com 
wrote 




On Thu, May 4, 2017 at 7:28 AM, gjprabu gjpr...@zohocorp.com wrote: 

 Hi Team, 

 

 We are running cephfs with 5 OSD and 3 Mon and 1 MDS. There is 

 Heath Warn "failing to respond to cache pressure" . Kindly advise to fix 

 this issue. 

 

This is usually due to buggy old clients, and occasionally due to a 

buggy old MDS. What client and MDS versions are you using? 

 

John 

 

 

 

 cluster b466e09c-f7ae-4e89-99a7-99d30eba0a13 

 health HEALTH_WARN 

 mds0: Client integ-hm8-1.csez.zohocorpin.com failing to respond 

 to cache pressure 

 mds0: Client integ-hm5 failing to respond to cache pressure 

 mds0: Client integ-hm9 failing to respond to cache pressure 

 mds0: Client integ-hm2 failing to respond to cache pressure 

 monmap e2: 3 mons at 

 
{intcfs-mon1=192.168.113.113:6789/0,intcfs-mon2=192.168.113.114:6789/0,intcfs-mon3=192.168.113.72:6789/0}
 

 election epoch 16, quorum 0,1,2 

 intcfs-mon3,intcfs-mon1,intcfs-mon2 

 fsmap e79409: 1/1/1 up {0=intcfs-osd1=up:active}, 1 up:standby 

 osdmap e3343: 5 osds: 5 up, 5 in 

 flags sortbitwise 

 pgmap v13065759: 564 pgs, 3 pools, 5691 GB data, 12134 kobjects 

 11567 GB used, 5145 GB / 16713 GB avail 

 562 active+clean 

 2 active+clean+scrubbing+deep 

 client io 8090 kB/s rd, 29032 kB/s wr, 25 op/s rd, 129 op/s wr 

 

 

 Regards 

 Prabu GJ 

 

 ___ 

 ceph-users mailing list 

 ceph-users@lists.ceph.com 

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

 






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread gjprabu
Hi Webert,



  Thanks for your reply , can pls suggest ceph pg value for data and 
metadata. I have set 128 for data and 128 for metadata , is this correct.





Regards

Prabu GJ




 On Thu, 04 May 2017 17:04:38 +0530 Webert de Souza Lima 
webert.b...@gmail.com wrote 




I have faced the same problem many times. Usually it doesn't cause anything 
bad, but I had a 30 min system outage twice because of this.

It might be because of the number of inodes on your ceph filesystem. Go to the 
MDS server and do (supposing your mds server id is intcfs-osd1):



 ceph daemon mds.intcfs-osd1 perf dump mds



look for the inodes_max and inodes informations.

inode_max is the maximum inodes to cache and inodes is the current number of 
inodes currently in the cache.




if it is full, mount the cephfs with the "-o dirstat" option, and cat the 
mountpoint, for example:



 mount -t ceph  10.0.0.1:6789:/ /mnt -o 
dirstat,name=admin,secretfile=/etc/ceph/admin.secret

 cat /mnt




look for the rentries number. if it is larger than the inode_max, rise the mds 
cache size option in ceph.conf to a number that fits and restart the mds 
(beware: this will cause cephfs to stall for a while. do at your own risk).



Regards,



Webert Lima

DevOps Engineer at MAV Tecnologia

Belo Horizonte - Brasil






On Thu, May 4, 2017 at 3:28 AM, gjprabu gjpr...@zohocorp.com wrote:








___

ceph-users mailing list 

ceph-users@lists.ceph.com 

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 




Hi Team,



  We are running cephfs with 5 OSD and 3 Mon and 1 MDS. There is Heath 
Warn "failing to respond to cache pressure" . Kindly advise to fix this issue.





cluster b466e09c-f7ae-4e89-99a7-99d30eba0a13

 health HEALTH_WARN

mds0: Client integ-hm8-1.csez.zohocorpin.com failing to respond to 
cache pressure

mds0: Client integ-hm5 failing to respond to cache pressure

mds0: Client integ-hm9 failing to respond to cache pressure

mds0: Client integ-hm2 failing to respond to cache pressure

 monmap e2: 3 mons at 
{intcfs-mon1=192.168.113.113:6789/0,intcfs-mon2=192.168.113.114:6789/0,intcfs-mon3=192.168.113.72:6789/0}

election epoch 16, quorum 0,1,2 intcfs-mon3,intcfs-mon1,intcfs-mon2

  fsmap e79409: 1/1/1 up {0=intcfs-osd1=up:active}, 1 up:standby

 osdmap e3343: 5 osds: 5 up, 5 in

flags sortbitwise

  pgmap v13065759: 564 pgs, 3 pools, 5691 GB data, 12134 kobjects

11567 GB used, 5145 GB / 16713 GB avail

 562 active+clean

   2 active+clean+scrubbing+deep

  client io 8090 kB/s rd, 29032 kB/s wr, 25 op/s rd, 129 op/s wr





Regards

Prabu GJ





___

 ceph-users mailing list

 ceph-users@lists.ceph.com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph health warn MDS failing to respond to cache pressure

2017-05-10 Thread John Spray
On Thu, May 4, 2017 at 7:28 AM, gjprabu  wrote:
> Hi Team,
>
>   We are running cephfs with 5 OSD and 3 Mon and 1 MDS. There is
> Heath Warn "failing to respond to cache pressure" . Kindly advise to fix
> this issue.

This is usually due to buggy old clients, and occasionally due to a
buggy old MDS.  What client and MDS versions are you using?

John

>
>
> cluster b466e09c-f7ae-4e89-99a7-99d30eba0a13
>  health HEALTH_WARN
> mds0: Client integ-hm8-1.csez.zohocorpin.com failing to respond
> to cache pressure
> mds0: Client integ-hm5 failing to respond to cache pressure
> mds0: Client integ-hm9 failing to respond to cache pressure
> mds0: Client integ-hm2 failing to respond to cache pressure
>  monmap e2: 3 mons at
> {intcfs-mon1=192.168.113.113:6789/0,intcfs-mon2=192.168.113.114:6789/0,intcfs-mon3=192.168.113.72:6789/0}
> election epoch 16, quorum 0,1,2
> intcfs-mon3,intcfs-mon1,intcfs-mon2
>   fsmap e79409: 1/1/1 up {0=intcfs-osd1=up:active}, 1 up:standby
>  osdmap e3343: 5 osds: 5 up, 5 in
> flags sortbitwise
>   pgmap v13065759: 564 pgs, 3 pools, 5691 GB data, 12134 kobjects
> 11567 GB used, 5145 GB / 16713 GB avail
>  562 active+clean
>2 active+clean+scrubbing+deep
>   client io 8090 kB/s rd, 29032 kB/s wr, 25 op/s rd, 129 op/s wr
>
>
> Regards
> Prabu GJ
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph MDS daemonperf

2017-05-10 Thread John Spray
On Tue, May 9, 2017 at 5:23 PM, Webert de Souza Lima
 wrote:
> Hi,
>
> by issuing `ceph daemonperf mds.x` I see the following columns:
>
> -mds-- --mds_server-- ---objecter--- -mds_cache-
> ---mds_log
> rlat inos caps|hsr  hcs  hcr |writ read actv|recd recy stry purg|segs evts
> subm|
>   0   95   41 |  000 |  000 |  00   250 |  1  628
> 0
>   0   95   41 |  000 |  000 |  00   250 |  1  628
> 0
>   0   95   41 |  000 |  000 |  00   250 |  1  628
> 0
>   0   95   41 |  000 |  000 |  00   250 |  1  628
> 0
>   0   95   41 |  000 |  000 |  00   250 |  1  628
> 0
>
> It's not clear to me what each column mean, but I can't find it anywhere.
> Also the labels are confusing. Why is there mds and mds_server?

The mds, mds_server etc refer to internal subsystems within the
ceph-mds process (their naming is arcane).

The abbreviated names for performance counters are the "nick" item in
the output of "ceph daemon  perf schema" -- for sufficiently
recent code you should see a description field there too.

John


> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> Belo Horizonte - Brasil
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] All OSD fails after few requests to RGW

2017-05-10 Thread Anton Dmitriev

Hi!

I increased pg_num and pgp_num for pool default.rgw.buckets.data from 
2048 to 4096, and it seems that situation became a bit better,  cluster 
dies after 20-30 PUTs, not after 1. Could someone please give me some 
recommendations how to rescue the cluster?


On 27.04.2017 09:59, Anton Dmitriev wrote:
Cluster was going well for a long time, but on the previous week osds 
start to fail.
We use cluster like image storage for Opennebula with small load and 
like object storage with high load.
Sometimes disks of some osds utlized by 100 %, iostat shows avgqu-sz 
over 1000, while reading or writing a few kilobytes in a second, osds 
on this disks become unresponsive and cluster marks them down. We 
lower the load to object storage and situation became better.


Yesterday situation became worse:
If RGWs are disabled and there is no requests to object storage 
cluster performing well, but if enable RGWs and make a few PUTs or 
GETs all not SSD osds on all storages become in the same situation, 
described above.

IOtop shows, that xfsaild/ burns disks.

trace-cmd record -e xfs\*  for a 10 seconds shows 10 milion objects, 
as i understand it means ~360 000 objects to push per one osd for a 10 
seconds

   $ wc -l t.t
10256873 t.t

fragmentation on one of such disks is about 3%

more information about cluster:

https://yadi.sk/d/Y63mXQhl3HPvwt

also debug logs for osd.33 while problem occurs

https://yadi.sk/d/kiqsMF9L3HPvte

debug_osd = 20/20
debug_filestore = 20/20
debug_tp = 20/20



Ubuntu 14.04
$ uname -a
Linux storage01 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 
20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


Ceph 10.2.7

7 storages: Supermicro 28 osd 4tb 7200 JBOD + journal raid10 4 ssd 
intel 3510 800gb + 2 osd SSD intel 3710 400gb for rgw meta and index
One of this storages differs only in number of osd, it has 26 osd on 
4tb, instead of 28 on others


Storages connect to each other by bonded 2x10gbit
Clients connect to storages by bonded 2x1gbit

in 5 storages 2 x CPU E5-2650v2  and 256 gb RAM
in 2 storages 2 x CPU E5-2690v3  and 512 gb RAM

7 mons
3 rgw

Help me please to rescue the cluster.





--
Dmitriev Anton

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com