[ceph-users] Cannot delete files or folders from bucket

2015-07-15 Thread Lior Vizanski

Hi,


I have an issue where I cannot delete files or folders from Buckets, no issues 
when copying data over. whenever i try to delete something i get:

Internal error 500, here is a sample from the radosgw log:


2015-07-12 17:51:33.216750 7f5daaf65700 15 calculated 
digest=4/aScqOXY8O45BFQds0OIzk=
2015-07-12 17:51:33.216756 7f5daaf65700 15 auth_sign=4/aScqOXf9hCXFQds0OIzk=
2015-07-12 17:51:33.216759 7f5daaf65700 15 compare=0
2015-07-12 17:51:33.216767 7f5daaf65700  2 req 17237:0.000278:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:reading permissions
2015-07-12 17:51:33.216810 7f5daaf65700 15 Read 
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/";>cbxxCloudberry
 Labs Userhttp://www.w3.org/2001/XMLSchema-instance"; 
xsi:type="CanonicalUser">cbxxCloudberry Labs 
UserFULL_CONTROL
2015-07-12 17:51:33.216828 7f5daaf65700  2 req 17237:0.000340:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:init op
2015-07-12 17:51:33.216838 7f5daaf65700  2 req 17237:0.000349:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:verifying op mask
2015-07-12 17:51:33.216842 7f5daaf65700 20 required_mask= 4 user.op_mask=7
2015-07-12 17:51:33.216847 7f5daaf65700  2 req 17237:0.000359:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:verifying op permissions
2015-07-12 17:51:33.216853 7f5daaf65700  5 Searching permissions for uid=cbxx 
mask=50
2015-07-12 17:51:33.216856 7f5daaf65700  5 Found permission: 15
2015-07-12 17:51:33.216860 7f5daaf65700  5 Searching permissions for group=1 
mask=50
2015-07-12 17:51:33.216864 7f5daaf65700  5 Permissions for group not found
2015-07-12 17:51:33.216867 7f5daaf65700  5 Searching permissions for group=2 
mask=50
2015-07-12 17:51:33.216870 7f5daaf65700  5 Permissions for group not found
2015-07-12 17:51:33.216873 7f5daaf65700  5 Getting permissions id=cbxx 
owner=cbxx perm=2
2015-07-12 17:51:33.216876 7f5daaf65700 10  uid=cbxx requested perm (type)=2, 
policy perm=2, user_perm_mask=2, acl perm=2
2015-07-12 17:51:33.216882 7f5daaf65700  2 req 17237:0.000394:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:verifying op params
2015-07-12 17:51:33.216888 7f5daaf65700  2 req 17237:0.000400:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:executing
2015-07-12 17:51:33.216925 7f5daaf65700 20 get_obj_state: rctx=0x7f5daaf641c0 
obj=cloudberrybucket:tmp/ state=0x7f5e0406be10 s->prefetch_data=0
2015-07-12 17:51:33.219125 7f5daaf65700 10 manifest: total_size = 0
2015-07-12 17:51:33.219133 7f5daaf65700 20 get_obj_state: setting s->obj_tag to 
default.16132.16299
2015-07-12 17:51:33.219141 7f5daaf65700 20 get_obj_state: rctx=0x7f5daaf641c0 
obj=cloudberrybucket:tmp/ state=0x7f5e0406be10 s->prefetch_data=0
2015-07-12 17:51:33.219166 7f5daaf65700 20 reading from 
.rgw:.bucket.meta.cloudberrybucket:default.16132.1
2015-07-12 17:51:33.219185 7f5daaf65700 20 get_obj_state: rctx=0x7f5daaf63600 
obj=.rgw:.bucket.meta.cloudberrybucket:default.16132.1 state=0x7f5e04051f60 
s->prefetch_data=0
2015-07-12 17:51:33.219200 7f5daaf65700 10 cache get: 
name=.rgw+.bucket.meta.cloudberrybucket:default.16132.1 : hit
2015-07-12 17:51:33.219212 7f5daaf65700 20 get_obj_state: s->obj_tag was set 
empty
2015-07-12 17:51:33.219221 7f5daaf65700 10 cache get: 
name=.rgw+.bucket.meta.cloudberrybucket:default.16132.1 : hit
2015-07-12 17:51:33.219246 7f5daaf65700 20  bucket index object: 
.dir.default.16132.1
2015-07-12 17:51:33.227267 7f5da5f5b700 20 get_obj_state: rctx=0x7f5da5f5a1c0 
obj=cloudberrybucket:MBS-f049230c-6628-4b4a-a025-7ff9dbed736c/CBB_TPL-SBS1/D:/DATA/docs/1019/2/גיבוי
 של G7889.wbk:/20050410160310/גיבוי של G7889.wbk state=0x7f5e100be420 
s->prefetch_data=0
2015-07-12 17:51:33.227294 7f5da5f5b700 10 setting object 
write_tag=default.16132.17234
2015-07-12 17:51:33.227381 7f5da5f5b700 20 reading from 
.rgw:.bucket.meta.cloudberrybucket:default.16132.1
2015-07-12 17:51:33.227400 7f5da5f5b700 20 get_obj_state: rctx=0x7f5da5f59140 
obj=.rgw:.bucket.meta.cloudberrybucket:default.16132.1 state=0x7f5e1b60 
s->prefetch_data=0
2015-07-12 17:51:33.227414 7f5da5f5b700 10 cache get: 
name=.rgw+.bucket.meta.cloudberrybucket:default.16132.1 : hit
2015-07-12 17:51:33.227427 7f5da5f5b700 20 get_obj_state: s->obj_tag was set 
empty
2015-07-12 17:51:33.227436 7f5da5f5b700 10 cache get: 
name=.rgw+.bucket.meta.cloudberrybucket:default.16132.1 : hit
2015-07-12 17:51:33.227498 7f5da5f5b700 20  bucket index object: 
.dir.default.16132.1
2015-07-12 17:51:33.235504 7f5daaf65700  0 WARNING: set_req_state_err err_no=95 
resorting to 500
2015-07-12 17:51:33.235706 7f5daaf65700  2 req 17237:0.019217:s3:DELETE 
/cloudberrybucket/tmp/:delete_obj:http status=500
2015-07-12 17:51:33.235716 7f5daaf65700  1 == req done req=0x7f5e080eda80 
http_status=500 ==


Ubuntu 14.04

ceph version 0.94.2


Thanks in advance,

Lior




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Any workaround for ImportError: No module named ceph_argparse?

2015-07-15 Thread Dan Mick
On 07/15/2015 11:11 AM, Deneau, Tom wrote:
> I just installed 9.0.2 on Trusty using ceph-deploy install --testing and I am 
> hitting
> the "ImportError:  No module named ceph_argparse" issue.
> 
> What is the best way to get around this issue and still run a version that is
> compatible with other (non-Ubuntu) nodes in the cluster that are running 
> 9.0.1?
> 
> -- Tom Deneau

Which command is prompting that error?


-- 
Dan Mick
Red Hat, Inc.
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backing Hadoop with Ceph ??

2015-07-15 Thread John Spray



On 15/07/15 16:57, Shane Gibson wrote:



We are in the (very) early stages of considering testing backing 
Hadoop via Ceph - as opposed to HDFS.  I've seen a few very vague 
references to doing that, but haven't found any concrete info 
(architecture, configuration recommendations, gotchas, lessons 
learned, etc...).   I did find the ceph.com/docs/ info [1] which 
discusses use of CephFS for backing Hadoop - but this would be foolish 
for production clusters given that CephFS isn't yet considered 
production quality/grade.


For analytics workloads where you're handling ephemeral datasets or 
scratch data, you might find that self-supporting a cephfs instance is a 
workable solution.  The in-development fsck parts of cephfs are usually 
more of a concern for long term storage use cases, and for providing 
fully vendor-supported systems.  I'd encourage you to try out the 
hadoop+cephfs setup and let us know what kind of issues you hit, if any.


Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados-java issue tracking and release

2015-07-15 Thread Wido den Hollander
Hi,

On 14-07-15 23:05, Laszlo Hornyak wrote:
> Hi Wido,
> 
> That release was actually just a publish from the ceph repository into
> the central repository, I have copied the very same jar file to make
> sure there is no difference and I don't change basically anything else
> than the source of the jar file. Since central requires a source
> attachment, I attached the source at
> version 35314dadad5eaf48f48cddce6c61f80ef175e7a6 therefore indeed, not
> the latest, but the one that should match the published jar file.
> 

Ok, so I think I'd have to take a look at it some time. Fix some bugs
and publish a new version with various fixes in it.

Bump the version number and that should then be uploaded to the repo so
that the binary matches the source.

Wido

> Best regards,
> Laszlo
> 
> 
> 
> On Tue, Jul 14, 2015 at 11:58 AM, Wido den Hollander  > wrote:
> 
> Hi,
> 
> On 14-07-15 11:05, Mingfai wrote:
> > hi,
> >
> > does anyone know who is maintaining rados-java and perform release to
> > the Maven central? In May, there was a release to Maven central *[1],
> > but the release version is not based on the latest code base from:
> > https://github.com/ceph/rados-java
> > I wonder if the one who do the Maven release could tag a version and
> > release the current snapshot.
> >
> 
> From the CloudStack project Laszlo pushed it to Maven central with my
> permission, but it seems he used a different source then from Github.
> 
> CC'ing him if he knows which source he used.
> 
> > Besides, I am not sure if the rados-java developers will notice any
> > issue reported in the ceph issue tracker. would it be better if the
> > rados-java project could enable issue tracking at github? thx
> >
> 
> I have to be honest that I simply forgot to look at the outstanding
> issues.
> 
> Any help is more then appreciated since I don't have the time to look at
> them.
> 
> Always feel free to send in a pull request on Github:
> https://github.com/ceph/rados-java/pulls
> 
> If it fixes a issue, please add that in the git commit message.
> 
> Wido
> 
> > [1]
> http://search.maven.org/#artifactdetails%7Ccom.ceph%7Crados%7C0.1.4%7Cjar
> >
> > regards,
> > mingfai
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> 
> 
> -- 
> 
> EOF
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backing Hadoop with Ceph ??

2015-07-15 Thread Shane Gibson

Somnath - thanks for the reply ...

:-)  Haven't tried anything yet - just starting to gather info/input/direction 
for this solution.

Looking at the S3 API info [2] - there is no mention of support for the "S3a" 
API extensions - namely "rename" support.  The problem with backing via S3 API 
- if you need to rename a large (say multi GB) data object - you have to copy 
to new name and delete - this is a very IO expensive operation - and something 
we do a lot of.  That in and of itself might be a deal breaker ...   Any 
idea/input/intention of supporting the S3a exentsions within the RadosGW S3 API 
implementation?

Plus - it seems like it's considered a "bad idea" to back Hadoop via S3 (and 
indirectly Ceph via RGW) [3]; though not sure if the architectural differences 
from Amazon's S3 implementation and the far superior Ceph make it more 
palatable?

~~shane

[2] http://ceph.com/docs/master/radosgw/s3/
[3] https://wiki.apache.org/hadoop/AmazonS3



On 7/15/15, 9:50 AM, "Somnath Roy" 
mailto:somnath@sandisk.com>> wrote:

Did you try to integrate ceph +rgw+s3 with Hadoop?

Sent from my iPhone

On Jul 15, 2015, at 8:58 AM, Shane Gibson 
mailto:shane_gib...@symantec.com>> wrote:



We are in the (very) early stages of considering testing backing Hadoop via 
Ceph - as opposed to HDFS.  I've seen a few very vague references to doing 
that, but haven't found any concrete info (architecture, configuration 
recommendations, gotchas, lessons learned, etc...).   I did find the 
ceph.com/docs/ info [1] which discusses use of CephFS 
for backing Hadoop - but this would be foolish for production clusters given 
that CephFS isn't yet considered production quality/grade.

Does anyone in the ceph-users community have experience with this that they'd 
be willing to share?   Preferably ... via use of Ceph - not via CephFS...but I 
am interested in any CephFS related experiences too.

If we were to do this, and Ceph proved out as a backing store to Hadoop - there 
is the potential to be creating a fairly large multi-Petabyte (100s ??) class 
backing store for Ceph.  We do a very large amount of analytics on a lot of 
data sets for security trending correlations, etc...

Our current Ceph experience is limited to a few small (90 x 4TB OSD size) 
clusters - which we are working towards putting in production for Glance/Cinder 
backing and for Block storage for various large storage need platforms (eg 
software and package repo/mirrors, etc...).

Thanks in  advance for any input, thoughts, or pointers ...

~~shane

[1] http://ceph.com/docs/master/cephfs/hadoop/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Any workaround for ImportError: No module named ceph_argparse?

2015-07-15 Thread Deneau, Tom
I just installed 9.0.2 on Trusty using ceph-deploy install --testing and I am 
hitting
the "ImportError:  No module named ceph_argparse" issue.

What is the best way to get around this issue and still run a version that is
compatible with other (non-Ubuntu) nodes in the cluster that are running 9.0.1?

-- Tom Deneau

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backing Hadoop with Ceph ??

2015-07-15 Thread Somnath Roy
Did you try to integrate ceph +rgw+s3 with Hadoop?

Sent from my iPhone

On Jul 15, 2015, at 8:58 AM, Shane Gibson 
mailto:shane_gib...@symantec.com>> wrote:



We are in the (very) early stages of considering testing backing Hadoop via 
Ceph - as opposed to HDFS.  I've seen a few very vague references to doing 
that, but haven't found any concrete info (architecture, configuration 
recommendations, gotchas, lessons learned, etc...).   I did find the 
ceph.com/docs/ info [1] which discusses use of CephFS 
for backing Hadoop - but this would be foolish for production clusters given 
that CephFS isn't yet considered production quality/grade.

Does anyone in the ceph-users community have experience with this that they'd 
be willing to share?   Preferably ... via use of Ceph - not via CephFS...but I 
am interested in any CephFS related experiences too.

If we were to do this, and Ceph proved out as a backing store to Hadoop - there 
is the potential to be creating a fairly large multi-Petabyte (100s ??) class 
backing store for Ceph.  We do a very large amount of analytics on a lot of 
data sets for security trending correlations, etc...

Our current Ceph experience is limited to a few small (90 x 4TB OSD size) 
clusters - which we are working towards putting in production for Glance/Cinder 
backing and for Block storage for various large storage need platforms (eg 
software and package repo/mirrors, etc...).

Thanks in  advance for any input, thoughts, or pointers ...

~~shane

[1] http://ceph.com/docs/master/cephfs/hadoop/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] backing Hadoop with Ceph ??

2015-07-15 Thread Shane Gibson


We are in the (very) early stages of considering testing backing Hadoop via 
Ceph - as opposed to HDFS.  I've seen a few very vague references to doing 
that, but haven't found any concrete info (architecture, configuration 
recommendations, gotchas, lessons learned, etc...).   I did find the 
ceph.com/docs/ info [1] which discusses use of CephFS for backing Hadoop - but 
this would be foolish for production clusters given that CephFS isn't yet 
considered production quality/grade.

Does anyone in the ceph-users community have experience with this that they'd 
be willing to share?   Preferably ... via use of Ceph - not via CephFS...but I 
am interested in any CephFS related experiences too.

If we were to do this, and Ceph proved out as a backing store to Hadoop - there 
is the potential to be creating a fairly large multi-Petabyte (100s ??) class 
backing store for Ceph.  We do a very large amount of analytics on a lot of 
data sets for security trending correlations, etc...

Our current Ceph experience is limited to a few small (90 x 4TB OSD size) 
clusters - which we are working towards putting in production for Glance/Cinder 
backing and for Block storage for various large storage need platforms (eg 
software and package repo/mirrors, etc...).

Thanks in  advance for any input, thoughts, or pointers ...

~~shane

[1] http://ceph.com/docs/master/cephfs/hadoop/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can a cephfs "volume" get errors and how are they fixed?

2015-07-15 Thread John Spray



On 15/07/15 16:11, Roland Giesler wrote:



I mount cephfs in /etc/fstab and all seemed well for quite a few 
months.  Now however, I start seeing strange things like directories 
with corrupted files names in the file system.


When you encounter a serious issue, please tell us some details about 
it, like what version of ceph you are using, what client you are using 
(kernel, fuse, + version), whether you have any errors in your logs, etc.


I have a vague memory of a symptom like what you're describing happening 
with an older kernel client at some stage, but can't find a ticket about 
it right now.




My question is: How can the filesystem be checked for errors and 
fixed?  Or does it heal itself automatically.  The disks are all 
formatted with btrfs.


The underlying data storage benefits from the resilience built into 
RADOS, i.e. you don't have to worry about drive failures etc.


CephFS's fsck is in development right now.  We note this in a big red 
box at the top of the documentation[1].


By the way, if data integrity is important to you, you would be better 
off with a more conservative configuration (btrfs is not used by most 
people in production, XFS is the default).


Regards,
John

1. http://ceph.com/docs/master/cephfs/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Can a cephfs "volume" get errors and how are they fixed?

2015-07-15 Thread Roland Giesler
Hi all,

I have ceph cluster that has the following:

# ceph osd tree
# idweighttype nameup/downreweight
-111.13root default
-28.14host h1
 10.9 osd.1 up1
 30.9 osd.3 up1
 40.9 osd.4 up1
 50.68osd.5 up1
 60.68osd.6 up1
 70.68osd.7 up1
 80.68osd.8 up1
 90.68osd.9 up1
100.68osd.10up1
110.68osd.11up1
120.68osd.12up1
-30.45host s3
 20.45osd.2 up1
-40.9 host s2
130.9 osd.13up1
-51.64host s1
140.29osd.14up1
 00.27osd.0 up1
150.27osd.15up1
160.27osd.16up1
170.27osd.17up1
180.27osd.18up1

s2 and s3 will get more drives in future, but this is the setup for now.

I mount cephfs in /etc/fstab and all seemed well for quite a few months.
Now however, I start seeing strange things like directories with corrupted
files names in the file system.

My question is: How can the filesystem be checked for errors and fixed?  Or
does it heal itself automatically.  The disks are all formatted with btrfs.

thanks

*Roland*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to recover from: 1 pgs down; 10 pgs incomplete; 10 pgs stuck inactive; 10 pgs stuck unclean

2015-07-15 Thread Lionel Bouton
Le 15/07/2015 10:55, Jelle de Jong a écrit :
> On 13/07/15 15:40, Jelle de Jong wrote:
>> I was testing a ceph cluster with osd_pool_default_size = 2 and while
>> rebuilding the OSD on one ceph node a disk in an other node started
>> getting read errors and ceph kept taking the OSD down, and instead of me
>> executing ceph osd set nodown while the other node was rebuilding I kept
>> restarting the OSD for a while and ceph took the OSD in for a few
>> minutes and then taking it back down.
>>
>> I then removed the bad OSD from the cluster and later added it back in
>> with nodown flag set and a weight of zero, moving all the data away.
>> Then removed the OSD again and added a new OSD with a new hard drive.
>>
>> However I ended up with the following cluster status and I can't seem to
>> find how to get the cluster healthy again. I'm doing this as tests
>> before taking this ceph configuration in further production.
>>
>> http://paste.debian.net/plain/281922
>>
>> If I lost data, my bad, but how could I figure out in what pool the data
>> was lost and in what rbd volume (so what kvm guest lost data).
> Anybody that can help?
>
> Can I somehow reweight some OSD to resolve the problems? Or should I
> rebuild the whole cluster and loose all data?

If your min_size is 2, try setting to 1 and restart each of your OSD. If
ceph -s doesn't show any progress repairing your data, you'll have to
either get developpers to help savage what can be from your disks or
rebuild the cluster with size=3 and restore your data.

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests during ceph osd boot

2015-07-15 Thread Andrey Korolyov
On Wed, Jul 15, 2015 at 12:15 PM, Jan Schermer  wrote:
> We have the same problems, we need to start the OSDs slowly.
> The problem seems to be CPU congestion. A booting OSD will use all available 
> CPU power you give it, and if it doesn’t have enough nasty stuff happens 
> (this might actually be the manifestation of some kind of problem in our 
> setup as well).
> It doesn’t do that always - I was restarting our hosts this weekend and most 
> of them came up fine with simple “service ceph start”, some just sat there 
> spinning the CPU and not doing any real world (and the cluster was not very 
> happy about that).
>
> Jan
>
>
>> On 15 Jul 2015, at 10:53, Kostis Fardelas  wrote:
>>
>> Hello,
>> after some trial and error we concluded that if we start the 6 stopped
>> OSD daemons with a delay of 1 minute, we do not experience slow
>> requests (threshold is set on 30 sec), althrough there are some ops
>> that last up to 10s which is already high enough. I assume that if we
>> spread the delay more, the slow requests will vanish. The possibility
>> of not having tuned our setup to the most finest detail is not zeroed
>> out but I wonder if at any way we miss some ceph tuning in terms of
>> ceph configuration.
>>
>> We run firefly latest stable version.
>>
>> Regards,
>> Kostis
>>
>> On 13 July 2015 at 13:28, Kostis Fardelas  wrote:
>>> Hello,
>>> after rebooting a ceph node and the OSDs starting booting and joining
>>> the cluster, we experience slow requests that get resolved immediately
>>> after cluster recovers. It is improtant to note that before the node
>>> reboot, we set noout flag in order to prevent recovery - so there are
>>> only degraded PGs when OSDs shut down- and let the cluster handle the
>>> OSDs down/up in the lightest way.
>>>
>>> Is there any tunable we should consider in order to avoid service
>>> degradation for our ceph clients?
>>>
>>> Regards,
>>> Kostis
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


As far as I`ve seen this problem, the main issue for regular
disk-backed OSDs is an IOPS starvation during some interval after
reading maps from filestore and marking itself as 'in' - even if
in-memory caches are still hot, I/O will significantly degrade for a
short period. The possible workaround for an otherwise healthy cluster
and node-wide restart is to set norecover flag, it would greatly
reduce a chance of hitting slow operations. Of course it is applicable
only to non-empty cluster with tens of percents of an average
utilization for rotating media. I pointed this issue a couple of years
ago first (it *does* break 30s I/O SLA for returning OSD, but
refilling same OSDs from scratch would not violate the same SLA,
giving out way bigger completion time for a refill). From UX side, it
would be great to introduce some kind of recovery throttler for newly
started OSDs, as recovery_ delay_start does not prevent immediate
recovery procedures.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests during ceph osd boot

2015-07-15 Thread Jan Schermer
We have the same problems, we need to start the OSDs slowly.
The problem seems to be CPU congestion. A booting OSD will use all available 
CPU power you give it, and if it doesn’t have enough nasty stuff happens (this 
might actually be the manifestation of some kind of problem in our setup as 
well).
It doesn’t do that always - I was restarting our hosts this weekend and most of 
them came up fine with simple “service ceph start”, some just sat there 
spinning the CPU and not doing any real world (and the cluster was not very 
happy about that).

Jan


> On 15 Jul 2015, at 10:53, Kostis Fardelas  wrote:
> 
> Hello,
> after some trial and error we concluded that if we start the 6 stopped
> OSD daemons with a delay of 1 minute, we do not experience slow
> requests (threshold is set on 30 sec), althrough there are some ops
> that last up to 10s which is already high enough. I assume that if we
> spread the delay more, the slow requests will vanish. The possibility
> of not having tuned our setup to the most finest detail is not zeroed
> out but I wonder if at any way we miss some ceph tuning in terms of
> ceph configuration.
> 
> We run firefly latest stable version.
> 
> Regards,
> Kostis
> 
> On 13 July 2015 at 13:28, Kostis Fardelas  wrote:
>> Hello,
>> after rebooting a ceph node and the OSDs starting booting and joining
>> the cluster, we experience slow requests that get resolved immediately
>> after cluster recovers. It is improtant to note that before the node
>> reboot, we set noout flag in order to prevent recovery - so there are
>> only degraded PGs when OSDs shut down- and let the cluster handle the
>> OSDs down/up in the lightest way.
>> 
>> Is there any tunable we should consider in order to avoid service
>> degradation for our ceph clients?
>> 
>> Regards,
>> Kostis
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM with rbd volume hangs on write during load

2015-07-15 Thread Jan Schermer
We are getting the same log message as you, but not too often, and from what I 
gathered it is normal to see that.
Not sure how often you are seeing those.

How many disks are connected to that VM? Take a look in /proc/pid/fd and count 
the file descriptors, then compare that to /proc/pid/limits. My first guess is 
the problem is in here. Basicaly qemu needs to make a connection to every VM 
hosting the PGs the RBD image is on, and that could be thousands (and since you 
have close to 500 osds I guess you are hitting that limit much sooner than most 
folks).


Jan

> On 15 Jul 2015, at 10:23, Jeya Ganesh Babu Jegatheesan  
> wrote:
> 
> 
> 
> On 7/15/15, 12:17 AM, "Jan Schermer"  wrote:
> 
>> Do you have a comparison of the same workload on a different storage than
>> CEPH?
> 
> I dont have a direct comparison data with a different storage. I did ran
> similar workload with single VM which was booted off from a local disk and
> i didn't see any issue. But as it was a from a local disk(single disk),
> the performance was not comparable with the one with Ceph.
> 
>> 
>> I am asking because those messages indicate slow request - basically some
>> operation took 120s (which would look insanely high to a storage
>> administrator, but is sort of expected in a cloud environment). And even
>> on a regular direct attached storage, some OPs can take that look but
>> those issues are masked in drivers (they don¹t necessarily manifest as
>> ³task blocked² or ³soft lockups² but just as 100% iowait for periods of
>> time - which is considered normal under load).
>> 
>> In other words - are you seeing a real problem with your workload or just
>> the messages?
> 
> The issue is that the access to the volumes gets stuck and never recovers.
> The jbd2 kernel thread locks up. The system recovers only after a reboot.
> 
>> 
>> If you don¹t have any slow ops then you either have the warning set to
>> high, or those blocked operations consist of more than just one OP - it
>> adds up.
>> 
>> It could also be caused by a network problem (like misconfigured offloads
>> on network cards causing retransmissions/reorders and such) or if for
>> example you run out of file desriptors on either the client or server
>> side, it manifests as some requests getting stuck _without_ getting any
>> slow ops on the ceph cluster side.
> 
> Yes, we are checking the network as well. Btw i see the following messages
> in the osd logs (somewhere around 5-6 message for the day per osd), could
> this point that there is some network issue? would this cause a write to
> get stuck?
>   
>   ceph-osd.296.log:2015-07-14 19:17:04.907107 7ffd1fc43700  0 --
> 10.163.45.3:6893/2046 submit_message osd_op_reply(562032
> rbd_data.2f982f3c214f5
>   9de.003d [stat,set-alloc-hint object_size 4194304 write_size
> 4194304,write 1253376~4096] v54601'407287 uv407287 ack = 0) v6 remote
>   , 10.163.43.1:0/1076109, failed lossy con, dropping message 0x11cd5600
>   
>   2015-07-14 17:35:24.209722 7ffc67c4d700  0 -- 10.163.45.3:6893/2046 >>
> 10.163.42.14:0/1004886 pipe(0x18521c80 sd=255 :6893 s=0 pgs=0 cs=0 l=1
> c=0xd86a680).accept
>   replacing existing (lossy) channel (new one lossy=1)
> 
> 
>> 
>> 
>> And an obligatory question - you say your OSDs don¹t use much CPU, but
>> how are the disks? Aren¹t some of them 100% utilized when this happens?
> 
> Disk as well are not fully utilized, the iops and bandwidth are quite low.
> The load as such is evenly distributed across the disks.
> 
>> 
>> Jan
>> 
>>> On 15 Jul 2015, at 02:23, Jeya Ganesh Babu Jegatheesan
>>>  wrote:
>>> 
>>> 
>>> 
>>> On 7/14/15, 4:56 PM, "ceph-users on behalf of Wido den Hollander"
>>>  wrote:
>>> 
 On 07/15/2015 01:17 AM, Jeya Ganesh Babu Jegatheesan wrote:
> Hi,
> 
> We have a Openstack + Ceph cluster based on Giant release. We use ceph
> for the VMs volumes including the boot volumes. Under load, we see the
> write access to the volumes stuck from within the VM. The same would
> work after a VM reboot. The issue is seen with and without rbd cache.
> Let me know if this is some known issue and any way to debug further.
> The ceph cluster itself seems to be clean. We have currently disabled
> scrub and deep scrub. 'ceph -s' output as below.
> 
 
 Are you seeing slow requests in the system?
>>> 
>>> I dont see slow requests in the cluster.
>>> 
 
 Are any of the disks under the OSDs 100% busy or close to it?
>>> 
>>> Most of the OSDs use 20% of a core. There is no OSD process busy at
>>> 100%.
>>> 
 
 Btw, the amount of PGs is rather high. You are at 88, while the formula
 recommends:
 
 num_osd * 100 / 3 = 14k (cluster total)
>>> 
>>> We used 30 * num_osd per pool. We do have 4 pools, i believe thats the
>>> why
>>> the PG seems to be be high.
>>> 
 
 Wido
 
>   cluster eaaeaa55-a8e7-4531-a5eb-03d73028b59d
>health HEALTH_WARN noscrub,nodeep-scrub flag(s) s

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-15 Thread Mallikarjun Biradar
cluster state:
 osdmap e3240: 24 osds: 12 up, 12 in
  pgmap v46050: 1088 pgs, 2 pools, 20322 GB data, 5080 kobjects
4 GB used, 61841 GB / 84065 GB avail
4745644/10405374 objects degraded (45.608%);
3688079/10405374 objects misplaced (35.444%)
   5 stale+active+clean
  59 active+clean
  74 active+undersized+degraded+remapped+backfilling
  53 active+remapped
 577 active+undersized+degraded
  37 down+peering
 283 active+undersized+degraded+remapped+wait_backfill
recovery io 844 MB/s, 211 objects/s

On Wed, Jul 15, 2015 at 2:29 PM, Mallikarjun Biradar
 wrote:
> Sorry for delay in replying to this, as I was doing some retries on
> this issue and summarise.
>
>
> Tony,
> Setup details:
> Two storage box (each with 12 drives) , each connected with 4 hosts.
> Each host own 3 disk from storage box. Total of 24 OSD's.
> Failure domain is at Chassis level.
>
> OSD tree:
>  -1  164.2   root default
> -7  82.08   chassis chassis1
> -2  20.52   host host-1
> 0   6.84osd.0   up  1
> 1   6.84osd.1   up  1
> 2   6.84osd.2   up  1
> -3  20.52   host host-2
> 3   6.84osd.3   up  1
> 4   6.84osd.4   up  1
> 5   6.84osd.5   up  1
> -4  20.52   host host-3
> 6   6.84osd.6   up  1
> 7   6.84osd.7   up  1
> 8   6.84osd.8   up  1
> -5  20.52   host host-4
> 9   6.84osd.9   up  1
> 10  6.84osd.10  up  1
> 11  6.84osd.11  up  1
> -8  82.08   chassis chassis2
> -6  20.52   host host-5
> 12  6.84osd.12  up  1
> 13  6.84osd.13  up  1
> 14  6.84osd.14  up  1
> -9  20.52   host host-6
> 15  6.84osd.15  up  1
> 16  6.84osd.16  up  1
> 17  6.84osd.17  up  1
> -10 20.52   host host-7
> 18  6.84osd.18  up  1
> 19  6.84osd.19  up  1
> 20  6.84osd.20  up  1
> -11 20.52   host host-8
> 21  6.84osd.21  up  1
> 22  6.84osd.22  up  1
> 23  6.84osd.23  up  1
>
> Cluster had ~30TB of data. Client IO is in progress on cluster.
> After chassis1 underwent powercycle,
> 1> all OSD's under chassis2 were intact. Up & running
> 2> all OSD's under chassis1 were down as expected.
>
> But, client IO was paused untill all the hosts/OSD's under chassis1
> comes up. This issue is observed twice out of 5 attempts.
>
> Size is 2 & min_size is 1.
>
> -Thanks,
> Mallikarjun
>
>
> On Thu, Jul 9, 2015 at 8:01 PM, Tony Harris  wrote:
>> Sounds to me like you've put yourself at too much risk - *if* I'm reading
>> your message right about your configuration, you have multiple hosts
>> accessing OSDs that are stored on a single shared box - so if that single
>> shared box (single point of failure for multiple nodes) goes down it's
>> possible for multiple replicas to disappear at the same time which could
>> halt the operation of your cluster if the masters and the replicas are both
>> on OSDs within that single shared storage system...
>>
>> On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar
>>  wrote:
>>>
>>> Hi all,
>>>
>>> Setup details:
>>> Two storage enclosures each connected to 4 OSD nodes (Shared storage).
>>> Failure domain is Chassis (enclosure) level. Replication count is 2.
>>> Each host has allotted with 4 drives.
>>>
>>> I have active client IO running on cluster. (Random write profile with
>>> 4M block size & 64 Queue depth).
>>>
>>> One of enclosure had power loss. So all OSD's from hosts that are
>>> connected to this enclosure went down as expected.
>>>
>>> But client IO got paused. After some time enclosure & hosts connected
>>> to it came up.
>>> And all OSD's on that hosts came up.
>>>
>>> Till this time, cluster was not serving IO. Once all hosts & OSD's
>>> pertaining to that enclosure came up, client IO resumed.
>>>
>>>
>>> Can anybody help me why cluster not serving IO during enclosure
>>> failure. OR its a bug?
>>>
>>> -Thanks & regards,
>>> Mallikarjun Biradar
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.co

Re: [ceph-users] Enclosure power failure pausing client IO till all connected hosts up

2015-07-15 Thread Mallikarjun Biradar
Sorry for delay in replying to this, as I was doing some retries on
this issue and summarise.


Tony,
Setup details:
Two storage box (each with 12 drives) , each connected with 4 hosts.
Each host own 3 disk from storage box. Total of 24 OSD's.
Failure domain is at Chassis level.

OSD tree:
 -1  164.2   root default
-7  82.08   chassis chassis1
-2  20.52   host host-1
0   6.84osd.0   up  1
1   6.84osd.1   up  1
2   6.84osd.2   up  1
-3  20.52   host host-2
3   6.84osd.3   up  1
4   6.84osd.4   up  1
5   6.84osd.5   up  1
-4  20.52   host host-3
6   6.84osd.6   up  1
7   6.84osd.7   up  1
8   6.84osd.8   up  1
-5  20.52   host host-4
9   6.84osd.9   up  1
10  6.84osd.10  up  1
11  6.84osd.11  up  1
-8  82.08   chassis chassis2
-6  20.52   host host-5
12  6.84osd.12  up  1
13  6.84osd.13  up  1
14  6.84osd.14  up  1
-9  20.52   host host-6
15  6.84osd.15  up  1
16  6.84osd.16  up  1
17  6.84osd.17  up  1
-10 20.52   host host-7
18  6.84osd.18  up  1
19  6.84osd.19  up  1
20  6.84osd.20  up  1
-11 20.52   host host-8
21  6.84osd.21  up  1
22  6.84osd.22  up  1
23  6.84osd.23  up  1

Cluster had ~30TB of data. Client IO is in progress on cluster.
After chassis1 underwent powercycle,
1> all OSD's under chassis2 were intact. Up & running
2> all OSD's under chassis1 were down as expected.

But, client IO was paused untill all the hosts/OSD's under chassis1
comes up. This issue is observed twice out of 5 attempts.

Size is 2 & min_size is 1.

-Thanks,
Mallikarjun


On Thu, Jul 9, 2015 at 8:01 PM, Tony Harris  wrote:
> Sounds to me like you've put yourself at too much risk - *if* I'm reading
> your message right about your configuration, you have multiple hosts
> accessing OSDs that are stored on a single shared box - so if that single
> shared box (single point of failure for multiple nodes) goes down it's
> possible for multiple replicas to disappear at the same time which could
> halt the operation of your cluster if the masters and the replicas are both
> on OSDs within that single shared storage system...
>
> On Thu, Jul 9, 2015 at 5:42 AM, Mallikarjun Biradar
>  wrote:
>>
>> Hi all,
>>
>> Setup details:
>> Two storage enclosures each connected to 4 OSD nodes (Shared storage).
>> Failure domain is Chassis (enclosure) level. Replication count is 2.
>> Each host has allotted with 4 drives.
>>
>> I have active client IO running on cluster. (Random write profile with
>> 4M block size & 64 Queue depth).
>>
>> One of enclosure had power loss. So all OSD's from hosts that are
>> connected to this enclosure went down as expected.
>>
>> But client IO got paused. After some time enclosure & hosts connected
>> to it came up.
>> And all OSD's on that hosts came up.
>>
>> Till this time, cluster was not serving IO. Once all hosts & OSD's
>> pertaining to that enclosure came up, client IO resumed.
>>
>>
>> Can anybody help me why cluster not serving IO during enclosure
>> failure. OR its a bug?
>>
>> -Thanks & regards,
>> Mallikarjun Biradar
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to recover from: 1 pgs down; 10 pgs incomplete; 10 pgs stuck inactive; 10 pgs stuck unclean

2015-07-15 Thread Jelle de Jong
On 13/07/15 15:40, Jelle de Jong wrote:
> I was testing a ceph cluster with osd_pool_default_size = 2 and while
> rebuilding the OSD on one ceph node a disk in an other node started
> getting read errors and ceph kept taking the OSD down, and instead of me
> executing ceph osd set nodown while the other node was rebuilding I kept
> restarting the OSD for a while and ceph took the OSD in for a few
> minutes and then taking it back down.
> 
> I then removed the bad OSD from the cluster and later added it back in
> with nodown flag set and a weight of zero, moving all the data away.
> Then removed the OSD again and added a new OSD with a new hard drive.
> 
> However I ended up with the following cluster status and I can't seem to
> find how to get the cluster healthy again. I'm doing this as tests
> before taking this ceph configuration in further production.
> 
> http://paste.debian.net/plain/281922
> 
> If I lost data, my bad, but how could I figure out in what pool the data
> was lost and in what rbd volume (so what kvm guest lost data).

Anybody that can help?

Can I somehow reweight some OSD to resolve the problems? Or should I
rebuild the whole cluster and loose all data?

Kind regards,

Jelle de Jong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Slow requests during ceph osd boot

2015-07-15 Thread Kostis Fardelas
Hello,
after some trial and error we concluded that if we start the 6 stopped
OSD daemons with a delay of 1 minute, we do not experience slow
requests (threshold is set on 30 sec), althrough there are some ops
that last up to 10s which is already high enough. I assume that if we
spread the delay more, the slow requests will vanish. The possibility
of not having tuned our setup to the most finest detail is not zeroed
out but I wonder if at any way we miss some ceph tuning in terms of
ceph configuration.

We run firefly latest stable version.

Regards,
Kostis

On 13 July 2015 at 13:28, Kostis Fardelas  wrote:
> Hello,
> after rebooting a ceph node and the OSDs starting booting and joining
> the cluster, we experience slow requests that get resolved immediately
> after cluster recovers. It is improtant to note that before the node
> reboot, we set noout flag in order to prevent recovery - so there are
> only degraded PGs when OSDs shut down- and let the cluster handle the
> OSDs down/up in the lightest way.
>
> Is there any tunable we should consider in order to avoid service
> degradation for our ceph clients?
>
> Regards,
> Kostis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM with rbd volume hangs on write during load

2015-07-15 Thread Jeya Ganesh Babu Jegatheesan


On 7/15/15, 12:17 AM, "Jan Schermer"  wrote:

>Do you have a comparison of the same workload on a different storage than
>CEPH?

I dont have a direct comparison data with a different storage. I did ran
similar workload with single VM which was booted off from a local disk and
i didn't see any issue. But as it was a from a local disk(single disk),
the performance was not comparable with the one with Ceph.

>
>I am asking because those messages indicate slow request - basically some
>operation took 120s (which would look insanely high to a storage
>administrator, but is sort of expected in a cloud environment). And even
>on a regular direct attached storage, some OPs can take that look but
>those issues are masked in drivers (they don¹t necessarily manifest as
>³task blocked² or ³soft lockups² but just as 100% iowait for periods of
>time - which is considered normal under load).
>
>In other words - are you seeing a real problem with your workload or just
>the messages?

The issue is that the access to the volumes gets stuck and never recovers.
The jbd2 kernel thread locks up. The system recovers only after a reboot.

>
>If you don¹t have any slow ops then you either have the warning set to
>high, or those blocked operations consist of more than just one OP - it
>adds up.
>
>It could also be caused by a network problem (like misconfigured offloads
>on network cards causing retransmissions/reorders and such) or if for
>example you run out of file desriptors on either the client or server
>side, it manifests as some requests getting stuck _without_ getting any
>slow ops on the ceph cluster side.

Yes, we are checking the network as well. Btw i see the following messages
in the osd logs (somewhere around 5-6 message for the day per osd), could
this point that there is some network issue? would this cause a write to
get stuck?

ceph-osd.296.log:2015-07-14 19:17:04.907107 7ffd1fc43700  0 --
10.163.45.3:6893/2046 submit_message osd_op_reply(562032
rbd_data.2f982f3c214f5
9de.003d [stat,set-alloc-hint object_size 4194304 write_size
4194304,write 1253376~4096] v54601'407287 uv407287 ack = 0) v6 remote
, 10.163.43.1:0/1076109, failed lossy con, dropping message 0x11cd5600

2015-07-14 17:35:24.209722 7ffc67c4d700  0 -- 10.163.45.3:6893/2046 >>
10.163.42.14:0/1004886 pipe(0x18521c80 sd=255 :6893 s=0 pgs=0 cs=0 l=1
c=0xd86a680).accept
replacing existing (lossy) channel (new one lossy=1)


> 
>
>And an obligatory question - you say your OSDs don¹t use much CPU, but
>how are the disks? Aren¹t some of them 100% utilized when this happens?

Disk as well are not fully utilized, the iops and bandwidth are quite low.
The load as such is evenly distributed across the disks.

>
>Jan
>
>> On 15 Jul 2015, at 02:23, Jeya Ganesh Babu Jegatheesan
>> wrote:
>> 
>> 
>> 
>> On 7/14/15, 4:56 PM, "ceph-users on behalf of Wido den Hollander"
>>  wrote:
>> 
>>> On 07/15/2015 01:17 AM, Jeya Ganesh Babu Jegatheesan wrote:
 Hi,
 
 We have a Openstack + Ceph cluster based on Giant release. We use ceph
 for the VMs volumes including the boot volumes. Under load, we see the
 write access to the volumes stuck from within the VM. The same would
 work after a VM reboot. The issue is seen with and without rbd cache.
 Let me know if this is some known issue and any way to debug further.
 The ceph cluster itself seems to be clean. We have currently disabled
 scrub and deep scrub. 'ceph -s' output as below.
 
>>> 
>>> Are you seeing slow requests in the system?
>> 
>> I dont see slow requests in the cluster.
>> 
>>> 
>>> Are any of the disks under the OSDs 100% busy or close to it?
>> 
>> Most of the OSDs use 20% of a core. There is no OSD process busy at
>>100%.
>> 
>>> 
>>> Btw, the amount of PGs is rather high. You are at 88, while the formula
>>> recommends:
>>> 
>>> num_osd * 100 / 3 = 14k (cluster total)
>> 
>> We used 30 * num_osd per pool. We do have 4 pools, i believe thats the
>>why
>> the PG seems to be be high.
>> 
>>> 
>>> Wido
>>> 
cluster eaaeaa55-a8e7-4531-a5eb-03d73028b59d
 health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
 monmap e71: 9 mons at
 
{gngsvc009a=10.163.43.1:6789/0,gngsvc009b=10.163.43.2:6789/0,gngsvc010a
=1
 
0.163.43.5:6789/0,gngsvc010b=10.163.43.6:6789/0,gngsvc011a=10.163.43.9:
67
 
89/0,gngsvc011b=10.163.43.10:6789/0,gngsvc011c=10.163.43.11:6789/0,gngs
vm
 010d=10.163.43.8:6789/0,gngsvm011d=10.163.43.12:6789/0}, election
epoch
 22246, quorum 0,1,2,3,4,5,6,7,8
 
gngsvc009a,gngsvc009b,gngsvc010a,gngsvc010b,gngsvm010d,gngsvc011a,gngsv
c0
 11b,gngsvc011c,gngsvm011d
 osdmap e54600: 425 osds: 425 up, 425 in
flags noscrub,nodeep-scrub
  pgmap v13257438: 37620 pgs, 4 pools, 134 TB data, 35289 kobjects
402 TB used, 941 TB / 1344 TB avail
   37620 active+clean
  client

Re: [ceph-users] mds0: Client failing to respond to cache pressure

2015-07-15 Thread John Spray



On 15/07/15 04:06, Eric Eastman wrote:

Hi John,

I cut the test down to a single client running only Ganesha NFS
without any ceph drivers loaded on the Ceph FS client.  After deleting
all the files in the Ceph file system, rebooting all the nodes, I
restarted the create 5 million file test using 2 NFS clients to the
one Ceph file system node running Ganesha NFS. After a couple hours I
am seeing the  client ede-c2-gw01 failing to respond to cache pressure
error:


Thanks -- that's a very useful datapoint.  I've created a ticket here:
http://tracker.ceph.com/issues/12334

Looking forward to seeing if samba has the same issue.

John

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] VM with rbd volume hangs on write during load

2015-07-15 Thread Jan Schermer
Do you have a comparison of the same workload on a different storage than CEPH?

I am asking because those messages indicate slow request - basically some 
operation took 120s (which would look insanely high to a storage administrator, 
but is sort of expected in a cloud environment). And even on a regular direct 
attached storage, some OPs can take that look but those issues are masked in 
drivers (they don’t necessarily manifest as “task blocked” or “soft lockups” 
but just as 100% iowait for periods of time - which is considered normal under 
load).

In other words - are you seeing a real problem with your workload or just the 
messages?

If you don’t have any slow ops then you either have the warning set to high, or 
those blocked operations consist of more than just one OP - it adds up.

It could also be caused by a network problem (like misconfigured offloads on 
network cards causing retransmissions/reorders and such) or if for example you 
run out of file desriptors on either the client or server side, it manifests as 
some requests getting stuck _without_ getting any slow ops on the ceph cluster 
side. 

And an obligatory question - you say your OSDs don’t use much CPU, but how are 
the disks? Aren’t some of them 100% utilized when this happens?

Jan

> On 15 Jul 2015, at 02:23, Jeya Ganesh Babu Jegatheesan  
> wrote:
> 
> 
> 
> On 7/14/15, 4:56 PM, "ceph-users on behalf of Wido den Hollander"
>  wrote:
> 
>> On 07/15/2015 01:17 AM, Jeya Ganesh Babu Jegatheesan wrote:
>>> Hi,
>>> 
>>> We have a Openstack + Ceph cluster based on Giant release. We use ceph
>>> for the VMs volumes including the boot volumes. Under load, we see the
>>> write access to the volumes stuck from within the VM. The same would
>>> work after a VM reboot. The issue is seen with and without rbd cache.
>>> Let me know if this is some known issue and any way to debug further.
>>> The ceph cluster itself seems to be clean. We have currently disabled
>>> scrub and deep scrub. 'ceph -s' output as below.
>>> 
>> 
>> Are you seeing slow requests in the system?
> 
> I dont see slow requests in the cluster.
> 
>> 
>> Are any of the disks under the OSDs 100% busy or close to it?
> 
> Most of the OSDs use 20% of a core. There is no OSD process busy at 100%.
> 
>> 
>> Btw, the amount of PGs is rather high. You are at 88, while the formula
>> recommends:
>> 
>> num_osd * 100 / 3 = 14k (cluster total)
> 
> We used 30 * num_osd per pool. We do have 4 pools, i believe thats the why
> the PG seems to be be high.
> 
>> 
>> Wido
>> 
>>>cluster eaaeaa55-a8e7-4531-a5eb-03d73028b59d
>>> health HEALTH_WARN noscrub,nodeep-scrub flag(s) set
>>> monmap e71: 9 mons at
>>> {gngsvc009a=10.163.43.1:6789/0,gngsvc009b=10.163.43.2:6789/0,gngsvc010a=1
>>> 0.163.43.5:6789/0,gngsvc010b=10.163.43.6:6789/0,gngsvc011a=10.163.43.9:67
>>> 89/0,gngsvc011b=10.163.43.10:6789/0,gngsvc011c=10.163.43.11:6789/0,gngsvm
>>> 010d=10.163.43.8:6789/0,gngsvm011d=10.163.43.12:6789/0}, election epoch
>>> 22246, quorum 0,1,2,3,4,5,6,7,8
>>> gngsvc009a,gngsvc009b,gngsvc010a,gngsvc010b,gngsvm010d,gngsvc011a,gngsvc0
>>> 11b,gngsvc011c,gngsvm011d
>>> osdmap e54600: 425 osds: 425 up, 425 in
>>>flags noscrub,nodeep-scrub
>>>  pgmap v13257438: 37620 pgs, 4 pools, 134 TB data, 35289 kobjects
>>>402 TB used, 941 TB / 1344 TB avail
>>>   37620 active+clean
>>>  client io 94059 kB/s rd, 313 MB/s wr, 4623 op/s
>>> 
>>> 
>>> The traces we see in the VM's kernel are as below.
>>> 
>>> [ 1080.552901] INFO: task jbd2/vdb-8:813 blocked for more than 120
>>> seconds.
>>> [ 1080.553027]   Tainted: GF  O 3.13.0-34-generic
>>> #60~precise1-Ubuntu
>>> [ 1080.553157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
>>> disables this message.
>>> [ 1080.553295] jbd2/vdb-8  D 88003687e3e0 0   813  2
>>> 0x
>>> [ 1080.553298]  880444fadb48 0002 880455114440
>>> 880444fadfd8
>>> [ 1080.553302]  00014440 00014440 88044a9317f0
>>> 88044b7917f0
>>> [ 1080.553303]  880444fadb48 880455114cd8 88044b7917f0
>>> 811fc670
>>> [ 1080.553307] Call Trace:
>>> [ 1080.553309]  [] ? __wait_on_buffer+0x30/0x30
>>> [ 1080.553311]  [] schedule+0x29/0x70
>>> [ 1080.553313]  [] io_schedule+0x8f/0xd0
>>> [ 1080.553315]  [] sleep_on_buffer+0xe/0x20
>>> [ 1080.553316]  [] __wait_on_bit+0x62/0x90
>>> [ 1080.553318]  [] ? __wait_on_buffer+0x30/0x30
>>> [ 1080.553320]  [] out_of_line_wait_on_bit+0x7c/0x90
>>> [ 1080.553322]  [] ? wake_atomic_t_function+0x40/0x40
>>> [ 1080.553324]  [] __wait_on_buffer+0x2e/0x30
>>> [ 1080.553326]  []
>>> jbd2_journal_commit_transaction+0x136b/0x1520
>>> [ 1080.553329]  [] ? sched_clock_local+0x25/0x90
>>> [ 1080.553331]  [] ? finish_task_switch+0x128/0x170
>>> [ 1080.55]  [] ? try_to_del_timer_sync+0x4f/0x70
>>> [ 1080.553334]  [] kjournald2+0xb8/0x240
>>> [ 1080.553336]  [] ? __wake_up_sync+0x20/0x20
>>> [ 1080.553338]  [] ? commit_timeou