[ceph-users] ceph can't recognize ext4 extended attributes when --mkfs --mkkey

2015-03-03 Thread wsnote
ceph version 0.80.1
System: CentOS 6.5


[root@dn1 osd.6]# mount
/dev/sde1 on /cache4 type ext4 (rw,noatime,user_xattr) —— osd.6
/dev/sdf1 on /cache5 type ext4 (rw,noatime,user_xattr) —— osd.7
/dev/sdg1 on /cache6 type ext4 (rw,noatime,user_xattr) —— osd.8
/dev/sdh1 on /cache7 type ext4 (rw,noatime,user_xattr) —— osd.9
/dev/sdi1 on /cache8 type ext4 (rw,noatime,user_xattr) —— osd.10
/dev/sdj1 on /cache9 type ext4 (rw,noatime,user_xattr) —— osd.11


[root@dn1 osd.6]# ceph-osd -i 6 --mkfs --mkkey
2015-03-03 15:52:12.156548 7fba6de2b7a0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-03-03 15:52:12.468304 7fba6de2b7a0 -1 filestore(/cache4/osd.6) Extended 
attributes don't appear to work. Got error (95) Operation not supported. If you 
are using ext3 or ext4, be sure to mount the underlying file system with the 
'user_xattr' option.
2015-03-03 15:52:12.468367 7fba6de2b7a0 -1 filestore(/cache4/osd.6) 
FileStore::mount : error in _detect_fs: (95) Operation not supported
2015-03-03 15:52:12.468387 7fba6de2b7a0 -1 OSD::mkfs: couldn't mount 
ObjectStore: error -95
2015-03-03 15:52:12.468470 7fba6de2b7a0 -1  ** ERROR: error creating empty 
object store in /cache4/osd.6: (95) Operation not supported




[root@dn1 osd.6]# tail -f /var/log/ceph/osd.6.log
2015-03-03 15:52:11.770484 7fba6de2b7a0  0 ceph version 0.80.1 
(a38fe1169b6d2ac98b427334c12d7cf81f809b74), process ceph-osd, pid 30336
2015-03-03 15:52:12.156548 7fba6de2b7a0 -1 journal FileJournal::_open: 
disabling aio for non-block journal.  Use journal_force_aio to force use of aio 
anyway
2015-03-03 15:52:12.224362 7fba6de2b7a0  0 filestore(/cache4/osd.6) mkjournal 
created journal on /cache4/osd.6/journal
2015-03-03 15:52:12.274706 7fba6de2b7a0  0 
genericfilestorebackend(/cache4/osd.6) detect_features: FIEMAP ioctl is 
supported and appears to work
2015-03-03 15:52:12.274733 7fba6de2b7a0  0 
genericfilestorebackend(/cache4/osd.6) detect_features: FIEMAP ioctl is 
disabled via 'filestore fiemap' config option
2015-03-03 15:52:12.468181 7fba6de2b7a0  0 
genericfilestorebackend(/cache4/osd.6) detect_features: syscall(SYS_syncfs, fd) 
fully supported
2015-03-03 15:52:12.468304 7fba6de2b7a0 -1 filestore(/cache4/osd.6) Extended 
attributes don't appear to work. Got error (95) Operation not supported. If you 
are using ext3 or ext4, be sure to mount the underlying file system with the 
'user_xattr' option.
2015-03-03 15:52:12.468367 7fba6de2b7a0 -1 filestore(/cache4/osd.6) 
FileStore::mount : error in _detect_fs: (95) Operation not supported
2015-03-03 15:52:12.468387 7fba6de2b7a0 -1 OSD::mkfs: couldn't mount 
ObjectStore: error -95
2015-03-03 15:52:12.468470 7fba6de2b7a0 -1  ** ERROR: error creating empty 
object store in /cache4/osd.6: (95) Operation not supported


Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Irek Fasikhov
You have a number of replication?

2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Hi Irek,

 yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
 degraded and moved/recovered.
 When I after that removed it from Crush map ceph osd crush rm id,
 that's when the stuff with 37% happened.

 And thanks Irek for help - could you kindly just let me know of the
 prefered steps when removing whole node?
 Do you mean I first stop all OSDs again, or just remove each OSD from
 crush map, or perhaps, just decompile cursh map, delete the node
 completely, compile back in, and let it heal/recover ?

 Do you think this would result in less data missplaces and moved arround ?

 Sorry for bugging you, I really appreaciate your help.

 Thanks

 On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low percentage
 degradation). If you had not made ceph osd crush rm id, the percentage
 would be low.
 In your case, the correct option is to remove the entire node, rather
 than each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround -
 this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the
 process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon
 /var/run/ceph/ceph-osd.94.asok config show  | grep 
 osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it
 caused over 37% od the data to rebalance - let's say this is fine (this 
 is
 when I removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working
 as expected, since first 1h was almoust no throtling applied if I check 
 the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on
 one SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Irek Fasikhov
Once you have only three nodes in the cluster.
I recommend you add new nodes to the cluster, and then delete the old.

2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 You have a number of replication?

 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Hi Irek,

 yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
 degraded and moved/recovered.
 When I after that removed it from Crush map ceph osd crush rm id,
 that's when the stuff with 37% happened.

 And thanks Irek for help - could you kindly just let me know of the
 prefered steps when removing whole node?
 Do you mean I first stop all OSDs again, or just remove each OSD from
 crush map, or perhaps, just decompile cursh map, delete the node
 completely, compile back in, and let it heal/recover ?

 Do you think this would result in less data missplaces and moved arround ?

 Sorry for bugging you, I really appreaciate your help.

 Thanks

 On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low percentage
 degradation). If you had not made ceph osd crush rm id, the percentage
 would be low.
 In your case, the correct option is to remove the entire node, rather
 than each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround
 - this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the 
 cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the
 process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon
 /var/run/ceph/ceph-osd.94.asok config show  | grep 
 osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it
 caused over 37% od the data to rebalance - let's say this is fine (this 
 is
 when I removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the 
 duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here
 working as expected, since first 1h was almoust no throtling applied if 
 I
 check the recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on
 one SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Objects, created with Rados Gateway, have incorrect UTC timestamp

2015-03-03 Thread Sergey Arkhipov
Hi,

I have a problem with timestamps of objects created in Rados Gateway.
Timestamps are supposed to be in UTC timezone but instead I have strange
offset shift.

Server with Rados Gateway use MSK timezone (GMT +3). NTP is set, up and
running correctly. Rados Gateway and Ceph has no objects (usage log is
empty). Then I use Boto to create some buckets and objects:

$ date
Чтв Фев 26 11:29:05 MSK 2015
$ python fill_smth.py
$ date
Чтв Фев 26 11:29:16 MSK 2015

As you can see, my local time is *11*:29:05, in UTC it is *09*:29:05.
After that I fetch Rados Gateway usage log:

$ date
Чтв Фев 26 11:35:35 MSK 2015
$ radosgw-admin usage show --uid=2733d594-2f5a-46f7-9174-68000ce754c8
{ entries: [
{ owner: 2733d594-2f5a-46f7-9174-68000ce754c8,
  buckets: [
{ bucket: 0f2f1c7e-f420-4b36-8ff0-333fd9523902,
  time: 2015-02-26 05:00:00.00Z,
  epoch: 1424926800,
  categories: [
{ category: create_bucket,
  bytes_sent: 0,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: get_obj,
  bytes_sent: 88,
  bytes_received: 0,
  ops: 4,
  successful_ops: 4},
{ category: list_bucket,
  bytes_sent: 1585,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: put_obj,
  bytes_sent: 0,
  bytes_received: 88,
  ops: 4,
  successful_ops: 4}]},
{ bucket: 6ab239b4-1806-441f-8831-85fb3c0cf7a8,
  time: 2015-02-26 05:00:00.00Z,
  epoch: 1424926800,
  categories: [
{ category: create_bucket,
  bytes_sent: 0,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: get_obj,
  bytes_sent: 110,
  bytes_received: 0,
  ops: 5,
  successful_ops: 5},
{ category: list_bucket,
  bytes_sent: 1916,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: put_obj,
  bytes_sent: 0,
  bytes_received: 110,
  ops: 5,
  successful_ops: 5}]},
{ bucket: b461cb37-c7a0-4e56-8444-b190452f5c6a,
  time: 2015-02-26 05:00:00.00Z,
  epoch: 1424926800,
  categories: [
{ category: create_bucket,
  bytes_sent: 0,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: get_obj,
  bytes_sent: 44,
  bytes_received: 0,
  ops: 2,
  successful_ops: 2},
{ category: list_bucket,
  bytes_sent: 923,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: put_obj,
  bytes_sent: 0,
  bytes_received: 44,
  ops: 2,
  successful_ops: 2}]},
{ bucket: e7d7ef55-9eeb-4d43-9d58-48dd373261ba,
  time: 2015-02-26 05:00:00.00Z,
  epoch: 1424926800,
  categories: [
{ category: create_bucket,
  bytes_sent: 0,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: get_obj,
  bytes_sent: 66,
  bytes_received: 0,
  ops: 3,
  successful_ops: 3},
{ category: list_bucket,
  bytes_sent: 1254,
  bytes_received: 0,
  ops: 1,
  successful_ops: 1},
{ category: put_obj,
  bytes_sent: 0,
  bytes_received: 66,
  ops: 3,
  successful_ops: 3}]},
{ bucket: 

Re: [ceph-users] Question regarding rbd cache

2015-03-03 Thread Jason Dillaman
librbd caches data at a buffer / block level.  In a simplified example, if you 
are reading and writing random 4K blocks, the librbd cache would store only 
those individual 4K blocks.  Behind the scenes, it is possible for adjacent 
block buffers to be merged together within the librbd cache.  Therefore, if you 
read a whole object worth of adjacent blocks, the whole object could be stored 
in the cache as a single entry due to merging -- assuming no cache trimming 
occurred to evict blocks.

When a flush occurs, only buffers that are flagged as dirty written are written 
back to the OSDs.  The whole object would not be written to the OSDs unless you 
wrote data to the whole object.

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message - 
From: Xu (Simon) Chen xche...@gmail.com 
To: ceph-users@lists.ceph.com 
Sent: Wednesday, February 25, 2015 7:12:01 PM 
Subject: [ceph-users] Question regarding rbd cache 

Hi folks, 

I am curious about how RBD cache works, whether it caches and writes back 
entire objects. For example, if my VM images are stored with order 23 (8MB 
blocks), would a 64MB rbd cache only be able to cache 8 objects at a time? Or 
does it work at a more granular fashion? Also, when a sync/flush happens, would 
the entire 8MB block be written back to ceph, or maybe some offset writes 
happens? 

Thanks. 
-Simon 


___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-03 Thread SCHAER Frederic
Hi,

I am attempting to test the cephfs filesystem layouts.
I created a user with rights to write only in one pool :

client.puppet
key:zzz
caps: [mon] allow r
caps: [osd] allow rwx pool=puppet

I also created another pool in which I would assume this user is allowed to do 
nothing after I successfully configure things.
By the way : looks like the ceph fs ls command is inconsistent when the 
cephfs is mounted (I used a locally compiled kmod-ceph rpm):

[root@ceph0 ~]# ceph fs ls
name: cephfs_puppet, metadata pool: puppet_metadata, data pools: [puppet ]
(umount /mnt ...)
[root@ceph0 ~]# ceph fs ls
name: cephfs_puppet, metadata pool: puppet_metadata, data pools: [puppet root ]

So, I have this pool named root that I added in the cephfs filesystem.
I then edited the filesystem xattrs :

[root@ceph0 ~]# getfattr -n ceph.dir.layout /mnt/root
getfattr: Removing leading '/' from absolute path names
# file: mnt/root
ceph.dir.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=root

I'm therefore assuming client.puppet should not be allowed to write or read 
anything in /mnt/root, which belongs to the root pool... but that is not the 
case.
On another machine where I mounted cephfs using the client.puppet key, I can do 
this :

The mount was done with the client.puppet key, not the admin one that is not 
deployed on that node :
1.2.3.4:6789:/ on /mnt type ceph 
(rw,relatime,name=puppet,secret=hidden,nodcache)

[root@dev7248 ~]# echo not allowed  /mnt/root/secret.notfailed
[root@dev7248 ~]#
[root@dev7248 ~]# cat /mnt/root/secret.notfailed
not allowed

And I can even see the xattrs inherited from the parent dir :
[root@dev7248 ~]# getfattr -n ceph.file.layout /mnt/root/secret.notfailed
getfattr: Removing leading '/' from absolute path names
# file: mnt/root/secret.notfailed
ceph.file.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=root

Whereas on the node where I mounted cephfs as ceph admin, I get nothing :
[root@ceph0 ~]# cat /mnt/root/secret.notfailed
[root@ceph0 ~]# ls -l /mnt/root/secret.notfailed
-rw-r--r-- 1 root root 12 Mar  3 15:27 /mnt/root/secret.notfailed

After some time, the file also gets empty on the puppet client host :
[root@dev7248 ~]# cat /mnt/root/secret.notfailed
[root@dev7248 ~]#
(but the metadata remained ?)

Also, as an unpriviledged user, I can get ownership of a secret file by 
changing the extended attribute :

[root@dev7248 ~]# setfattr -n ceph.file.layout.pool -v puppet 
/mnt/root/secret.notfailed
[root@dev7248 ~]# getfattr -n ceph.file.layout /mnt/root/secret.notfailed
getfattr: Removing leading '/' from absolute path names
# file: mnt/root/secret.notfailed
ceph.file.layout=stripe_unit=4194304 stripe_count=1 object_size=4194304 
pool=puppet

But fortunately, I haven't succeeded yet (?) in reading that file...
My question therefore is : what am I doing wrong ?

Final question for those that read down here : it appears that before creating 
the cephfs filesystem, I used the puppet pool to store a test rbd instance.
And it appears I cannot get the list of cephfs objects in that pool, whereas I 
can get those that are on the newly created root pool :

[root@ceph0 ~]# rados -p puppet ls
test.rbd
rbd_directory
[root@ceph0 ~]# rados -p root ls
10a.
10b.

Bug, or feature ?

Thanks  regards


P.S : ceph release :

[root@dev7248 ~]# rpm -qa '*ceph*'
kmod-libceph-3.10.0-0.1.20150130gitee04310.el7.centos.x86_64
libcephfs1-0.87-0.el7.centos.x86_64
ceph-common-0.87-0.el7.centos.x86_64
ceph-0.87-0.el7.centos.x86_64
kmod-ceph-3.10.0-0.1.20150130gitee04310.el7.centos.x86_64
ceph-fuse-0.87.1-0.el7.centos.x86_64
python-ceph-0.87-0.el7.centos.x86_64
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-03 Thread Jason Dillaman
Your procedure appears correct to me.  Would you mind re-running your cloned 
image VM with the following ceph.conf properties:

[client]
rbd cache off
debug rbd = 20
log file = /path/writeable/by/qemu.$pid.log

If you recreate the issue, would you mind opening a ticket at 
http://tracker.ceph.com/projects/rbd/issues?  

Thanks,

-- 

Jason Dillaman 
Red Hat 
dilla...@redhat.com 
http://www.redhat.com 


- Original Message -
From: koukou73gr koukou7...@yahoo.com
To: ceph-users@lists.ceph.com
Sent: Monday, March 2, 2015 7:16:08 AM
Subject: [ceph-users] qemu-kvm and cloned rbd image


Hello,

Today I thought I'd experiment with snapshots and cloning. So I did:

rbd import --image-format=2 vm-proto.raw rbd/vm-proto
rbd snap create rbd/vm-proto@s1
rbd snap protect rbd/vm-proto@s1
rbd clone rbd/vm-proto@s1 rbd/server

And then proceeded to create a qemu-kvm guest with rbd/server as its
backing store. The guest booted but as soon as it got to mount the root
fs, things got weird:

[...]
scsi2 : Virtio SCSI HBA
scsi 2:0:0:0: Direct-Access QEMU QEMU HARDDISK1.5. PQ: 0 ANSI: 5
sd 2:0:0:0: [sda] 20971520 512-byte logical blocks: (10.7 GB/10.0 GiB)
sd 2:0:0:0: [sda] Write Protect is off
sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support 
DPO or FUA
  sda: sda1 sda2
sd 2:0:0:0: [sda] Attached SCSI disk
dracut: Scanning devices sda2  for LVM logical volumes vg_main/lv_swap 
vg_main/lv_root
dracut: inactive '/dev/vg_main/lv_swap' [1.00 GiB] inherit
dracut: inactive '/dev/vg_main/lv_root' [6.50 GiB] inherit
EXT4-fs (dm-1): INFO: recovery required on readonly filesystem
EXT4-fs (dm-1): write access will be enabled during recovery
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] abort
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 b0 e0 d8 00 00 08 00
Buffer I/O error on device dm-1, logical block 1058331
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 6f ba c8 00 00 08 00
[ ... snip ... snip ... more or less the same messages ]
end_request: I/O error, dev sda, sector 3129880
end_request: I/O error, dev sda, sector 11518432
end_request: I/O error, dev sda, sector 3194664
end_request: I/O error, dev sda, sector 3129824
end_request: I/O error, dev sda, sector 3194376
end_request: I/O error, dev sda, sector 11579664
end_request: I/O error, dev sda, sector 3129448
end_request: I/O error, dev sda, sector 3197856
end_request: I/O error, dev sda, sector 3129400
end_request: I/O error, dev sda, sector 7385360
end_request: I/O error, dev sda, sector 11515912
end_request: I/O error, dev sda, sector 11514112
__ratelimit: 12 callbacks suppressed
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 af b0 80 00 00 10 00
__ratelimit: 12 callbacks suppressed
__ratelimit: 13 callbacks suppressed
Buffer I/O error on device dm-1, logical block 1048592
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 1048593
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f bf 00 00 00 08 00
Buffer I/O error on device dm-1, logical block 480
lost page write due to I/O error on dm-1
[... snip... more of the same ... ]
Buffer I/O error on device dm-1, logical block 475
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 476
lost page write due to I/O error on dm-1
Buffer I/O error on device dm-1, logical block 477
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 30 00 00 10 00
Buffer I/O error on device dm-1, logical block 454
lost page write due to I/O error on dm-1
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 10 00 00 18 00
sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
sd 2:0:0:0: [sda] Sense Key : Aborted Command [current]
sd 2:0:0:0: [sda] Add. Sense: I/O process terminated
sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 00 2f be 

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thanks Irek.

Does this mean, that after peering for each PG, there will be delay of
10sec, meaning that every once in a while, I will have 10sec od the cluster
NOT being stressed/overloaded, and then the recovery takes place for that
PG, and then another 10sec cluster is fine, and then stressed again ?

I'm trying to understand process before actually doing stuff (config
reference is there on ceph.com but I don't fully understand the process)

Thanks,
Andrija

On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working as
 expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] More than 50% osds down, CPUs still busy; will the cluster recover without help?

2015-03-03 Thread Chris Murray
Ah yes, that's a good point :-)

Thank you for your assistance Greg, I'm understanding a little more about how 
Ceph operates under the hood now.

We're probably at a reasonable point for me to say I'll just switch the 
machines off and forget about them for a while. It's no great loss; I just 
wanted to see if the cluster would come back to life despite any mis-treatment, 
and how far it can be pushed with the limited resources on the Microservers.

Getting to the admin socket fails:

root@ceph26:~# ceph --admin-daemon /var/run/ceph/ceph-osd.1.asok help
admin_socket: exception getting command descriptions: [Errno 111] Connection 
refused

And after activity ceased on /dev/sdb ...

(60 second intervals again, snipped many hours of these sorts of figures)
sdb   5.52 0.00   801.27  0  48076
sdb   4.68 0.00   731.80  0  43908
sdb   5.25 0.00   792.80  0  47568
sdb  18.83   483.07   569.53  28984  34172
sdb  28.28   894.6035.40  53676   2124
sdb   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0
sdb   0.00 0.00 0.00  0  0

... the log hadn't progressed beyond the below. Note the last entry was 13 
hours prior to activity on sdb ending, so whatever finished writing (then 
momentarily reading) this morning, didn't add anything to the log.

...
2015-03-02 18:24:45.942970 7f27f03ef780 15 filestore(/var/lib/ceph/osd/ceph-1) 
get_omap_iterator meta/39e3fb/pglog_4.57c/0//-1
2015-03-02 18:24:45.977857 7f27f03ef780 15 filestore(/var/lib/ceph/osd/ceph-1) 
_omap_rmkeys meta/39e3fb/pglog_4.57c/0//-1
2015-03-02 18:24:45.978400 7f27f03ef780 10 filestore oid: 
39e3fb/pglog_4.57c/0//-1 not skipping op, *spos 13288339.0.3
2015-03-02 18:24:45.978414 7f27f03ef780 10 filestore   header.spos 0.0.0
2015-03-02 18:24:45.986763 7f27f03ef780 15 filestore(/var/lib/ceph/osd/ceph-1) 
_omap_rmkeys meta/39e3fb/pglog_4.57c/0//-1
2015-03-02 18:24:45.987350 7f27f03ef780 10 filestore oid: 
39e3fb/pglog_4.57c/0//-1 not skipping op, *spos 13288339.0.4
2015-03-02 18:24:45.987363 7f27f03ef780 10 filestore   header.spos 0.0.0
2015-03-02 18:24:45.991651 7f27f03ef780 15 filestore(/var/lib/ceph/osd/ceph-1) 
_omap_setkeys meta/39e3fb/pglog_4.57c/0//-1
2015-03-02 18:24:45.992119 7f27f03ef780 10 filestore oid: 
39e3fb/pglog_4.57c/0//-1 not skipping op, *spos 13288339.0.5
2015-03-02 18:24:45.992128 7f27f03ef780 10 filestore   header.spos 0.0.0
2015-03-02 18:24:46.016116 7f27f03ef780 10 filestore(/var/lib/ceph/osd/ceph-1) 
_do_transaction on 0x1a92540
2015-03-02 18:24:46.016133 7f27f03ef780 15 filestore(/var/lib/ceph/osd/ceph-1) 
_omap_setkeys meta/16ef7597/infos/head//-1
2015-03-02 18:24:46.016542 7f27f03ef780 10 filestore oid: 
16ef7597/infos/head//-1 not skipping op, *spos 13288340.0.1
2015-03-02 18:24:46.016555 7f27f03ef780 10 filestore   header.spos 0.0.0
2015-03-02 18:24:48.855098 7f27e2fe0700 20 filestore(/var/lib/ceph/osd/ceph-1) 
sync_entry woke after 5.000291

The complete file is attached, in case it's of interest to anyone.

I get the feeling it's BTRFS which is the 'cause' here. I'm running a scrub in 
case it highlights anything wrong with the filesystem. If it all springs back 
to life, I'll post back here with my findings!

Thanks again for the pointers,
Chris

-Original Message-
From: Gregory Farnum [mailto:g...@gregs42.com] 
Sent: 02 March 2015 18:05
To: Chris Murray
Cc: ceph-users
Subject: Re: [ceph-users] More than 50% osds down, CPUs still busy; will the 
cluster recover without help?

You can turn the filestore up to 20 instead of 1. ;) You might also explore 
what information you can get out of the admin socket.

You are correct that those numbers are the OSD epochs, although note that when 
the system is running you'll get output both for the OSD as a whole and for 
individual PGs within it (which can be lagging behind). I'm still pretty 
convinced the OSDs are simply stuck trying to bring their PGs up to date and 
are thrashing the maps on disk, but we're well past what I can personally 
diagnose without log diving.
-Greg

On Sat, Feb 28, 2015 at 11:51 AM, Chris Murray chrismurra...@gmail.com wrote:
 After noticing that the number increases by 101 on each attempt to 
 start osd.11, I figured I was only 7 iterations away from the output 
 being within 101 of 63675. So, I killed the osd process, started it 
 again, lather, rinse, repeat. I then did the same for other OSDs. Some 
 created very small logs, and some created logs into the gigabytes. 
 Grepping the latter for update_osd_stat showed me where the maps 
 were up to, and therefore which OSDs needed some special attention. 
 Some of the epoch numbers appeared to increase by themselves to a 
 point and then 

Re: [ceph-users] cephfs filesystem layouts : authentication gotchas ?

2015-03-03 Thread John Spray



On 03/03/2015 15:21, SCHAER Frederic wrote:


By the way : looks like the “ceph fs ls” command is inconsistent when 
the cephfs is mounted (I used a locally compiled kmod-ceph rpm):


[root@ceph0 ~]# ceph fs ls

name: cephfs_puppet, metadata pool: puppet_metadata, data pools: [puppet ]

(umount /mnt …)

[root@ceph0 ~]# ceph fs ls

name: cephfs_puppet, metadata pool: puppet_metadata, data pools: 
[puppet root ]



This is probably #10288, which was fixed in 0.87.1


So, I have this pool named “root” that I added in the cephfs filesystem.

I then edited the filesystem xattrs :

[root@ceph0 ~]# getfattr -n ceph.dir.layout /mnt/root

getfattr: Removing leading '/' from absolute path names

# file: mnt/root

ceph.dir.layout=stripe_unit=4194304 stripe_count=1 
object_size=4194304 pool=root


I’m therefore assuming client.puppet should not be allowed to write or 
read anything in /mnt/root, which belongs to the “root” pool… but that 
is not the case.


On another machine where I mounted cephfs using the client.puppet key, 
I can do this :


The mount was done with the client.puppet key, not the admin one that 
is not deployed on that node :


1.2.3.4:6789:/ on /mnt type ceph 
(rw,relatime,name=puppet,secret=hidden,nodcache)


[root@dev7248 ~]# echo not allowed  /mnt/root/secret.notfailed

[root@dev7248 ~]#

[root@dev7248 ~]# cat /mnt/root/secret.notfailed

not allowed

This is data you're seeing from the page cache, it hasn't been written 
to RADOS.


You have used the nodcache setting, but that doesn't mean what you 
think it does (it was about caching dentries, not data).  It's actually 
not even used in recent kernels (http://tracker.ceph.com/issues/11009).


You could try the nofsc option, but I don't know exactly how much 
caching that turns off -- the safer approach here is probably to do your 
testing using I/Os that have O_DIRECT set.



And I can even see the xattrs inherited from the parent dir :

[root@dev7248 ~]# getfattr -n ceph.file.layout /mnt/root/secret.notfailed

getfattr: Removing leading '/' from absolute path names

# file: mnt/root/secret.notfailed

ceph.file.layout=stripe_unit=4194304 stripe_count=1 
object_size=4194304 pool=root


Whereas on the node where I mounted cephfs as ceph admin, I get nothing :

[root@ceph0 ~]# cat /mnt/root/secret.notfailed

[root@ceph0 ~]# ls -l /mnt/root/secret.notfailed

-rw-r--r-- 1 root root 12 Mar  3 15:27 /mnt/root/secret.notfailed

After some time, the file also gets empty on the “puppet client” host :

[root@dev7248 ~]# cat /mnt/root/secret.notfailed

[root@dev7248 ~]#

(but the metadata remained ?)

Right -- eventually the cache goes away, and you see the true (empty) 
state of the file.


Also, as an unpriviledged user, I can get ownership of a “secret” file 
by changing the extended attribute :


[root@dev7248 ~]# setfattr -n ceph.file.layout.pool -v puppet 
/mnt/root/secret.notfailed


[root@dev7248 ~]# getfattr -n ceph.file.layout /mnt/root/secret.notfailed

getfattr: Removing leading '/' from absolute path names

# file: mnt/root/secret.notfailed

ceph.file.layout=stripe_unit=4194304 stripe_count=1 
object_size=4194304 pool=puppet


Well, you're not really getting ownership of anything here: you're 
modifying the file's metadata, which you are entitled to do (pool 
permissions have nothing to do with file metadata).  There was a recent 
bug where a file's pool layout could be changed even if it had data, but 
that was about safety rather than permissions.


Final question for those that read down here : it appears that before 
creating the cephfs filesystem, I used the “puppet” pool to store a 
test rbd instance.


And it appears I cannot get the list of cephfs objects in that pool, 
whereas I can get those that are on the newly created “root” pool :


[root@ceph0 ~]# rados -p puppet ls

test.rbd

rbd_directory

[root@ceph0 ~]# rados -p root ls

10a.

10b.

Bug, or feature ?



I didn't see anything in your earlier steps that would have led to any 
objects in the puppet pool.


To get closer to the effect you're looking for, you probably need to 
combine your pool settings with some permissions on the folders, and do 
your I/O as a user other than root -- your user-level permissions would 
protect your metadata, and your pool permissions would protect your data.


There are also plans to make finer grained access control for the 
metadata, but that's not there yet.


Cheers,
John

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rbd image's data deletion

2015-03-03 Thread Giuseppe Civitella
Hi all,

what happens to data contained in an rbd image when the image itself gets
deleted?
Are the data just unlinked or are them destroyed in a way that make them
unreadable?

thanks
Giuseppe
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Question about rados bench

2015-03-03 Thread Tony Harris
Hi all,

In my reading on the net about various implementations of Ceph, I came
across this website blog page (really doesn't give a lot of good
information but caused me to wonder):

http://avengermojo.blogspot.com/2014/12/cubieboard-cluster-ceph-test.html

near the bottom, the person did a rados bench test.  During the write
phase, there were several areas where there was a 0 in the cur MB/s.  I
figure there must have been a bottleneck somewhere slowing down the
operation where data wasn't getting written.  Is something like that during
a benchmark test something that one should be concerned about?  Is there a
good procedure for tracking down where the bottleneck is (like if it's a
given OSD?)  Is the data cached and just taking a long time to write or is
it lost in an instance like that?

-Tony
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Thx Irek. Number of replicas is 3.

I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
decommissioned), which is further connected to a new 10G switch/network
with 3 servers on it with 12 OSDs each.
I'm decommissioning old 3 nodes on 1G network...

So you suggest removing whole node with 2 OSDs manually from crush map?
Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
were originally been distributed over all 3 nodes. So anyway It could be
safe to remove 2 OSDs at once together with the node itself...since replica
count is 3...
?

Thx again for your time
On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:

 Once you have only three nodes in the cluster.
 I recommend you add new nodes to the cluster, and then delete the old.

 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 You have a number of replication?

 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Hi Irek,

 yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
 degraded and moved/recovered.
 When I after that removed it from Crush map ceph osd crush rm id,
 that's when the stuff with 37% happened.

 And thanks Irek for help - could you kindly just let me know of the
 prefered steps when removing whole node?
 Do you mean I first stop all OSDs again, or just remove each OSD from
 crush map, or perhaps, just decompile cursh map, delete the node
 completely, compile back in, and let it heal/recover ?

 Do you think this would result in less data missplaces and moved arround
 ?

 Sorry for bugging you, I really appreaciate your help.

 Thanks

 On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low
 percentage degradation). If you had not made ceph osd crush rm id, the
 percentage would be low.
 In your case, the correct option is to remove the entire node, rather
 than each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround
 - this is MISPLACED object (degraded objects were 0.001%, after I removed 
 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of
 CEPH, but still 37% of object missplaces just by removing 1 OSD from crush
 maps out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay
 of 10sec, meaning that every once in a while, I will have 10sec od the
 cluster NOT being stressed/overloaded, and then the recovery takes place
 for that PG, and then another 10sec cluster is fine, and then stressed
 again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the
 process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon
 /var/run/ceph/ceph-osd.94.asok config show  | grep 
 osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it
 caused over 37% od the data to rebalance - let's say this is fine 
 (this is
 when I removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 
 1500
 MB/s - and VMs were unusable completely, and then last 4h of the 
 duration
 of recover this recovery rate went down to, say, 100-200 MB.s and 
 during
 this VM performance was still pretty impacted, but at least I could 
 work
 more or a less

 So my question, is this behaviour expected, is throtling here
 working as expected, since first 1h was almoust no throtling applied 
 if I
 check the recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on
 one SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 

Re: [ceph-users] problem in cephfs for remove empty directory

2015-03-03 Thread John Spray

On 03/03/2015 14:07, Daniel Takatori Ohara wrote:

*$ls test-daniel-old/*
total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../

*$rm -rf test-daniel-old/*
rm: cannot remove ‘test-daniel-old/’: Directory not empty

*$ls test-daniel-old/*
ls: cannot access 
test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such 
file or directory
ls: cannot access 
test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No 
such file or directory
ls: cannot access 
test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such 
file or directory
ls: cannot access 
test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No 
such file or directory
ls: cannot access 
test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such 
file or directory
ls: cannot access 
test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No 
such file or directory
ls: cannot access 
test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such 
file or directory
ls: cannot access 
test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No 
such file or directory

total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../
l? ? ?  ?   ?  ? 
M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ?  ? 
M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
You don't say what version of the client (version of kernel, if it's the 
kernel client) this is.  It would appear that the client thinks there 
are some dentries that don't really exist.  You should enable verbose 
debug logs (with fuse client, debug client = 20) and reproduce this.  
It looks like you had similar issues (subject: problem for remove files 
in cephfs) a while back, when Yan Zheng also advised you to get some 
debug logs.


John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem in cephfs for remove empty directory

2015-03-03 Thread Gregory Farnum
On Tue, Mar 3, 2015 at 9:24 AM, John Spray john.sp...@redhat.com wrote:
 On 03/03/2015 14:07, Daniel Takatori Ohara wrote:

 $ls test-daniel-old/
 total 0
 drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
 drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../

 $rm -rf test-daniel-old/
 rm: cannot remove ‘test-daniel-old/’: Directory not empty

 $ls test-daniel-old/
 ls: cannot access
 test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
 or directory
 ls: cannot access
 test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
 file or directory
 ls: cannot access
 test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
 or directory
 ls: cannot access
 test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
 file or directory
 ls: cannot access
 test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
 or directory
 ls: cannot access
 test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
 file or directory
 ls: cannot access
 test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
 or directory
 ls: cannot access
 test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
 file or directory
 total 0
 drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
 drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../
 l? ? ?  ?   ??
 M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam
 l? ? ?  ?   ??
 M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam

 You don't say what version of the client (version of kernel, if it's the
 kernel client) this is.  It would appear that the client thinks there are
 some dentries that don't really exist.  You should enable verbose debug logs
 (with fuse client, debug client = 20) and reproduce this.  It looks like
 you had similar issues (subject: problem for remove files in cephfs) a
 while back, when Yan Zheng also advised you to get some debug logs.

In particular this is a known bug in older kernels and is fixed in new
enough ones. Unfortunately I don't have the bug link handy though. :(
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Cluster Address

2015-03-03 Thread Garg, Pankaj
Hi,
I have ceph cluster that is contained within a rack (1 Monitor and 5 OSD 
nodes). I kept the same public and private address for configuration.
I do have 2 NICS and 2 valid IP addresses (one internal only and one external) 
for each machine.

Is it possible now, to change the Public Network address, after the cluster is 
up and running?
I had used Ceph-deploy for the cluster. If I change the address of the public 
network in Ceph.conf, do I need to propagate to all the machines in the cluster 
or just the Monitor Node is enough?

Thanks
Pankaj
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unbalanced cluster

2015-03-03 Thread Matt Conner
Hi All,

I have a cluster that I've been pushing data into in order to get an idea
of how full it can get prior ceph marking the cluster full. Unfortunately,
each time I fill the cluster I end up with one disk that typically hits the
full ratio (0.95) while all other disks still have anywhere from 20-40%
free space (my latest attempt resulted in the cluster marking full at 60%
total usage). Any idea why the OSDs would be so unbalanced?

Few notes on the cluster:

   - It has 6 storage hosts with 143 total OSDs (typically 144 but it has
   one failed disk - removed from cluster)
   - All OSDs are 4TB drives
   - All OSDs are set to the same weight
   - The cluster is using host rules
   - Using ceph version 0.80.7


In terms of the Pool(s), I have been varying the number of pools from run
to run, following the PG calculator at http://ceph.com/pgcalc/ to determine
the number of placement groups. I have also attempted a few runs bumping up
the number of PGs, but it has only resulted in further unbalance.

Any thoughts?

Thanks,

Matt
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem in cephfs for remove empty directory

2015-03-03 Thread Daniel Takatori Ohara
Hi John and Gregory,

The version of ceph client is 0.87 and the kernel is 3.13.

The debug logs here in attach.

I see this problem in a older kernel, but i didn't find the solution in the
track.

Thanks,

Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês de Ensino e Pesquisa
Hospital Sírio-Libanês
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
São Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br


On Tue, Mar 3, 2015 at 2:26 PM, Gregory Farnum g...@gregs42.com wrote:

 On Tue, Mar 3, 2015 at 9:24 AM, John Spray john.sp...@redhat.com wrote:
  On 03/03/2015 14:07, Daniel Takatori Ohara wrote:
 
  $ls test-daniel-old/
  total 0
  drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
  drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../
 
  $rm -rf test-daniel-old/
  rm: cannot remove ‘test-daniel-old/’: Directory not empty
 
  $ls test-daniel-old/
  ls: cannot access
  test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such
 file
  or directory
  ls: cannot access
  test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
  file or directory
  ls: cannot access
  test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such
 file
  or directory
  ls: cannot access
  test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
  file or directory
  ls: cannot access
  test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such
 file
  or directory
  ls: cannot access
  test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
  file or directory
  ls: cannot access
  test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such
 file
  or directory
  ls: cannot access
  test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
  file or directory
  total 0
  drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
  drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../
  l? ? ?  ?   ??
  M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam
  l? ? ?  ?   ??
  M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
 
  You don't say what version of the client (version of kernel, if it's the
  kernel client) this is.  It would appear that the client thinks there are
  some dentries that don't really exist.  You should enable verbose debug
 logs
  (with fuse client, debug client = 20) and reproduce this.  It looks
 like
  you had similar issues (subject: problem for remove files in cephfs) a
  while back, when Yan Zheng also advised you to get some debug logs.

 In particular this is a known bug in older kernels and is fixed in new
 enough ones. Unfortunately I don't have the bug link handy though. :(
 -Greg



log_mds.gz
Description: GNU Zip compressed data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-03 Thread Scottix
I did a bit more testing.
1. I tried on a newer kernel and was not able to recreate the problem,
maybe it is that kernel bug you mentioned. Although its not an exact
replica of the load.
2. I haven't tried the debug yet since I have to wait for the right moment.

One thing I realized and maybe it is not an issue is we are using a symlink
to a folder in the ceph mount.
ceph-fuse on /mnt/ceph type fuse.ceph-fuse
(rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other)
lrwxrwxrwx 1 root   root   metadata - /mnt/ceph/DataCenter/metadata
Not sure if that would create any issues.

Anyway we are going to update the machine soon so, I can report if we keep
having the issue.

Thanks for your support,
Scott


On Mon, Mar 2, 2015 at 4:07 PM Scottix scot...@gmail.com wrote:

 I'll try the following things and report back to you.

 1. I can get a new kernel on another machine and mount to the CephFS and
 see if I get the following errors.
 2. I'll run the debug and see if anything comes up.

 I'll report back to you when I can do these things.

 Thanks,
 Scottie

 On Mon, Mar 2, 2015 at 4:04 PM Gregory Farnum g...@gregs42.com wrote:

 I bet it's that permission issue combined with a minor bug in FUSE on
 that kernel, or maybe in the ceph-fuse code (but I've not seen it
 reported before, so I kind of doubt it). If you run ceph-fuse with
 debug client = 20 it will output (a whole lot of) logging to the
 client's log file and you could see what requests are getting
 processed by the Ceph code and how it's responding. That might let you
 narrow things down. It's certainly not any kind of timeout.
 -Greg

 On Mon, Mar 2, 2015 at 3:57 PM, Scottix scot...@gmail.com wrote:
  3 Ceph servers on Ubuntu 12.04.5 - kernel 3.13.0-29-generic
 
  We have an old server that we compiled the ceph-fuse client on
  Suse11.4 - kernel 2.6.37.6-0.11
  This is the only mount we have right now.
 
  We don't have any problems reading the files and the directory shows
 full
  775 permissions and doing a second ls fixes the problem.
 
  On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders billysand...@gmail.com
 wrote:
 
  Forgive me if this is unhelpful, but could it be something to do with
  permissions of the directory and not Ceph at all?
 
  http://superuser.com/a/528467
 
  Bill
 
  On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum g...@gregs42.com
 wrote:
 
  On Mon, Mar 2, 2015 at 3:39 PM, Scottix scot...@gmail.com wrote:
   We have a file system running CephFS and for a while we had this
 issue
   when
   doing an ls -la we get question marks in the response.
  
   -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
   data.2015-02-08_00-00-00.csv.bz2
   -? ? ?  ?   ??
   data.2015-02-09_00-00-00.csv.bz2
  
   If we do another directory listing it show up fine.
  
   -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
   data.2015-02-08_00-00-00.csv.bz2
   -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
   data.2015-02-09_00-00-00.csv.bz2
  
   It hasn't been a problem but just wanted to see if this is an issue,
   could
   the attributes be timing out? We do have a lot of files in the
   filesystem so
   that could be a possible bottleneck.
 
  Huh, that's not something I've seen before. Are the systems you're
  doing this on the same? What distro and kernel version? Is it reliably
  one of them showing the question marks, or does it jump between
  systems?
  -Greg
 
  
   We are using the ceph-fuse mount.
   ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578)
   We are planning to do the update soon to 87.1
  
   Thanks
   Scottie
  
  
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] import-diff requires snapshot exists?

2015-03-03 Thread Steve Anthony
Hello,

I've been playing with backing up images from my production site
(running 0.87) to my backup site (running 0.87.1) using export/import
and export-diff/import-diff. After initially exporting and importing the
image (rbd/small to backup/small) I took a snapshot (called test1) on
the production cluster, ran export-diff from that snapshot, and then
attempted to import-diff the diff file on the backup cluster.

# rbd import-diff ./foo.diff backup/small
start snapshot 'test1' does not exist in the image, aborting
Importing image diff: 0% complete...failed.
rbd: import-diff failed: (22) Invalid argument

This works fine if I create a test1 snapshot on the backup cluster
before running import-diff. However, it appears that the changes get
written into backup/small not backup/small@test1. So unless I'm not
understanding something, it seems like the content of the snapshot on
the backup cluster is of no importance, which makes me wonder why it
must exist at all.

Any thoughts? Thanks!

-Steve

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma...@lehigh.edu




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW do not populate log file

2015-03-03 Thread Italo Santos
After change the ownership of the log file directory everything became fine.  

Thanks for your help  

Regards.

Italo Santos
http://italosantos.com.br/


On Tuesday, March 3, 2015 at 00:35, zhangdongmao wrote:

 I have met this before.
 Because I use apache with rgw, radosrgw is executed by the user 'apache', so 
 you have to make sure the apache user have permissions to write the log file.
  
 在 2015年03月03日 07:06, Italo Santos 写道:
  Hello everyone,  
   
  I have a radosgw configured with the bellow ceph.conf file, but this 
  instanse aren't generate any log entry on log file path, the log is aways 
  empty, but if I take a look to the apache access.log there are a lot of 
  entries.  
   
  Anyone knows why?  
   
  Regards.  
   
  Italo Santos  
  http://italosantos.com.br/
   
   
   
  ___ ceph-users mailing list 
  ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com) 
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com (mailto:ceph-users@lists.ceph.com)
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  
  


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Cluster Address

2015-03-03 Thread J-P Methot
I had to go through the same experience of changing the public network 
address and it's not easy.  Ceph seems to keep a record of what ip 
address is associated to what OSD and a port number for the process. I 
was never able to find out where this record is kept or how to change it 
manually. Here's what I did, from memory :


1. Remove the network address I didn't want to use anymore from the 
ceph.conf and put the one I wanted to use instead. Don't worry, 
modifying the ceph.conf will not affect a currently running cluster 
unless you issue a command to it, like adding an OSD.
2. Remove each OSD one by one and then reinitialize them right after. 
You will lose the data that's on the OSD, but if your cluster is 
replicated properly and do this operation one OSD at a time, you should 
not lose the copies of that data.
3. Check the OSD status to make sure they use the proper IP. The command 
ceph osd dump will tell you if your OSDs are detected on the proper IP.

4. Remove and reinstall each monitor one by one.

If anybody else has another solution I'd be curious to hear it, but this 
is how I managed to do it, by basically reinstalling each component one 
by one.


On 3/3/2015 12:26 PM, Garg, Pankaj wrote:


Hi,

I have ceph cluster that is contained within a rack (1 Monitor and 5 
OSD nodes). I kept the same public and private address for configuration.


I do have 2 NICS and 2 valid IP addresses (one internal only and one 
external) for each machine.


Is it possible now, to change the Public Network address, after the 
cluster is up and running?


I had used Ceph-deploy for the cluster. If I change the address of the 
public network in Ceph.conf, do I need to propagate to all the 
machines in the cluster or just the Monitor Node is enough?


Thanks

Pankaj



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
==
Jean-Philippe Méthot
Administrateur système / System administrator
GloboTech Communications
Phone: 1-514-907-0050
Toll Free: 1-(888)-GTCOMM1
Fax: 1-(514)-907-0750
jpmet...@gtcomm.net
http://www.gtcomm.net

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] import-diff requires snapshot exists?

2015-03-03 Thread Steve Anthony
Jason,

Ah, ok that makes sense. I was forgetting snapshots are read-only. Thanks!

My plan was to do something like this. First, create a sync snapshot and
seed the backup:

rbd snap create rbd/small@sync
rbd export rbd/small@sync ./foo

rbd import ./foo backup/small
rbd snap create backup/small@sync

Then each day, create a daily snap on the backup cluster:

rbd snap create backup/small@2015-02-03

Then send that day's changes:

rbd export-diff --from-snap sync rbd/small ./foo.diff
rbd import-diff ./foo.diff rbd/small

Then remove and recreate the daily snap marker to prepare for the next sync.

rbd snap rm rbd/small@sync
rbd snap rm backup/small@sync

rbd snap create rbd/small@sync
rbd snap create backup/small@sync

Finally remove any dated snapshots on the remote cluster outside the
retention window.

-Steve

On 03/03/2015 04:37 PM, Jason Dillaman wrote:
 Snapshots are read-only, so all changes to the image can only be applied to 
 the HEAD revision.

 In general, you should take a snapshot prior to export / export-diff to 
 ensure consistent images:

   rbd snap create rbd/small@snap1
   rbd export rbd/small@snap1 ./foo

   rbd import ./foo backup/small
   rbd snap create backup/small@snap1

   ** rbd/small and backup/small are now consistent through snap1 -- rbd/small 
 might have been modified post snapshot

   rbd snap create rbd/small@snap2
   rbd export-diff --from-snap snap1 rbd/small@snap2 ./foo.diff
   rbd import-diff ./foo.diff backup/small

   ** rbd/small and backup/small are now consistent through snap2.  
 import-diff automatically created backup/small@snap2 after importing all 
 changes. 

 -- Jason Dillaman Red Hat dilla...@redhat.com http://www.redhat.com
 - Original Message - From: Steve Anthony sma...@lehigh.edu
 To: ceph-users@lists.ceph.com Sent: Tuesday, March 3, 2015 2:06:44 PM
 Subject: [ceph-users] import-diff requires snapshot exists? Hello,
 I've been playing with backing up images from my production site
 (running 0.87) to my backup site (running 0.87.1) using export/import
 and export-diff/import-diff. After initially exporting and importing
 the image (rbd/small to backup/small) I took a snapshot (called test1)
 on the production cluster, ran export-diff from that snapshot, and
 then attempted to import-diff the diff file on the backup cluster. #
 rbd import-diff ./foo.diff backup/small start snapshot 'test1' does
 not exist in the image, aborting Importing image diff: 0%
 complete...failed. rbd: import-diff failed: (22) Invalid argument This
 works fine if I create a test1 snapshot on the backup cluster before
 running import-diff. However, it appears that the changes get written
 into backup/small not backup/small@test1. So unless I'm not
 understanding something, it seems like the content of the snapshot on
 the backup cluster is of no importance, which makes me wonder why it
 must exist at all. Any thoughts? Thanks! -Steve
 -- Steve Anthony LTS HPC Support Specialist Lehigh University
 sma...@lehigh.edu ___
 ceph-users mailing list ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Steve Anthony
LTS HPC Support Specialist
Lehigh University
sma...@lehigh.edu




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Unexpected OSD down during deep-scrub

2015-03-03 Thread Italo Santos
Hello everyone,

I have a cluster with 5 hosts and 18 OSDs, today I faced with a unexpected 
issue when multiple OSD goes down.

The first OSD go down, was osd.8, feel minutes after, another OSD goes down on 
the same host, the osd.1. So, I tried restart the OSDs (osd.8 and osd.1) but 
doesn’t worked and I decided put this OSDs out of cluster and wait the recovery 
complete.

During the recovery, more two OSDs goes down, osd.6 in another host… and 
seconds after, osd.0 on the same host that first osd goes down too.

Looking to the “ceph -w” status I realised some slow/stuck ops and I decided 
stop the writes on cluster. After that I restarted the OSDs 0 and 6 and bouth 
became UP and I was able to wait the recovery finish, which happened 
successfully.

I realised that when the first OSD goes down, the cluster was performing a 
deep-scrub and I found the bellow trace on the logs of osd.8, anyone can help 
me understand why the osd.8, and other osds, unexpected goes down?

Bellow the osd.8 trace:

-2 2015-03-03 16:31:48.191796 7f91a388b700  5 -- op tracker -- seq: 
2633606, time: 2015-03-03 16:31:48.191796, event: done, op: 
osd_op(client.3880912.0:236
8430 notify.6 [watch ping cookie 140352686583296] 40.97c520d4 
ack+write+known_if_redirected e4231)
-1 2015-03-03 16:31:48.192174 7f91af8a3700  1 -- 10.32.30.11:6804/3991 == 
client.3880912 10.32.30.10:0/1001424 282597  ping magic: 0 v1  0+0+0 (0
0 0) 0xf500 con 0x1535c580
 0 2015-03-03 16:31:48.251131 7f91a0084700 -1 osd/ReplicatedPG.cc: In 
function 'void ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)' 
thread 7
f91a0084700 time 2015-03-03 16:31:48.169895
osd/ReplicatedPG.cc: 7494: FAILED assert(!i-mod_desc.empty())

 ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x72) 
[0xcc86c2]
 2: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x49c) 
[0x9624fc]
 3: (ReplicatedPG::simple_repop_submit(ReplicatedPG::RepGather*)+0x7a) 
[0x9698ba]
 4: (ReplicatedPG::_scrub(ScrubMap)+0x2e62) [0x99b072]
 5: (PG::scrub_compare_maps()+0x511) [0x90f0d1]
 6: (PG::chunky_scrub(ThreadPool::TPHandle)+0x204) [0x910bb4]
 7: (PG::scrub(ThreadPool::TPHandle)+0x3a3) [0x912c53]
 8: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle)+0x13) [0x7ebdd3]
 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x629) [0xcbade9]
 10: (ThreadPool::WorkThread::entry()+0x10) [0xcbbfe0]
 11: (()+0x6b50) [0x7f91bfe46b50]
 12: (clone()+0x6d) [0x7f91be8627bd]
 NOTE: a copy of the executable, or `objdump -rdS executable` is needed to 
interpret this.


At.

Italo Santos
http://italosantos.com.br/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-03 Thread Scottix
Ya we are not at 0.87.1 yet, possibly tomorrow. I'll let you know if it
still reports the same.

Thanks John,
--Scottie


On Tue, Mar 3, 2015 at 2:57 PM John Spray john.sp...@redhat.com wrote:

 On 03/03/2015 22:35, Scottix wrote:
  I was testing a little bit more and decided to run the
 cephfs-journal-tool
 
  I ran across some errors
 
  $ cephfs-journal-tool journal inspect
  2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
  (0x2aebf6) at 0x2aeb32279b
  2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr
  (0x2aeb000733) at 0x2aeb322dd8
  2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr
  (0x2aeb000d70) at 0x2aeb323415
  2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr
  (0x2aeb0013ad) at 0x2aeb323a52
  2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr
  (0x2aeb0019ea) at 0x2aeb32408f
  Overall journal integrity: DAMAGED

 I expect this is http://tracker.ceph.com/issues/9977, which is fixed in
 master.

 You are in *very* bleeding edge territory here, and I'd suggest using
 the latest development release if you want to experiment with the latest
 CephFS tooling.

 Cheers,
 John

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet
Does kernel client affected by the problem ?

Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
 Hi,
 
 This is just a heads up that we've identified a performance regression in 
 v0.80.8 from previous firefly releases.  A v0.80.9 is working it's way 
 through QA and should be out in a few days.  If you haven't upgraded yet 
 you may want to wait.
 
 Thanks!
 sage
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Olivier Bonvalet
Le mardi 03 mars 2015 à 16:32 -0800, Sage Weil a écrit :
 On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
  Does kernel client affected by the problem ?
 
 Nope.  The kernel client is unaffected.. the issue is in librbd.
 
 sage
 


Ok, thanks for the clarification.
So I have to dig !


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS Attributes Question Marks

2015-03-03 Thread Scottix
I was testing a little bit more and decided to run the cephfs-journal-tool

I ran across some errors

$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr
(0x2aebf6) at 0x2aeb32279b
2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr
(0x2aeb000733) at 0x2aeb322dd8
2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr
(0x2aeb000d70) at 0x2aeb323415
2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr
(0x2aeb0013ad) at 0x2aeb323a52
2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr
(0x2aeb0019ea) at 0x2aeb32408f
Overall journal integrity: DAMAGED
Corrupt regions:
  0x2aeb3226a5-2aeb32279b
  0x2aeb32279b-2aeb322dd8
  0x2aeb322dd8-2aeb323415
  0x2aeb323415-2aeb323a52
  0x2aeb323a52-2aeb32408f
  0x2aeb32408f-2aeb3246cc

$ cephfs-journal-tool header get
{ magic: ceph fs volume v011,
  write_pos: 184430420380,
  expire_pos: 184389995327,
  trimmed_pos: 184389992448,
  stream_format: 1,
  layout: { stripe_unit: 4194304,
  stripe_count: 4194304,
  object_size: 4194304,
  cas_hash: 4194304,
  object_stripe_unit: 4194304,
  pg_pool: 4194304}}

$ cephfs-journal-tool event get summary
2015-03-03 14:32:50.102863 7f47c3006780 -1 Bad entry start ptr
(0x2aee8000e6) at 0x2aee800c25
2015-03-03 14:32:50.242576 7f47c3006780 -1 Bad entry start ptr
(0x2aee800b3f) at 0x2aee80167e
2015-03-03 14:32:50.486354 7f47c3006780 -1 Bad entry start ptr
(0x2aee800e4f) at 0x2aee80198e
2015-03-03 14:32:50.577443 7f47c3006780 -1 Bad entry start ptr
(0x2aee801f65) at 0x2aee802aa4
Events by type:
no output here


On Tue, Mar 3, 2015 at 12:01 PM Scottix scot...@gmail.com wrote:

 I did a bit more testing.
 1. I tried on a newer kernel and was not able to recreate the problem,
 maybe it is that kernel bug you mentioned. Although its not an exact
 replica of the load.
 2. I haven't tried the debug yet since I have to wait for the right moment.

 One thing I realized and maybe it is not an issue is we are using a
 symlink to a folder in the ceph mount.
 ceph-fuse on /mnt/ceph type fuse.ceph-fuse
 (rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other)
 lrwxrwxrwx 1 root   root   metadata - /mnt/ceph/DataCenter/metadata
 Not sure if that would create any issues.

 Anyway we are going to update the machine soon so, I can report if we keep
 having the issue.

 Thanks for your support,
 Scott


 On Mon, Mar 2, 2015 at 4:07 PM Scottix scot...@gmail.com wrote:

 I'll try the following things and report back to you.

 1. I can get a new kernel on another machine and mount to the CephFS and
 see if I get the following errors.
 2. I'll run the debug and see if anything comes up.

 I'll report back to you when I can do these things.

 Thanks,
 Scottie

 On Mon, Mar 2, 2015 at 4:04 PM Gregory Farnum g...@gregs42.com wrote:

 I bet it's that permission issue combined with a minor bug in FUSE on
 that kernel, or maybe in the ceph-fuse code (but I've not seen it
 reported before, so I kind of doubt it). If you run ceph-fuse with
 debug client = 20 it will output (a whole lot of) logging to the
 client's log file and you could see what requests are getting
 processed by the Ceph code and how it's responding. That might let you
 narrow things down. It's certainly not any kind of timeout.
 -Greg

 On Mon, Mar 2, 2015 at 3:57 PM, Scottix scot...@gmail.com wrote:
  3 Ceph servers on Ubuntu 12.04.5 - kernel 3.13.0-29-generic
 
  We have an old server that we compiled the ceph-fuse client on
  Suse11.4 - kernel 2.6.37.6-0.11
  This is the only mount we have right now.
 
  We don't have any problems reading the files and the directory shows
 full
  775 permissions and doing a second ls fixes the problem.
 
  On Mon, Mar 2, 2015 at 3:51 PM Bill Sanders billysand...@gmail.com
 wrote:
 
  Forgive me if this is unhelpful, but could it be something to do with
  permissions of the directory and not Ceph at all?
 
  http://superuser.com/a/528467
 
  Bill
 
  On Mon, Mar 2, 2015 at 3:47 PM, Gregory Farnum g...@gregs42.com
 wrote:
 
  On Mon, Mar 2, 2015 at 3:39 PM, Scottix scot...@gmail.com wrote:
   We have a file system running CephFS and for a while we had this
 issue
   when
   doing an ls -la we get question marks in the response.
  
   -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
   data.2015-02-08_00-00-00.csv.bz2
   -? ? ?  ?   ??
   data.2015-02-09_00-00-00.csv.bz2
  
   If we do another directory listing it show up fine.
  
   -rw-r--r-- 1 wwwrun root14761 Feb  9 16:06
   data.2015-02-08_00-00-00.csv.bz2
   -rw-r--r-- 1 wwwrun root13675 Feb 10 15:21
   data.2015-02-09_00-00-00.csv.bz2
  
   It hasn't been a problem but just wanted to see if this is an
 issue,
   could
   the attributes be timing out? We do have a lot of files in the
   filesystem so
   that could be a possible bottleneck.
 
  Huh, that's not something I've seen before. Are the systems you're
  doing this on the same? What 

Re: [ceph-users] CephFS Attributes Question Marks

2015-03-03 Thread John Spray

On 03/03/2015 22:35, Scottix wrote:

I was testing a little bit more and decided to run the cephfs-journal-tool

I ran across some errors

$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr 
(0x2aebf6) at 0x2aeb32279b
2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb000733) at 0x2aeb322dd8
2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb000d70) at 0x2aeb323415
2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb0013ad) at 0x2aeb323a52
2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb0019ea) at 0x2aeb32408f

Overall journal integrity: DAMAGED


I expect this is http://tracker.ceph.com/issues/9977, which is fixed in 
master.


You are in *very* bleeding edge territory here, and I'd suggest using 
the latest development release if you want to experiment with the latest 
CephFS tooling.


Cheers,
John
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Sage Weil
Hi,

This is just a heads up that we've identified a performance regression in 
v0.80.8 from previous firefly releases.  A v0.80.9 is working it's way 
through QA and should be out in a few days.  If you haven't upgraded yet 
you may want to wait.

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Fwd: RPM Build Errors

2015-03-03 Thread Jesus Chavez (jeschave)

someone in this DL had the thread error?

  Checking for unpackaged file(s): /usr/lib/rpm/check-files 
/home/vagrant/rpmbuild/BUILDROOT/calamari-server-1.3-rc_23_g4c41db3.el7.x86_64
  Wrote: 
/home/vagrant/rpmbuild/RPMS/x86_64/calamari-server-1.3-rc_23_g4c41db3.el7.x86_64.rpm
  Executing(%clean): /bin/sh -e /var/tmp/rpm-tmp.TF02LW
--
  ID: cp-artifacts-to-share calamari/repobuild/calamari-repo-*.tar.gz
Function: cmd.run
Name: cp calamari/repobuild/calamari-repo-*.tar.gz /git
  Result: False
 Comment: Command cp calamari/repobuild/calamari-repo-*.tar.gz /git run
 Started: 18:38:22.222920
Duration: 12.591 ms
 Changes:
  --
  pid:
  855
  retcode:
  1
  stderr:
  cp: cannot stat 'calamari/repobuild/calamari-repo-*.tar.gz': 
No such file or directory
  stdout:




Begin forwarded message:

From: Jesus Chavez (jeschave) jesch...@cisco.commailto:jesch...@cisco.com
Subject: RPM Build Errors
Date: March 3, 2015 at 5:47:14 PM CST
To: ceph-calam...@ceph.commailto:ceph-calam...@ceph.com

Hi everyone I am having exactly the same issue, does anybody knew whats going 
on on this?

Thanks!


I've seen this kind of compiler error with too little memory.

  As for salt, it's set up by vagrant because it's using the salt
provider.  What error are you seeing?
On Jan 5, 2015 4:46 AM, John Spray john.spray at 
redhat.comhttp://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com wrote:

 Forwarding from ceph-users to ceph-calamari.


 -- Forwarded message --
 From: Tony unixfly at 
 gmail.comhttp://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com
 Date: Wed, Dec 24, 2014 at 7:03 PM
 Subject: [ceph-users] Calamari
 To: ceph-users at 
 ceph.comhttp://lists.ceph.com/listinfo.cgi/ceph-calamari-ceph.com


 Has anyone else ran into this error?  I've tried different versions of
 GCC and even CentOS and RHEL to compile the calamari but continues to
 fail and by the way the instructions on the ceph website are not
 correct because the virtual used with vagrant isn't complete with
 which ever versions they used to compile with and salt doesn't exist
 on the virtual.

 Here is the error message I'm having below:

 I thought this was a memory issue but I changed memory in the virtual
 and even cores from 1 to 4, 8, 16 without luck.

 The below error still looks like a memory issue but I gave it 12G and
 still failed.

  gcc -pthread -fno-strict-aliasing -O2 -g -pipe -Wall
 -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
 --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC
 -fwrapv -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2
 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64
 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC
 -I/usr/include/python2.6 -c

 /home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/build/cython/Cython/Compiler/Parsing.c
 -o
 build/temp.linux-x86_64-2.6/home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/build/cython/Cython/Compiler/Parsing.o

   {standard input}: Assembler messages:

   {standard input}:19186: Warning: end of file not at
 end of a line; newline inserted

   {standard input}:19533: Error: unknown pseudo-op: `.lc2'

   gcc: Internal error: Killed (program cc1)

   Please submit a full bug report.

   See http://bugzilla.redhat.com/bugzilla for
 instructions.

   error: command 'gcc' failed with exit status 1

   
 Can't roll back Cython; was not uninstalled
   Cleaning up...
   Command
 /home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/bin/python -c
 import
 setuptools;__file__='/home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/build/cython/setup.py';exec(compile(open(__file__).read().replace('\r\n',
 '\n'), __file__, 'exec')) install --record
 /tmp/pip-6R8jlS-record/install-record.txt
 --single-version-externally-managed --install-headers

 /home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/include/site/python2.6
 failed with error code 1 in
 /home/vagrant/rpmbuild/BUILD/calamari-server-1.2.1/venv/build/cython
   Storing complete log in /home/vagrant/.pip/pip.log


   RPM build errors:
 --
   ID: cp-artifacts-to-share
 calamari/repobuild/calamari-repo-rhel6.tar.gz
 Function: cmd.run
 Name: cp calamari/repobuild/calamari-repo-rhel6.tar.gz /git/
   Result: True
  Comment: Command cp
 calamari/repobuild/calamari-repo-rhel6.tar.gz /git/ run
  Started: 18:54:40.702746
 Duration: 123.319 ms
  Changes:
   --
   pid:
   10380
   retcode:
   0

Re: [ceph-users] CephFS Attributes Question Marks

2015-03-03 Thread John Spray

On 03/03/2015 22:57, John Spray wrote:

On 03/03/2015 22:35, Scottix wrote:
I was testing a little bit more and decided to run the 
cephfs-journal-tool


I ran across some errors

$ cephfs-journal-tool journal inspect
2015-03-03 14:18:54.453981 7f8e29f86780 -1 Bad entry start ptr 
(0x2aebf6) at 0x2aeb32279b
2015-03-03 14:18:54.539060 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb000733) at 0x2aeb322dd8
2015-03-03 14:18:54.584539 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb000d70) at 0x2aeb323415
2015-03-03 14:18:54.669991 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb0013ad) at 0x2aeb323a52
2015-03-03 14:18:54.707724 7f8e29f86780 -1 Bad entry start ptr 
(0x2aeb0019ea) at 0x2aeb32408f

Overall journal integrity: DAMAGED


I expect this is http://tracker.ceph.com/issues/9977, which is fixed 
in master.


You are in *very* bleeding edge territory here, and I'd suggest using 
the latest development release if you want to experiment with the 
latest CephFS tooling.
...although at the risk of contradicting myself, I now notice that this 
particular bugfix is one that we did backport for 0.87.1


John

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpected OSD down during deep-scrub

2015-03-03 Thread Yann Dupont


Le 03/03/2015 22:03, Italo Santos a écrit :


I realised that when the first OSD goes down, the cluster was 
performing a deep-scrub and I found the bellow trace on the logs of 
osd.8, anyone can help me understand why the osd.8, and other osds, 
unexpected goes down?




I'm afraid I've seen this this afternoon too on my test cluster, just 
after upgrading from 0.87 to 0.93. After an initial migration success, 
some OSD started to go down : All presented similar stack traces , with 
magic word scrub in it :


ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
 1: /usr/bin/ceph-osd() [0xbeb3dc]
 2: (()+0xf0a0) [0x7f8f3ca130a0]
 3: (gsignal()+0x35) [0x7f8f3b37d165]
 4: (abort()+0x180) [0x7f8f3b3803e0]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f8f3bbd389d]
 6: (()+0x63996) [0x7f8f3bbd1996]
 7: (()+0x639c3) [0x7f8f3bbd19c3]
 8: (()+0x63bee) [0x7f8f3bbd1bee]
 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x220) [0xcd74f0]
 10: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, 
utime_t)+0x1fc) [0x97259c]
 11: (ReplicatedPG::simple_repop_submit(ReplicatedPG::RepGather*)+0x7a) 
[0x97344a]
 12: (ReplicatedPG::_scrub(ScrubMap, std::maphobject_t, 
std::pairunsigned int, unsigned int, std::lesshobject_t, 
std::allocatorstd::pairhobject_t const, std::pa

irunsigned int, unsigned intconst)+0x2e4d) [0x9a5ded]
 13: (PG::scrub_compare_maps()+0x658) [0x916378]
 14: (PG::chunky_scrub(ThreadPool::TPHandle)+0x202) [0x917ee2]
 15: (PG::scrub(ThreadPool::TPHandle)+0x3a3) [0x919f83]
 16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle)+0x13) [0x7eff93]
 17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x629) [0xcc8c49]
 18: (ThreadPool::WorkThread::entry()+0x10) [0xccac40]
 19: (()+0x6b50) [0x7f8f3ca0ab50]
 20: (clone()+0x6d) [0x7f8f3b42695d]

As a temporary measure, noscrub and nodeep-scrub are now set for this 
cluster, and all is working fine right now.


So there is probably something wrong here. Need to investigate further.

Cheers,








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Ken Dreyer
On 03/03/2015 04:19 PM, Sage Weil wrote:
 Hi,
 
 This is just a heads up that we've identified a performance regression in 
 v0.80.8 from previous firefly releases.  A v0.80.9 is working it's way 
 through QA and should be out in a few days.  If you haven't upgraded yet 
 you may want to wait.
 
 Thanks!
 sage

Hi Sage,

I've seen a couple Redmine tickets on this (eg
http://tracker.ceph.com/issues/9854 ,
http://tracker.ceph.com/issues/10956). It's not totally clear to me
which of the 70+ unreleased commits on the firefly branch fix this
librbd issue.  Is it only the three commits in
https://github.com/ceph/ceph/pull/3410 , or are there more?

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Unexpected OSD down during deep-scrub

2015-03-03 Thread Loic Dachary
Hi Yann,

That seems related to http://tracker.ceph.com/issues/10536 which seems to be 
resolved. Could you create a new issue with a link to 10536 ? More logs and 
ceph report would also be useful to figure out why it resurfaced.

Thanks !


On 04/03/2015 00:04, Yann Dupont wrote:
 
 Le 03/03/2015 22:03, Italo Santos a écrit :

 I realised that when the first OSD goes down, the cluster was performing a 
 deep-scrub and I found the bellow trace on the logs of osd.8, anyone can 
 help me understand why the osd.8, and other osds, unexpected goes down?

 
 I'm afraid I've seen this this afternoon too on my test cluster, just after 
 upgrading from 0.87 to 0.93. After an initial migration success, some OSD 
 started to go down : All presented similar stack traces , with magic word 
 scrub in it :
 
 ceph version 0.93 (bebf8e9a830d998eeaab55f86bb256d4360dd3c4)
  1: /usr/bin/ceph-osd() [0xbeb3dc]
  2: (()+0xf0a0) [0x7f8f3ca130a0]
  3: (gsignal()+0x35) [0x7f8f3b37d165]
  4: (abort()+0x180) [0x7f8f3b3803e0]
  5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f8f3bbd389d]
  6: (()+0x63996) [0x7f8f3bbd1996]
  7: (()+0x639c3) [0x7f8f3bbd19c3]
  8: (()+0x63bee) [0x7f8f3bbd1bee]
  9: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
 const*)+0x220) [0xcd74f0]
  10: (ReplicatedPG::issue_repop(ReplicatedPG::RepGather*, utime_t)+0x1fc) 
 [0x97259c]
  11: (ReplicatedPG::simple_repop_submit(ReplicatedPG::RepGather*)+0x7a) 
 [0x97344a]
  12: (ReplicatedPG::_scrub(ScrubMap, std::maphobject_t, std::pairunsigned 
 int, unsigned int, std::lesshobject_t, std::allocatorstd::pairhobject_t 
 const, std::pa
 irunsigned int, unsigned intconst)+0x2e4d) [0x9a5ded]
  13: (PG::scrub_compare_maps()+0x658) [0x916378]
  14: (PG::chunky_scrub(ThreadPool::TPHandle)+0x202) [0x917ee2]
  15: (PG::scrub(ThreadPool::TPHandle)+0x3a3) [0x919f83]
  16: (OSD::ScrubWQ::_process(PG*, ThreadPool::TPHandle)+0x13) [0x7eff93]
  17: (ThreadPool::worker(ThreadPool::WorkThread*)+0x629) [0xcc8c49]
  18: (ThreadPool::WorkThread::entry()+0x10) [0xccac40]
  19: (()+0x6b50) [0x7f8f3ca0ab50]
  20: (clone()+0x6d) [0x7f8f3b42695d]
 
 As a temporary measure, noscrub and nodeep-scrub are now set for this 
 cluster, and all is working fine right now.
 
 So there is probably something wrong here. Need to investigate further.
 
 Cheers,
 
 
 
 
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Clustering a few NAS into a Ceph cluster

2015-03-03 Thread Loic Dachary
Hi Ceph,

Last week-end I discussed with a friend about a use case many of us thought 
about already: it would be cool to have a simple way to assemble Ceph aware NAS 
fresh from the store. I summarized the use case and interface we discussed here 
: 

  https://wiki.ceph.com/Clustering_a_few_NAS_into_a_Ceph_cluster

It is far from polished but I hope it will trigger some discussions. The best 
of comments would be: wait, that already exists at URL ;-) But if that's not 
the case, maybe we can improve it.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.8 and librbd performance

2015-03-03 Thread Sage Weil
On Wed, 4 Mar 2015, Olivier Bonvalet wrote:
 Does kernel client affected by the problem ?

Nope.  The kernel client is unaffected.. the issue is in librbd.

sage


 
 Le mardi 03 mars 2015 à 15:19 -0800, Sage Weil a écrit :
  Hi,
  
  This is just a heads up that we've identified a performance regression in 
  v0.80.8 from previous firefly releases.  A v0.80.9 is working it's way 
  through QA and should be out in a few days.  If you haven't upgraded yet 
  you may want to wait.
  
  Thanks!
  sage
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 
 ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC configuration questions...

2015-03-03 Thread Don Doerner
Loic,

Thank you, I got it created.  One of these days, I am going to have to try to 
understand some of the crush map details...  Anyway, on to the next step!

-don-

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Robert LeBlanc
I would be inclined to shut down both OSDs in a node, let the cluster
recover. Once it is recovered, shut down the next two, let it recover.
Repeat until all the OSDs are taken out of the cluster. Then I would
set nobackfill and norecover. Then remove the hosts/disks from the
CRUSH then unset nobackfill and norecover.

That should give you a few small changes (when you shut down OSDs) and
then one big one to get everything in the final place. If you are
still adding new nodes, when nobackfill and norecover is set, you can
add them in so that the one big relocate fills the new drives too.

On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com wrote:
 Thx Irek. Number of replicas is 3.

 I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
 decommissioned), which is further connected to a new 10G switch/network with
 3 servers on it with 12 OSDs each.
 I'm decommissioning old 3 nodes on 1G network...

 So you suggest removing whole node with 2 OSDs manually from crush map?
 Per my knowledge, ceph never places 2 replicas on 1 node, all 3 replicas
 were originally been distributed over all 3 nodes. So anyway It could be
 safe to remove 2 OSDs at once together with the node itself...since replica
 count is 3...
 ?

 Thx again for your time

 On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:

 Once you have only three nodes in the cluster.
 I recommend you add new nodes to the cluster, and then delete the old.

 2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:

 You have a number of replication?

 2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Hi Irek,

 yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
 degraded and moved/recovered.
 When I after that removed it from Crush map ceph osd crush rm id,
 that's when the stuff with 37% happened.

 And thanks Irek for help - could you kindly just let me know of the
 prefered steps when removing whole node?
 Do you mean I first stop all OSDs again, or just remove each OSD from
 crush map, or perhaps, just decompile cursh map, delete the node 
 completely,
 compile back in, and let it heal/recover ?

 Do you think this would result in less data missplaces and moved arround
 ?

 Sorry for bugging you, I really appreaciate your help.

 Thanks

 On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low
 percentage degradation). If you had not made ceph osd crush rm id, the
 percentage would be low.
 In your case, the correct option is to remove the entire node, rather
 than each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround
 - this is MISPLACED object (degraded objects were 0.001%, after I 
 removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of
 CEPH, but still 37% of object missplaces just by removing 1 OSD from 
 crush
 maps out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 
 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay
 of 10sec, meaning that every once in a while, I will have 10sec od the
 cluster NOT being stressed/overloaded, and then the recovery takes 
 place for
 that PG, and then another 10sec cluster is fine, and then stressed 
 again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon
 /var/run/ceph/ceph-osd.94.asok config show  | grep 
 osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it
 caused over 37% od the data to rebalance - let's say this is fine 
 (this is
 when I removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 
 1500
 MB/s - and VMs were unusable completely, and then last 4h of the 
 duration of
 recover this recovery rate went down to, say, 100-200 MB.s and during 
 this
 VM performance was still pretty impacted, but at least I could work 
 more or
 a less

 So my question, is this behaviour expected, is throtling here
 working as expected, since first 1h was almoust no 

[ceph-users] problem in cephfs for remove empty directory

2015-03-03 Thread Daniel Takatori Ohara
Hi,

I have a  problem when i will remove a empty directory in cephfs. The
directory is empty, but it seems have files crashed in MDS.

*$ls test-daniel-old/*
total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../

*$rm -rf test-daniel-old/*
rm: cannot remove ‘test-daniel-old/’: Directory not empty

*$ls test-daniel-old/*
ls: cannot access
test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
or directory
ls: cannot access
test-daniel-old/M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
file or directory
ls: cannot access
test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
or directory
ls: cannot access
test-daniel-old/M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
file or directory
ls: cannot access
test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
or directory
ls: cannot access
test-daniel-old/M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
file or directory
ls: cannot access
test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam: No such file
or directory
ls: cannot access
test-daniel-old/M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam: No such
file or directory
total 0
drwx-- 1 rmagalhaes BioInfoHSL Users0 Mar  2 10:52 ./
drwx-- 1 rmagalhaes BioInfoHSL Users 773099838313 Mar  2 11:41 ../
l? ? ?  ?   ??
M_S8_L001_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L001_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L002_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L002_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L003_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L003_R1-2_001.fastq.gz_sylvio.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L004_R1-2_001.fastq.gz_ref.sam_fixed.bam
l? ? ?  ?   ??
M_S8_L004_R1-2_001.fastq.gz_sylvio.sam_fixed.bam


Att.

---
Daniel Takatori Ohara.
System Administrator - Lab. of Bioinformatics
Molecular Oncology Center
Instituto Sírio-Libanês de Ensino e Pesquisa
Hospital Sírio-Libanês
Phone: +55 11 3155-0200 (extension 1927)
R: Cel. Nicolau dos Santos, 69
São Paulo-SP. 01308-060
http://www.bioinfo.mochsl.org.br
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Some long running ops may lock osd

2015-03-03 Thread Erdem Agaoglu
Looking further, i guess what i tried to tell was a simplified version of
sharded threadpools, released in giant. Is it possible for that to be
backported to firefly?

On Tue, Mar 3, 2015 at 9:33 AM, Erdem Agaoglu erdem.agao...@gmail.com
wrote:

 Thank you folks for bringing that up. I had some questions about sharding.
 We'd like blind buckets too, at least it's on the roadmap. For the current
 sharded implementation, what are the final details? Is number of shards
 defined per bucket or globally? Is there a way to split current indexes
 into shards?

 On the other hand what i'd like to point here is not necessarily
 large-bucket-index specific. The problem is the mechanism around thread
 pools. Any request may require locks on a pg and this should not block the
 requests for other pgs. I'm no expert but the threads may be able to
 requeue the requests to a locked pg, processing others for other pgs. Or
 maybe a thread per pg design was possible. Because, you know, it is
 somewhat OK not being able to do anything for a locked resource. Then you
 can go and improve your processing or your locks. But it's a whole
 different problem when a locked pg blocks requests for a few hundred other
 pgs in other pools for no good reason.

 On Tue, Mar 3, 2015 at 5:43 AM, Ben Hines bhi...@gmail.com wrote:

 Blind-bucket would be perfect for us, as we don't need to list the
 objects.

 We only need to list the bucket when doing a bucket deletion. If we
 could clean out/delete all objects in a bucket (without
 iterating/listing them) that would be ideal..

 On Mon, Mar 2, 2015 at 7:34 PM, GuangYang yguan...@outlook.com wrote:
  We have had good experience so far keeping each bucket less than 0.5
 million objects, by client side sharding. But I think it would be nice you
 can test at your scale, with your hardware configuration, as well as your
 expectation over the tail latency.
 
  Generally the bucket sharding should help, both for Write throughput
 and *stall with recovering/scrubbing*, but it comes with a prices -  The X
 shards you have for each bucket, the listing/trimming would be X times
 weighted, from OSD's load's point of view. There was discussion to
 implement: 1) blind bucket (for use cases bucket listing is not needed). 2)
 Un-ordered listing, which could improve the problem I mentioned above. They
 are on the roadmap...
 
  Thanks,
  Guang
 
 
  
  From: bhi...@gmail.com
  Date: Mon, 2 Mar 2015 18:13:25 -0800
  To: erdem.agao...@gmail.com
  CC: ceph-users@lists.ceph.com
  Subject: Re: [ceph-users] Some long running ops may lock osd
 
  We're seeing a lot of this as well. (as i mentioned to sage at
  SCALE..) Is there a rule of thumb at all for how big is safe to let a
  RGW bucket get?
 
  Also, is this theoretically resolved by the new bucket-sharding
  feature in the latest dev release?
 
  -Ben
 
  On Mon, Mar 2, 2015 at 11:08 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com wrote:
  Hi Gregory,
 
  We are not using listomapkeys that way or in any way to be precise. I
 used
  it here just to reproduce the behavior/issue.
 
  What i am really interested in is if scrubbing-deep actually
 mitigates the
  problem and/or is there something that can be further improved.
 
  Or i guess we should go upgrade now and hope for the best :)
 
  On Mon, Mar 2, 2015 at 8:10 PM, Gregory Farnum g...@gregs42.com
 wrote:
 
  On Mon, Mar 2, 2015 at 7:56 AM, Erdem Agaoglu 
 erdem.agao...@gmail.com
  wrote:
  Hi all, especially devs,
 
  We have recently pinpointed one of the causes of slow requests in
 our
  cluster. It seems deep-scrubs on pg's that contain the index file
 for a
  large radosgw bucket lock the osds. Incresing op threads and/or disk
  threads
  helps a little bit, but we need to increase them beyond reason in
 order
  to
  completely get rid of the problem. A somewhat similar (and more
 severe)
  version of the issue occurs when we call listomapkeys for the index
  file,
  and since the logs for deep-scrubbing was much harder read, this
  inspection
  was based on listomapkeys.
 
  In this example osd.121 is the primary of pg 10.c91 which contains
 file
  .dir.5926.3 in .rgw.buckets pool. OSD has 2 op threads. Bucket
 contains
  ~500k objects. Standard listomapkeys call take about 3 seconds.
 
  time rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null
  real 0m2.983s
  user 0m0.760s
  sys 0m0.148s
 
  In order to lock the osd we request 2 of them simultaneously with
  something
  like:
 
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
  sleep 1
  rados -p .rgw.buckets listomapkeys .dir.5926.3 /dev/null 
 
  'debug_osd=30' logs show the flow like:
 
  At t0 some thread enqueue_op's my omap-get-keys request.
  Op-Thread A locks pg 10.c91 and dequeue_op's it and starts reading
 ~500k
  keys.
  Op-Thread B responds to several other requests during that 1 second
  sleep.
  They're generally extremely fast subops on other pgs.
  At t1 (about a second later) my 

Re: [ceph-users] backfill_toofull, but OSDs not full

2015-03-03 Thread wsnote
ceph 0.80.1 
The same quesiton.
I have deleted 1/4 data, but the problem didn't disappear
Does anyone have other way to solve it?


At 2015-01-10 05:31:30,Udo Lembke ulem...@polarzone.de wrote:
Hi,
I had an similiar effect two weeks ago - 1PG backfill_toofull and due
reweighting and delete there was enough free space but the rebuild
process stopped after a while.

After stop and start ceph on the second node, the rebuild process runs
without trouble and the backfill_toofull are gone.

This happens with firefly.

Udo

On 09.01.2015 21:29, c3 wrote:
 In this case the root cause was half denied reservations.

 http://tracker.ceph.com/issues/9626

 This stopped backfills since, those listed as backfilling were
 actually half denied and doing nothing. The toofull status is not
 checked until a free backfill slot happens, so everything was just stuck.

 Interestingly, the toofull was created by other backfills which were
 not stoppped.
 http://tracker.ceph.com/issues/9594

 Quite the log jam to clear.


 Quoting Craig Lewis cle...@centraldesktop.com:

 What was the osd_backfill_full_ratio?  That's the config that controls
 backfill_toofull.  By default, it's 85%.  The mon_osd_*_ratio affect the
 ceph status.

 I've noticed that it takes a while for backfilling to restart after
 changing osd_backfill_full_ratio.  Backfilling usually restarts for
 me in
 10-15 minutes.  Some PGs will stay in that state until the cluster is
 nearly done recoverying.

 I've only seen backfill_toofull happen after the OSD exceeds the
 ratio (so
 it's reactive, no proactive).  Mine usually happen when I'm
 rebalancing a
 nearfull cluster, and an OSD backfills itself toofull.




 On Mon, Jan 5, 2015 at 11:32 AM, c3 ceph-us...@lopkop.com wrote:

 Hi,

 I am wondering how a PG gets marked backfill_toofull.

 I reweighted several OSDs using ceph osd crush reweight. As
 expected, PG
 began moving around (backfilling).

 Some PGs got marked +backfilling (~10), some +wait_backfill (~100).

 But some are marked +backfill_toofull. My OSDs are between 25% and 72%
 full.

 Looking at ceph pg dump, I can find the backfill_toofull PGs and
 verified
 the OSDs involved are less than 72% full.

 Do backfill reservations include a size? Are these OSDs projected to be
 toofull, once the current backfilling complete? Some of the
 backfill_toofull and backfilling point to the same OSDs.

 I did adjust the full ratios, but that did not change the
 backfill_toofull
 status.
 ceph tell mon.\* injectargs '--mon_osd_full_ratio 0.95'
 ceph tell osd.\* injectargs '--osd_backfill_full_ratio 0.92'


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Irek Fasikhov
Hi.

Use value osd_recovery_delay_start
example:
[root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
config show  | grep osd_recovery_delay_start
  osd_recovery_delay_start: 10

2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but during
 first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s -
 and VMs were unusable completely, and then last 4h of the duration of
 recover this recovery rate went down to, say, 100-200 MB.s and during this
 VM performance was still pretty impacted, but at least I could work more or
 a less

 So my question, is this behaviour expected, is throtling here working as
 expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problems with shadow objects

2015-03-03 Thread Butkeev Stas
Hello, all

I have ceph+RGW installation. And have some problems with shadow objects.

For example:
#rados ls -p .rgw.buckets|grep default.4507.1

.
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.1_5
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.2_2
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.6_4
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.4_2
default.4507.1__shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.3_5
.

Please give me advices and answer on my questions
1) How can I rm this shadow files?
2) What does the name of this shadow files?
example
with normal object:
# radosgw-admin object stat --bucket=dev --object=RegExp_tutorial.png
and I receive information about this object.

with shadow object:
default.4507.1_ - bucket-id
radosgw-admin object stat --bucket=dev 
--object=_shadow_test_s3.2/2vO4WskQNBGMnC8MGaYPSLfGkhQY76U.2_7
ERROR: failed to stat object, returned error: (2) No such file or directory
how can I determine name of this object

-- 
Best Regards,
Stanislav Butkeev
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Understand RadosGW logs

2015-03-03 Thread Daniel Schneller

Hi!

After realizing the problem with log rotation (see
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
and fixing it, I now for the first time have some
meaningful (and recent) logs to look at.

While from an application perspective there seem
to be no issues, I would like to understand some
messages I find with relatively high frequency in
the logs:

Exhibit 1
-
2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1


I cannot find anything relevant by Googling for
that, apart from the actual line of code that
produces this line.
What does that mean? Is it an indication of data
corruption or are there more benign reasons for
this line?


Exhibit 2
--
Several of these blocks

2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request 
req=0x7fcf5800f3b0 =
2015-03-03 07:06:17.836671 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836758 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836918 7fcf36fed700  0 
RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243 
part_ofs=0 rule-part_size=0
2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done 
req=0x7fcf5800f3b0 http_status=200 ==

...
2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request 
req=0x7fcf580102a0 =
2015-03-03 09:27:29.866718 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866778 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866852 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866917 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.875466 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.884434 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.906155 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.914364 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.940653 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:30.272816 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.125773 7fcf28fd1700  0 
RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632 
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list(): 
d-client_c-handle_data() returned -1
2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done 
req=0x7fcf580102a0 http_status=200 ==

...
2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request 
req=0x7fcf580102a0 =
2015-03-03 09:28:43.016414 7fcf2a7d4700  0 
RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579 
part_ofs=0 rule-part_size=0
2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done 
req=0x7fcf580102a0 http_status=200 ==


First, what is the req= line? Is that a thread-id?
I am asking, because the same id is used over and over
in the same file over time.

More importantly, what do the RGWObjManifest::operator++():...
lines mean? In the middle case above the block even ends
with one of the ERROR lines mentioned before, but the HTTP
status is still 200, suggesting a succesful operation.

Thanks in advance for shedding some light, because I would like
to know if I need to take some action or at least keep an
eye on these via monitoring?

Cheers,
Daniel


___
ceph-users 

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Irek Fasikhov
A large percentage of the rebuild of the cluster map (But low percentage
degradation). If you had not made ceph osd crush rm id, the percentage
would be low.
In your case, the correct option is to remove the entire node, rather than
each disk individually

2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround -
 this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are demoting
 some old hardware nodes). This means I can potentialy go with 7 x the same
 number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working
 as expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Another question - I mentioned here 37% of objects being moved arround -
this is MISPLACED object (degraded objects were 0.001%, after I removed 1
OSD from cursh map (out of 44 OSD or so).

Can anybody confirm this is normal behaviour - and are there any
workarrounds ?

I understand this is because of the object placement algorithm of CEPH, but
still 37% of object missplaces just by removing 1 OSD from crush maps out
of 44 make me wonder why this large percentage ?

Seems not good to me, and I have to remove another 7 OSDs (we are demoting
some old hardware nodes). This means I can potentialy go with 7 x the same
number of missplaced objects...?

Any thoughts ?

Thanks

On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working as
 expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Irek Fasikhov
osd_recovery_delay_start - is the delay in seconds between iterations
recovery (osd_recovery_max_active)

It is described here:
https://github.com/ceph/ceph/search?utf8=%E2%9C%93q=osd_recovery_delay_start


2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround -
 this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are demoting
 some old hardware nodes). This means I can potentialy go with 7 x the same
 number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working
 as expected, since first 1h was almoust no throtling applied if I check the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
Hi Irek,

yes, stoping OSD (or seting it to OUT) resulted in only 3% of data degraded
and moved/recovered.
When I after that removed it from Crush map ceph osd crush rm id, that's
when the stuff with 37% happened.

And thanks Irek for help - could you kindly just let me know of the
prefered steps when removing whole node?
Do you mean I first stop all OSDs again, or just remove each OSD from crush
map, or perhaps, just decompile cursh map, delete the node completely,
compile back in, and let it heal/recover ?

Do you think this would result in less data missplaces and moved arround ?

Sorry for bugging you, I really appreaciate your help.

Thanks

On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com wrote:

 A large percentage of the rebuild of the cluster map (But low percentage
 degradation). If you had not made ceph osd crush rm id, the percentage
 would be low.
 In your case, the correct option is to remove the entire node, rather than
 each disk individually

 2015-03-03 14:27 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 Another question - I mentioned here 37% of objects being moved arround -
 this is MISPLACED object (degraded objects were 0.001%, after I removed 1
 OSD from cursh map (out of 44 OSD or so).

 Can anybody confirm this is normal behaviour - and are there any
 workarrounds ?

 I understand this is because of the object placement algorithm of CEPH,
 but still 37% of object missplaces just by removing 1 OSD from crush maps
 out of 44 make me wonder why this large percentage ?

 Seems not good to me, and I have to remove another 7 OSDs (we are
 demoting some old hardware nodes). This means I can potentialy go with 7 x
 the same number of missplaced objects...?

 Any thoughts ?

 Thanks

 On 3 March 2015 at 12:14, Andrija Panic andrija.pa...@gmail.com wrote:

 Thanks Irek.

 Does this mean, that after peering for each PG, there will be delay of
 10sec, meaning that every once in a while, I will have 10sec od the cluster
 NOT being stressed/overloaded, and then the recovery takes place for that
 PG, and then another 10sec cluster is fine, and then stressed again ?

 I'm trying to understand process before actually doing stuff (config
 reference is there on ceph.com but I don't fully understand the process)

 Thanks,
 Andrija

 On 3 March 2015 at 11:32, Irek Fasikhov malm...@gmail.com wrote:

 Hi.

 Use value osd_recovery_delay_start
 example:
 [root@ceph08 ceph]# ceph --admin-daemon /var/run/ceph/ceph-osd.94.asok
 config show  | grep osd_recovery_delay_start
   osd_recovery_delay_start: 10

 2015-03-03 13:13 GMT+03:00 Andrija Panic andrija.pa...@gmail.com:

 HI Guys,

 I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused
 over 37% od the data to rebalance - let's say this is fine (this is when I
 removed it frm Crush Map).

 I'm wondering - I have previously set some throtling mechanism, but
 during first 1h of rebalancing, my rate of recovery was going up to 1500
 MB/s - and VMs were unusable completely, and then last 4h of the duration
 of recover this recovery rate went down to, say, 100-200 MB.s and during
 this VM performance was still pretty impacted, but at least I could work
 more or a less

 So my question, is this behaviour expected, is throtling here working
 as expected, since first 1h was almoust no throtling applied if I check 
 the
 recovery rate 1500MB/s and the impact on Vms.
 And last 4h seemed pretty fine (although still lot of impact in
 general)

 I changed these throtling on the fly with:

 ceph tell osd.* injectargs '--osd_recovery_max_active 1'
 ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
 ceph tell osd.* injectargs '--osd_max_backfills 1'

 My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one
 SSD, 6 journals on another SSD)  - I have 3 of these hosts.

 Any thought are welcome.
 --

 Andrija Panić

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




 --

 Andrija Panić




 --

 Andrija Panić




 --
 С уважением, Фасихов Ирек Нургаязович
 Моб.: +79229045757




-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-03 Thread Andrija Panic
HI Guys,

I yesterday removed 1 OSD from cluster (out of 42 OSDs), and it caused over
37% od the data to rebalance - let's say this is fine (this is when I
removed it frm Crush Map).

I'm wondering - I have previously set some throtling mechanism, but during
first 1h of rebalancing, my rate of recovery was going up to 1500 MB/s -
and VMs were unusable completely, and then last 4h of the duration of
recover this recovery rate went down to, say, 100-200 MB.s and during this
VM performance was still pretty impacted, but at least I could work more or
a less

So my question, is this behaviour expected, is throtling here working as
expected, since first 1h was almoust no throtling applied if I check the
recovery rate 1500MB/s and the impact on Vms.
And last 4h seemed pretty fine (although still lot of impact in general)

I changed these throtling on the fly with:

ceph tell osd.* injectargs '--osd_recovery_max_active 1'
ceph tell osd.* injectargs '--osd_recovery_op_priority 1'
ceph tell osd.* injectargs '--osd_max_backfills 1'

My Jorunals are on SSDs (12 OSD per server, of which 6 journals on one SSD,
6 journals on another SSD)  - I have 3 of these hosts.

Any thought are welcome.
-- 

Andrija Panić
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com