date:20150611

[ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer

Hi,
hoping someone can point me in the right direction.

Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I 
restart the OSD everything runs nicely for some time, then it creeps up.

1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
80%. Restarting means the offending OSDs only use 40% again.
2) average latencies and CPU usage on the host are the same - so it’s not 
caused by the host that the OSD is running on
3) I can’t say exactly when or how the issue happens. I can’t even say if it’s 
the same OSDs. It seems it either happens when something heavy happens in a 
cluster (like dropping very old snapshots, rebalancing) and then doesn’t come 
back, or maybe it happens slowly over time and I can’t find it in the graphs. 
Looking at the graphs it seems to be the former.

I have just one suspicion and that is the “fd cache size” - we have it set to 
16384 but the open fds suggest there are more open files for the osd process 
(over 17K fds) - it varies by some hundreds between the osds. Maybe some are 
just slightly over the limit and the misses cause this? Restarting the OSD 
clears them (~2K) and they increase over time. I increased it to 32768 
yesterday and it consistently nice now, but it might take another few days to 
manifest…
Could this explain it? Any other tips?

Thanks

Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-11 Thread Ilya Dryomov

On Thu, Jun 11, 2015 at 2:23 PM, Ilya Dryomov idryo...@gmail.com wrote:
 On Wed, Jun 10, 2015 at 7:07 PM, Ilya Dryomov idryo...@gmail.com wrote:
 On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk n...@fisk.me.uk wrote:
   -Original Message-
   From: Ilya Dryomov [mailto:idryo...@gmail.com]
   Sent: 10 June 2015 14:06
   To: Nick Fisk
   Cc: ceph-users
   Subject: Re: [ceph-users] krbd splitting large IO's into smaller
   IO's

   On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk n...@fisk.me.uk wrote:
Hi,

Using Kernel RBD client with Kernel 4.03 (I have also tried some
older kernels with the same effect) and IO is being split into
smaller IO's which is having a negative impact on performance.

cat /sys/block/sdc/queue/max_hw_sectors_kb
4096

cat /sys/block/rbd0/queue/max_sectors_kb
4096

Using DD
dd if=/dev/rbd0 of=/dev/null bs=4M

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.00  201.500.00 25792.00 0.00
 256.00
1.99   10.15   10.150.00   4.96 100.00

Using FIO with 4M blocks
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.00  232.000.00 118784.00 0.00
 1024.00
11.29   48.58   48.580.00   4.31 100.00

Any ideas why IO sizes are limited to 128k (256 blocks) in DD's
case and 512k in Fio's case?

   128k vs 512k is probably buffered vs direct IO - add iflag=direct
   to your dd invocation.

   Yes, thanks for this, that was the case

   As for the 512k - I'm pretty sure it's a regression in our switch
   to blk-mq.  I tested it around 3.18-3.19 and saw steady 4M IOs.  I
   hope we are just missing a knob - I'll take a look.

   I've tested both 4.03 and 3.16 and both seem to be split into 512k.
   Let
 me
  know if you need me to test any other particular version.

  With 3.16 you are going to need to adjust max_hw_sectors_kb /
  max_sectors_kb as discussed in Dan's thread.  The patch that fixed
  that in the block layer went into 3.19, blk-mq into 4.0 - try 3.19.

 Sorry should have mentioned, I had adjusted both of them on the 3.16
 kernel to 4096.
 I will try 3.19 and let you know.

 Better with 3.19, but should I not be seeing around 8192, or am I getting my
 blocks and bytes mixed up?

 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0 72.00 0.00   24.000.00 49152.00 0.00  4096.00
 1.96   82.67   82.670.00  41.58  99.80

 I'd expect 8192.  I'm getting a box for investigation.

 OK, so this is bug in the blk-mq part of block layer.  There is no
 plugging going on in the single hardware queue (i.e. krbd) case - it
 never once plugs the queue, and that means no request merging is done
 for your direct sequential read test.  It gets 512k bios and those same
 512k requests are issued to krbd.

 While queue plugging may not make sense in the multi queue case, I'm
 pretty sure it's supposed to plug in the single queue case.  Looks like
 use_plug logic in blk_sq_make_request() is busted.

It turns out to be a year old regression.  Before commit 07068d5b8ed8
(blk-mq: split make request handler for multi and single queue) it
used to be (reads are considered sync)

use_plug = !is_flush_fua  ((q-nr_hw_queues == 1) || !is_sync);

and now it is

use_plug = !is_flush_fua  !is_sync;

in a function that is only called if q-nr_hw_queues == 1.

This is getting fixed by blk-mq: fix plugging in blk_sq_make_request
from Jeff Moyer - http://article.gmane.org/gmane.linux.kernel/1941750.
Looks like it's on its way to mainline along with some other blk-mq
plugging fixes.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-11 Thread Irek Fasikhov

It is necessary to synchronize time

2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.com:

  i'm trying to add a extra monitor to my already existing cluster
 i do this with the ceph-deploy with the following command

 ceph-deploy mon add mynewhost

 the ceph-deploy says its all finished
 but when i take a look at my new monitor host in the logs i see the
 following error

 cephx: verify_reply couldn't decrypt with error: error decoding block for
 decryption

 and when i take a look in my existing monitor logs i see this error
 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES
 final round failed: -8190

 i tried gatherking key's
 copy keys
 reinstall/purge the new monitor node

 greetz
 Ramon 
 For information, services and offers, please visit our web site:
 http://www.klm.com. This e-mail and any attachment may contain
 confidential and privileged material intended for the addressee only. If
 you are not the addressee, you are notified that no part of the e-mail or
 any attachment may be disclosed, copied or distributed, and that any other
 action related to this e-mail or attachment is strictly prohibited, and may
 be unlawful. If you have received this e-mail by error, please notify the
 sender immediately by return e-mail, and delete this message.

 Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
 employees shall not be liable for the incorrect or incomplete transmission
 of this e-mail or any attachments, nor responsible for any delay in receipt.
 Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
 Airlines) is registered in Amstelveen, The Netherlands, with registered
 number 33014286
 

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-11 Thread Makkelie, R (ITCDCC) - KLM

@Irek Fasikhov
as i said in my previous post
all my servers clocks are in sync
i double checked it several times just to be sure

i hope you have any other clues

-Original Message-
From: Irek Fasikhov 
malm...@gmail.commailto:irek%20fasikhov%20%3cmalm...@gmail.com%3e
To: Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:%22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e
Cc: ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:%22ceph-us...@lists.ceph.com%22%20%3cceph-us...@lists.ceph.com%3e
Subject: Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: 
verify_reply couldn't decrypt with error: error decoding block for decryption]
Date: Thu, 11 Jun 2015 12:38:10 +0300

Hands follow command: ntpdate NTPADDRESS


2015-06-11 12:36 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:ramon.makke...@klm.com:
all ceph releated servers have the same NTP server
and double checked the time and timezones
the are all correct


-Original Message-
From: Irek Fasikhov 
malm...@gmail.commailto:irek%20fasikhov%20%3cmalm...@gmail.com%3e
To: Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:%22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e
Cc: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:%22ceph-us...@lists.ceph.com%22%20%3cceph-us...@lists.ceph.com%3e
Subject: Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: 
verify_reply couldn't decrypt with error: error decoding block for decryption]
Date: Thu, 11 Jun 2015 12:16:53 +0300

It is necessary to synchronize time


2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:ramon.makke...@klm.com:
i'm trying to add a extra monitor to my already existing cluster
i do this with the ceph-deploy with the following command

ceph-deploy mon add mynewhost

the ceph-deploy says its all finished
but when i take a look at my new monitor host in the logs i see the following 
error

cephx: verify_reply couldn't decrypt with error: error decoding block for 
decryption

and when i take a look in my existing monitor logs i see this error
cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final 
round failed: -8190

i tried gatherking key's
copy keys
reinstall/purge the new monitor node

greetz
Ramon 
For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286


___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





-- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286





-- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Dan van der Ster

Hi Jan,

Can you get perf top running? It should show you where the OSDs are spinning...

Cheers, Dan

On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.

 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I 
 restart the OSD everything runs nicely for some time, then it creeps up.

 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so it’s not 
 caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then doesn’t 
 come back, or maybe it happens slowly over time and I can’t find it in the 
 graphs. Looking at the graphs it seems to be the former.

 I have just one suspicion and that is the “fd cache size” - we have it set to 
 16384 but the open fds suggest there are more open files for the osd process 
 (over 17K fds) - it varies by some hundreds between the osds. Maybe some are 
 just slightly over the limit and the misses cause this? Restarting the OSD 
 clears them (~2K) and they increase over time. I increased it to 32768 
 yesterday and it consistently nice now, but it might take another few days to 
 manifest…
 Could this explain it? Any other tips?

 Thanks

 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-11 Thread Makkelie, R (ITCDCC) - KLM

all ceph releated servers have the same NTP server
and double checked the time and timezones
the are all correct

-Original Message-
From: Irek Fasikhov 
malm...@gmail.commailto:irek%20fasikhov%20%3cmalm...@gmail.com%3e
To: Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:%22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e
Cc: ceph-users@lists.ceph.com 
ceph-users@lists.ceph.commailto:%22ceph-us...@lists.ceph.com%22%20%3cceph-us...@lists.ceph.com%3e
Subject: Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: 
verify_reply couldn't decrypt with error: error decoding block for decryption]
Date: Thu, 11 Jun 2015 12:16:53 +0300

It is necessary to synchronize time

2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.commailto:ramon.makke...@klm.com:
i'm trying to add a extra monitor to my already existing cluster
i do this with the ceph-deploy with the following command

ceph-deploy mon add mynewhost

the ceph-deploy says its all finished
but when i take a look at my new monitor host in the logs i see the following 
error

cephx: verify_reply couldn't decrypt with error: error decoding block for 
decryption

and when i take a look in my existing monitor logs i see this error
cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final 
round failed: -8190

i tried gatherking key's
copy keys
reinstall/purge the new monitor node

greetz
Ramon 
For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-11 Thread Ilya Dryomov

On Wed, Jun 10, 2015 at 7:07 PM, Ilya Dryomov idryo...@gmail.com wrote:
 On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk n...@fisk.me.uk wrote:
   -Original Message-
   From: Ilya Dryomov [mailto:idryo...@gmail.com]
   Sent: 10 June 2015 14:06
   To: Nick Fisk
   Cc: ceph-users
   Subject: Re: [ceph-users] krbd splitting large IO's into smaller
   IO's

   On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk n...@fisk.me.uk wrote:
Hi,

Using Kernel RBD client with Kernel 4.03 (I have also tried some
older kernels with the same effect) and IO is being split into
smaller IO's which is having a negative impact on performance.

cat /sys/block/sdc/queue/max_hw_sectors_kb
4096

cat /sys/block/rbd0/queue/max_sectors_kb
4096

Using DD
dd if=/dev/rbd0 of=/dev/null bs=4M

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.00  201.500.00 25792.00 0.00
 256.00
1.99   10.15   10.150.00   4.96 100.00

Using FIO with 4M blocks
Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
rbd0  0.00 0.00  232.000.00 118784.00 0.00
 1024.00
11.29   48.58   48.580.00   4.31 100.00

Any ideas why IO sizes are limited to 128k (256 blocks) in DD's
case and 512k in Fio's case?

   128k vs 512k is probably buffered vs direct IO - add iflag=direct
   to your dd invocation.

   Yes, thanks for this, that was the case

   As for the 512k - I'm pretty sure it's a regression in our switch
   to blk-mq.  I tested it around 3.18-3.19 and saw steady 4M IOs.  I
   hope we are just missing a knob - I'll take a look.

   I've tested both 4.03 and 3.16 and both seem to be split into 512k.
   Let
 me
  know if you need me to test any other particular version.

  With 3.16 you are going to need to adjust max_hw_sectors_kb /
  max_sectors_kb as discussed in Dan's thread.  The patch that fixed
  that in the block layer went into 3.19, blk-mq into 4.0 - try 3.19.

 Sorry should have mentioned, I had adjusted both of them on the 3.16
 kernel to 4096.
 I will try 3.19 and let you know.

 Better with 3.19, but should I not be seeing around 8192, or am I getting my
 blocks and bytes mixed up?

 Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0 72.00 0.00   24.000.00 49152.00 0.00  4096.00
 1.96   82.67   82.670.00  41.58  99.80

 I'd expect 8192.  I'm getting a box for investigation.

OK, so this is bug in the blk-mq part of block layer.  There is no
plugging going on in the single hardware queue (i.e. krbd) case - it
never once plugs the queue, and that means no request merging is done
for your direct sequential read test.  It gets 512k bios and those same
512k requests are issued to krbd.

While queue plugging may not make sense in the multi queue case, I'm
pretty sure it's supposed to plug in the single queue case.  Looks like
use_plug logic in blk_sq_make_request() is busted.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-11 Thread Irek Fasikhov

Hands follow command: ntpdate NTPADDRESS

2015-06-11 12:36 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
ramon.makke...@klm.com:

  all ceph releated servers have the same NTP server
 and double checked the time and timezones
 the are all correct


 -Original Message-
 *From*: Irek Fasikhov malm...@gmail.com
 irek%20fasikhov%20%3cmalm...@gmail.com%3e
 *To*: Makkelie, R (ITCDCC) - KLM ramon.makke...@klm.com
 %22Makkelie,%20r%20%28itcdcc%29%20-%20klm%22%20%3cramon.makke...@klm.com%3e
 
 *Cc*: ceph-users@lists.ceph.com ceph-users@lists.ceph.com
 %22ceph-us...@lists.ceph.com%22%20%3cceph-us...@lists.ceph.com%3e
 *Subject*: Re: [ceph-users] [Fwd: adding a a monitor wil result in cephx:
 verify_reply couldn't decrypt with error: error decoding block for
 decryption]
 *Date*: Thu, 11 Jun 2015 12:16:53 +0300

 It is necessary to synchronize time


 2015-06-11 11:09 GMT+03:00 Makkelie, R (ITCDCC) - KLM 
 ramon.makke...@klm.com:

 i'm trying to add a extra monitor to my already existing cluster
 i do this with the ceph-deploy with the following command

 ceph-deploy mon add mynewhost

 the ceph-deploy says its all finished
 but when i take a look at my new monitor host in the logs i see the
 following error

 cephx: verify_reply couldn't decrypt with error: error decoding block for
 decryption

 and when i take a look in my existing monitor logs i see this error
 cephx: verify_authorizer could not decrypt ticket info: error: NSS AES
 final round failed: -8190

 i tried gatherking key's
 copy keys
 reinstall/purge the new monitor node

 greetz
 Ramon 
 For information, services and offers, please visit our web site:
 http://www.klm.com. This e-mail and any attachment may contain
 confidential and privileged material intended for the addressee only. If
 you are not the addressee, you are notified that no part of the e-mail or
 any attachment may be disclosed, copied or distributed, and that any other
 action related to this e-mail or attachment is strictly prohibited, and may
 be unlawful. If you have received this e-mail by error, please notify the
 sender immediately by return e-mail, and delete this message.

 Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
 employees shall not be liable for the incorrect or incomplete transmission
 of this e-mail or any attachments, nor responsible for any delay in receipt.
 Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
 Airlines) is registered in Amstelveen, The Netherlands, with registered
 number 33014286
 


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 -- С уважением, Фасихов Ирек Нургаязович Моб.: +79229045757
 
 For information, services and offers, please visit our web site:
 http://www.klm.com. This e-mail and any attachment may contain
 confidential and privileged material intended for the addressee only. If
 you are not the addressee, you are notified that no part of the e-mail or
 any attachment may be disclosed, copied or distributed, and that any other
 action related to this e-mail or attachment is strictly prohibited, and may
 be unlawful. If you have received this e-mail by error, please notify the
 sender immediately by return e-mail, and delete this message.

 Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its
 employees shall not be liable for the incorrect or incomplete transmission
 of this e-mail or any attachments, nor responsible for any delay in receipt.
 Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch
 Airlines) is registered in Amstelveen, The Netherlands, with registered
 number 33014286
 




-- 
С уважением, Фасихов Ирек Нургаязович
Моб.: +79229045757
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Henrik Korkuc


On 6/11/15 12:21, Jan Schermer wrote:

Hi,
hoping someone can point me in the right direction.

Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I 
restart the OSD everything runs nicely for some time, then it creeps up.

1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
80%. Restarting means the offending OSDs only use 40% again.
2) average latencies and CPU usage on the host are the same - so it’s not 
caused by the host that the OSD is running on
3) I can’t say exactly when or how the issue happens. I can’t even say if it’s 
the same OSDs. It seems it either happens when something heavy happens in a 
cluster (like dropping very old snapshots, rebalancing) and then doesn’t come 
back, or maybe it happens slowly over time and I can’t find it in the graphs. 
Looking at the graphs it seems to be the former.

I have just one suspicion and that is the “fd cache size” - we have it set to 
16384 but the open fds suggest there are more open files for the osd process 
(over 17K fds) - it varies by some hundreds between the osds. Maybe some are 
just slightly over the limit and the misses cause this? Restarting the OSD 
clears them (~2K) and they increase over time. I increased it to 32768 
yesterday and it consistently nice now, but it might take another few days to 
manifest…
Could this explain it? Any other tips?

What about disk IO? Are OSDs scrubbing or deep-scrubbing?



Thanks

Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer


 On 11 Jun 2015, at 11:53, Henrik Korkuc li...@kirneh.eu wrote:
 
 On 6/11/15 12:21, Jan Schermer wrote:
 Hi,
 hoping someone can point me in the right direction.
 
 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
 I restart the OSD everything runs nicely for some time, then it creeps up.
 
 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so it’s not 
 caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then 
 doesn’t come back, or maybe it happens slowly over time and I can’t find it 
 in the graphs. Looking at the graphs it seems to be the former.
 
 I have just one suspicion and that is the “fd cache size” - we have it set 
 to 16384 but the open fds suggest there are more open files for the osd 
 process (over 17K fds) - it varies by some hundreds between the osds. Maybe 
 some are just slightly over the limit and the misses cause this? Restarting 
 the OSD clears them (~2K) and they increase over time. I increased it to 
 32768 yesterday and it consistently nice now, but it might take another few 
 days to manifest…
 Could this explain it? Any other tips?
 What about disk IO? Are OSDs scrubbing or deep-scrubbing?

Nope, the OSDs are not scrubbing or deep-scrubbing, and I see the same amount 
of ops/sec on the OSD as before the restart. The things that are not yet at the 
previous (before restart) level are
a) threads: 2200 before restart, 2050 now, slowly going up
b) open files  - changed with fdcache, ~17500 before restart, 31000 now
c) memory usage: rss 1.7GiB x 1.1 now, vss 4.7 x 3.5 now

The amount of work is still the same.

Jan

 
 
 Thanks
 
 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-11 Thread Nick Fisk

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Ilya Dryomov
 Sent: 11 June 2015 12:33
 To: Nick Fisk
 Cc: ceph-users
 Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's

 On Thu, Jun 11, 2015 at 2:23 PM, Ilya Dryomov idryo...@gmail.com
 wrote:
  On Wed, Jun 10, 2015 at 7:07 PM, Ilya Dryomov idryo...@gmail.com
 wrote:
  On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: Ilya Dryomov [mailto:idryo...@gmail.com]
Sent: 10 June 2015 14:06
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] krbd splitting large IO's into
smaller IO's

On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk n...@fisk.me.uk
 wrote:
 Hi,

 Using Kernel RBD client with Kernel 4.03 (I have also tried
 some older kernels with the same effect) and IO is being
 split into smaller IO's which is having a negative impact on
 performance.

 cat /sys/block/sdc/queue/max_hw_sectors_kb
 4096

 cat /sys/block/rbd0/queue/max_sectors_kb
 4096

 Using DD
 dd if=/dev/rbd0 of=/dev/null bs=4M

 Device: rrqm/s   wrqm/s r/s w/srkB/s
wkB/s
  avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0  0.00 0.00  201.500.00 25792.00
0.00
  256.00
 1.99   10.15   10.150.00   4.96 100.00

 Using FIO with 4M blocks
 Device: rrqm/s   wrqm/s r/s w/srkB/s
wkB/s
  avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0  0.00 0.00  232.000.00 118784.00
0.00
  1024.00
 11.29   48.58   48.580.00   4.31 100.00

 Any ideas why IO sizes are limited to 128k (256 blocks) in
 DD's case and 512k in Fio's case?

128k vs 512k is probably buffered vs direct IO - add
iflag=direct to your dd invocation.

Yes, thanks for this, that was the case

As for the 512k - I'm pretty sure it's a regression in our
switch to blk-mq.  I tested it around 3.18-3.19 and saw steady
4M IOs.  I hope we are just missing a knob - I'll take a look.

I've tested both 4.03 and 3.16 and both seem to be split into
512k.
Let
  me
   know if you need me to test any other particular version.

   With 3.16 you are going to need to adjust max_hw_sectors_kb /
   max_sectors_kb as discussed in Dan's thread.  The patch that
   fixed that in the block layer went into 3.19, blk-mq into 4.0 - try
3.19.

  Sorry should have mentioned, I had adjusted both of them on the
  3.16 kernel to 4096.
  I will try 3.19 and let you know.

  Better with 3.19, but should I not be seeing around 8192, or am I
  getting my blocks and bytes mixed up?

  Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
avgrq-sz
  avgqu-sz   await r_await w_await  svctm  %util
  rbd0 72.00 0.00   24.000.00 49152.00 0.00
4096.00
  1.96   82.67   82.670.00  41.58  99.80

  I'd expect 8192.  I'm getting a box for investigation.

  OK, so this is bug in the blk-mq part of block layer.  There is no
  plugging going on in the single hardware queue (i.e. krbd) case - it
  never once plugs the queue, and that means no request merging is done
  for your direct sequential read test.  It gets 512k bios and those
  same 512k requests are issued to krbd.

  While queue plugging may not make sense in the multi queue case, I'm
  pretty sure it's supposed to plug in the single queue case.  Looks
  like use_plug logic in blk_sq_make_request() is busted.

 It turns out to be a year old regression.  Before commit 07068d5b8ed8
 (blk-mq: split make request handler for multi and single queue) it used
to
 be (reads are considered sync)

 use_plug = !is_flush_fua  ((q-nr_hw_queues == 1) || !is_sync);

 and now it is

 use_plug = !is_flush_fua  !is_sync;

 in a function that is only called if q-nr_hw_queues == 1.

 This is getting fixed by blk-mq: fix plugging in blk_sq_make_request
 from Jeff Moyer - http://article.gmane.org/gmane.linux.kernel/1941750.
 Looks like it's on its way to mainline along with some other blk-mq
plugging
 fixes.

That's great, do you think it will make 4.2?

 Thanks,

 Ilya
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph giant installation fails on rhel 7.0

2015-06-11 Thread Robert LeBlanc

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Have you configured and enabled the epel repo?
- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Jun 11, 2015 at 6:26 AM, Shambhu Rajak  wrote:
 I am trying to install ceph gaint on rhel 7.0, while installing
 ceph-common-0.87.2-0.el7.x86_64.rpm I am getting following dependency



 packages]$ sudo yum install ceph-common-0.87.2-0.el7.x86_64.rpm

 Loaded plugins: amazon-id, priorities, rhui-lb

 Examining ceph-common-0.87.2-0.el7.x86_64.rpm:
 1:ceph-common-0.87.2-0.el7.x86_64

 Marking ceph-common-0.87.2-0.el7.x86_64.rpm to be installed

 Resolving Dependencies

 -- Running transaction check

 --- Package ceph-common.x86_64 1:0.87.2-0.el7 will be installed

 -- Processing Dependency: libtcmalloc.so.4()(64bit) for package:
 1:ceph-common-0.87.2-0.el7.x86_64

 -- Finished Dependency Resolution

 Error: Package: 1:ceph-common-0.87.2-0.el7.x86_64
 (/ceph-common-0.87.2-0.el7.x86_64)

Requires: libtcmalloc.so.4()(64bit)

 You could try using --skip-broken to work around the problem

 You could try running: rpm -Va --nofiles –nodigest





 So I am trying install gperftools-libs to resolve the dependencies, but I am
 unable to get the package using yum install



 Can any one help me getting the complete list of dependencies to install
 ceph giant on rhel 7.0.



 Thanks,

 Shambhu




 

 PLEASE NOTE: The information contained in this electronic mail message is
 intended only for the use of the designated recipient(s) named above. If the
 reader of this message is not the intended recipient, you are hereby
 notified that you have received this message in error and that any review,
 dissemination, distribution, or copying of this message is strictly
 prohibited. If you have received this communication in error, please notify
 the sender by telephone or e-mail (as shown above) immediately and destroy
 any and all copies of this message in your possession (whether hard copies
 or electronically stored copies).


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJVeZv0CRDmVDuy+mK58QAAODEQALfE8XG5TTOSJGBl20O8
Ol47KjhHzAYtSTod3jFDJRqzpKDv61YKX4XAqOT3NrGiWP33hzWZiBQoYjlZ
LqsjmZyvOZfQVkS2v78C0oBmIqyFhND7Xp/U57CBGhhqMk/LS9P36eaYdywZ
MkYSqy+Jwk9Cg++VA4spbG+i6eu+Gp3vPJnJbzJ/3pLBK8K1wKz97qPuyztk
LSRIDejrdUPx355m2pAkzYVdVMaw41+FFiz4QJPCTfZx/Ya6oCboyBLFcNA0
vka2U3NmWHQDiT0MCOylTvDk4cqjX9bHFrqa0juNw0vxBwICoVa4JieLNlyE
CLPHUrqq3kfKHO18zapvc204uJuec/Ufm9bPW8cnXMg3m4izzUOQ5ZKDk5aL
K2WfTB0fozy099AsPnDOy4DYFkv2zIRTZngD4Gs0aIAQd7XIHv1SohEDve50
D6trpQW8BAzuQVaGh7pY9X1rWnRLeAdaXIW+X5aZlrhCZQDP4tTPvyL0VuGi
ncp1H5dnuZlfHXCSrMZUQoETgXAjhHl+ww+6RCccYzg2wEcT0TV7fXFKUkLU
jJlPMFOW0dqbt4NkxulBQrXKDPB/VsQFYegCcNPSyvOPsYs5iysy5u21tBz9
jfZjUA8qxaAGpQb2EnCGhsK3mWLV5ud3RqCQvBz7xiVeAXdQZEfcdKtlDjFi
Cpmr
=Ytyr
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] krbd splitting large IO's into smaller IO's

2015-06-11 Thread Ilya Dryomov

On Thu, Jun 11, 2015 at 5:30 PM, Nick Fisk n...@fisk.me.uk wrote:
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Ilya Dryomov
 Sent: 11 June 2015 12:33
 To: Nick Fisk
 Cc: ceph-users
 Subject: Re: [ceph-users] krbd splitting large IO's into smaller IO's

 On Thu, Jun 11, 2015 at 2:23 PM, Ilya Dryomov idryo...@gmail.com
 wrote:
  On Wed, Jun 10, 2015 at 7:07 PM, Ilya Dryomov idryo...@gmail.com
 wrote:
  On Wed, Jun 10, 2015 at 7:04 PM, Nick Fisk n...@fisk.me.uk wrote:
-Original Message-
From: Ilya Dryomov [mailto:idryo...@gmail.com]
Sent: 10 June 2015 14:06
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] krbd splitting large IO's into
smaller IO's

On Wed, Jun 10, 2015 at 2:47 PM, Nick Fisk n...@fisk.me.uk
 wrote:
 Hi,

 Using Kernel RBD client with Kernel 4.03 (I have also tried
 some older kernels with the same effect) and IO is being
 split into smaller IO's which is having a negative impact on
 performance.

 cat /sys/block/sdc/queue/max_hw_sectors_kb
 4096

 cat /sys/block/rbd0/queue/max_sectors_kb
 4096

 Using DD
 dd if=/dev/rbd0 of=/dev/null bs=4M

 Device: rrqm/s   wrqm/s r/s w/srkB/s
 wkB/s
  avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0  0.00 0.00  201.500.00 25792.00
 0.00
  256.00
 1.99   10.15   10.150.00   4.96 100.00

 Using FIO with 4M blocks
 Device: rrqm/s   wrqm/s r/s w/srkB/s
 wkB/s
  avgrq-sz
 avgqu-sz   await r_await w_await  svctm  %util
 rbd0  0.00 0.00  232.000.00 118784.00
 0.00
  1024.00
 11.29   48.58   48.580.00   4.31 100.00

 Any ideas why IO sizes are limited to 128k (256 blocks) in
 DD's case and 512k in Fio's case?

128k vs 512k is probably buffered vs direct IO - add
iflag=direct to your dd invocation.

Yes, thanks for this, that was the case

As for the 512k - I'm pretty sure it's a regression in our
switch to blk-mq.  I tested it around 3.18-3.19 and saw steady
4M IOs.  I hope we are just missing a knob - I'll take a look.

I've tested both 4.03 and 3.16 and both seem to be split into
 512k.
Let
  me
   know if you need me to test any other particular version.

   With 3.16 you are going to need to adjust max_hw_sectors_kb /
   max_sectors_kb as discussed in Dan's thread.  The patch that
   fixed that in the block layer went into 3.19, blk-mq into 4.0 - try
 3.19.

  Sorry should have mentioned, I had adjusted both of them on the
  3.16 kernel to 4096.
  I will try 3.19 and let you know.

  Better with 3.19, but should I not be seeing around 8192, or am I
  getting my blocks and bytes mixed up?

  Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
 avgrq-sz
  avgqu-sz   await r_await w_await  svctm  %util
  rbd0 72.00 0.00   24.000.00 49152.00 0.00
 4096.00
  1.96   82.67   82.670.00  41.58  99.80

  I'd expect 8192.  I'm getting a box for investigation.

  OK, so this is bug in the blk-mq part of block layer.  There is no
  plugging going on in the single hardware queue (i.e. krbd) case - it
  never once plugs the queue, and that means no request merging is done
  for your direct sequential read test.  It gets 512k bios and those
  same 512k requests are issued to krbd.

  While queue plugging may not make sense in the multi queue case, I'm
  pretty sure it's supposed to plug in the single queue case.  Looks
  like use_plug logic in blk_sq_make_request() is busted.

 It turns out to be a year old regression.  Before commit 07068d5b8ed8
 (blk-mq: split make request handler for multi and single queue) it used
 to
 be (reads are considered sync)

 use_plug = !is_flush_fua  ((q-nr_hw_queues == 1) || !is_sync);

 and now it is

 use_plug = !is_flush_fua  !is_sync;

 in a function that is only called if q-nr_hw_queues == 1.

 This is getting fixed by blk-mq: fix plugging in blk_sq_make_request
 from Jeff Moyer - http://article.gmane.org/gmane.linux.kernel/1941750.
 Looks like it's on its way to mainline along with some other blk-mq
 plugging
 fixes.

 That's great, do you think it will make 4.2?

Depends on Jens, but I think it will.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Load balancing RGW and Scaleout

2015-06-11 Thread Florent MONTHEL

Hum thanks David I will check corosync
And maybe Consul can be a solution ?

Sent from my iPhone

 On 11 juin 2015, at 11:33, David Moreau Simard dmsim...@iweb.com wrote:
 
 What I've seen work well is to set multiple A records for your RGW endpoint.
 Then, with something like corosync, you ensure that these multiple IP
 addresses are always bound somewhere.
 
 You can then have as many nodes in active-active mode as you want.
 
 -- 
 David Moreau Simard
 
 On 2015-06-11 11:29 AM, Florent MONTHEL wrote:
 Hi Team
 
 Is it possible for you to share your setup on radosgw in order to use 
 maximum of network bandwidth and to have no SPOF
 
 I have 5 servers on 10gb network and 3 radosgw on it
 We would like to setup Haproxy on 1 node with 3 rgw but :
 - SPOF become Haproxy node
 - Max bandwidth will be on HAproxy node (10gb/s)
 
 Thanks
 
 Sent from my iPhone
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] v9.0.1 released

2015-06-11 Thread Sage Weil

This development release is delayed a bit due to tooling changes in the 
build environment.  As a result the next one (v9.0.2) will have a bit more 
work than is usual.

Highlights here include lots of RGW Swift fixes, RBD feature work 
surrounding the new object map feature, more CephFS snapshot fixes, and a 
few important CRUSH fixes.

Notable Changes
---

* auth: cache/reuse crypto lib key objects, optimize msg signature check 
  (Sage Weil)
* build: allow tcmalloc-minimal (Thorsten Behrens)
* build: do not build ceph-dencoder with tcmalloc (#10691 Boris Ranto)
* build: fix pg ref disabling (William A. Kennington III)
* build: install-deps.sh improvements (Loic Dachary)
* build: misc fixes (Boris Ranto, Ken Dreyer, Owen Synge)
* ceph-authtool: fix return code on error (Gerhard Muntingh)
* ceph-disk: fix zap sgdisk invocation (Owen Synge, Thorsten Behrens)
* ceph-disk: pass --cluster arg on prepare subcommand (Kefu Chai)
* ceph-fuse, libcephfs: drop inode when rmdir finishes (#11339 Yan, Zheng)
* ceph-fuse,libcephfs: fix uninline (#11356 Yan, Zheng)
* ceph-monstore-tool: fix store-copy (Huangjun)
* common: add perf counter descriptions (Alyona Kiseleva)
* common: fix throttle max change (Henry Chang)
* crush: fix crash from invalid 'take' argument (#11602 Shiva Rkreddy, 
  Sage Weil)
* crush: fix divide-by-2 in straw2 (#11357 Yann Dupont, Sage Weil)
* deb: fix rest-bench-dbg and ceph-test-dbg dependendies (Ken Dreyer)
* doc: document region hostnames (Robin H. Johnson)
* doc: update release schedule docs (Loic Dachary)
* init-radosgw: run radosgw as root (#11453 Ken Dreyer)
* librados: fadvise flags per op (Jianpeng Ma)
* librbd: allow additional metadata to be stored with the image (Haomai 
  Wang)
* librbd: better handling for dup flatten requests (#11370 Jason Dillaman)
* librbd: cancel in-flight ops on watch error (#11363 Jason Dillaman)
* librbd: default new images to format 2 (#11348 Jason Dillaman)
* librbd: fast diff implementation that leverages object map (Jason 
  Dillaman)
* librbd: fix snapshot creation when other snap is active (#11475 Jason 
  Dillaman)
* librbd: new diff_iterate2 API (Jason Dillaman)
* librbd: object map rebuild support (Jason Dillaman)
* logrotate.d: prefer service over invoke-rc.d (#11330 Win Hierman, Sage 
  Weil)
* mds: avoid getting stuck in XLOCKDONE (#11254 Yan, Zheng)
* mds: fix integer truncateion on large client ids (Henry Chang)
* mds: many snapshot and stray fixes (Yan, Zheng)
* mds: persist completed_requests reliably (#11048 John Spray)
* mds: separate safe_pos in Journaler (#10368 John Spray)
* mds: snapshot rename support (#3645 Yan, Zheng)
* mds: warn when clients fail to advance oldest_client_tid (#10657 Yan, 
  Zheng)
* misc cleanups and fixes (Danny Al-Gaaf)
* mon: fix average utilization calc for 'osd df' (Mykola Golub)
* mon: fix variance calc in 'osd df' (Sage Weil)
* mon: improve callout to crushtool (Mykola Golub)
* mon: prevent bucket deletion when referenced by a crush rule (#11602 
  Sage Weil)
* mon: prime pg_temp when CRUSH map changes (Sage Weil)
* monclient: flush_log (John Spray)
* msgr: async: many many fixes (Haomai Wang)
* msgr: simple: fix clear_pipe (#11381 Haomai Wang)
* osd: add latency perf counters for tier operations (Xinze Chi)
* osd: avoid multiple hit set insertions (Zhiqiang Wang)
* osd: break PG removal into multiple iterations (#10198 Guang Yang)
* osd: check scrub state when handling map (Jianpeng Ma)
* osd: fix endless repair when object is unrecoverable (Jianpeng Ma, Kefu 
  Chai)
* osd: fix pg resurrection (#11429 Samuel Just)
* osd: ignore non-existent osds in unfound calc (#10976 Mykola Golub)
* osd: increase default max open files (Owen Synge)
* osd: prepopulate needs_recovery_map when only one peer has missing 
  (#9558 Guang Yang)
* osd: relax reply order on proxy read (#11211 Zhiqiang Wang)
* osd: skip promotion for flush/evict op (Zhiqiang Wang)
* osd: write journal header on clean shutdown (Xinze Chi)
* qa: run-make-check.sh script (Loic Dachary)
* rados bench: misc fixes (Dmitry Yatsushkevich)
* rados: fix error message on failed pool removal (Wido den Hollander)
* radosgw-admin: add 'bucket check' function to repair bucket index 
  (Yehuda Sadeh)
* rbd: allow unmapping by spec (Ilya Dryomov)
* rbd: deprecate --new-format option (Jason Dillman)
* rgw: do not set content-type if length is 0 (#11091 Orit Wasserman)
* rgw: don't use end_marker for namespaced object listing (#11437 Yehuda 
  Sadeh)
* rgw: fail if parts not specified on multipart upload (#11435 Yehuda 
  Sadeh)
* rgw: fix GET on swift account when limit == 0 (#10683 Radoslaw 
  Zarzynski)
* rgw: fix broken stats in container listing (#11285 Radoslaw Zarzynski)
* rgw: fix bug in domain/subdomain splitting (Robin H. Johnson)
* rgw: fix civetweb max threads (#10243 Yehuda Sadeh)
* rgw: fix copy metadata, support X-Copied-From for swift (#10663 Radoslaw 
  Zarzynski)
* rgw: fix locator for objects starting with _ (#11442 Yehuda Sadeh)
*

Re: [ceph-users] NFS interaction with RBD

2015-06-11 Thread Christian Schnidrig

Hi George

Well that’s strange. I wonder why our systems behave so differently.

We’ve got:

Hypervisors running on Ubuntu 14.04. 
VMs with 9 ceph volumes: 2TB each.
XFS instead of your ext4

Maybe the number of placement groups plays a major role as well. Jens-Christian 
may be able to give you the specifics of our ceph cluster. 
I’m about to leave on vacation and don’t have time to look that up anymore.

Best regards
Christian


On 29 May 2015, at 14:42, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote:

 All,
 
 I 've tried to recreate the issue without success!
 
 My configuration is the following:
 
 OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64)
 QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
 Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), 20x4TB 
 OSDs equally distributed on two disk nodes, 3xMonitors
 
 
 OpenStack Cinder has been configured to provide RBD Volumes from Ceph.
 
 I have created 10x 500GB Volumes which were then all attached at a single 
 Virtual Machine.
 
 All volumes were formatted two times for comparison reasons, one using 
 mkfs.xfs and one using mkfs.ext4.
 I did try to issue the commands all at the same time (or as possible to that).
 
 In both tests I didn't notice any interruption. It may took longer than just 
 doing one at a time but the system was continuously up and everything was 
 responding without the problem.
 
 At the time of these processes the open connections were 100 with one of the 
 OSD node and 111 with the other one.
 
 So I guess I am not experiencing the issue due to the low number of OSDs I am 
 having. Is my assumption correct?
 
 
 Best regards,
 
 George
 
 
 
 Thanks a million for the feedback Christian!
 
 I 've tried to recreate the issue with 10RBD Volumes mounted on a
 single server without success!
 
 I 've issued the mkfs.xfs command simultaneously (or at least as
 fast I could do it in different terminals) without noticing any
 problems. Can you please tell me what was the size of each one of the
 RBD Volumes cause I have a feeling that mine were two small, and if so
 I have to test it on our bigger cluster.
 
 I 've also thought that besides QEMU version it might also be
 important the underlying OS, so what was your testbed?
 
 
 All the best,
 
 George
 
 Hi George
 
 In order to experience the error it was enough to simply run mkfs.xfs
 on all the volumes.
 
 
 In the meantime it became clear what the problem was:
 
 ~ ; cat /proc/183016/limits
 ...
 Max open files1024 4096 files
 ..
 
 This can be changed by setting a decent value in
 /etc/libvirt/qemu.conf for max_files.
 
 Regards
 Christian
 
 
 
 On 27 May 2015, at 16:23, Jens-Christian Fischer
 jens-christian.fisc...@switch.ch wrote:
 
 George,
 
 I will let Christian provide you the details. As far as I know, it was 
 enough to just do a ‘ls’ on all of the attached drives.
 
 we are using Qemu 2.0:
 
 $ dpkg -l | grep qemu
 ii  ipxe-qemu   
 1.0.0+git-2013.c3d1e78-2ubuntu1   all  PXE boot firmware - ROM 
 images for qemu
 ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11  all
   QEMU keyboard maps
 ii  qemu-system 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries
 ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (arm)
 ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (common files)
 ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (mips)
 ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (miscelaneous)
 ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (ppc)
 ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (sparc)
 ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU full system emulation binaries (x86)
 ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11  amd64  
   QEMU utilities
 
 cheers
 jc
 
 --
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 26.05.2015, at 19:12, Georgios Dimitrakakis gior...@acmac.uoc.gr 
 wrote:
 
 Jens-Christian,
 
 how did you test that? Did you just tried to write to them 
 simultaneously? Any other tests that one can perform to verify that?
 
 In our installation we have a VM with 30 RBD volumes mounted which are 
 all exported via NFS to other VMs.
 No one has complaint for the moment but the load/usage is

Re: [ceph-users] Discuss: New default recovery config settings

2015-06-11 Thread huang jun

hi,jan

2015-06-01 15:43 GMT+08:00 Jan Schermer j...@schermer.cz:
 We had to disable deep scrub or the cluster would me unusable - we need to 
 turn it back on sooner or later, though.
 With minimal scrubbing and recovery settings, everything is mostly good. 
 Turned out many issues we had were due to too few PGs - once we increased 
 them from 4K to 16K everything sped up nicely (because the chunks are 
 smaller), but during heavy activity we are still getting some “slow IOs”.

How many PGs do you set ?  we get slow requests many times, but
didn't relate it to PG number.
And we follow the equation below for every pool:

(OSDs * 100)
Total PGs =  -
  pool size
our cluster has 157 OSDs and 3 POOLs, we set pg_num to  8192 for every pool,
but osd cpu utlity percentage is up to 300% after restart, we think
it's  loading pgs during the period.
and we will try different PG number when we get slow request

thanks!

 I believe there is an ionice knob in newer versions (we still run Dumpling), 
 and that should do the trick no matter how much additional “load” is put on 
 the OSDs.
 Everybody’s bottleneck will be different - we run all flash so disk IO is not 
 a problem but an OSD daemon is - no ionice setting will help with that, it 
 just needs to be faster ;-)

 Jan


 On 30 May 2015, at 01:17, Gregory Farnum g...@gregs42.com wrote:

 On Fri, May 29, 2015 at 2:47 PM, Samuel Just sj...@redhat.com wrote:
 Many people have reported that they need to lower the osd recovery config 
 options to minimize the impact of recovery on client io.  We are talking 
 about changing the defaults as follows:

 osd_max_backfills to 1 (from 10)
 osd_recovery_max_active to 3 (from 15)
 osd_recovery_op_priority to 1 (from 10)
 osd_recovery_max_single_start to 1 (from 5)

 I'm under the (possibly erroneous) impression that reducing the number
 of max backfills doesn't actually reduce recovery speed much (but will
 reduce memory use), but that dropping the op priority can. I'd rather
 we make users manually adjust values which can have a material impact
 on their data safety, even if most of them choose to do so.

 After all, even under our worst behavior we're still doing a lot
 better than a resilvering RAID array. ;)
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
thanks
huangjun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Hardware cache settings recomendation

2015-06-11 Thread Mateusz Skała

Hi,

Please help me with hardware cache settings on controllers for ceph rbd best
performance. All Ceph hosts have one SSD drive for journal.

 

We are using 4 different controllers, all with BBU: 

* HP Smart Array P400

* HP Smart Array P410i

* Dell PERC 6/i

* Dell  PERC H700

 

I have to set cache policy, on Dell settings are:

* Read Policy 

o   Read-Ahead (current)

o   No-Read-Ahead

o   Adaptive Read-Ahead

* Write Policy 

o   Write-Back (current)

o   Write-Through 

* Cache Policy

o   Cache I/O

o   Direct I/O (current)

* Disk Cache Policy

o   Default (current)

o   Enabled

o   Disabled

On HP controllers:

* Cache Ratio (current: 25% Read / 75% Write)

* Drive Write Cache

o   Enabled (current)

o   Disabled

 

And there is one more setting in LogicalDrive option:

* Caching: 

o   Enabled (current)

o   Disabled

 

Please verify my settings and give me some recomendations. 

Best regards,

Mateusz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] ceph mount error

2015-06-11 Thread 张忠波

Hi ,
My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer to  
the CEPH FS QUICK START guide .
http://ceph.com/docs/master/start/quick-cephfs/
however , I got a error when i use the command ,  mount -t ceph 
192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid 
argument 
I refer to munual , and now , I don't know how to solve it . 
I am looking forward to your reply !

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] query on ceph-deploy command

2015-06-11 Thread Vivek B

Hi,

I am trying to deploy ceph-hammer on 4 nodes(admin, monitor and 2 OSD's).
MY servers are behind a proxy server, so when I need to run an apt-get
update I need to export our proxy server.

When I run the command ceph-deploy install osd1 osd2 mon1, since all
three nodes are behind the proxy the command fails with the below message:
[osd1][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 9 not
upgraded.
[osd1][INFO  ] Running command: sudo wget -O release.asc
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[osd1][WARNIN] --2015-05-20 11:07:41--
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[osd1][WARNIN] Resolving ceph.com (ceph.com)... 208.113.241.137,
2607:f298:4:147::b05:fe2a
[osd1][WARNIN] Connecting to ceph.com (ceph.com)|208.113.241.137|:443...
failed: Connection timed out.
[osd1][WARNIN] command returned non-zero exit status: 4
[osd1][INFO  ] Running command: sudo apt-key add release.asc
[osd1][WARNIN] gpg: no valid OpenPGP data found.
[osd1][ERROR ] RuntimeError: command returned non-zero exit status: 2
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: apt-key add
release.asc

Request your help in solving above. How do I set a proxy for the user so
that I am able to connect to ceph.com to download the file.

Thanks!


-- 
Best Regards
B.Vivek
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Error in sys.exitfunc

2015-06-11 Thread 张忠波

OS:  CentOS release 6.6 (Final)
kernel : 3.10.77-1.el6.elrepo.x86_64
Installed:   ceph-deploy.noarch 0:1.5.23-0
Dependency Installed:python-argparse.noarch 0:1.2.1-2.el6.centos

I install the ceph-deploy refer to the manual  ,
http://ceph.com/docs/master/start/quick-start-preflight/  .  However , When
I run ceph-deploy , error will appear , Error in sys.exitfunc:  .   how
to solve it ?

 I find the same error message with me in the web ,
http://www.spinics.net/lists/ceph-devel/msg21388.html , but I cannot
find the way to solve this problem .

 I am looking forward for your reply !
 Best wishes!

zhongbo


error message:

[root@node1 ~]# ceph-deploy
usage: ceph-deploy [-h] [-v | -q] [--version] [--username USERNAME]
   [--overwrite-conf] [--cluster NAME] [--ceph-conf
CEPH_CONF]
   COMMAND ...

Easy Ceph deployment

-^-
   /   \
   |O o|  ceph-deploy v1.5.23
   ).-.(
  '/|||\`
  | '|` |
'|`

Full documentation can be found at: http://ceph.com/ceph-deploy/docs

optional arguments:
  -h, --helpshow this help message and exit
  -v, --verbose be more verbose
  -q, --quiet   be less verbose
  --version the current installed version of ceph-deploy
  --username USERNAME   the username to connect to the remote host
  --overwrite-conf  overwrite an existing conf file on remote host (if
present)
  --cluster NAMEname of the cluster
  --ceph-conf CEPH_CONF
use (or reuse) a given ceph.conf file

commands:
  COMMAND   description
new Start deploying a new cluster, and write a
CLUSTER.conf and keyring for it.
install Install Ceph packages on remote hosts.
rgw Deploy ceph RGW on remote hosts.
mds Deploy ceph MDS on remote hosts.
mon Deploy ceph monitor on remote hosts.
gatherkeys  Gather authentication keys for provisioning new
nodes.
diskManage disks on a remote host.
osd Prepare a data disk on remote host.
admin   Push configuration and client.admin key to a remote
host.
config  Push configuration file to a remote host.
uninstall   Remove Ceph packages from remote hosts.
purgedata   Purge (delete, destroy, discard, shred) any Ceph
data
from /var/lib/ceph
purge   Remove Ceph packages from remote hosts and purge all
data.
forgetkeys  Remove authentication keys from the local directory.
pkg Manage packages on remote hosts.
calamariInstall and configure Calamari nodes
Error in sys.exitfunc:
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] [Qemu-devel] rbd cache + libvirt

2015-06-11 Thread Stefan Hajnoczi

On Mon, Jun 08, 2015 at 07:49:15PM +0300, Andrey Korolyov wrote:
 On Mon, Jun 8, 2015 at 6:50 PM, Jason Dillaman dilla...@redhat.com wrote:
  Hmm ... looking at the latest version of QEMU, it appears that the RBD 
  cache settings are changed prior to reading the configuration file instead 
  of overriding the value after the configuration file has been read [1].  
  Try specifying the path to a new configuration file via the 
  conf=/path/to/my/new/ceph.conf QEMU parameter where the RBD cache is 
  explicitly disabled  [2].
 
 
  [1] 
  http://git.qemu.org/?p=qemu.git;a=blob;f=block/rbd.c;h=fbe87e035b12aab2e96093922a83a3545738b68f;hb=HEAD#l478
  [2] http://ceph.com/docs/master/rbd/qemu-rbd/#usage
 
 
 Actually the mentioned snippet presumes *expected* behavior with
 cache=xxx driving the overall cache behavior. Probably the pass itself
 (from cache=none to proper bitmask values in a backend properties) is
 broken in some way.
 
 CCing qemu-devel for possible bug.

CCing Josh Durgin and Jeff Cody for block/rbd.c


pgp0qama9SeEj.pgp
Description: PGP signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Error in sys.exitfunc

2015-06-11 Thread 张忠波



Thank you for your reply . 
I encountered other problems when i install ceph .
#1. When i run the command ,   ceph-deploy new ceph-0, and got the  
ceph.conf   file . However , there is not any information aboutosd pool 
default size or public network .
[root@ceph-2 my-cluster]# more ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.72.33
mon_initial_members = ceph-0
fsid = 74d682b5-2bf2-464c-8462-740f96bcc525


#2.  I ignore the problem #1 , and continue to  set us the Ceph Storage Cluster 
, encountered a error  , whhen run the command  ' ceph-deploy osd activate  
ceph-2:/mnt/sda ' .
I do it refer to the manual , 
http://ceph.com/docs/master/start/quick-ceph-deploy/
error message
[root@ceph-0 my-cluster]#ceph-deploy osd prepare ceph-2:/mnt/sda
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy osd prepare 
ceph-2:/mnt/sda
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks ceph-2:/mnt/sda:
[ceph-2][DEBUG ] connected to host: ceph-2
[ceph-2][DEBUG ] detect platform information from remote host
[ceph-2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] Deploying osd to ceph-2
[ceph-2][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph-2][INFO  ] Running command: udevadm trigger --subsystem-match=block 
--action=add
[ceph_deploy.osd][DEBUG ] Preparing host ceph-2 disk /mnt/sda journal None 
activate False
[ceph-2][INFO  ] Running command: ceph-disk -v prepare --fs-type xfs --cluster 
ceph -- /mnt/sda
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_mount_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=osd_journal_size
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-conf 
--cluster=ceph --name=osd. --lookup osd_dmcrypt_type
[ceph-2][WARNIN] DEBUG:ceph-disk:Preparing osd data dir /mnt/sda
[ceph-2][INFO  ] checking OSD status...
[ceph-2][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
[ceph_deploy.osd][DEBUG ] Host ceph-2 is now ready for osd use.
Error in sys.exitfunc:
[root@ceph-0 my-cluster]# ceph-deploy osd activate  ceph-2:/mnt/sda
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.23): /usr/bin/ceph-deploy osd activate 
ceph-2:/mnt/sda
[ceph_deploy.osd][DEBUG ] Activating cluster ceph disks ceph-2:/mnt/sda:
[ceph-2][DEBUG ] connected to host: ceph-2
[ceph-2][DEBUG ] detect platform information from remote host
[ceph-2][DEBUG ] detect machine type
[ceph_deploy.osd][INFO  ] Distro info: CentOS 6.5 Final
[ceph_deploy.osd][DEBUG ] activating host ceph-2 disk /mnt/sda
[ceph_deploy.osd][DEBUG ] will use init type: sysvinit
[ceph-2][INFO  ] Running command: ceph-disk -v activate --mark-init sysvinit 
--mount /mnt/sda
[ceph-2][WARNIN] DEBUG:ceph-disk:Cluster uuid is 
af23707d-325f-4846-bba9-b88ec953be80
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd 
--cluster=ceph --show-config-value=fsid
[ceph-2][WARNIN] DEBUG:ceph-disk:Cluster name is ceph
[ceph-2][WARNIN] DEBUG:ceph-disk:OSD uuid is 
ca9f6649-b4b8-46ce-a860-1d81eed4fd5e
[ceph-2][WARNIN] DEBUG:ceph-disk:Allocating OSD id...
[ceph-2][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph 
--name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/  
   ceph.keyring osd create --concise 
ca9f6649-b4b8-46ce-a860-1d81eed4fd5e
[ceph-2][WARNIN] 2015-05-14 17:37:10.988914 7f373bd34700  0 librados: 
client.bootstrap-osd authentication error (1) Operation not permitted
[ceph-2][WARNIN] Error connecting to cluster: PermissionError
[ceph-2][WARNIN] ceph-disk: Error: ceph osd create failed: Command 
'/usr/bin/ceph' returned non-zero exit status 1:
[ceph-2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v 
activate --mark-init sysvinit --mount /mnt/sda


Error in

[ceph-users] Is Ceph right for me?

2015-06-11 Thread Trevor Robinson - Key4ce

Hello,

Could somebody please advise me if Ceph is suitable for our use?

We are looking for a file system which is able to work over different locations 
which are connected by VPN. If one locations was to go offline then the 
filesystem will stay online at both sites and then once connection is regained 
the latest file version will take priority.

The main use will be for website files so the changes are most likely to be any 
uploaded files and cache files as a lot of the data will be stored in a SQL 
database which is already replicated.

With Kind Regards,
Trevor Robinson

CTO at Key4ce
[Key4ce - IT Professionals]https://key4ce.com/

Skype:  KeyMalus.Trev
xmpp:  t.robin...@im4ce.com
Livechat:  http://livechat.key4ce.comhttp://livechat.key4ce.com/

NL:  +31 (0)40 290 3310
UK:  +44 (0)1332 898 999
CN:  +86 (0)7552 824 5985



The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] TR: High apply latency on OSD causes poor performance on VM

2015-06-11 Thread Franck Allouis

Hi,

Could you take a look on my problem.
It's about high latency on my OSDs on HP G8 servers (ceph01, ceph02 and ceph03).
When I run a rados bench for 60 sec, the results are surprising : after a few 
seconds,  there is no traffic, then it's resume, etc.
Finally, the maximum latency is high and VM's disks freeze lot.

#rados bench -p pool-test-g8 60 write
Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 
objects
Object prefix: benchmark_data_ceph02_56745
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
 0   0 0 0 0 0 - 0
 1  168266   263.959   264 0.0549584  0.171148
 2  16   134   118235.97   208  0.344873  0.232103
 3  16   189   173   230.639   220  0.015583   0.24581
 4  16   248   232   231.973   236 0.0704699  0.252504
 5  16   306   290   231.974   232 0.0229872  0.258343
 6  16   371   355236.64   260   0.27183  0.255469
7  16   419   403230.26   192 0.0503492  0.263304
 8  16   460   444   221.975   164 0.0157241  0.261779
 9  16   506   490   217.754   184  0.199418  0.271501
10  16   518   502   200.77848 0.0472324  0.269049
11  16   518   502   182.526 0 -  0.269049
12  16   556   540   179.98176  0.100336  0.301616
13  16   607   591   181.827   204  0.173912  0.346105
14  16   655   639   182.552   192 0.0484904  0.339879
15  16   683   667   177.848   112 0.0504184  0.349929
16  16   746   730   182.481   252  0.276635  0.347231
17  16   807   791   186.098   244  0.391491  0.339275
18  16   845   829   184.203   152  0.188608  0.342021
19  16   850   834   175.56120  0.960175  0.342717
2015-05-28 17:09:48.397376min lat: 0.013532 max lat: 6.28387 avg lat: 0.346987
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
20  16   859   843   168.58236 0.0182246  0.346987
21  16   863   847   161.31616   3.18544  0.355051
22  16   897   881   160.165   136 0.0811037  0.371209
23  16   901   885   153.89716 0.0482124  0.370793
24  16   943   927   154.484   168   0.63064  0.397204
25  15   997   982   157.104   220 0.0933448  0.392701
26  16  1058  1042   160.291   240  0.166463  0.385943
27  16  1088  1072   158.798   120   1.63882  0.388568
28  16  1125  1109   158.412   148 0.0511479   0.38419
29  16  1155  1139   157.087   120  0.162266  0.385898
30  16  1163  1147   152.91732 0.0682181  0.383571
31  16  1190  1174   151.468   108 0.0489185  0.386665
32  16  1196  1180   147.48524   2.95263  0.390657
33  16  1213  1197   145.07668 0.0467788  0.389299
34  16  1265  1249   146.926   208 0.0153085  0.420687
35  16  1332  1316   150.384   268 0.0157061   0.42259
36  16  1374  1358   150.873   168  0.251626  0.417373
37  16  1402  1386   149.822   112 0.0475302  0.413886
38  16  1444  1428 150.3   168 0.0507577  0.421055
39  16  1500  1484   152.189   224 0.0489163  0.416872
2015-05-28 17:10:08.399434min lat: 0.013532 max lat: 9.26596 avg lat: 0.415296
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
   40  16  1530  1514   151.384   120  0.951713  0.415296
41  16  1551  1535   149.74184 0.0686787  0.416571
42  16  1606  1590   151.413   220 0.0826855   0.41684
43  16  1656  1640   152.542   200 0.0706539  0.409974
44  16  1663  1647   149.71228  0.046672  0.408476
45  16  1685  1669148.3488 0.0989566  0.424918
46  16  1707  1691   147.02888 0.0490569  0.421116
47  16  1707  1691 143.9 0 -  0.421116
48  16  1707  1691   140.902 0 -  0.421116
49  16  1720  1704   139.088   17. 0.0480335  0.428997
50  16  1752  1736   138.866   128  0.0532190.4416
51  16  1786  1770   138.809   136  0.602946  0.440357
52  16  1810  1794   137.98696 0.0472518  0.438376
53  16  1831  1815   136.96784 0.0148999  0.446801
54  16  1831  1815134.43 0 -  0.446801

Re: [ceph-users] NFS interaction with RBD

2015-06-11 Thread Christian Schnidrig

Hi George

In order to experience the error it was enough to simply run mkfs.xfs on all 
the volumes.


In the meantime it became clear what the problem was:

 ~ ; cat /proc/183016/limits
...
Max open files1024 4096 files
..

This can be changed by setting a decent value in /etc/libvirt/qemu.conf for 
max_files.

Regards
Christian



On 27 May 2015, at 16:23, Jens-Christian Fischer 
jens-christian.fisc...@switch.ch wrote:

 George,
 
 I will let Christian provide you the details. As far as I know, it was enough 
 to just do a ‘ls’ on all of the attached drives.
 
 we are using Qemu 2.0:
 
 $ dpkg -l | grep qemu
 ii  ipxe-qemu   1.0.0+git-2013.c3d1e78-2ubuntu1   
 all  PXE boot firmware - ROM images for qemu
 ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11
 all  QEMU keyboard maps
 ii  qemu-system 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries
 ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (arm)
 ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (common files)
 ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (mips)
 ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (miscelaneous)
 ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (ppc)
 ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (sparc)
 ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11
 amd64QEMU full system emulation binaries (x86)
 ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11
 amd64QEMU utilities
 
 cheers
 jc
 
 -- 
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 26.05.2015, at 19:12, Georgios Dimitrakakis gior...@acmac.uoc.gr wrote:
 
 Jens-Christian,
 
 how did you test that? Did you just tried to write to them simultaneously? 
 Any other tests that one can perform to verify that?
 
 In our installation we have a VM with 30 RBD volumes mounted which are all 
 exported via NFS to other VMs.
 No one has complaint for the moment but the load/usage is very minimal.
 If this problem really exists then very soon that the trial phase will be 
 over we will have millions of complaints :-(
 
 What version of QEMU are you using? We are using the one provided by Ceph in 
 qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm
 
 Best regards,
 
 George
 
 I think we (i.e. Christian) found the problem:
 
 We created a test VM with 9 mounted RBD volumes (no NFS server). As
 soon as he hit all disks, we started to experience these 120 second
 timeouts. We realized that the QEMU process on the hypervisor is
 opening a TCP connection to every OSD for every mounted volume -
 exceeding the 1024 FD limit.
 
 So no deep scrubbing etc, but simply to many connections…
 
 cheers
 jc
 
 --
 SWITCH
 Jens-Christian Fischer, Peta Solutions
 Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
 phone +41 44 268 15 15, direct +41 44 268 15 71
 jens-christian.fisc...@switch.ch [3]
 http://www.switch.ch
 
 http://www.switch.ch/stories
 
 On 25.05.2015, at 06:02, Christian Balzer  wrote:
 
 Hello,
 
 lets compare your case with John-Paul's.
 
 Different OS and Ceph versions (thus we can assume different NFS
 versions
 as well).
 The only common thing is that both of you added OSDs and are likely
 suffering from delays stemming from Ceph re-balancing or
 deep-scrubbing.
 
 Ceph logs will only pipe up when things have been blocked for more
 than 30
 seconds, NFS might take offense to lower values (or the accumulation
 of
 several distributed delays).
 
 You added 23 OSDs, tell us more about your cluster, HW, network.
 Were these added to the existing 16 nodes, are these on new storage
 nodes
 (so could there be something different with those nodes?), how busy
 is your
 network, CPU.
 Running something like collectd to gather all ceph perf data and
 other
 data from the storage nodes and then feeding it to graphite (or
 similar)
 can be VERY helpful to identify if something is going wrong and what
 it is
 in particular.
 Otherwise run atop on your storage nodes to identify if CPU,
 network,
 specific HDDs/OSDs are bottlenecks.
 
 Deep scrubbing can be _very_ taxing, do your problems persist if
 inject
 into your running cluster an osd_scrub_sleep value of 0.5 (lower

Re: [ceph-users] xfs corruption, data disaster!

2015-06-11 Thread Eric Sandeen

On 5/11/15 9:47 AM, Ric Wheeler wrote:
On 05/05/2015 04:13 AM, Yujian Peng wrote:
Emmanuel Florac eflorac@... writes:

Le Mon, 4 May 2015 07:00:32 + (UTC)
Yujian Peng pengyujian5201314 at 126.com écrivait:

I'm encountering a data disaster. I have a ceph cluster with 145 osd.
The data center had a power problem yesterday, and all of the ceph
nodes were down. But now I find that 6 disks(xfs) in 4 nodes have
data corruption. Some disks are unable to mount, and some disks have
IO errors in syslog. mount: Structure needs cleaning
xfs_log_forece: error 5 returned
I tried to repair one with xfs_repair -L /dev/sdx1, but the ceph-osd
reported a leveldb error:
Error initializing leveldb: Corruption: checksum mismatch
I cannot start the 6 osds and 22 pgs is down.
This is really a tragedy for me. Can you give me some idea to
recovery the xfs? Thanks very much!
For XFS problems, ask the XFS ML: xfs at oss.sgi.com

You didn't give enough details, by far. What version of kernel and
distro are you running? If there were errors, please post extensive
logs. If you have IO errors on some disks, you probably MUST replace
them before going any further.

Why did you run xfs_repair -L ? Did you try xfs_repair without options
first? Were you running the very very latest version of xfs_repair
(3.2.2) ?

The OS is ubuntu 12.04.5 with kernel 3.13.0
uname -a
Linux ceph19 3.13.0-32-generic #57~precise1-Ubuntu SMP Tue Jul 15 03:51:20
UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
cat /etc/issue
Ubuntu 12.04.5 LTS \n \l
xfs_repair -V
xfs_repair version 3.1.7
I've tried xfs_repair without options, but it showed me some errors, so I
used the -L option.
Thanks for your reply!

Responding quickly to a couple of things:

* xfs_repair -L wipes out the XFS log, not normally a good thing to do

And if required due to an unreplayable log, often indicates some problem
with the storage system. For example a volatile write cache not synced as
needed, and lost along with a power loss, leading to a corrupted and
unreplayable XFS log.

* replacing disks with IO errors is not a great idea if you still
need that data. You might want to copy the data from that disk to a
new disk (same or greater size) and then try to repair that new disk.
A lot depends on the type of IO error you see - you might have cable
issues, HBA issues, or fairly normal read issues (which are not worth
replacing a disk for).

Just a note that XFS sometimes starts saying IO error when the filesystem
has shut down; this isn't the same as a block-device-level IO error, but you
haven't posted logs or anything, so I'm just guessing here.

http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

-Eric

You should work with your vendor's support team if you have a support
contract or post the the XFS devel list (copied above) for help.

Good luck!

Ric

___
xfs mailing list
x...@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] umount stuck on NFS gateways switch over by using Pacemaker

2015-06-11 Thread WD_Hwang

Hello,
  I am testing NFS over RBD recently. I am trying to build the NFS HA 
environment under Ubuntu 14.04 for testing, and the packages version 
information as follows:
- Ubuntu 14.04 : 3.13.0-32-generic(Ubuntu 14.04.2 LTS)
- ceph : 0.80.9-0ubuntu0.14.04.2
- ceph-common : 0.80.9-0ubuntu0.14.04.2
- pacemaker (git20130802-1ubuntu2.3)
- corosync (2.3.3-1ubuntu1)
PS: I also tried ceph/ceph-common(0.87.1-1trusty and 0.87.2-1trusty) on 
3.13.0-48-generic(Ubuntu 14.04.2) server and I got same situations.

  The environment has 5 nodes int the Ceph cluster (3 MONs and 5 OSDs) and two 
NFS gateway (nfs1 and nfs2) for high availability. I issued the command, 'sudo 
service pacemaker stop', on 'nfs1' to force these resources stopped and 
transferred to 'nfs2', and vice versa.

When the two nodes are up, I issue 'sudo service pacemaker stop' on one node, 
the other node will take over all resources. Everything looks fine. Then I wait 
about 30 minutes and do nothing to the NFS gateways. I repeated the previous 
steps to test fail over procedure. I found the process code of 'umount' is 'D' 
(uninterruptible sleep), the 'ps' showed the following result

root 21047 0.0 0.0 17412 952 ? D 16:39 0:00 umount /mnt/block1

Have any idea to solve or work around? Because of 'umount' stuck, both 'reboot' 
and 'shutdown' command can't work well. So if I don't wait 20 minutes for 
'umount' time out, the only way I can do is powering off the server directly.
Any help would be much appreciated.

I attached my configurations and loggings as follows.


Pacemaker configurations:

crm configure primitive p_rbd_map_1 ocf:ceph:rbd.in \
params user=admin pool=block_data name=data01 
cephconf=/etc/ceph/ceph.conf \
op monitor interval=10s timeout=20s

crm configure primitive p_fs_rbd_1 ocf:heartbeat:Filesystem \
params directory=/mnt/block1 fstype=xfs device=/dev/rbd1 \
fast_stop=no options=noatime,nodiratime,nobarrier,inode64 \
op monitor interval=20s timeout=40s \
op start interval=0 timeout=60s \
op stop interval=0 timeout=60s

crm configure primitive p_export_rbd_1 ocf:heartbeat:exportfs \
params directory=/mnt/block1 clientspec=10.35.64.0/24 
options=rw,async,no_subtree_check,no_root_squash fsid=1 \
op monitor interval=10s timeout=20s \
op start interval=0 timeout=40s

crm configure primitive p_vip_1 ocf:heartbeat:IPaddr2 \
params ip=10.35.64.90 cidr_netmask=24 \
op monitor interval=5

crm configure primitive p_nfs_server lsb:nfs-kernel-server \
op monitor interval=10s timeout=30s

crm configure primitive p_rpcbind upstart:rpcbind \
op monitor interval=10s timeout=30s

crm configure group g_rbd_share_1 p_rbd_map_1 p_fs_rbd_1 p_export_rbd_1 p_vip_1 
\
meta target-role=Started

crm configure group g_nfs p_rpcbind p_nfs_server \
meta target-role=Started

crm configure clone clo_nfs g_nfs \
meta globally-unique=false target-role=Started


'crm_mon' status results for normal condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1
p_export_rbd_1 (ocf::heartbeat:exportfs): Started nfs1
p_vip_1 (ocf::heartbeat:IPaddr2): Started nfs1
Clone Set: clo_nfs [g_nfs]
Started: [ nfs1 nfs2 ]

'crm_mon' status results for fail over condition:
Online: [ nfs1 nfs2 ]

Resource Group: g_rbd_share_1
p_rbd_map_1 (ocf::ceph:rbd.in): Started nfs1
p_fs_rbd_1 (ocf::heartbeat:Filesystem): Started nfs1 (unmanaged) FAILED
p_export_rbd_1 (ocf::heartbeat:exportfs): Stopped
p_vip_1 (ocf::heartbeat:IPaddr2): Stopped
Clone Set: clo_nfs [g_nfs]
Started: [ nfs2 ]
Stopped: [ nfs1 ]

Failed actions:
p_fs_rbd_1_stop_0 (node=nfs1, call=114, rc=1, status=Timed Out, 
last-rc-change=Wed May 13 16:39:10 2015, queued=60002ms, exec=1ms
): unknown error


'demsg' messages:

[ 9470.284509] nfsd: last server has exited, flushing export cache
[ 9470.322893] init: rpcbind main process (4267) terminated with status 2
[ 9600.520281] INFO: task umount:2675 blocked for more than 120 seconds.
[ 9600.520445] Not tainted 3.13.0-32-generic #57-Ubuntu
[ 9600.520570] echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this 
message.
[ 9600.520792] umount D 88003fc13480 0 2675 1 0x
[ 9600.520800] 88003a4f9dc0 0082 880039ece000 
88003a4f9fd8
[ 9600.520805] 00013480 00013480 880039ece000 
880039ece000
[ 9600.520809] 88003fc141a0 0001  
88003a377928
[ 9600.520814] Call Trace:
[ 9600.520830] [817251a9] schedule+0x29/0x70
[ 9600.520882] [a043b300] _xfs_log_force+0x220/0x280 [xfs]
[ 9600.520891] [8109a9b0] ? wake_up_state+0x20/0x20
[ 9600.520922] [a043b386] xfs_log_force+0x26/0x80 [xfs]
[ 9600.520947] [a03f3b6d] xfs_fs_sync_fs+0x2d/0x50 [xfs]
[ 9600.520954]

[ceph-users] radosgw backup

2015-06-11 Thread Konstantin Ivanov

Hi everyone.
I'm wondering - is there way to backup radosgw data?
What i already tried.
create backup pool - copy .rgw.buckets to backup pool. Then i delete
object via s3 client. And then i copy data from backup pool to
.rgw.buckets. I still can't see object in s3 client, but can get it via
http by early known url.
Questions: where radosgw stores info about objects - (how to make restored
object visible from s3 client)? is there best way for backup data for
radosgw?
Thanks for any advises.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Hardware cache settings recomendation

2015-06-11 Thread Tyler Bishop

You want write cache to disk, no write cache for SSD. 

I assume all of your data disk are single drive raid 0? 







Tyler Bishop 
Chief Executive Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: Mateusz Skała mateusz.sk...@budikom.net 
To: ceph-users@lists.ceph.com 
Sent: Saturday, June 6, 2015 4:09:59 AM 
Subject: [ceph-users] Hardware cache settings recomendation 



Hi, 

Please help me with hardware cache settings on controllers for ceph rbd best 
performance. All Ceph hosts have one SSD drive for journal. 



We are using 4 different controllers, all with BBU: 

· HP Smart Array P400 

· HP Smart Array P410i 

· Dell PERC 6/i 

· Dell PERC H700 



I have to set cache policy, on Dell settings are: 

· Read Policy 

o Read-Ahead (current) 

o No-Read-Ahead 

o Adaptive Read-Ahead 

· Write Policy 

o Write-Back (current) 

o Write-Through 

· Cache Policy 

o Cache I/O 

o Direct I/O (current) 

· Disk Cache Policy 

o Default (current) 

o Enabled 

o Disabled 

On HP controllers: 

· Cache Ratio (current: 25% Read / 75% Write) 

· Drive Write Cache 

o Enabled (current) 

o Disabled 



And there is one more setting in LogicalDrive option: 

· Caching: 

o Enabled (current) 

o Disabled 



Please verify my settings and give me some recomendations. 

Best regards, 

Mateusz 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Nginx access ceph

2015-06-11 Thread Ram Chander

Hi,

I am trying to setup nginx to access  html files in ceph buckets.
I have setup -  https://github.com/anomalizer/ngx_aws_auth

Below is the nginx config . When I try to access

http://hostname:8080/test/b.html - shows signature mismatch.
http://hostname:8080/b.html - shows signature mismatch.

I could see the request passed from nginx to  ceph in ceph logs.



server {
listen   8080;
server_name  localhost;



location / {

   proxy_pass http://10.84.182.80:8080;

   aws_access_key GMO31LL1LECV1RH4T71K;
   aws_secret_key aXEf9e1Aq85VTz7Q5tkXeq4qZaEtnYP04vSTIFBB;
   s3_buckettest;
   set $url_full '$1';
   chop_prefix /test;

proxy_set_header Authorization $s3_auth_token;
  proxy_set_header x-amz-date $aws_date;
}
}

I have set ceph bucket as public ( not private ).
Request to kindly  help.



Regards,
Ram
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] v0.94.2 Hammer released

2015-06-11 Thread Sage Weil

This Hammer point release fixes a few critical bugs in RGW that can 
prevent objects starting with underscore from behaving properly and that 
prevent garbage collection of deleted objects when using the Civetweb 
standalone mode.

All v0.94.x Hammer users are strongly encouraged to upgrade, and to make 
note of the repair procedure below if RGW is in use.

Upgrading from previous Hammer release
--

Bug #11442 introduced a change that made rgw objects that start with 
underscore incompatible with previous versions. The fix to that bug 
reverts to the previous behavior. In order to be able to access objects 
that start with an underscore and were created in prior Hammer releases, 
following the upgrade it is required to run (for each affected bucket)::

$ radosgw-admin bucket check --check-head-obj-locator \
 --bucket=bucket [--fix]

You can get a list of buckets with

$ radosgw-admin bucket list

Notable changes
---

* build: compilation error: No high-precision counter available  (armhf, 
  powerpc..) (#11432, James Page)
* ceph-dencoder links to libtcmalloc, and shouldn't (#10691, Boris Ranto)
* ceph-disk: disk zap sgdisk invocation (#11143, Owen Synge)
* ceph-disk: use a new disk as journal disk,ceph-disk prepare fail 
  (#10983, Loic Dachary)
* ceph-objectstore-tool should be in the ceph server package (#11376, Ken 
  Dreyer)
* librados: can get stuck in redirect loop if osdmap epoch == 
  last_force_op_resend (#11026, Jianpeng Ma)
* librbd: A retransmit of proxied flatten request can result in -EINVAL 
  (Jason Dillaman)
* librbd: ImageWatcher should cancel in-flight ops on watch error (#11363, 
  Jason Dillaman)
* librbd: Objectcacher setting max object counts too low (#7385, Jason 
  Dillaman)
* librbd: Periodic failure of TestLibRBD.DiffIterateStress (#11369, Jason 
  Dillaman)
* librbd: Queued AIO reference counters not properly updated (#11478, 
  Jason Dillaman)
* librbd: deadlock in image refresh (#5488, Jason Dillaman)
* librbd: notification race condition on snap_create (#11342, Jason 
  Dillaman)
* mds: Hammer uclient checking (#11510, John Spray)
* mds: remove caps from revoking list when caps are voluntarily released 
  (#11482, Yan, Zheng)
* messenger: double clear of pipe in reaper (#11381, Haomai Wang)
* mon: Total size of OSDs is a maginitude less than it is supposed to be. 
  (#11534, Zhe Zhang)
* osd: don't check order in finish_proxy_read (#11211, Zhiqiang Wang)
* osd: handle old semi-deleted pgs after upgrade (#11429, Samuel Just)
* osd: object creation by write cannot use an offset on an erasure coded 
  pool (#11507, Jianpeng Ma)
* rgw: Improve rgw HEAD request by avoiding read the body of the first 
  chunk (#11001, Guang Yang)
* rgw: civetweb is hitting a limit (number of threads 1024) (#10243, 
  Yehuda Sadeh)
* rgw: civetweb should use unique request id (#10295, Orit Wasserman)
* rgw: critical fixes for hammer (#11447, #11442, Yehuda Sadeh)
* rgw: fix swift COPY headers (#10662, #10663, #11087, #10645, Radoslaw 
  Zarzynski)
* rgw: improve performance for large object  (multiple chunks) GET 
  (#11322, Guang Yang)
* rgw: init-radosgw: run RGW as root (#11453, Ken Dreyer)
* rgw: keystone token cache does not work correctly (#11125, Yehuda Sadeh)
* rgw: make quota/gc thread configurable for starting (#11047, Guang Yang)
* rgw: make swift responses of RGW return last-modified, content-length, 
  x-trans-id headers.(#10650, Radoslaw Zarzynski)
* rgw: merge manifests correctly when there's prefix override (#11622, 
  Yehuda Sadeh)
* rgw: quota not respected in POST object (#11323, Sergey Arkhipov)
* rgw: restore buffer of multipart upload after EEXIST (#11604, Yehuda 
  Sadeh)
* rgw: shouldn't need to disable rgw_socket_path if frontend is configured 
  (#11160, Yehuda Sadeh)
* rgw: swift: Response header of GET request for container does not 
  contain X-Container-Object-Count, X-Container-Bytes-Used and x-trans-id 
  headers (#10666, Dmytro Iurchenko)
* rgw: swift: Response header of POST request for object does not contain 
  content-length and x-trans-id headers (#10661, Radoslaw Zarzynski)
* rgw: swift: response for GET/HEAD on container does not contain the 
  X-Timestamp header (#10938, Radoslaw Zarzynski)
* rgw: swift: response for PUT on /container does not contain the 
  mandatory Content-Length header when FCGI is used (#11036, #10971, 
  Radoslaw Zarzynski)
* rgw: swift: wrong handling of empty metadata on Swift container (#11088, 
  Radoslaw Zarzynski)
* tests: TestFlatIndex.cc races with TestLFNIndex.cc (#11217, Xinze Chi)
* tests: ceph-helpers kill_daemons fails when kill fails (#11398, Loic 
  Dachary)

For more detailed information, see the complete changelog at
  http://docs.ceph.com/docs/master/_downloads/v0.94.2.txt

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://ceph.com/download/ceph-0.94.2.tar.gz
* For packages, see

Re: [ceph-users] ceph mount error

2015-06-11 Thread Gregory Farnum

You probably didn't turn on an MDS, as that isn't set up by default
anymore. I believe the docs tell you how to do that somewhere else.

If that's not it, please provide the output of ceph -s.
-Greg

On Sun, Jun 7, 2015 at 8:14 AM, 张忠波 zhangzhongbo2...@163.com wrote:

 Hi ,
 My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer
 to  the CEPH FS QUICK START guide .
 http://ceph.com/docs/master/start/quick-cephfs/
 however , I got a error when i use the command ,  mount -t ceph
 192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid
 argument
 I refer to munual , and now , I don't know how to solve it .
 I am looking forward to your reply !




 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] radosgw backup

2015-06-11 Thread Michael Kuriger

You may be able to use replication.  Here is a site showing a good example of 
how to set it up.  I have not tested replicating within the same datacenter, 
but you should just be able to define a new zone within your existing ceph 
cluster and replicate to it.

http://cephnotes.ksperis.com/blog/2015/03/13/radosgw-simple-replication-example



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Konstantin Ivanov
Sent: Thursday, May 28, 2015 1:44 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] radosgw backup

Hi everyone.
I'm wondering - is there way to backup radosgw data?
What i already tried.
create backup pool - copy .rgw.buckets to backup pool. Then i delete object 
via s3 client. And then i copy data from backup pool to .rgw.buckets. I still 
can't see object in s3 client, but can get it via http by early known url.
Questions: where radosgw stores info about objects - (how to make restored 
object visible from s3 client)? is there best way for backup data for radosgw?
Thanks for any advises.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mount error

2015-06-11 Thread Michael Kuriger

1) set up mds server
ceph-deploy mds --overwrite-conf create hostname of mds server

2) create filesystem
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 16
ceph fs new cephfs cephfs_metadata cephfs_data
ceph fs ls
ceph mds stat

3) mount it!


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ???
Sent: Sunday, June 07, 2015 8:15 AM
To: ceph-us...@ceph.com; community
Cc: xuzh@gmail.com
Subject: [ceph-users] ceph mount error

Hi ,
My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer to  
the CEPH FS QUICK START guide .
http://ceph.com/docs/master/start/quick-cephfs/https://urldefense.proofpoint.com/v2/url?u=http-3A__ceph.com_docs_master_start_quick-2Dcephfs_d=AwMGbgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=q4j_7A_3Avo64MLd_mNa6jWl9XuLv1sx5SHvl58A0Vos=5Ttzin1olsWLGMMcZsINYfk82p7_jiBGDejDXUqUQvQe=
however , I got a error when i use the command ,  mount -t ceph 
192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid 
argument
I refer to munual , and now , I don't know how to solve it .
I am looking forward to your reply !

[cid:image001.png@01D0A433.D303BFB0]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Michael Kuriger

You might be able to accomplish that with something like dropbox or owncloud

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Trevor 
Robinson - Key4ce
Sent: Wednesday, May 20, 2015 2:35 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Is Ceph right for me?

Hello,

Could somebody please advise me if Ceph is suitable for our use?

We are looking for a file system which is able to work over different locations 
which are connected by VPN. If one locations was to go offline then the 
filesystem will stay online at both sites and then once connection is regained 
the latest file version will take priority.

The main use will be for website files so the changes are most likely to be any 
uploaded files and cache files as a lot of the data will be stored in a SQL 
database which is already replicated.

With Kind Regards,
Trevor Robinson

CTO at Key4ce
[Image removed by sender. Key4ce - IT 
Professionals]https://urldefense.proofpoint.com/v2/url?u=https-3A__key4ce.com_d=AwMFAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=2W9LUdP7c-p9ne86lHzC9HrCJlqRadoKZr0lL_2jCpss=-GHFrZVDoc-S05-TAziAR-f-4eLd8MxrKbTkiSlWHyEe=

Skype:  KeyMalus.Trev
xmpp:  t.robin...@im4ce.com
Livechat:  
http://livechat.key4ce.comhttps://urldefense.proofpoint.com/v2/url?u=http-3A__livechat.key4ce.com_d=AwMFAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=2W9LUdP7c-p9ne86lHzC9HrCJlqRadoKZr0lL_2jCpss=EAVTZZsRYNxcyZr2JR9op7sqzfWA49ReJpeH7MkSgWQe=

NL:  +31 (0)40 290 3310
UK:  +44 (0)1332 898 999
CN:  +86 (0)7552 824 5985

The information contained in this message may be confidential and legally 
protected under applicable law. The message is intended solely for the 
addressee(s). If you are not the intended recipient, you are hereby notified 
that any use, forwarding, dissemination, or reproduction of this message is 
strictly prohibited and may be unlawful. If you are not the intended recipient, 
please contact the sender by return e-mail and destroy all copies of the 
original message.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mount error

2015-06-11 Thread Lincoln Bryant

Hi,

Are you using cephx? If so, does your client have the appropriate key on it? It 
looks like you have an mds set up and running from your screenshot.

Try mounting it like so:

mount -t ceph -o name=admin,secret=[your secret] 192.168.1.105:6789:/ 
/mnt/mycephfs 

--Lincoln

On Jun 7, 2015, at 10:14 AM, 张忠波 wrote:

 Hi ,
 My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer to  
 the CEPH FS QUICK START guide .
 http://ceph.com/docs/master/start/quick-cephfs/
 however , I got a error when i use the command ,  mount -t ceph 
 192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid 
 argument 
 I refer to munual , and now , I don't know how to solve it . 
 I am looking forward to your reply !
 
 截图1.png
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Lionel Bouton

On 05/20/15 23:34, Trevor Robinson - Key4ce wrote:

 Hello,

  

 Could somebody please advise me if Ceph is suitable for our use?

  

 We are looking for a file system which is able to work over different
 locations which are connected by VPN. If one locations was to go
 offline then the filesystem will stay online at both sites and then
 once connection is regained the latest file version will take priority.


CephFS won't work well (or at all when the connections are lost). The
only part of Ceph which would work is RGW replication but you don't get
a filesystem with it and I'm under the impression that a multi-master
replication might be tricky (to be confirmer).

Coda's goals seems to match your needs. I'm not sure if it's still
actively developped (there is a client distributed with the Linux kernel
though).
http://www.coda.cs.cmu.edu/

Last time I tried it (several years ago) it worked well enough for me.

  

 The main use will be for website files so the changes are most likely
 to be any uploaded files and cache files as a lot of the data will be
 stored in a SQL database which is already replicated.


If your setup is not too complex, you might simply handle this with
rsync or unison.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph OSD with OCFS2

2015-06-11 Thread Somnath Roy

Hi,

Ceph journal works in different way.  It’s a write ahead journal, all the data 
will be persisted first in journal and then will be written to actual place. 
Journal data is encoded. Journal is a fixed size partition/file and written 
sequentially. So, if you are placing journal in HDDs, it will be overwritten, 
for SSD case , it will be GC later. So, if you are measuring amount of data 
written to the device it will be double. But, if you are saying you have 
written a 500MB file to cluster and you are seeing the actual file size is 10G, 
it should not be the case. How are you seeing this size BTW ?

Could you please tell us more about your configuration ?
What is the replication policy you are using ?
What interface you used to store the data ?

Regarding your other query..

 If i transfer 1GB data, what will be server size(OSD), Is this will write 
compressed format

No, actual data is not compressed. You don’t want to fill up OSD disk and there 
are some limits you can set . Check the following link

http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/

It will stop working if the disk is 95% full by default.

 Is it possible to take backup from server compressed data and copy the same 
to other machine as Server_Backup  - then start new client using Server_Backup
For backup, check the following link if that works for you.

https://ceph.com/community/blog/tag/backup/

Also, you can use RGW federated config for back up.

 Data removal is very slow

How are you removing data ? Are you removing a rbd image ?

If you are removing entire pool , that should be fast and do deletes data async 
way I guess.

Thanks  Regards
Somnath

From: gjprabu [mailto:gjpr...@zohocorp.com]
Sent: Thursday, June 11, 2015 6:38 AM
To: Somnath Roy
Cc: ceph-users@lists.ceph.com; Kamala Subramani; Siva Sokkumuthu
Subject: Re: RE: [ceph-users] Ceph OSD with OCFS2

Hi Team,

Once data transfer completed the journal file should convert all memory 
data's to real places but our cause it showing double of the size after 
complete transfer, Here everyone will confuse what is real file and folder 
size. Also What will happen If i move the monitoring from that osd server to 
separately, is the double size issue may solve ?

We have below query also.

1.  Extra 2-3 mins is taken for hg / git repository operation like clone , pull 
, checkout and update.
2.  If i transfer 1GB data, what will be server size(OSD), Is this will write 
compressed format.
3 . Is it possible to take backup from server compressed data and copy the same 
to other machine as Server_Backup  - then start new client using Server_Backup.
4.  Data removal is very slow.
Regards
Prabu



 On Fri, 05 Jun 2015 21:55:28 +0530 Somnath Roy 
somnath@sandisk.commailto:somnath@sandisk.com wrote 

Yes, Ceph will be writing twice , one for journal and one for actual data. 
Considering you configured journal in the same device , this is what you end up 
seeing if you are monitoring the device BW.



Thanks  Regards

Somnath



From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.commailto:ceph-users-boun...@lists.ceph.com]
 On Behalf Of gjprabu
Sent: Friday, June 05, 2015 3:07 AM
To: ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com
Cc: Kamala Subramani; Siva Sokkumuthu
Subject: [ceph-users] Ceph OSD with OCFS2



Dear Team,

   We are newly using ceph with two OSD and two clients. Both clients are 
mounted with OCFS2 file system. Here suppose i transfer 500MB of data in the 
client its showing double of the size 1GB after finish data transfer. Is the 
behavior is correct or is there any solution for this.

Regards

Prabu





PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Dan Craciun

You don't need a filesystem for that. I use csync2 with lsyncd and it
works ok.
Make sure if you use 2 or multi way sync and WinSCP to update files,
first delete the old version, wait a second and then upload the new
version. It will save you some head scratching...

https://www.krystalmods.com/index.php?title=csync2-web-server-file-syncmore=1c=1tb=1pb=1

Dan

On 6/11/2015 8:54 PM, Michael Kuriger wrote:

 You might be able to accomplish that with something like dropbox or
 owncloud

  

 *From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On
 Behalf Of *Trevor Robinson - Key4ce
 *Sent:* Wednesday, May 20, 2015 2:35 PM
 *To:* ceph-users@lists.ceph.com
 *Subject:* [ceph-users] Is Ceph right for me?

  

 Hello,

  

 Could somebody please advise me if Ceph is suitable for our use?

  

 We are looking for a file system which is able to work over different
 locations which are connected by VPN. If one locations was to go
 offline then the filesystem will stay online at both sites and then
 once connection is regained the latest file version will take priority.

  

 The main use will be for website files so the changes are most likely
 to be any uploaded files and cache files as a lot of the data will be
 stored in a SQL database which is already replicated.

  

 *With Kind Regards,*
 Trevor Robinson

 *CTO at Key4ce*

 Image removed by sender. Key4ce - IT Professionals
 https://urldefense.proofpoint.com/v2/url?u=https-3A__key4ce.com_d=AwMFAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=2W9LUdP7c-p9ne86lHzC9HrCJlqRadoKZr0lL_2jCpss=-GHFrZVDoc-S05-TAziAR-f-4eLd8MxrKbTkiSlWHyEe=

   

 *Skype:*  KeyMalus.Trev
 *xmpp:*  t.robin...@im4ce.com
 *Livechat:*  http://livechat.key4ce.com
 https://urldefense.proofpoint.com/v2/url?u=http-3A__livechat.key4ce.com_d=AwMFAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=2W9LUdP7c-p9ne86lHzC9HrCJlqRadoKZr0lL_2jCpss=EAVTZZsRYNxcyZr2JR9op7sqzfWA49ReJpeH7MkSgWQe=

   

 *NL:*  +31 (0)40 290 3310
 *UK:*  +44 (0)1332 898 999
 *CN:*  +86 (0)7552 824 5985

 

 The information contained in this message may be confidential and
 legally protected under applicable law. The message is intended solely
 for the addressee(s). If you are not the intended recipient, you are
 hereby notified that any use, forwarding, dissemination, or
 reproduction of this message is strictly prohibited and may be
 unlawful. If you are not the intended recipient, please contact the
 sender by return e-mail and destroy all copies of the original message.

  

  



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] TR: High apply latency on OSD causes poor performance on VM

2015-06-11 Thread Tyler Bishop

Turn off write cache on the controller. Your probably seeing the flush to disk. 







Tyler Bishop 
Chief Executive Officer 
513-299-7108 x10 



tyler.bis...@beyondhosting.net 


If you are not the intended recipient of this transmission you are notified 
that disclosing, copying, distributing or taking any action in reliance on the 
contents of this information is strictly prohibited. 




From: Franck Allouis franck.allo...@stef.com 
To: ceph-users ceph-us...@ceph.com 
Sent: Friday, May 29, 2015 8:54:41 AM 
Subject: [ceph-users] TR: High apply latency on OSD causes poor performance on 
VM 



Hi, 



Could you take a look on my problem. 

It’s about high latency on my OSDs on HP G8 servers (ceph01, ceph02 and 
ceph03). 

When I run a rados bench for 60 sec, the results are surprising : after a few 
seconds, there is no traffic, then it’s resume, etc. 

Finally, the maximum latency is high and VM’s disks freeze lot. 



#rados bench -p pool-test-g8 60 write 

Maintaining 16 concurrent writes of 4194304 bytes for up to 60 seconds or 0 
objects 

Object prefix: benchmark_data_ceph02_56745 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

0 0 0 0 0 0 - 0 

1 16 82 66 263.959 264 0.0549584 0.171148 

2 16 134 118 235.97 208 0.344873 0.232103 

3 16 189 173 230.639 220 0.015583 0.24581 

4 16 248 232 231.973 236 0.0704699 0.252504 

5 16 306 290 231.974 232 0.0229872 0.258343 

6 16 371 355 236.64 260 0.27183 0.255469 

7 16 419 403 230.26 192 0.0503492 0.263304 

8 16 460 444 221.975 164 0.0157241 0.261779 

9 16 506 490 217.754 184 0.199418 0.271501 

10 16 518 502 200.778 48 0.0472324 0.269049 

11 16 518 502 182.526 0 - 0.269049 

12 16 556 540 179.981 76 0.100336 0.301616 

13 16 607 591 181.827 204 0.173912 0.346105 

14 16 655 639 182.552 192 0.0484904 0.339879 

15 16 683 667 177.848 112 0.0504184 0.349929 

16 16 746 730 182.481 252 0.276635 0.347231 

17 16 807 791 186.098 244 0.391491 0.339275 

18 16 845 829 184.203 152 0.188608 0.342021 

19 16 850 834 175.561 20 0.960175 0.342717 

2015-05-28 17:09:48.397376min lat: 0.013532 max lat: 6.28387 avg lat: 0.346987 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

20 16 859 843 168.582 36 0.0182246 0.346987 

21 16 863 847 161.316 16 3.18544 0.355051 

22 16 897 881 160.165 136 0.0811037 0.371209 

23 16 901 885 153.897 16 0.0482124 0.370793 

24 16 943 927 154.484 168 0.63064 0.397204 

25 15 997 982 157.104 220 0.0933448 0.392701 

26 16 1058 1042 160.291 240 0.166463 0.385943 

27 16 1088 1072 158.798 120 1.63882 0.388568 

28 16 1125 1109 158.412 148 0.0511479 0.38419 

29 16 1155 1139 157.087 120 0.162266 0.385898 

30 16 1163 1147 152.917 32 0.0682181 0.383571 

31 16 1190 1174 151.468 108 0.0489185 0.386665 

32 16 1196 1180 147.485 24 2.95263 0.390657 

33 16 1213 1197 145.076 68 0.0467788 0.389299 

34 16 1265 1249 146.926 208 0.0153085 0.420687 

35 16 1332 1316 150.384 268 0.0157061 0.42259 

36 16 1374 1358 150.873 168 0.251626 0.417373 

37 16 1402 1386 149.822 112 0.0475302 0.413886 

38 16 1444 1428 150.3 168 0.0507577 0.421055 

39 16 1500 1484 152.189 224 0.0489163 0.416872 

2015-05-28 17:10:08.399434min lat: 0.013532 max lat: 9.26596 avg lat: 0.415296 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

40 16 1530 1514 151.384 120 0.951713 0.415296 

41 16 1551 1535 149.741 84 0.0686787 0.416571 

42 16 1606 1590 151.413 220 0.0826855 0.41684 

43 16 1656 1640 152.542 200 0.0706539 0.409974 

44 16 1663 1647 149.712 28 0.046672 0.408476 

45 16 1685 1669 148.34 88 0.0989566 0.424918 

46 16 1707 1691 147.028 88 0.0490569 0.421116 

47 16 1707 1691 143.9 0 - 0.421116 

48 16 1707 1691 140.902 0 - 0.421116 

49 16 1720 1704 139.088 17. 0.0480335 0.428997 

50 16 1752 1736 138.866 128 0.053219 0.4416 

51 16 1786 1770 138.809 136 0.602946 0.440357 

52 16 1810 1794 137.986 96 0.0472518 0.438376 

53 16 1831 1815 136.967 84 0.0148999 0.446801 

54 16 1831 1815 134.43 0 - 0.446801 

55 16 1853 1837 133.586 44 0.0499486 0.455561 

56 16 1898 1882 134.415 180 0.0566593 0.461019 

57 16 1932 1916 134.442 136 0.0162902 0.454385 

58 16 1948 1932 133.227 64 0.62188 0.464403 

59 16 1966 1950 132.19 72 0.563613 0.472147 

2015-05-28 17:10:28.401525min lat: 0.013532 max lat: 12.4828 avg lat: 0.472084 

sec Cur ops started finished avg MB/s cur MB/s last lat avg lat 

60 16 1983 1967 131.12 68 0.030789 0.472084 

61 16 1984 1968 129.036 4 0.0519125 0.471871 

62 16 1984 1968 126.955 0 - 0.471871 

63 16 1984 1968 124.939 0 - 0.471871 

64 14 1984 1970 123.112 2.7 4.20878 0.476035 

Total time run: 64.823355 

Total writes made: 1984 

Write size: 4194304 

Bandwidth (MB/sec): 122.425 

Stddev Bandwidth: 85.3816 

Max bandwidth (MB/sec): 268 

Min bandwidth (MB/sec): 0 

Average Latency: 0.520956 

Stddev Latency: 1.17678 

Max latency: 12.4828 

Min latency: 0.013532 





I have installed a new ceph06 box which has best latencies but hardware is 
different (RAID card, disks,

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Somnath Roy

Yeah, perf top will help you a lot..

Some guess:

1. If your block size is small 4-16K range, most probably you are hitting the 
tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in that 
case.

2. fdcache should save you some cpu but I don't see it will be that significant.

Thanks  Regards
Somnath

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
Schermer
Sent: Thursday, June 11, 2015 5:57 AM
To: Dan van der Ster
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage

I have no experience with perf and the package is not installed.
I will take a look at it, thanks.

Jan


 On 11 Jun 2015, at 13:48, Dan van der Ster d...@vanderster.com wrote:

 Hi Jan,

 Can you get perf top running? It should show you where the OSDs are 
 spinning...

 Cheers, Dan

 On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.

 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
 I restart the OSD everything runs nicely for some time, then it creeps up.

 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so it’s
 not caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then 
 doesn’t come back, or maybe it happens slowly over time and I can’t find it 
 in the graphs. Looking at the graphs it seems to be the former.

 I have just one suspicion and that is the “fd cache size” - we have
 it set to 16384 but the open fds suggest there are more open files for the 
 osd process (over 17K fds) - it varies by some hundreds between the osds. 
 Maybe some are just slightly over the limit and the misses cause this? 
 Restarting the OSD clears them (~2K) and they increase over time. I 
 increased it to 32768 yesterday and it consistently nice now, but it might 
 take another few days to manifest… Could this explain it? Any other tips?

 Thanks

 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer

Hi,
I looked at it briefly before leaving, tcmalloc was at the top. I can provide a 
full listing tomorrow if it helps.

 12.80%  libtcmalloc.so.4.1.0  [.] tcmalloc::CentralFreeList::FetchFromSpans()
  8.40%  libtcmalloc.so.4.1.0  [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int)
  7.40%  [kernel]  [k] futex_wake
  6.36%  libtcmalloc.so.4.1.0  [.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
  6.09%  [kernel]  [k] futex_requeue

Not much else to see. We tried setting the venerable 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it only got much much worse (default 
16MB, tried 8MB and up to 512MB, it was unusably slow immediately after start). 
We haven’t tried upgrading tcmalloc, though...

We only use Ceph for RBD with OpenStack, block size is the default (4MB).
I tested different block sizes previously, and I got the best results from 8MB 
blocks (and I was benchmarking 4K random direct/sync writes) - strange, I think…

I increased fdcache to 12 (which should be enough for all objects on the 
OSD), and I will compare how it behaves tomorrow.

Thanks a lot

Jan

 On 11 Jun 2015, at 20:59, Somnath Roy somnath@sandisk.com wrote:
 
 Yeah, perf top will help you a lot..
 
 Some guess:
 
 1. If your block size is small 4-16K range, most probably you are hitting the 
 tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in that 
 case.
 
 2. fdcache should save you some cpu but I don't see it will be that 
 significant.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jan 
 Schermer
 Sent: Thursday, June 11, 2015 5:57 AM
 To: Dan van der Ster
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage
 
 I have no experience with perf and the package is not installed.
 I will take a look at it, thanks.
 
 Jan
 
 
 On 11 Jun 2015, at 13:48, Dan van der Ster d...@vanderster.com wrote:
 
 Hi Jan,
 
 Can you get perf top running? It should show you where the OSDs are 
 spinning...
 
 Cheers, Dan
 
 On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.
 
 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
 I restart the OSD everything runs nicely for some time, then it creeps up.
 
 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer 
 to 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so it’s
 not caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then 
 doesn’t come back, or maybe it happens slowly over time and I can’t find it 
 in the graphs. Looking at the graphs it seems to be the former.
 
 I have just one suspicion and that is the “fd cache size” - we have
 it set to 16384 but the open fds suggest there are more open files for the 
 osd process (over 17K fds) - it varies by some hundreds between the osds. 
 Maybe some are just slightly over the limit and the misses cause this? 
 Restarting the OSD clears them (~2K) and they increase over time. I 
 increased it to 32768 yesterday and it consistently nice now, but it might 
 take another few days to manifest… Could this explain it? Any other tips?
 
 Thanks
 
 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error, please notify 
 the sender by telephone or e-mail (as shown above) immediately and destroy 
 any and all copies of this message in your possession (whether hard copies or 
 electronically stored copies).
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Somnath Roy

Yeah ! Then it is the tcmalloc issue..
If you are using the version coming with OS , the 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES won't do anything.
Try building the latest tcmalloc and set the env variable and see if it 
improves or not.
Also, you can try with latest ceph build with jemalloc enabled if you have a 
test cluster.

Thanks  Regards
Somnath

-Original Message-
From: Jan Schermer [mailto:j...@schermer.cz] 
Sent: Thursday, June 11, 2015 12:10 PM
To: Somnath Roy
Cc: Dan van der Ster; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage

Hi,
I looked at it briefly before leaving, tcmalloc was at the top. I can provide a 
full listing tomorrow if it helps.

 12.80%  libtcmalloc.so.4.1.0  [.] tcmalloc::CentralFreeList::FetchFromSpans()
  8.40%  libtcmalloc.so.4.1.0  [.] 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned long, int)
  7.40%  [kernel]  [k] futex_wake
  6.36%  libtcmalloc.so.4.1.0  [.] 
tcmalloc::CentralFreeList::ReleaseToSpans(void*)
  6.09%  [kernel]  [k] futex_requeue

Not much else to see. We tried setting the venerable 
TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it only got much much worse (default 
16MB, tried 8MB and up to 512MB, it was unusably slow immediately after start). 
We haven’t tried upgrading tcmalloc, though...

We only use Ceph for RBD with OpenStack, block size is the default (4MB).
I tested different block sizes previously, and I got the best results from 8MB 
blocks (and I was benchmarking 4K random direct/sync writes) - strange, I think…

I increased fdcache to 12 (which should be enough for all objects on the 
OSD), and I will compare how it behaves tomorrow.

Thanks a lot

Jan

 On 11 Jun 2015, at 20:59, Somnath Roy somnath@sandisk.com wrote:
 
 Yeah, perf top will help you a lot..
 
 Some guess:
 
 1. If your block size is small 4-16K range, most probably you are hitting the 
 tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in that 
 case.
 
 2. fdcache should save you some cpu but I don't see it will be that 
 significant.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Thursday, June 11, 2015 5:57 AM
 To: Dan van der Ster
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage
 
 I have no experience with perf and the package is not installed.
 I will take a look at it, thanks.
 
 Jan
 
 
 On 11 Jun 2015, at 13:48, Dan van der Ster d...@vanderster.com wrote:
 
 Hi Jan,
 
 Can you get perf top running? It should show you where the OSDs are 
 spinning...
 
 Cheers, Dan
 
 On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.
 
 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
 I restart the OSD everything runs nicely for some time, then it creeps up.
 
 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer 
 to 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so 
 it’s not caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then 
 doesn’t come back, or maybe it happens slowly over time and I can’t find it 
 in the graphs. Looking at the graphs it seems to be the former.
 
 I have just one suspicion and that is the “fd cache size” - we have 
 it set to 16384 but the open fds suggest there are more open files for the 
 osd process (over 17K fds) - it varies by some hundreds between the osds. 
 Maybe some are just slightly over the limit and the misses cause this? 
 Restarting the OSD clears them (~2K) and they increase over time. I 
 increased it to 32768 yesterday and it consistently nice now, but it might 
 take another few days to manifest… Could this explain it? Any other tips?
 
 Thanks
 
 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 PLEASE NOTE: The information contained in this electronic mail message is 
 intended only for the use of the designated recipient(s) named above. If the 
 reader of this message is not the intended recipient, you are hereby notified 
 that you have received this message in error and that any review, 
 dissemination, distribution, or copying of this message is strictly 
 prohibited. If you have received this communication in error,

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Shane Gibson



Alternatively you could just use GIT (or some other form of versioning system) 
... host your code/files/html/whatever in GIT.  Make changes to the GIT tree - 
then you can trigger a git pull from your webservers to local filesystem.

This gives you the ability to use branches/versions to control your webserver 
content - and you can easily roll back to a previous version if you need to.  
You can create a dev branch and make changes to it, host it on a test web 
server ... approved, then push the changes to the master branch and trigger 
the refresh on the web servers.

~~shane



On 6/11/15, 11:28 AM, Lionel Bouton 
lionel+c...@bouton.namemailto:lionel+c...@bouton.name wrote:

On 05/20/15 23:34, Trevor Robinson - Key4ce wrote:
Hello,

Could somebody please advise me if Ceph is suitable for our use?

We are looking for a file system which is able to work over different locations 
which are connected by VPN. If one locations was to go offline then the 
filesystem will stay online at both sites and then once connection is regained 
the latest file version will take priority.

CephFS won't work well (or at all when the connections are lost). The only part 
of Ceph which would work is RGW replication but you don't get a filesystem with 
it and I'm under the impression that a multi-master replication might be tricky 
(to be confirmer).

Coda's goals seems to match your needs. I'm not sure if it's still actively 
developped (there is a client distributed with the Linux kernel though).
http://www.coda.cs.cmu.edu/

Last time I tried it (several years ago) it worked well enough for me.


The main use will be for website files so the changes are most likely to be any 
uploaded files and cache files as a lot of the data will be stored in a SQL 
database which is already replicated.

If your setup is not too complex, you might simply handle this with rsync or 
unison.

Best regards,

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer

TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES works, or at least seems to, just nothing 
positive. This is on a Centos 6-ish distro.
I can’t really upgrade anything easily because of support, and we still run 
0.67.12 in production, so that’s a no-go.
I know upgrading to Giant is the best way to achieve more performance, but 
we’re not ready for that yet either (but working on it :))
I’d expect the tcmalloc issue to manifest almost immediately? There are 
thousands of threads, hundreds of connections - surely it would manifest 
sooner? People were seeing regressions with just two clients in benchmarks so I 
thought we are operating with b0rked thread cache constantly…

for the record, preloading jemalloc ends with sigsegv within a few minutes, if 
anybody wanted to know… :)

Jan


 On 11 Jun 2015, at 21:14, Somnath Roy somnath@sandisk.com wrote:
 
 Yeah ! Then it is the tcmalloc issue..
 If you are using the version coming with OS , the 
 TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES won't do anything.
 Try building the latest tcmalloc and set the env variable and see if it 
 improves or not.
 Also, you can try with latest ceph build with jemalloc enabled if you have a 
 test cluster.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: Jan Schermer [mailto:j...@schermer.cz] 
 Sent: Thursday, June 11, 2015 12:10 PM
 To: Somnath Roy
 Cc: Dan van der Ster; ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage
 
 Hi,
 I looked at it briefly before leaving, tcmalloc was at the top. I can provide 
 a full listing tomorrow if it helps.
 
 12.80%  libtcmalloc.so.4.1.0  [.] tcmalloc::CentralFreeList::FetchFromSpans()
  8.40%  libtcmalloc.so.4.1.0  [.] 
 tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*,
  unsigned long, int)
  7.40%  [kernel]  [k] futex_wake
  6.36%  libtcmalloc.so.4.1.0  [.] 
 tcmalloc::CentralFreeList::ReleaseToSpans(void*)
  6.09%  [kernel]  [k] futex_requeue
 
 Not much else to see. We tried setting the venerable 
 TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES, but it only got much much worse 
 (default 16MB, tried 8MB and up to 512MB, it was unusably slow immediately 
 after start). We haven’t tried upgrading tcmalloc, though...
 
 We only use Ceph for RBD with OpenStack, block size is the default (4MB).
 I tested different block sizes previously, and I got the best results from 
 8MB blocks (and I was benchmarking 4K random direct/sync writes) - strange, I 
 think…
 
 I increased fdcache to 12 (which should be enough for all objects on the 
 OSD), and I will compare how it behaves tomorrow.
 
 Thanks a lot
 
 Jan
 
 On 11 Jun 2015, at 20:59, Somnath Roy somnath@sandisk.com wrote:
 
 Yeah, perf top will help you a lot..
 
 Some guess:
 
 1. If your block size is small 4-16K range, most probably you are hitting 
 the tcmalloc issue. 'perf top' will show up with lot of tcmalloc traces in 
 that case.
 
 2. fdcache should save you some cpu but I don't see it will be that 
 significant.
 
 Thanks  Regards
 Somnath
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
 Of Jan Schermer
 Sent: Thursday, June 11, 2015 5:57 AM
 To: Dan van der Ster
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Restarting OSD leads to lower CPU usage
 
 I have no experience with perf and the package is not installed.
 I will take a look at it, thanks.
 
 Jan
 
 
 On 11 Jun 2015, at 13:48, Dan van der Ster d...@vanderster.com wrote:
 
 Hi Jan,
 
 Can you get perf top running? It should show you where the OSDs are 
 spinning...
 
 Cheers, Dan
 
 On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.
 
 Some of my OSDs have a larger CPU usage (and ops latencies) than others. 
 If I restart the OSD everything runs nicely for some time, then it creeps 
 up.
 
 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer 
 to 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so 
 it’s not caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy 
 happens in a cluster (like dropping very old snapshots, rebalancing) and 
 then doesn’t come back, or maybe it happens slowly over time and I can’t 
 find it in the graphs. Looking at the graphs it seems to be the former.
 
 I have just one suspicion and that is the “fd cache size” - we have 
 it set to 16384 but the open fds suggest there are more open files for the 
 osd process (over 17K fds) - it varies by some hundreds between the osds. 
 Maybe some are just slightly over the limit and the misses cause this? 
 Restarting the OSD clears them (~2K) and they increase over time. I 
 increased it to 32768 yesterday and it consistently nice now,

[ceph-users] Ceph giant installation fails on rhel 7.0

2015-06-11 Thread Shambhu Rajak

I am trying to install ceph gaint on rhel 7.0, while installing 
ceph-common-0.87.2-0.el7.x86_64.rpm I am getting following dependency

packages]$ sudo yum install ceph-common-0.87.2-0.el7.x86_64.rpm
Loaded plugins: amazon-id, priorities, rhui-lb
Examining ceph-common-0.87.2-0.el7.x86_64.rpm: 1:ceph-common-0.87.2-0.el7.x86_64
Marking ceph-common-0.87.2-0.el7.x86_64.rpm to be installed
Resolving Dependencies
-- Running transaction check
--- Package ceph-common.x86_64 1:0.87.2-0.el7 will be installed
-- Processing Dependency: libtcmalloc.so.4()(64bit) for package: 
1:ceph-common-0.87.2-0.el7.x86_64
-- Finished Dependency Resolution
Error: Package: 1:ceph-common-0.87.2-0.el7.x86_64 
(/ceph-common-0.87.2-0.el7.x86_64)
   Requires: libtcmalloc.so.4()(64bit)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles -nodigest


So I am trying install gperftools-libs to resolve the dependencies, but I am 
unable to get the package using yum install

Can any one help me getting the complete list of dependencies to install ceph 
giant on rhel 7.0.

Thanks,
Shambhu




PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph OSD with OCFS2

2015-06-11 Thread gjprabu

Hi Team,

Once data transfer completed the journal file should convert all memory 
data's to real places but our cause it showing double of the size after 
complete transfer, Here everyone will confuse what is real file and folder 
size. Also What will happen If i move the monitoring from that osd server to 
separately, is the double size issue may solve ?

We have below query also.

1.  Extra 2-3 mins is taken for hg / git repository operation like clone , pull 
, checkout and update.
2.  If i transfer 1GB data, what will be server size(OSD), Is this will write 
compressed format.

3 . Is it possible to take backup from server compressed data and copy the same 
to other machine as Server_Backup  - then start new client using Server_Backup. 
 

4.  Data removal is very slow.

Regards
Prabu





 On Fri, 05 Jun 2015 21:55:28 +0530 Somnath Roy 
lt;somnath@sandisk.comgt; wrote  

  Yes, Ceph will be writing twice , one for journal and one for actual data. 
Considering you configured journal in the same device , this is what you end up 
seeing if you are monitoring the device BW.
  
 Thanks amp; Regards
 Somnath
  
   From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
gjprabu
 Sent: Friday, June 05, 2015 3:07 AM
 To: ceph-users@lists.ceph.com
 Cc: Kamala Subramani; Siva Sokkumuthu
 Subject: [ceph-users] Ceph OSD with OCFS2
 
 
  
  Dear Team,  
 
We are newly using ceph with two OSD and two clients. Both clients are 
mounted with OCFS2 file system. Here suppose i transfer 500MB of data in the 
client its showing double of the size 1GB after finish data transfer. Is the 
behavior is correct or is there any solution for this.
   Regards
 
 Prabu
   
 
 
 
 
 
  
 PLEASE NOTE: The information contained in this electronic mail message is 
intended only for the use of the designated recipient(s) named above. If the 
reader of this message is not the intended recipient, you are hereby notified 
that you have received this message in error and that any review, 
dissemination, distribution, or copying of this message is strictly prohibited. 
If you have received this communication in error, please notify the sender by 
telephone or e-mail (as shown above) immediately and destroy any and all copies 
of this message in your possession (whether hard copies or electronically 
stored copies).
 
  




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Jan Schermer

I have no experience with perf and the package is not installed.
I will take a look at it, thanks.

Jan


 On 11 Jun 2015, at 13:48, Dan van der Ster d...@vanderster.com wrote:
 
 Hi Jan,
 
 Can you get perf top running? It should show you where the OSDs are 
 spinning...
 
 Cheers, Dan
 
 On Thu, Jun 11, 2015 at 11:21 AM, Jan Schermer j...@schermer.cz wrote:
 Hi,
 hoping someone can point me in the right direction.
 
 Some of my OSDs have a larger CPU usage (and ops latencies) than others. If 
 I restart the OSD everything runs nicely for some time, then it creeps up.
 
 1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
 80%. Restarting means the offending OSDs only use 40% again.
 2) average latencies and CPU usage on the host are the same - so it’s not 
 caused by the host that the OSD is running on
 3) I can’t say exactly when or how the issue happens. I can’t even say if 
 it’s the same OSDs. It seems it either happens when something heavy happens 
 in a cluster (like dropping very old snapshots, rebalancing) and then 
 doesn’t come back, or maybe it happens slowly over time and I can’t find it 
 in the graphs. Looking at the graphs it seems to be the former.
 
 I have just one suspicion and that is the “fd cache size” - we have it set 
 to 16384 but the open fds suggest there are more open files for the osd 
 process (over 17K fds) - it varies by some hundreds between the osds. Maybe 
 some are just slightly over the limit and the misses cause this? Restarting 
 the OSD clears them (~2K) and they increase over time. I increased it to 
 32768 yesterday and it consistently nice now, but it might take another few 
 days to manifest…
 Could this explain it? Any other tips?
 
 Thanks
 
 Jan
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Is Ceph right for me?

2015-06-11 Thread Karsten Heymann

Hi Trevor,

probably csync2 could work for you.

Best
Karsten
Am 11.06.2015 7:30 nachm. schrieb Trevor Robinson - Key4ce 
t.robin...@key4ce.com:

  Hello,



 Could somebody please advise me if Ceph is suitable for our use?



 We are looking for a file system which is able to work over different
 locations which are connected by VPN. If one locations was to go offline
 then the filesystem will stay online at both sites and then once connection
 is regained the latest file version will take priority.



 The main use will be for website files so the changes are most likely to
 be any uploaded files and cache files as a lot of the data will be stored
 in a SQL database which is already replicated.



 *With Kind Regards,*
 Trevor Robinson

 *CTO at Key4ce*

 [image: Key4ce - IT Professionals] https://key4ce.com/

 *Skype:*  KeyMalus.Trev
 *xmpp:*  t.robin...@im4ce.com
 *Livechat:*  http://livechat.key4ce.com

 *NL:*  +31 (0)40 290 3310
 *UK:*  +44 (0)1332 898 999
 *CN:*  +86 (0)7552 824 5985
--

 The information contained in this message may be confidential and legally
 protected under applicable law. The message is intended solely for the
 addressee(s). If you are not the intended recipient, you are hereby
 notified that any use, forwarding, dissemination, or reproduction of this
 message is strictly prohibited and may be unlawful. If you are not the
 intended recipient, please contact the sender by return e-mail and destroy
 all copies of the original message.





 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] MONs not forming quorum

2015-06-11 Thread Gruher, Joseph R

Hi folks-

I'm trying to deploy 0.94.2 (Hammer) onto CentOS7.  I used to be pretty good at 
this on Ubuntu but it has been a while.  Anyway, my monitors are not forming 
quorum, and I'm not sure why.  They can definitely all ping each other and 
such.  Any thoughts on specific problems in the output below, or just general 
causes for monitors not forming quorum, or where to get more debug information 
on what is going wrong?  Thanks!!

[root@bdca151 ceph]# ceph-deploy mon create-initial bdca15{0,2,3}
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial 
bdca150 bdca152 bdca153
[ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts bdca150 bdca152 
bdca153
[ceph_deploy.mon][DEBUG ] detecting platform for host bdca150 ...
[bdca150][DEBUG ] connected to host: bdca150
[bdca150][DEBUG ] detect platform information from remote host
[bdca150][DEBUG ] detect machine type
[ceph_deploy.mon][INFO  ] distro info: CentOS Linux 7.1.1503 Core
[bdca150][DEBUG ] determining if provided host has same hostname in remote
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] deploying mon to bdca150
[bdca150][DEBUG ] get remote short hostname
[bdca150][DEBUG ] remote hostname: bdca150
[bdca150][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[bdca150][DEBUG ] create the mon path if it does not exist
[bdca150][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-bdca150/done
[bdca150][INFO  ] creating keyring file: 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create the monitor keyring file
[bdca150][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i bdca150 
--keyring /var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] ceph-mon: renaming mon.noname-a 10.1.0.150:6789/0 to 
mon.bdca150
[bdca150][DEBUG ] ceph-mon: set fsid to 770514ba-65e6-475b-8d43-ad6ee850ead6
[bdca150][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-bdca150 for 
mon.bdca150
[bdca150][INFO  ] unlinking keyring file 
/var/lib/ceph/tmp/ceph-bdca150.mon.keyring
[bdca150][DEBUG ] create a done file to avoid re-doing the mon deployment
[bdca150][DEBUG ] create the init path if it does not exist
[bdca150][DEBUG ] locating the `service` executable...
[bdca150][INFO  ] Running command: /usr/sbin/service ceph -c 
/etc/ceph/ceph.conf start mon.bdca150
[bdca150][DEBUG ] === mon.bdca150 ===
[bdca150][DEBUG ] Starting Ceph mon.bdca150 on bdca150...
[bdca150][WARNIN] Running as unit run-52328.service.
[bdca150][DEBUG ] Starting ceph-create-keys on bdca150...
[bdca150][INFO  ] Running command: systemctl enable ceph
[bdca150][WARNIN] ceph.service is not a native service, redirecting to 
/sbin/chkconfig.
[bdca150][WARNIN] Executing /sbin/chkconfig ceph on
[bdca150][WARNIN] The unit files have no [Install] section. They are not meant 
to be enabled
[bdca150][WARNIN] using systemctl.
[bdca150][WARNIN] Possible reasons for having this kind of units are:
[bdca150][WARNIN] 1) A unit may be statically enabled by being symlinked from 
another unit's
[bdca150][WARNIN].wants/ or .requires/ directory.
[bdca150][WARNIN] 2) A unit's purpose may be to act as a helper for some other 
unit which has
[bdca150][WARNIN]a requirement dependency on it.
[bdca150][WARNIN] 3) A unit may be started when needed via activation (socket, 
path, timer,
[bdca150][WARNIN]D-Bus, udev, scripted systemctl call, ...).
[bdca150][INFO  ] Running command: ceph --cluster=ceph --admin-daemon 
/var/run/ceph/ceph-mon.bdca150.asok mon_status
[bdca150][DEBUG ] 

[bdca150][DEBUG ] status for monitor: mon.bdca150
[bdca150][DEBUG ] {
[bdca150][DEBUG ]   election_epoch: 0,
[bdca150][DEBUG ]   extra_probe_peers: [
[bdca150][DEBUG ] 10.1.0.152:6789/0,
[bdca150][DEBUG ] 10.1.0.153:6789/0
[bdca150][DEBUG ]   ],
[bdca150][DEBUG ]   monmap: {
[bdca150][DEBUG ] created: 0.00,
[bdca150][DEBUG ] epoch: 0,
[bdca150][DEBUG ] fsid: 770514ba-65e6-475b-8d43-ad6ee850ead6,
[bdca150][DEBUG ] modified: 0.00,
[bdca150][DEBUG ] mons: [
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 10.1.0.150:6789/0,
[bdca150][DEBUG ] name: bdca150,
[bdca150][DEBUG ] rank: 0
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/1,
[bdca150][DEBUG ] name: bdca152,
[bdca150][DEBUG ] rank: 1
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   {
[bdca150][DEBUG ] addr: 0.0.0.0:0/2,
[bdca150][DEBUG ] name: bdca153,
[bdca150][DEBUG ] rank: 2
[bdca150][DEBUG ]   }
[bdca150][DEBUG ] ]
[bdca150][DEBUG ]   },
[bdca150][DEBUG ]   name: bdca150,
[bdca150][DEBUG ]   outside_quorum: [
[bdca150][DEBUG ] bdca150
[bdca150][DEBUG ]   ],
[bdca150][DEBUG ]   quorum: [],

[ceph-users] anyone using CephFS for HPC?

2015-06-11 Thread Nigel Williams

Wondering if anyone has done comparisons between CephFS and other
parallel filesystems like Lustre typically used in HPC deployments
either for scratch storage or persistent storage to support HPC
workflows?

thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

2015-06-11 Thread pushpesh sharma

Hi Alexandre,

I agree with your rational, of one iothread per disk. CPU consumed in
IOwait is pretty high in each VM. But I am not finding a way to set
the same on a nova instance. I am using openstack Juno with QEMU+KVM.
As per libvirt documentation for setting iothreads, I can edit
domain.xml directly and achieve the same effect. However in as in
openstack env domain xml is created by nova with some additional
metadata, so editing the domain xml using 'virsh edit' does not seems
to work(I agree, it is not a very cloud way of doing things, but a
hack). Changes made there vanish after saving them, due to reason
libvirt validation fails on the same.

#virsh dumpxml instance-00c5  vm.xml
#virt-xml-validate vm.xml
Relax-NG validity error : Extra element cpu in interleave
vm.xml:1: element domain: Relax-NG validity error : Element domain
failed to validate content
vm.xml fails to validate

Second approach I took was to setting QoS in volumes types. But there
is no option to set iothreads per volume, there are parameter realted
to max_read/wrirte ops/bytes.

Thirdly, editing Nova flavor and proving extra specs like
hw:cpu_socket/thread/core, can change guest CPU topology however again
no way to set iothread. It does accept hw_disk_iothreads(no type check
in place, i believe ), but can not pass the same in domain.xml.

Could you suggest me a way to set the same.

-Pushpesh

On Wed, Jun 10, 2015 at 12:59 PM, Alexandre DERUMIER
aderum...@odiso.com wrote:
I need to try out the performance on qemu soon and may come back to you if I 
need some qemu setting trick :-)

 Sure no problem.

 (BTW, I can reach around 200k iops in 1 qemu vm with 5 virtio disks with 1 
 iothread by disk)


 - Mail original -
 De: Somnath Roy somnath@sandisk.com
 À: aderumier aderum...@odiso.com, Irek Fasikhov malm...@gmail.com
 Cc: ceph-devel ceph-de...@vger.kernel.org, pushpesh sharma 
 pushpesh@gmail.com, ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 10 Juin 2015 09:06:32
 Objet: RE: rbd_cache, limiting read on high iops around 40k

 Hi Alexandre,
 Thanks for sharing the data.
 I need to try out the performance on qemu soon and may come back to you if I 
 need some qemu setting trick :-)

 Regards
 Somnath

 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
 Alexandre DERUMIER
 Sent: Tuesday, June 09, 2015 10:42 PM
 To: Irek Fasikhov
 Cc: ceph-devel; pushpesh sharma; ceph-users
 Subject: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

Very good work!
Do you have a rpm-file?
Thanks.
 no sorry, I'm have compiled it manually (and I'm using debian jessie as 
 client)



 - Mail original -
 De: Irek Fasikhov malm...@gmail.com
 À: aderumier aderum...@odiso.com
 Cc: Robert LeBlanc rob...@leblancnet.us, ceph-devel 
 ceph-de...@vger.kernel.org, pushpesh sharma pushpesh@gmail.com, 
 ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 10 Juin 2015 07:21:42
 Objet: Re: [ceph-users] rbd_cache, limiting read on high iops around 40k

 Hi, Alexandre.

 Very good work!
 Do you have a rpm-file?
 Thanks.

 2015-06-10 7:10 GMT+03:00 Alexandre DERUMIER  aderum...@odiso.com  :


 Hi,

 I have tested qemu with last tcmalloc 2.4, and the improvement is huge with 
 iothread: 50k iops (+45%) !



 qemu : no iothread : glibc : iops=33395 qemu : no-iothread : tcmalloc (2.2.1) 
 : iops=34516 (+3%) qemu : no-iothread : jemmaloc : iops=42226 (+26%) qemu : 
 no-iothread : tcmalloc (2.4) : iops=35974 (+7%)


 qemu : iothread : glibc : iops=34516
 qemu : iothread : tcmalloc : iops=38676 (+12%) qemu : iothread : jemmaloc : 
 iops=28023 (-19%) qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)





 qemu : iothread : tcmalloc (2.4) : iops=50276 (+45%)
 --
 rbd_iodepth32-test: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, 
 ioengine=libaio, iodepth=32
 fio-2.1.11
 Starting 1 process
 Jobs: 1 (f=1): [r(1)] [100.0% done] [214.7MB/0KB/0KB /s] [54.1K/0/0 iops] 
 [eta 00m:00s]
 rbd_iodepth32-test: (groupid=0, jobs=1): err= 0: pid=894: Wed Jun 10 05:54:24 
 2015 read : io=5120.0MB, bw=201108KB/s, iops=50276, runt= 26070msec slat 
 (usec): min=1, max=1136, avg= 3.54, stdev= 3.58 clat (usec): min=128, 
 max=6262, avg=631.41, stdev=197.71 lat (usec): min=149, max=6265, avg=635.27, 
 stdev=197.40 clat percentiles (usec):
 | 1.00th=[ 318], 5.00th=[ 378], 10.00th=[ 418], 20.00th=[ 474],
 | 30.00th=[ 516], 40.00th=[ 564], 50.00th=[ 612], 60.00th=[ 652],
 | 70.00th=[ 700], 80.00th=[ 756], 90.00th=[ 860], 95.00th=[ 980],
 | 99.00th=[ 1272], 99.50th=[ 1384], 99.90th=[ 1688], 99.95th=[ 1896],
 | 99.99th=[ 3760]
 bw (KB /s): min=145608, max=249688, per=100.00%, avg=201108.00, 
 stdev=21718.87 lat (usec) : 250=0.04%, 500=25.84%, 750=53.00%, 1000=16.63% 
 lat (msec) : 2=4.46%, 4=0.03%, 10=0.01% cpu : usr=9.73%, sys=24.93%, 
 ctx=66417, majf=0, minf=38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 
 16=0.1%, 32=100.0%, =64=0.0% submit

Re: [ceph-users] anyone using CephFS for HPC?

2015-06-11 Thread Gregory Farnum

On Thu, Jun 11, 2015 at 10:31 PM, Nigel Williams
nigel.d.willi...@gmail.com wrote:
 Wondering if anyone has done comparisons between CephFS and other
 parallel filesystems like Lustre typically used in HPC deployments
 either for scratch storage or persistent storage to support HPC
 workflows?

Oak Ridge had a paper at Supercomputing a couple years ago about this
from their perspective. I don't remember how many of its concerns are
still up-to-date, and the test evaluation was on repurposed Lustre
hardware so it was a bit odd, but it might give you some stuff to
think about.
Sage's thesis or some of the earlier papers will be happy to tell you
all the ways in which Ceph  Lustre, of course, since creating a
successor is how the project started. ;)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] New to CEPH - VR@Sheeltron

2015-06-11 Thread V.Ranganath


Dear Sir,

I am New to CEPH. I have the following queries:

1. I have been using OpenNAS, OpenFiler, Gluster  Nexenta for storage 
OS. How is CEPH different from Gluster  Nexenta ?


2. I also use LUSTRE for our Storage in a HPC Environment. Can CEPH be 
substituted for Lustre ?


3. What is the minimum capacity of storage (in TB), where CEPH can be 
deployed ? What is the typical hardware configuration required to 
support CEPH ? Can we use 'commodity hardware' like TYAN - Servers  
JBODs to stack up the HDDs ?? Do you need RAID Controllers or is 
RAID/LUN built by the OS ?


4. Do you have any doc. that gives me the comparisons with other 
Software based Storage ?


Thanks  Regards,

V.Ranganath
VP - SIS Division
Sheeltron Digital Systems Pvt. Ltd.
Direct: 080-49293307
Mob:+91 88840 54897
E-mail: ra...@sheeltron.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] calculating maximum number of disk and node failure that can be handled by cluster with out data loss

2015-06-11 Thread Vasiliy Angapov

I wrote a script which calculates data durability SLA depending on many
factors like disk size, network speed, number of hosts etc.
It takes recovery time three times greater than needed to count client IO
priority over recovery IO.
For 2Tb disks and 10g network it shows a bright picture.
OSDs: 10SLA: 100.00%
OSDs: 20SLA: 100.00%
OSDs: 30SLA: 100.00%
OSDs: 40SLA: 100.00%
OSDs: 50SLA: 100.00%
OSDs: 100 SLA: 100.00%
OSDs: 200 SLA: 100.00%
OSDs: 500 SLA: 99.99%

For 1g network it show 7-8 nines in every line. So if my estimations are
correct then we are almost safe from triple failure data loss.
Script is in attachment. Any good critisizm is welcome.

Regards, Vasily.

On Thu, Jun 11, 2015 at 3:37 AM, Christian Balzer ch...@gol.com wrote:

Hello,

On Wed, 10 Jun 2015 23:53:48 +0300 Vasiliy Angapov wrote:

Hi,

I also wrote a simple script which calculates the data loss probabilities
for triple disk failure. Here are some numbers:
OSDs: 10, Pr: 138.89%
OSDs: 20, Pr: 29.24%
OSDs: 30, Pr: 12.32%
OSDs: 40, Pr: 6.75%
OSDs: 50, Pr: 4.25%
OSDs: 100, Pr: 1.03%
OSDs: 200, Pr: 0.25%
OSDs: 500, Pr: 0.04%

Nice, good to have some numbers.

Here i assumed we have 100PGs per OSD. Also there is a constraint for 3
disks not to be in one host because this will not lead to a failure. For
situation where all disks are evenly distributed between 10 hosts it
gives us a correction coefficient of 83% so for 50 OSDs it will be
something like 3.53% instead of 4.25%.

There is a further constraint for 2 disks in one host and 1 disk on
another but that's just adds unneeded complexity. Numbers will not change
significantly.
And actually triple simultaneous failure is itself not very likely to
happen, so i believe that starting from 100 OSDs we can somewhat relax
about data failure.

I mentioned the link below before, I found that to be one of the more
believable RAID failure calculators and they explain their shortcomings
nicely to boot.
I usually half their DLO/year values (double the chance of data loss) to
be on the safe side: https://www.memset.com/tools/raid-calculator/

If you plunk in a 100 disk RAID6 (the equivalent of replica 3) and 2TB per
disk with a recovery rate of 100MB/s the odds are indeed pretty good.
But note the expected disk failure rate of one per 10 days!

Of course the the biggest variable here is how fast your recovery speed
will be. I picked 100MB/s, because for some people that will be as fast as
their network goes. For others their network could be 10-40 times as
fast, but their cluster might not have enough OSDs (or fast enough ones) to
remain usable at those speeds, so they'll opt for for lower priority
recovery speeds.

Christian

BTW, this presentation has more math

http://www.slideshare.net/kioecn/build-an-highperformance-and-highdurable-block-storage-service-based-on-ceph

Regards, Vasily.

On Wed, Jun 10, 2015 at 12:38 PM, Dan van der Ster d...@vanderster.com
wrote:

OK I wrote a quick script to simulate triple failures and count how
many would have caused data loss. The script gets your list of OSDs
and PGs, then simulates failures and checks if any permutation of that
failure matches a PG.

Here's an example with 1 simulations on our production cluster:

# ./simulate-failures.py
We have 1232 OSDs and 21056 PGs, hence 21056 combinations e.g. like
this: (945, 910, 399)
Simulating 1 failures
Simulated 1000 triple failures. Data loss incidents = 0
Data loss incident with failure (676, 451, 931)
Simulated 2000 triple failures. Data loss incidents = 1
Simulated 3000 triple failures. Data loss incidents = 1
Simulated 4000 triple failures. Data loss incidents = 1
Simulated 5000 triple failures. Data loss incidents = 1
Simulated 6000 triple failures. Data loss incidents = 1
Simulated 7000 triple failures. Data loss incidents = 1
Simulated 8000 triple failures. Data loss incidents = 1
Data loss incident with failure (1031, 1034, 806)
Data loss incident with failure (449, 644, 329)
Simulated 9000 triple failures. Data loss incidents = 3
Simulated 1 triple failures. Data loss incidents = 3

End of simulation: Out of 1 triple failures, 3 caused a data loss
incident

The script is here:

https://github.com/cernceph/ceph-scripts/blob/master/tools/durability/simulate-failures.py
Give it a try (on your test clusters!)

Cheers, Dan

On Wed, Jun 10, 2015 at 10:47 AM, Jan Schermer j...@schermer.cz
wrote:
Yeah, I know but I believe it was fixed so that a single copy is
sufficient for recovery now (even with min_size=1)? Depends on what you
want to achieve...

The point is that even if we lost “just” 1% of data, that’s too much
(0%) when talking about customer

[ceph-users] S3 expiration

2015-06-11 Thread Arkadi Kizner

Hello,
I need to store expirable objects in Ceph for housekeeping purposes.  I 
understand that developers are planning to implement it using Amazon S3 API. 
Does anybody know what is the status of this, or is there another approach for 
housekeeping available?

Thanks.
This email and any files transmitted with it are confidential material. They 
are intended solely for the use of the designated individual or entity to whom 
they are addressed. If the reader of this message is not the intended 
recipient, you are hereby notified that any dissemination, use, distribution or 
copying of this communication is strictly prohibited and may be unlawful.

If you have received this email in error please immediately notify the sender 
and delete or destroy any copy of this message
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mount error

2015-06-11 Thread Drunkard Zhang

2015-06-12 2:00 GMT+08:00 Lincoln Bryant linco...@uchicago.edu:
 Hi,

 Are you using cephx? If so, does your client have the appropriate key on it?
 It looks like you have an mds set up and running from your screenshot.

 Try mounting it like so:

 mount -t ceph -o name=admin,secret=[your secret] 192.168.1.105:6789:/
 /mnt/mycephfs

This should be the solution,  you can get error detail from kernel
log, by dmesg.

 --Lincoln

 On Jun 7, 2015, at 10:14 AM, 张忠波 wrote:

 Hi ,
 My ceph health is OK ,  And now , I want to  build  a  Filesystem , refer to
 the CEPH FS QUICK START guide .
 http://ceph.com/docs/master/start/quick-cephfs/
 however , I got a error when i use the command ,  mount -t ceph
 192.168.1.105:6789:/ /mnt/mycephfs .  error :   mount error 22 = Invalid
 argument
 I refer to munual , and now , I don't know how to solve it .
 I am looking forward to your reply !

 截图1.png


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] v0.94.2 Hammer released

2015-06-11 Thread Francois Lafont

Hi,

On 11/06/2015 19:34, Sage Weil wrote:

 Bug #11442 introduced a change that made rgw objects that start with 
 underscore incompatible with previous versions. The fix to that bug 
 reverts to the previous behavior. In order to be able to access objects 
 that start with an underscore and were created in prior Hammer releases, 
 following the upgrade it is required to run (for each affected bucket)::
 
 $ radosgw-admin bucket check --check-head-obj-locator \
  --bucket=bucket [--fix]
 
 You can get a list of buckets with
 
 $ radosgw-admin bucket list

After the upgrade of my radosgw, I can't fix the problem of rgw objects
that start with underscore. The command with the --fix option displays
some errors which I don't understand. Here is a (troncated) paste of my
shell below. Have I done something wrong?

Thx in advance for the help.
François Lafont

--
~# radosgw-admin --id=radosgw.gw2 bucket check --check-head-obj-locator 
--bucket=$bucket
{
bucket: moodles-poc-registry,
check_objects: [
{
key: {
name: 
_multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
instance: 
},
oid: 
default.763616.1___multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
locator: 
default.763616.1__multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
needs_fixing: true,
status: needs_fixing
},

[snip]

{
key: {
name: 
_multipart_registry\/images\/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b\/layer.2~TSdIpafsfGXJ7kKMOVqJ-hn8Aog4ETF.meta,
instance: 
},
oid: 
default.763616.1___multipart_registry\/images\/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b\/layer.2~TSdIpafsfGXJ7kKMOVqJ-hn8Aog4ETF.meta,
locator: 
default.763616.1__multipart_registry\/images\/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b\/layer.2~TSdIpafsfGXJ7kKMOVqJ-hn8Aog4ETF.meta,
needs_fixing: true,
status: needs_fixing
}

]
}

~# radosgw-admin --id=radosgw.gw2 bucket check --check-head-obj-locator 
--bucket=$bucket --fix
2015-06-12 03:01:33.197984 7f3c9130d840 -1 ERROR: 
ioctx.operate(oid=default.763616.1___multipart_registry/images/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta)
 returned ret=-2
ERROR: fix_head_object_locator() returned ret=-2
2015-06-12 03:01:33.200428 7f3c9130d840 -1 ERROR: 
ioctx.operate(oid=default.763616.1___multipart_registry/images/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909/layer.2~poMH-PQKCLstUWpMQpji7JuGaBT53Th.meta)
 returned ret=-2
ERROR: fix_head_object_locator() returned ret=-2
ERROR: fix_head_object_locator() returned ret=-2
2015-06-12 03:01:33.206875 7f3c9130d840 -1 ERROR: 
ioctx.operate(oid=default.763616.1___multipart_registry/images/c5a7fc74211188aabf3429539674275645b07717d003c390a943acc44f35c6d0/layer.2~Bg6bkbSOE8GCtV4Mxr0t56vSfTQTCx9.1)
 returned ret=-2
2015-06-12 03:01:33.209293 7f3c9130d840 -1 ERROR: 
ioctx.operate(oid=default.763616.1___multipart_registry/images/c5a7fc74211188aabf3429539674275645b07717d003c390a943acc44f35c6d0/layer.2~Bg6bkbSOE8GCtV4Mxr0t56vSfTQTCx9.2)
 returned ret=-2
ERROR: fix_head_object_locator() returned ret=-2
ERROR: fix_head_object_locator() returned ret=-2

[snip]

2015-06-12 03:01:33.301101 7f3c9130d840 -1 ERROR: 
ioctx.operate(oid=default.763616.1___multipart_registry/images/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b/layer.2~TSdIpafsfGXJ7kKMOVqJ-hn8Aog4ETF.meta)
 returned ret=-2
{
bucket: moodles-poc-registry,
check_objects: [
{
key: {
name: 
_multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
instance: 
},
oid: 
default.763616.1___multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
locator: 
default.763616.1__multipart_registry\/images\/1483a2ea4c3f5865d4d583fb484bbe11afe709a6f3d1baef102904d4d9127909\/layer.2~QorD8QaGiDc4HPUP7VVpx4LS-e_7f0u.meta,
needs_fixing: true,
status: needs_fixing
},

[snip]

{
key: {
name: 
_multipart_registry\/images\/fa4fd76b09ce9b87bfdc96515f9a5dd5121c01cc996cf5379050d8e13d4a864b\/layer.2~TSdIpafsfGXJ7kKMOVqJ-hn8Aog4ETF.meta,
instance: 
},
oid:

[ceph-users] [Fwd: adding a a monitor wil result in cephx: verify_reply couldn't decrypt with error: error decoding block for decryption]

2015-06-11 Thread Makkelie, R (ITCDCC) - KLM

i'm trying to add a extra monitor to my already existing cluster
i do this with the ceph-deploy with the following command

ceph-deploy mon add mynewhost

the ceph-deploy says its all finished
but when i take a look at my new monitor host in the logs i see the following 
error

cephx: verify_reply couldn't decrypt with error: error decoding block for 
decryption

and when i take a look in my existing monitor logs i see this error
cephx: verify_authorizer could not decrypt ticket info: error: NSS AES final 
round failed: -8190

i tried gatherking key's
copy keys
reinstall/purge the new monitor node

greetz
Ramon

For information, services and offers, please visit our web site: 
http://www.klm.com. This e-mail and any attachment may contain confidential and 
privileged material intended for the addressee only. If you are not the 
addressee, you are notified that no part of the e-mail or any attachment may be 
disclosed, copied or distributed, and that any other action related to this 
e-mail or attachment is strictly prohibited, and may be unlawful. If you have 
received this e-mail by error, please notify the sender immediately by return 
e-mail, and delete this message.

Koninklijke Luchtvaart Maatschappij NV (KLM), its subsidiaries and/or its 
employees shall not be liable for the incorrect or incomplete transmission of 
this e-mail or any attachments, nor responsible for any delay in receipt.
Koninklijke Luchtvaart Maatschappij N.V. (also known as KLM Royal Dutch 
Airlines) is registered in Amstelveen, The Netherlands, with registered number 
33014286

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Antw: Re: clock skew detected

2015-06-11 Thread Steffen Weißgerber



 Andrey Korolyov and...@xdel.ru schrieb am Mittwoch, 10. Juni 2015 um 
 15:29:
 On Wed, Jun 10, 2015 at 4:11 PM, Pavel V. Kaygorodov pa...@inasan.ru wrote:

Hi,

for us a restart of the monitor solved this.

Regards

Steffen

 Hi!

 Immediately after a reboot of mon.3 host its clock was unsynchronized and 
 clock skew detected on mon.3 warning is appeared.
 But now (more then 1 hour of uptime) the clock is synced, but the warning 
 still showing.
 Is this ok?
 Or I have to restart monitor after clock synchronization?

 Pavel.

 
 
 The quorum should report OK after a five-minute interval but there is
 a bug which is preventing quorum for doing so at least on oldest
 supported stable versions of Ceph. I`ve never reported it because of
 its almost zero importance, but things are what they are - the
 theoretical behavior should be different and warning should disappear
 without restart.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 


-- 
Klinik-Service Neubrandenburg GmbH
Allendestr. 30, 17036 Neubrandenburg
Amtsgericht Neubrandenburg, HRB 2457
Geschaeftsfuehrerin: Gudrun Kappich
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Load balancing RGW and Scaleout

2015-06-11 Thread David Moreau Simard

What I've seen work well is to set multiple A records for your RGW endpoint.
Then, with something like corosync, you ensure that these multiple IP
addresses are always bound somewhere.

You can then have as many nodes in active-active mode as you want.

-- 
David Moreau Simard

On 2015-06-11 11:29 AM, Florent MONTHEL wrote:
 Hi Team

 Is it possible for you to share your setup on radosgw in order to use maximum 
 of network bandwidth and to have no SPOF

 I have 5 servers on 10gb network and 3 radosgw on it
 We would like to setup Haproxy on 1 node with 3 rgw but :
 - SPOF become Haproxy node
 - Max bandwidth will be on HAproxy node (10gb/s)

 Thanks

 Sent from my iPhone
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Load balancing RGW and Scaleout

2015-06-11 Thread Florent MONTHEL

Hi Team

Is it possible for you to share your setup on radosgw in order to use maximum 
of network bandwidth and to have no SPOF

I have 5 servers on 10gb network and 3 radosgw on it
We would like to setup Haproxy on 1 node with 3 rgw but :
- SPOF become Haproxy node
- Max bandwidth will be on HAproxy node (10gb/s)

Thanks

Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Can't mount btrfs volume on rbd

2015-06-11 Thread Steve Dainard

Hello,

I'm getting an error when attempting to mount a volume on a host that was
forceably powered off:

# mount /dev/rbd4 climate-downscale-CMIP5/
mount: mount /dev/rbd4 on /mnt/climate-downscale-CMIP5 failed: Stale file
handle

/var/log/messages:
Jun 10 15:31:07 node1 kernel: rbd4: unknown partition table

# parted /dev/rbd4 print
Model: Unknown (unknown)
Disk /dev/rbd4: 36.5TB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number  Start  End SizeFile system  Flags
 1  0.00B  36.5TB  36.5TB  btrfs

# btrfs check --repair /dev/rbd4
enabling repair mode
Checking filesystem on /dev/rbd4
UUID: dfe6b0c8-2866-4318-abc2-e1e75c891a5e
checking extents
cmds-check.c:2274: check_owner_ref: Assertion `rec-is_root` failed.
btrfs[0x4175cc]
btrfs[0x41b873]
btrfs[0x41c3fe]
btrfs[0x41dc1d]
btrfs[0x406922]


OS: CentOS 7.1
btrfs-progs: 3.16.2
Ceph: version: 0.94.1/CentOS 7.1

I haven't found any references to 'stale file handle' on btrfs.

The underlying block device is ceph rbd, so I've posted to both lists for
any feedback. Also once I reformatted btrfs I didn't get a mount error.

The btrfs volume has been reformatted so I won't be able to do much post
mortem but I'm wondering if anyone has some insight.

Thanks,
Steve
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

64 matches

Mail list logo