Re: [ceph-users] Typical 10GbE latency

2014-11-12 Thread Alexandre DERUMIER
Is this with a 8192 byte payload?
Oh, sorry it was with 1500.
I'll try to send a report with 8192 tomorrow.

- Mail original - 

De: Robert LeBlanc rob...@leblancnet.us 
À: Alexandre DERUMIER aderum...@odiso.com 
Cc: Wido den Hollander w...@42on.com, ceph-users@lists.ceph.com 
Envoyé: Mardi 11 Novembre 2014 23:13:17 
Objet: Re: [ceph-users] Typical 10GbE latency 


Is this with a 8192 byte payload? Theoretical transfer time of 1 Gbps (you are 
only sending one packet so LACP won't help) one direction is 0.061 ms, double 
that and you are at 0.122 ms of bits in flight, then there is context 
switching, switch latency (store and forward assumed for 1 Gbps), etc which I'm 
not sure would fit in the rest of the 0.057 of you min time. If it is a 8192 
byte payload, then I'm really impressed! 


On Tue, Nov 11, 2014 at 11:56 AM, Alexandre DERUMIER  aderum...@odiso.com  
wrote: 


Don't have yet 10GBE, but here my result my simple lacp on 2 gigabit links with 
a cisco 6500 

rtt min/avg/max/mdev = 0.179/0.202/0.221/0.019 ms 


(Seem to be lower than your 10gbe nexus) 


- Mail original - 

De: Wido den Hollander  w...@42on.com  
À: ceph-users@lists.ceph.com 
Envoyé: Lundi 10 Novembre 2014 17:22:04 
Objet: Re: [ceph-users] Typical 10GbE latency 



On 08-11-14 02:42, Gary M wrote: 
 Wido, 
 
 Take the switch out of the path between nodes and remeasure.. ICMP-echo 
 requests are very low priority traffic for switches and network stacks. 
 

I tried with a direct TwinAx and fiber cable. No difference. 

 If you really want to know, place a network analyzer between the nodes 
 to measure the request packet to response packet latency.. The ICMP 
 traffic to the ping application is not accurate in the sub-millisecond 
 range. And should only be used as a rough estimate. 
 

True, I fully agree with you. But, why is everybody showing a lower 
latency here? My latencies are about 40% higher then what I see in this 
setup and other setups. 

 You also may want to install the high resolution timer patch, sometimes 
 called HRT, to the kernel which may give you different results. 
 
 ICMP traffic takes a different path than the TCP traffic and should not 
 be considered an indicator of defect. 
 

Yes, I'm aware. But it still doesn't explain me why the latency on other 
systems, which are in production, is lower then on this idle system. 

 I believe the ping app calls the sendto system call.(sorry its been a 
 while since I last looked) Systems calls can take between .1us and .2us 
 each. However, the ping application makes several of these calls and 
 waits for a signal from the kernel. The wait for a signal means the ping 
 application must wait to be rescheduled to report the time.Rescheduling 
 will depend on a lot of other factors in the os. eg, timers, card 
 interrupts other tasks with higher priorities. Reporting the time must 
 add a few more systems calls for this to happen. As the ping application 
 loops to post the next ping request which again requires a few systems 
 calls which may cause a task switch while in each system call. 
 
 For the above factors, the ping application is not a good representation 
 of network performance due to factors in the application and network 
 traffic shaping performed at the switch and the tcp stacks. 
 

I think that netperf is probably a better tool, but that also does TCP 
latencies. 

I want the real IP latency, so I assumed that ICMP would be the most 
simple one. 

The other setups I have access to are in production and do not have any 
special tuning, yet their latency is still lower then on this new 
deployment. 

That's what gets me confused. 

Wido 

 cheers, 
 gary 
 
 
 On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło 
  jagiello.luk...@gmail.com mailto: jagiello.luk...@gmail.com  wrote: 
 
 Hi, 
 
 rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms 
 
 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit 
 SFI/SFP+ Network Connection (rev 01) 
 
 at both hosts and Arista 7050S-64 between. 
 
 Both hosts were part of active ceph cluster. 
 
 
 On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander  w...@42on.com 
 mailto: w...@42on.com  wrote: 
 
 Hello, 
 
 While working at a customer I've ran into a 10GbE latency which 
 seems 
 high to me. 
 
 I have access to a couple of Ceph cluster and I ran a simple 
 ping test: 
 
 $ ping -s 8192 -c 100 -n ip 
 
 Two results I got: 
 
 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms 
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms 
 
 Both these environment are running with Intel 82599ES 10Gbit 
 cards in 
 LACP. One with Extreme Networks switches, the other with Arista. 
 
 Now, on a environment with Cisco Nexus 3000 and Nexus 7000 
 switches I'm 
 seeing: 
 
 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms 
 
 As you can see, the Cisco Nexus network has high latency 
 compared to the 
 other setup. 
 
 You would say the switches are to blame, but we also tried with 
 a direct 
 TwinAx 

[ceph-users] Help regarding Installi​ng ceph on a single machine with cephdeploy on ubuntu 14.04 64 bit

2014-11-12 Thread tej ak
Hi,

I am a starter to ceph and deperately trying to figure out how to install
and deploy ceph on a single machine with ceph deploy. I have ubuntu 14.04 -
64 bit installed in a virtual machine (on windows 8.1 through VMware
player)  and have installed devstack on ubuntu. I am trying to install ceph
on the same machine (Ubuntu) and interface with openstack. I have tried the
following steps but it says that mkcephfs does not exist and I read that it
is deprecated and ceph - deploy is there. But documentation talks about
multiple nodes. I am lost as to how to use ceph deploy and install and
setup ceph on a single machine. Pl guide me. I tried the following steps
earlier which was given for mkcephfs.

( reference http://eu.ceph.com/docs/wip-6919/start/quick-start/ sudo
apt-get update  sudo apt-get install ceph (2) Execute hostname -s on the
command line to retrieve the name of your host. Then, replace {hostname} in
the sample configuration file with your host name. Execute ifconfig on the
command line to retrieve the IP address of your host. Then, replace
{ip-address} with the IP address of your host. Finally, copy the contents
of the modified configuration file and save it to /etc/ceph/ceph.conf. This
file will configure Ceph to operate a monitor, two OSD daemons and one
metadata server on your local machin
[osd] osd journal size = 1000 filestore xattr use omap = true
# Execute $ hostname to retrieve the name of your host,
# and replace {hostname} with the name of your host.
# For the monitor, replace {ip-address} with the IP
# address of your host.
[mon.a]
host = {hostname}
mon addr = {ip-address}:6789
[osd.0] host = {hostname}
[osd.1] host = {hostname}
[mds.a] host = {hostname}
sudo mkdir /var/lib/ceph/osd/ceph-0 sudo mkdir /var/lib/ceph/osd/ceph-1
sudo mkdir /var/lib/ceph/mon/ceph-a sudo mkdir /var/lib/ceph/mds/ceph-a
cd /etc/ceph sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring
sudo service ceph start ceph health

Regards,
Bobby
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds isn't working anymore after osd's running full

2014-11-12 Thread Jasper Siero
Hello Greg,

The specific PG was always deep scrubbing (ceph pg dump all showed the last 
deep scrub of this PG was in august) but now when I look at it again the deep 
scrub is finished en everything is healthy. Maybe it is solved because the mds 
is running fine now and it unlocked something.

The problem is solved now :)

Thanks!

Jasper

Van: Gregory Farnum [g...@gregs42.com]
Verzonden: dinsdag 11 november 2014 19:19
Aan: Jasper Siero
CC: ceph-users
Onderwerp: Re: [ceph-users] mds isn't working anymore after osd's running full

On Tue, Nov 11, 2014 at 5:06 AM, Jasper Siero
jasper.si...@target-holding.nl wrote:
 No problem thanks for helping.
 I don't want to disable the deep scrubbing process itself because its very 
 useful but one placement group (3.30) is continuously deep scrubbing and it 
 should finish after some time but it won't.

Hmm, how are you determining that this one PG won't stop scrubbing?
This doesn't sound like any issues familiar to me.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Typical 10GbE latency

2014-11-12 Thread Wido den Hollander
(back to list)

On 11/10/2014 06:57 PM, Gary M wrote:
 Hi Wido,
 
 That is a bit weird.. I'd also check the Ethernet controller firmware
 version and settings between the other configurations. There must be
 something different.
 

Indeed, there must be something! But I can't figure it out yet. Same
controllers, tried the same OS, direct cables, but the latency is 40%
higher.

 I can understand wanting to do a simple latency test.. But as we get closer
 to hw speeds and microsecond measurements, measures appear to be more
 unstable through software stacks.
 

I fully agree with you. But a basic ICMP test on a idle machine should
be a baseline from where you can start with further diagnosing network
latency using better tools like netperf.

Wido

 
 
 -gary
 
 On Mon, Nov 10, 2014 at 9:22 AM, Wido den Hollander w...@42on.com wrote:
 
 On 08-11-14 02:42, Gary M wrote:
 Wido,

 Take the switch out of the path between nodes and remeasure.. ICMP-echo
 requests are very low priority traffic for switches and network stacks.


 I tried with a direct TwinAx and fiber cable. No difference.

 If you really want to know, place a network analyzer between the nodes
 to measure the request packet to response packet latency.. The ICMP
 traffic to the ping application is not accurate in the sub-millisecond
 range. And should only be used as a rough estimate.


 True, I fully agree with you. But, why is everybody showing a lower
 latency here? My latencies are about 40% higher then what I see in this
 setup and other setups.

 You also may want to install the high resolution timer patch, sometimes
 called HRT, to the kernel which may give you different results.

 ICMP traffic takes a different path than the TCP traffic and should not
 be considered an indicator of defect.


 Yes, I'm aware. But it still doesn't explain me why the latency on other
 systems, which are in production, is lower then on this idle system.

 I believe the ping app calls the sendto system call.(sorry its been a
 while since I last looked)  Systems calls can take between .1us and .2us
 each. However, the ping application makes several of these calls and
 waits for a signal from the kernel. The wait for a signal means the ping
 application must wait to be rescheduled to report the time.Rescheduling
 will depend on a lot of other factors in the os. eg, timers, card
 interrupts other tasks with higher priorities.  Reporting the time must
 add a few more systems calls for this to happen. As the ping application
 loops to post the next ping request which again requires a few systems
 calls which may cause a task switch while in each system call.

 For the above factors, the ping application is not a good representation
 of network performance due to factors in the application and network
 traffic shaping performed at the switch and the tcp stacks.


 I think that netperf is probably a better tool, but that also does TCP
 latencies.

 I want the real IP latency, so I assumed that ICMP would be the most
 simple one.

 The other setups I have access to are in production and do not have any
 special tuning, yet their latency is still lower then on this new
 deployment.

 That's what gets me confused.

 Wido

 cheers,
 gary


 On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło
 jagiello.luk...@gmail.com mailto:jagiello.luk...@gmail.com wrote:

 Hi,

 rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms

 04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit
 SFI/SFP+ Network Connection (rev 01)

 at both hosts and Arista 7050S-64 between.

 Both hosts were part of active ceph cluster.


 On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander w...@42on.com
 mailto:w...@42on.com wrote:

 Hello,

 While working at a customer I've ran into a 10GbE latency which
 seems
 high to me.

 I have access to a couple of Ceph cluster and I ran a simple
 ping test:

 $ ping -s 8192 -c 100 -n ip

 Two results I got:

 rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
 rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

 Both these environment are running with Intel 82599ES 10Gbit
 cards in
 LACP. One with Extreme Networks switches, the other with Arista.

 Now, on a environment with Cisco Nexus 3000 and Nexus 7000
 switches I'm
 seeing:

 rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

 As you can see, the Cisco Nexus network has high latency
 compared to the
 other setup.

 You would say the switches are to blame, but we also tried with
 a direct
 TwinAx connection, but that didn't help.

 This setup also uses the Intel 82599ES cards, so the cards don't
 seem to
 be the problem.

 The MTU is set to 9000 on all these networks and cards.

 I was wondering, others with a Ceph cluster running on 10GbE,
 could you
  

[ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread SCHAER Frederic
Hi,

I'm used to RAID software giving me the failing disks  slots, and most often 
blinking the disks on the disk bays.
I recently installed a  DELL 6GB HBA SAS JBOD card, said to be an LSI 2008 
one, and I now have to identify 3 pre-failed disks (so says S.M.A.R.T) .

Since this is an LSI, I thought I'd use MegaCli to identify the disks slot, but 
MegaCli does not see the HBA card.
Then I found the LSI sas2ircu utility, but again, this one fails at giving me 
the disk slots (it finds the disks, serials and others, but slot is always 0)
Because of this, I'm going to head over to the disk bay and unplug the disk 
which I think corresponds to the alphabetical order in linux, and see if it's 
the correct one But even if this is correct this time, it might not be next 
time.

But this makes me wonder : how do you guys, Ceph users, manage your disks if 
you really have JBOD servers ?
I can't imagine having to guess slots that each time, and I can't imagine 
neither creating serial number stickers for every single disk I could have to 
manage ...
Is there any specific advice reguarding JBOD cards people should (not) use in 
their systems ?
Any magical way to blink a drive in linux ?

Thanks  regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] The strategy of auto-restarting crashed OSD

2014-11-12 Thread David Z
Hi Guys,

We are experiencing some OSD crashing issues recently, like messenger crash, 
some strange crash (still being investigating), etc. Those crashes seems not to 
reproduce after restarting OSD.

So we are thinking about the strategy of auto-restarting crashed OSD for 1 or 2 
times, then leave it as down if restarting doesn't work. This strategy might 
help us on pg peering and recovering impact to online traffic to some extent, 
since we won't mark OSD out automatically even if it is down unless we are sure 
it is disk failure.

However, we are also aware that this strategy may bring us some problems. Since 
your guys have more experience on CEPH, so we would like to hear some 
suggestions from you.

Thanks.

David Zhang  
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.87 Giant released

2014-11-12 Thread debian Only
Dear expert

could you help to provide some guide upgrade Ceph from firefly to giant ?

many thanks !

2014-10-30 15:37 GMT+07:00 Joao Eduardo Luis joao.l...@inktank.com:

 On 10/30/2014 05:54 AM, Sage Weil wrote:

 On Thu, 30 Oct 2014, Nigel Williams wrote:

 On 30/10/2014 8:56 AM, Sage Weil wrote:

 * *Degraded vs misplaced*: the Ceph health reports from 'ceph -s' and
 related commands now make a distinction between data that is
 degraded (there are fewer than the desired number of copies) and
 data that is misplaced (stored in the wrong location in the
 cluster).


 Is someone able to briefly described how/why misplaced happens please,
 is it
 repaired eventually? I've not seen misplaced (yet).


 Sure.  An easy way to get misplaced objects is to do 'ceph osd
 out N' on an OSD.  Nothing is down, we still have as many copies
 as we had before, but Ceph now wants to move them somewhere
 else. Starting with giant, you will see the misplaced % in 'ceph -s' and
 not degraded.

leveldb_write_buffer_size = 32*1024*1024  = 33554432  // 32MB
   leveldb_cache_size= 512*1024*1204 = 536870912 // 512MB


 I noticed the typo, wondered about the code, but I'm not seeing the same
 values anyway?

 https://github.com/ceph/ceph/blob/giant/src/common/config_opts.h

 OPTION(leveldb_write_buffer_size, OPT_U64, 8 *1024*1024) // leveldb
 write
 buffer size
 OPTION(leveldb_cache_size, OPT_U64, 128 *1024*1024) // leveldb cache size


 Hmm!  Not sure where that 32MB number came from.  I'll fix it, thanks!


 Those just happen to be the values used on the monitors (in ceph_mon.cc).
 Maybe that's where the mix up came from. :)

   -Joao


 --
 Joao Eduardo Luis
 Software Engineer | http://inktank.com | http://ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Pieter Koorts
Hi,

A while back on a blog I saw mentioned that Ceph should not be run on
compute nodes and in the general sense should be on dedicated hardware.
Does this really still apply?

An example, if you have nodes comprised of

16+ cores
256GB+ RAM
Dual 10GBE Network
2+8 OSD (SSD log + HDD store)

I understand that Ceph can use a lot of IO and CPU in some cases but if the
nodes are powerful enough does it not make it an option to run compute and
storage on the same hardware to either increase density of compute or save
money on additional hardware?

What are the reasons for not running Ceph on the Compute nodes.

Thanks

Pieter
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stackforge Puppet Module

2014-11-12 Thread Nick Fisk
Hi David,

Many thanks for your reply.

I must admit I have only just started looking at puppet, but a lot of what
you said makes sense to me and understand the reason for not having the
module auto discover disks.

I'm currently having a problem with the ceph::repo class when trying to push
this out to a test server:-

Error: Could not retrieve catalog from remote server: Error 400 on SERVER:
Could not find class ceph::repo for ceph-puppet-test on node
ceph-puppet-test
Warning: Not using cache on failed catalog
Error: Could not retrieve catalog; skipping run

I'm a bit stuck but will hopefully work out why it's not working soon and
then I can attempt your idea of using a script to dynamically pass disks to
the puppet module.

Thanks,
Nick


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
David Moreau Simard
Sent: 11 November 2014 12:05
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stackforge Puppet Module

Hi Nick,

The great thing about puppet-ceph's implementation on Stackforge is that it
is both unit and integration tested.
You can see the integration tests here:
https://github.com/ceph/puppet-ceph/tree/master/spec/system

Where I'm getting at is that the tests allow you to see how you can use the
module to a certain extent.
For example, in the OSD integration tests:
-
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb
#L24 and then:
-
https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb
#L82-L110

There's no auto discovery mechanism built-in the module right now. It's kind
of dangerous, you don't want to format the wrong disks.

Now, this doesn't mean you can't discover the disks yourself and pass them
to the module from your site.pp or from a composition layer.
Here's something I have for my CI environment that uses the $::blockdevices
fact to discover all devices, split that fact into a list of the devices and
then reject the drives I don't want (such as the OS disk):

# Assume OS is installed on xvda/sda/vda.
# On an Openstack VM, vdb is ephemeral, we don't want to use vdc.
# WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH!
$block_devices = reject(split($::blockdevices, ','),
'(xvda|sda|vda|vdc|sr0)')
$devices = prefix($block_devices, '/dev/')

And then you can pass $devices to the module.

Let me know if you have any questions !
--
David Moreau Simard

 On Nov 11, 2014, at 6:23 AM, Nick Fisk n...@fisk.me.uk wrote:
 
 Hi,
 
 I'm just looking through the different methods of deploying Ceph and I 
 particularly liked the idea that the stackforge puppet module 
 advertises of using discover to automatically add new disks. I 
 understand the principle of how it should work; using ceph-disk list 
 to find unknown disks, but I would like to see in a little more detail on
how it's been implemented.
 
 I've been looking through the puppet module on Github, but I can't see 
 anyway where this discovery is carried out.
 
 Could anyone confirm if this puppet modules does currently support the 
 auto discovery and where  in the code its carried out?
 
 Many Thanks,
 Nick
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Mark Nelson
Technically there's no reason it shouldn't work, but it does complicate 
things.  Probably the biggest worry would be that if something bad 
happens on the compute side (say it goes nuts with network or memory 
transfers) it could slow things down enough that OSDs start failing 
heartbeat checks causing ceph to go into recovery and maybe cause a 
vicious cycle of nastiness.


You can mitigate some of this with cgroups and try to dedicate specific 
sockets and memory banks to Ceph/Compute, but we haven't done a lot of 
testing yet afaik.


Mark

On 11/12/2014 07:45 AM, Pieter Koorts wrote:

Hi,

A while back on a blog I saw mentioned that Ceph should not be run on
compute nodes and in the general sense should be on dedicated hardware.
Does this really still apply?

An example, if you have nodes comprised of

16+ cores
256GB+ RAM
Dual 10GBE Network
2+8 OSD (SSD log + HDD store)

I understand that Ceph can use a lot of IO and CPU in some cases but if
the nodes are powerful enough does it not make it an option to run
compute and storage on the same hardware to either increase density of
compute or save money on additional hardware?

What are the reasons for not running Ceph on the Compute nodes.

Thanks

Pieter


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stackforge Puppet Module

2014-11-12 Thread David Moreau Simard
What comes to mind is that you need to make sure that you've cloned the git 
repository to /etc/puppet/modules/ceph and not /etc/puppet/modules/puppet-ceph.

Feel free to hop on IRC to discuss about puppet-ceph on freenode in 
#puppet-openstack.
You can find me there as dmsimard.

--
David Moreau Simard

 On Nov 12, 2014, at 8:58 AM, Nick Fisk n...@fisk.me.uk wrote:
 
 Hi David,
 
 Many thanks for your reply.
 
 I must admit I have only just started looking at puppet, but a lot of what
 you said makes sense to me and understand the reason for not having the
 module auto discover disks.
 
 I'm currently having a problem with the ceph::repo class when trying to push
 this out to a test server:-
 
 Error: Could not retrieve catalog from remote server: Error 400 on SERVER:
 Could not find class ceph::repo for ceph-puppet-test on node
 ceph-puppet-test
 Warning: Not using cache on failed catalog
 Error: Could not retrieve catalog; skipping run
 
 I'm a bit stuck but will hopefully work out why it's not working soon and
 then I can attempt your idea of using a script to dynamically pass disks to
 the puppet module.
 
 Thanks,
 Nick
 
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 David Moreau Simard
 Sent: 11 November 2014 12:05
 To: Nick Fisk
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] Stackforge Puppet Module
 
 Hi Nick,
 
 The great thing about puppet-ceph's implementation on Stackforge is that it
 is both unit and integration tested.
 You can see the integration tests here:
 https://github.com/ceph/puppet-ceph/tree/master/spec/system
 
 Where I'm getting at is that the tests allow you to see how you can use the
 module to a certain extent.
 For example, in the OSD integration tests:
 -
 https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb
 #L24 and then:
 -
 https://github.com/ceph/puppet-ceph/blob/master/spec/system/ceph_osd_spec.rb
 #L82-L110
 
 There's no auto discovery mechanism built-in the module right now. It's kind
 of dangerous, you don't want to format the wrong disks.
 
 Now, this doesn't mean you can't discover the disks yourself and pass them
 to the module from your site.pp or from a composition layer.
 Here's something I have for my CI environment that uses the $::blockdevices
 fact to discover all devices, split that fact into a list of the devices and
 then reject the drives I don't want (such as the OS disk):
 
# Assume OS is installed on xvda/sda/vda.
# On an Openstack VM, vdb is ephemeral, we don't want to use vdc.
# WARNING: ALL OTHER DISKS WILL BE FORMATTED/PARTITIONED BY CEPH!
$block_devices = reject(split($::blockdevices, ','),
 '(xvda|sda|vda|vdc|sr0)')
$devices = prefix($block_devices, '/dev/')
 
 And then you can pass $devices to the module.
 
 Let me know if you have any questions !
 --
 David Moreau Simard
 
 On Nov 11, 2014, at 6:23 AM, Nick Fisk n...@fisk.me.uk wrote:
 
 Hi,
 
 I'm just looking through the different methods of deploying Ceph and I 
 particularly liked the idea that the stackforge puppet module 
 advertises of using discover to automatically add new disks. I 
 understand the principle of how it should work; using ceph-disk list 
 to find unknown disks, but I would like to see in a little more detail on
 how it's been implemented.
 
 I've been looking through the puppet module on Github, but I can't see 
 anyway where this discovery is carried out.
 
 Could anyone confirm if this puppet modules does currently support the 
 auto discovery and where  in the code its carried out?
 
 Many Thanks,
 Nick
 
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
 
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Haomai Wang
Actually, our production cluster(up to ten) all are that ceph-osd ran
on compute-node(KVM).

The primary action is that you need to constrain the cpu and memory.
For example, you can alloc a ceph cpu-set and memory group, let
ceph-osd run with it within limited cores and memory.

The another risk is the network. Because compute-node and ceph-osd
shared the same kernel network stack, it exists some risks that VM may
ran out of network resources such as conntracker in netfilter
framework.

On Wed, Nov 12, 2014 at 10:23 PM, Mark Nelson mark.nel...@inktank.com wrote:
 Technically there's no reason it shouldn't work, but it does complicate
 things.  Probably the biggest worry would be that if something bad happens
 on the compute side (say it goes nuts with network or memory transfers) it
 could slow things down enough that OSDs start failing heartbeat checks
 causing ceph to go into recovery and maybe cause a vicious cycle of
 nastiness.

 You can mitigate some of this with cgroups and try to dedicate specific
 sockets and memory banks to Ceph/Compute, but we haven't done a lot of
 testing yet afaik.

 Mark


 On 11/12/2014 07:45 AM, Pieter Koorts wrote:

 Hi,

 A while back on a blog I saw mentioned that Ceph should not be run on
 compute nodes and in the general sense should be on dedicated hardware.
 Does this really still apply?

 An example, if you have nodes comprised of

 16+ cores
 256GB+ RAM
 Dual 10GBE Network
 2+8 OSD (SSD log + HDD store)

 I understand that Ceph can use a lot of IO and CPU in some cases but if
 the nodes are powerful enough does it not make it an option to run
 compute and storage on the same hardware to either increase density of
 compute or save money on additional hardware?

 What are the reasons for not running Ceph on the Compute nodes.

 Thanks

 Pieter


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Best Regards,

Wheat
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Andrey Korolyov
On Wed, Nov 12, 2014 at 5:30 PM, Haomai Wang haomaiw...@gmail.com wrote:
 Actually, our production cluster(up to ten) all are that ceph-osd ran
 on compute-node(KVM).

 The primary action is that you need to constrain the cpu and memory.
 For example, you can alloc a ceph cpu-set and memory group, let
 ceph-osd run with it within limited cores and memory.

 The another risk is the network. Because compute-node and ceph-osd
 shared the same kernel network stack, it exists some risks that VM may
 ran out of network resources such as conntracker in netfilter
 framework.

 On Wed, Nov 12, 2014 at 10:23 PM, Mark Nelson mark.nel...@inktank.com wrote:
 Technically there's no reason it shouldn't work, but it does complicate
 things.  Probably the biggest worry would be that if something bad happens
 on the compute side (say it goes nuts with network or memory transfers) it
 could slow things down enough that OSDs start failing heartbeat checks
 causing ceph to go into recovery and maybe cause a vicious cycle of
 nastiness.

 You can mitigate some of this with cgroups and try to dedicate specific
 sockets and memory banks to Ceph/Compute, but we haven't done a lot of
 testing yet afaik.

 Mark


 On 11/12/2014 07:45 AM, Pieter Koorts wrote:

 Hi,

 A while back on a blog I saw mentioned that Ceph should not be run on
 compute nodes and in the general sense should be on dedicated hardware.
 Does this really still apply?

 An example, if you have nodes comprised of

 16+ cores
 256GB+ RAM
 Dual 10GBE Network
 2+8 OSD (SSD log + HDD store)

 I understand that Ceph can use a lot of IO and CPU in some cases but if
 the nodes are powerful enough does it not make it an option to run
 compute and storage on the same hardware to either increase density of
 compute or save money on additional hardware?

 What are the reasons for not running Ceph on the Compute nodes.

 Thanks

 Pieter


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



 --
 Best Regards,

 Wheat
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Yes, the essential part is a resource management, which can neither be
dynamic or static. In Flops we had implemented dynamic resource
control which allows to pack VMs and OSDs more densely than static
cg-based jails can allow (and it requires deep orchestration
modifications for every open source cloud orchestrator,
unfortunately). As long as you are able to manage strong traffic
isolation for storage and vm segment, there are absolutely no problem
(it can be static limits from linux-qos or tricky flow management for
OpenFlow, depends on what your orchestration allows). The possibility
of putting together compute and storage roles without significant
impact to performance characteristics was one of key features which
led our selection to Ceph as a storage backend three years ago.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread Robert van Leeuwen
 A while back on a blog I saw mentioned that Ceph should not be run on compute 
 nodes and in the general
 sense should be on dedicated hardware. Does this really still apply?

In my opinion storage needs to be rock-solid.
Running other (complex) software on a Ceph node increases the chances of stuff 
falling over.
Worst-case a cascading effect takes down you whole storage platform.
If your storage platform bites the dust your whole compute cloud also falls 
over (assuming you boot instances from Ceph).

Troubleshooting issues (especially those that have no obvious cause)  becomes 
more complex having to rule out more potential causes.

Not saying it can not work perfectly fine.
I'd rather just not take any chances with the storage system...

Cheers,
Robert van Leeuwen
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph and Compute on same hardware?

2014-11-12 Thread gaoxingxing

I think you may also consider risk like kernel crashes etc,since storage and 
compute node are sharing the same box.
Date: Wed, 12 Nov 2014 14:51:47 +
From: pieter.koo...@me.com
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Ceph and Compute on same hardware?

Hi,
Thanks for the replies. Likely will not choose this method but wanted to make 
sure that it was a good technical reason rather than just a best practice. I 
did not quite think of conntracker at the time so this is a good one to 
consider.
Thanks
Pieter
On 12 November 2014 14:30, Haomai Wang haomaiw...@gmail.com wrote:
Actually, our production cluster(up to ten) all are that ceph-osd ran

on compute-node(KVM).



The primary action is that you need to constrain the cpu and memory.

For example, you can alloc a ceph cpu-set and memory group, let

ceph-osd run with it within limited cores and memory.



The another risk is the network. Because compute-node and ceph-osd

shared the same kernel network stack, it exists some risks that VM may

ran out of network resources such as conntracker in netfilter

framework.



On Wed, Nov 12, 2014 at 10:23 PM, Mark Nelson mark.nel...@inktank.com wrote:

 Technically there's no reason it shouldn't work, but it does complicate

 things.  Probably the biggest worry would be that if something bad happens

 on the compute side (say it goes nuts with network or memory transfers) it

 could slow things down enough that OSDs start failing heartbeat checks

 causing ceph to go into recovery and maybe cause a vicious cycle of

 nastiness.



 You can mitigate some of this with cgroups and try to dedicate specific

 sockets and memory banks to Ceph/Compute, but we haven't done a lot of

 testing yet afaik.



 Mark





 On 11/12/2014 07:45 AM, Pieter Koorts wrote:



 Hi,



 A while back on a blog I saw mentioned that Ceph should not be run on

 compute nodes and in the general sense should be on dedicated hardware.

 Does this really still apply?



 An example, if you have nodes comprised of



 16+ cores

 256GB+ RAM

 Dual 10GBE Network

 2+8 OSD (SSD log + HDD store)



 I understand that Ceph can use a lot of IO and CPU in some cases but if

 the nodes are powerful enough does it not make it an option to run

 compute and storage on the same hardware to either increase density of

 compute or save money on additional hardware?



 What are the reasons for not running Ceph on the Compute nodes.



 Thanks



 Pieter





 ___

 ceph-users mailing list

 ceph-users@lists.ceph.com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





 ___

 ceph-users mailing list

 ceph-users@lists.ceph.com

 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com







--

Best Regards,



Wheat

___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com  
  ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados -p pool cache-flush-evict-all surprisingly slow

2014-11-12 Thread Martin Millnert
Dear Cephers,

I have a lab setup with 6x dual-socket hosts, 48GB RAM, 2x10Gbps hosts,
each equipped with 2x S3700 100GB SSDs and 4x 500GB HDD, where the HDDs
are mapped in a tree under a 'platter' root tree similar to guidance from
Seb at 
http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/
 ,
and SSDs similarily under an 'ssd' root.  Replication is set to 3.
Journals on tmpfs (simulating NVRAM).

I have put an ssd pool as a cache tier in front of an hdd pool (rbd), and run
fio-rbd against rbd.  In the benchmarks, at bs=32kb, QD=128 from a
single separate client machine, I reached at peak throughput of around
1.2 GB/s.  So there is some capability.  IOPS-wise I see a max of around
15k iops currently.

After having filled the SSD cache tier, I ran rados -p rbd
cache-flush-evict-all - and I was expecting to see the 6 SSD OSDs start
to evict all the cache-tier pg's to the underlying pool, rbd, which maps
to the HDDs.  I would have expected parallellism and high throughput,
but what I now observe is ~80 MB/s on average flush speed.

Which leads me to the question:  Is rados -p pool
cache-flush-evict-all supposed to work in a parallell manner?

Cursory viewing in tcpdump suggests to me that eviction operation is
serial, in which case the performance could make a little bit sense,
since it is basically limited by the write speed of a single hdd.

What should I see?

If it is indeed a serial operation, is this different from the regular
cache tier eviction routines that are triggered by full_ratios, max
objects or max storage volume?

Regards,
Martin


signature.asc
Description: Digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG's incomplete after OSD failure

2014-11-12 Thread Chad Seys
Would love to hear if you discover a way to get zapping incomplete PGs!

Perhaps this is a common enough issue to open an issue?

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Solaris 10 VMs extremely slow in KVM on Ceph RBD Devices

2014-11-12 Thread Christoph Adomeit
Hi,

i installed a Ceph Cluster with 50 OSDs on 4 Hosts and finally I am really 
happy with it.

Linux and Windows VMs run really fast in KVM on the Ceph Storage.

Only my Solaris 10 guests are terribly slow on ceph rbd storage. A solaris on 
Ceph Storage needs 15 Minutes to boot. When I move the Solaris Image to the old 
nexenta nfs storage and start it on the same kvm host it will fly and boot in 
1,5 Minutes.

I have tested ceph firefly and giant and the Problem is with both ceph versions.

The performance problem is not only with booting. The problem continues when 
the server is up. EVerything is terribly slow.

So the only difference here is ceph vs. nexenta nfs storage that causes the big 
performance problems.

The solaris guests have zfs root standard installation.

Does anybody have an idea or a hint what might go on here and what I should try 
to make solaris 10 Guests faster on ceph storage ?

Many Thanks
  Christoph

-- 
Christoph Adomeit
GATWORKS GmbH
Reststrauch 191
41199 Moenchengladbach
Sitz: Moenchengladbach
Amtsgericht Moenchengladbach, HRB 6303
Geschaeftsfuehrer:
Christoph Adomeit, Hans Wilhelm Terstappen

christoph.adom...@gatworks.de Internetloesungen vom Feinsten
Fon. +49 2166 9149-32  Fax. +49 2166 9149-10
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Erik Logtenberg
I have no experience with the DELL SAS controller, but usually the
advantage of using a simple controller (instead of a RAID card) is that
you can use full SMART directly.

$ sudo smartctl -a /dev/sda

=== START OF INFORMATION SECTION ===
Device Model: INTEL SSDSA2BW300G3H
Serial Number:PEPR2381003E300EGN

Personally, I make sure that I know which serial number drive is in
which bay, so I can easily tell which drive I'm talking about.

So you can use SMART both to notice (pre)failing disks -and- to
physically identify them.

The same smartctl command also returns the health status like so:

233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
  -   0

This specific SSD has 99% media lifetime left, so it's in the green. But
it will continue to gradually degrade, and at some time It'll hit a
percentage where I like to replace it. To keep an eye on the speed of
decay, I'm graphing those SMART values in Cacti. That way I can somewhat
predict how long a disk will last, especially SSD's which die very
gradually.

Erik.


On 12-11-14 14:43, JF Le Fillâtre wrote:
 
 Hi,
 
 May or may not work depending on your JBOD and the way it's identified
 and set up by the LSI card and the kernel:
 
 cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier
 
 The weird path and the wildcards are due to the way the sysfs is set up.
 
 That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
 CentOS release 6.5.
 
 Note that you can make your life easier by writing an udev script that
 will create a symlink with a sane identifier for each of your external
 disks. If you match along the lines of
 
 KERNEL==sd*[a-z], KERNELS==end_device-*:*:*
 
 then you'll just have to cat /sys/class/sas_device/${1}/bay_identifier
 in a script (with $1 being the $id of udev after that match, so the
 string end_device-X:Y:Z) to obtain the bay ID.
 
 Thanks,
 JF
 
 
 
 On 12/11/14 14:05, SCHAER Frederic wrote:
 Hi,

  

 I’m used to RAID software giving me the failing disks  slots, and most
 often blinking the disks on the disk bays.

 I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
 2008 one, and I now have to identify 3 pre-failed disks (so says
 S.M.A.R.T) .

  

 Since this is an LSI, I thought I’d use MegaCli to identify the disks
 slot, but MegaCli does not see the HBA card.

 Then I found the LSI “sas2ircu” utility, but again, this one fails at
 giving me the disk slots (it finds the disks, serials and others, but
 slot is always 0)

 Because of this, I’m going to head over to the disk bay and unplug the
 disk which I think corresponds to the alphabetical order in linux, and
 see if it’s the correct one…. But even if this is correct this time, it
 might not be next time.

  

 But this makes me wonder : how do you guys, Ceph users, manage your
 disks if you really have JBOD servers ?

 I can’t imagine having to guess slots that each time, and I can’t
 imagine neither creating serial number stickers for every single disk I
 could have to manage …

 Is there any specific advice reguarding JBOD cards people should (not)
 use in their systems ?

 Any magical way to “blink” a drive in linux ?

  

 Thanks  regards



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jbod + SMART : how to identify failing disks ?

2014-11-12 Thread Scottix
I would say it depends on your system and where drives are connected
to. Some HBA have a cli tool to manage the drives connected like a
raid card would do.
One other method I found is sometimes it will expose the leds for you
http://fabiobaltieri.com/2011/09/21/linux-led-subsystem/ has an
article on the /sys/class/led but not guarantee.

On my laptop I could turn on lights and stuff but our server didn't
have anything. Seems like a feature either linux or smartctrl should
have. I have ran into this problem before but did a couple tricks to
figure it out.

I guess best solution is just to track the drives S/N. Maybe a good
note to have in the doc for a Ceph cluster to be aware of.

On Wed, Nov 12, 2014 at 9:06 AM, Erik Logtenberg e...@logtenberg.eu wrote:
 I have no experience with the DELL SAS controller, but usually the
 advantage of using a simple controller (instead of a RAID card) is that
 you can use full SMART directly.

 $ sudo smartctl -a /dev/sda

 === START OF INFORMATION SECTION ===
 Device Model: INTEL SSDSA2BW300G3H
 Serial Number:PEPR2381003E300EGN

 Personally, I make sure that I know which serial number drive is in
 which bay, so I can easily tell which drive I'm talking about.

 So you can use SMART both to notice (pre)failing disks -and- to
 physically identify them.

 The same smartctl command also returns the health status like so:

 233 Media_Wearout_Indicator 0x0032   099   099   000Old_age   Always
   -   0

 This specific SSD has 99% media lifetime left, so it's in the green. But
 it will continue to gradually degrade, and at some time It'll hit a
 percentage where I like to replace it. To keep an eye on the speed of
 decay, I'm graphing those SMART values in Cacti. That way I can somewhat
 predict how long a disk will last, especially SSD's which die very
 gradually.

 Erik.


 On 12-11-14 14:43, JF Le Fillâtre wrote:

 Hi,

 May or may not work depending on your JBOD and the way it's identified
 and set up by the LSI card and the kernel:

 cat /sys/block/sdX/../../../../sas_device/end_device-*/bay_identifier

 The weird path and the wildcards are due to the way the sysfs is set up.

 That works with a Dell R520, 6GB HBA SAS cards and Dell MD1200s, running
 CentOS release 6.5.

 Note that you can make your life easier by writing an udev script that
 will create a symlink with a sane identifier for each of your external
 disks. If you match along the lines of

 KERNEL==sd*[a-z], KERNELS==end_device-*:*:*

 then you'll just have to cat /sys/class/sas_device/${1}/bay_identifier
 in a script (with $1 being the $id of udev after that match, so the
 string end_device-X:Y:Z) to obtain the bay ID.

 Thanks,
 JF



 On 12/11/14 14:05, SCHAER Frederic wrote:
 Hi,



 I’m used to RAID software giving me the failing disks  slots, and most
 often blinking the disks on the disk bays.

 I recently installed a  DELL “6GB HBA SAS” JBOD card, said to be an LSI
 2008 one, and I now have to identify 3 pre-failed disks (so says
 S.M.A.R.T) .



 Since this is an LSI, I thought I’d use MegaCli to identify the disks
 slot, but MegaCli does not see the HBA card.

 Then I found the LSI “sas2ircu” utility, but again, this one fails at
 giving me the disk slots (it finds the disks, serials and others, but
 slot is always 0)

 Because of this, I’m going to head over to the disk bay and unplug the
 disk which I think corresponds to the alphabetical order in linux, and
 see if it’s the correct one…. But even if this is correct this time, it
 might not be next time.



 But this makes me wonder : how do you guys, Ceph users, manage your
 disks if you really have JBOD servers ?

 I can’t imagine having to guess slots that each time, and I can’t
 imagine neither creating serial number stickers for every single disk I
 could have to manage …

 Is there any specific advice reguarding JBOD cards people should (not)
 use in their systems ?

 Any magical way to “blink” a drive in linux ?



 Thanks  regards



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Follow Me: @Taijutsun
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Triggering shallow scrub on OSD where scrub is already in progress

2014-11-12 Thread Gregory Farnum
Yes, this is expected behavior. You're telling the OSD to scrub every PG it
holds, and it is doing so. The list of PGs to scrub is getting reset each
time, but none of the individual scrubs are getting restarted. (I believe
that if you instruct a PG to scrub while it's already doing so, nothing
happens.)
-Greg
On Tue, Nov 11, 2014 at 9:54 PM Mallikarjun Biradar 
mallikarjuna.bira...@gmail.com wrote:

 Hi Greg,

 I am using 0.86

 refering to osd logs to check scrub behaviour.. Please have look at log
 snippet from osd log

 ##Triggered scrub on osd.10---
 2014-11-12 16:24:21.393135 7f5026f31700  0 log_channel(default) log [INF]
 : 0.4 scrub ok
 2014-11-12 16:24:24.393586 7f5026f31700  0 log_channel(default) log [INF]
 : 0.20 scrub ok
 2014-11-12 16:24:30.393989 7f5026f31700  0 log_channel(default) log [INF]
 : 0.21 scrub ok
 2014-11-12 16:24:33.394764 7f5026f31700  0 log_channel(default) log [INF]
 : 0.23 scrub ok
 2014-11-12 16:24:34.395293 7f5026f31700  0 log_channel(default) log [INF]
 : 0.36 scrub ok
 2014-11-12 16:24:35.941704 7f5026f31700  0 log_channel(default) log [INF]
 : 1.1 scrub ok
 2014-11-12 16:24:39.533780 7f5026f31700  0 log_channel(default) log [INF]
 : 1.d scrub ok
 2014-11-12 16:24:41.811185 7f5026f31700  0 log_channel(default) log [INF]
 : 1.44 scrub ok
 2014-11-12 16:24:54.257384 7f5026f31700  0 log_channel(default) log [INF]
 : 1.5b scrub ok
 2014-11-12 16:25:02.973101 7f5026f31700  0 log_channel(default) log [INF]
 : 1.67 scrub ok
 2014-11-12 16:25:17.597546 7f5026f31700  0 log_channel(default) log [INF]
 : 1.6b scrub ok
 ##Previous scrub is still in progress, triggered scrub on osd.10 again--
 CEPH re-started scrub operation
 20104-11-12 16:25:19.394029 7f5026f31700  0 log_channel(default) log [INF]
 : 0.4 scrub ok
 2014-11-12 16:25:22.402630 7f5026f31700  0 log_channel(default) log [INF]
 : 0.20 scrub ok
 2014-11-12 16:25:24.695565 7f5026f31700  0 log_channel(default) log [INF]
 : 0.21 scrub ok
 2014-11-12 16:25:25.408821 7f5026f31700  0 log_channel(default) log [INF]
 : 0.23 scrub ok
 2014-11-12 16:25:29.467527 7f5026f31700  0 log_channel(default) log [INF]
 : 0.36 scrub ok
 2014-11-12 16:25:32.558838 7f5026f31700  0 log_channel(default) log [INF]
 : 1.1 scrub ok
 2014-11-12 16:25:35.763056 7f5026f31700  0 log_channel(default) log [INF]
 : 1.d scrub ok
 2014-11-12 16:25:38.166853 7f5026f31700  0 log_channel(default) log [INF]
 : 1.44 scrub ok
 2014-11-12 16:25:40.602758 7f5026f31700  0 log_channel(default) log [INF]
 : 1.5b scrub ok
 2014-11-12 16:25:42.169788 7f5026f31700  0 log_channel(default) log [INF]
 : 1.67 scrub ok
 2014-11-12 16:25:45.851419 7f5026f31700  0 log_channel(default) log [INF]
 : 1.6b scrub ok
 2014-11-12 16:25:51.259453 7f5026f31700  0 log_channel(default) log [INF]
 : 1.a8 scrub ok
 2014-11-12 16:25:53.012220 7f5026f31700  0 log_channel(default) log [INF]
 : 1.a9 scrub ok
 2014-11-12 16:25:54.009265 7f5026f31700  0 log_channel(default) log [INF]
 : 1.cb scrub ok
 2014-11-12 16:25:56.516569 7f5026f31700  0 log_channel(default) log [INF]
 : 1.e2 scrub ok


  -Thanks  regards,
 Mallikarjun Biradar

 On Tue, Nov 11, 2014 at 12:18 PM, Gregory Farnum g...@gregs42.com wrote:

 On Sun, Nov 9, 2014 at 9:29 PM, Mallikarjun Biradar
 mallikarjuna.bira...@gmail.com wrote:
  Hi all,
 
  Triggering shallow scrub on OSD where scrub is already in progress,
 restarts
  scrub from beginning on that OSD.
 
 
  Steps:
  Triggered shallow scrub on an OSD (Cluster is running heavy IO)
  While scrub is in progress, triggered shallow scrub again on that OSD.
 
  Observed behavior, is scrub restarted from beginning on that OSD.
 
  Please let me know, whether its expected behaviour?

 What version of Ceph are you seeing this on? How are you identifying
 that scrub is restarting from the beginning? It sounds sort of
 familiar to me, but I thought this was fixed so it was a no-op if you
 issue another scrub. (That's not authoritative though; I might just be
 missing a reason we want to restart it.)
 -Greg



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deep scrub, cache pools, replica 1

2014-11-12 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 2:32 PM, Christian Balzer ch...@gol.com wrote:
 On Tue, 11 Nov 2014 10:21:49 -0800 Gregory Farnum wrote:

 On Mon, Nov 10, 2014 at 10:58 PM, Christian Balzer ch...@gol.com wrote:
 
  Hello,
 
  One of my clusters has become busy enough (I'm looking at you, evil
  Window VMs that I shall banish elsewhere soon) to experience client
  noticeable performance impacts during deep scrub.
  Before this I instructed all OSDs to deep scrub in parallel at Saturday
  night and that finished before Sunday morning.
  So for now I'll fire them off one by one to reduce the load.
 
  Looking forward, that cluster doesn't need more space so instead of
  adding more hosts and OSDs I was thinking of a cache pool instead.
 
  I suppose that will keep the clients happy while the slow pool gets
  scrubbed.
  Is there anybody who tested cache pools with Firefly and compared the
  performance to Giant?
 
  For testing I'm currently playing with a single storage node and 8 SSD
  backed OSDs.
  Now what very much blew my mind is that a pool with a replication of 1
  still does quite the impressive read orgy, clearly reading all the
  data in the PGs.
  Why? And what is it comparing that data with, the cosmic background
  radiation?

 Yeah, cache pools currently do full-object promotions whenever an
 object is accessed. There are some ideas and projects to improve this
 or reduce its effects, but they're mostly just getting started.
 Thanks for confirming that, so probably not much better than Firefly
 _aside_ from the fact that SSD pools should be quite a bit faster in and
 by themselves in Giant.
 Guess there is no other way to find out than to test things, I have a
 feeling that determining the hot working set otherwise will be rather
 difficult.

 At least, I assume that's what you mean by a read orgy; perhaps you
 are seeing something else entirely?

 Indeed I did, this was just an observation that any pool with a replica of
 1 will still read ALL the data during a deep-scrub. What good would that
 do?

Oh, I see what you're saying; you mean it was reading all the data
during a scrub, not just that it was promoting things.

Anyway, reading all the data during a deep scrub verifies that we
*can* read all the data. That's one of the fundamental tasks of
scrubbing data in a storage system. It's often accompanied by other
checks or recovery behaviors to easily repair issues that are
discovered, but simply maintaining confidence that the data actually
exists is the principle goal. :)
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-12 Thread Gregory Farnum
On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird sc...@sigkill.org wrote:
 I'm having a problem with my cluster.  It's running 0.87 right now, but I
 saw the same behavior with 0.80.5 and 0.80.7.

 The problem is that my logs are filling up with replacing existing (lossy)
 channel log lines (see below), to the point where I'm filling drives to
 100% almost daily just with logs.

 It doesn't appear to be network related, because it happens even when
 talking to other OSDs on the same host.

Well, that means it's probably not physical network related, but there
can still be plenty wrong with the networking stack... ;)

 The logs pretty much all point to
 port 0 on the remote end.  Is this an indicator that it's failing to resolve
 port numbers somehow, or is this normal at this point in connection setup?

That's definitely unusual, but I'd need to see a little more to be
sure if it's bad. My guess is that these pipes are connections from
the other OSD's Objecter, which is treated as a regular client and
doesn't bind to a socket for incoming connections.

The repetitive channel replacements are concerning, though — they can
be harmless in some circumstances but this looks more like the
connection is simply failing to establish and so it's retrying over
and over again. Can you restart the OSDs with debug ms = 10 in their
config file and post the logs somewhere? (There is not really any
documentation available on what they mean, but the deeper detail ones
might also be more understandable to you.)
-Greg


 The systems that are causing this problem are somewhat unusual; they're
 running OSDs in Docker containers, but they *should* be configured to run as
 root and have full access to the host's network stack.  They manage to work,
 mostly, but things are still really flaky.

 Also, is there documentation on what the various fields mean, short of
 digging through the source?  And how does Ceph resolve OSD numbers into
 host/port addresses?


 2014-11-12 01:50:40.802604 7f7828db8700  0 -- 10.2.0.36:6819/1 
 10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1
 c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.802708 7f7816538700  0 -- 10.2.0.36:6830/1 
 10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1
 c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.803346 7f781ba8d700  0 -- 10.2.0.36:6819/1 
 10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1
 c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.803944 7f781996c700  0 -- 10.2.0.36:6830/1 
 10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1
 c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.804185 7f7816538700  0 -- 10.2.0.36:6819/1 
 10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1
 c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.805235 7f7813407700  0 -- 10.2.0.36:6819/1 
 10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1
 c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.806364 7f781bc8f700  0 -- 10.2.0.36:6819/1 
 10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1
 c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1)

 2014-11-12 01:50:40.806425 7f781aa7d700  0 -- 10.2.0.36:6830/1 
 10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1
 c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1)



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-12 Thread Aaron Bassett
In playing around with this a bit more, I noticed that the two users on the 
secondary node cant see each others buckets. Is this a problem?
 On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com wrote:
 
 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known 
 issue with Apache 2.4 on the primary and replication.  It's fixed, just 
 waiting for the next firefly release.  Although, that causes 40x errors with 
 Apache 2.4, not 500 errors.
 It is apache 2.4, but I’m actually running 0.80.7 so I probably have that bug 
 fix?
 
 
 No, the unreleased 0.80.8 has the fix.
  
  
 
 Have you verified that both system users can read and write to both 
 clusters?  (Just make sure you clean up the writes to the slave cluster).
 Yes I can write everywhere and radosgw-agent isn’t getting any 403s like it 
 was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index pool is 
 syncing properly, as are the users. It seems like really the only thing that 
 isn’t syncing is the .zone.rgw.buckets pool.
 
 That's pretty much the same behavior I was seeing with Apache 2.4.
 
 Try downgrading the primary cluster to Apache 2.2.  In my testing, the 
 secondary cluster could run 2.2 or 2.4.
Do you have a link to that bug#? I want to see if it gives me any clues. 

Aaron 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Log reading/how do I tell what an OSD is trying to connect to

2014-11-12 Thread Scott Laird
Here are the first 33k lines or so:
https://dl.dropboxusercontent.com/u/104949139/ceph-osd-log.txt

This is a different (but more or less identical) machine from the past set
of logs.  This system doesn't have quite as many drives in it, so I
couldn't spot a same-host error burst, but it's logging tons of the same
errors while trying to talk to 10.2.0.34.

On Wed Nov 12 2014 at 10:47:30 AM Gregory Farnum g...@gregs42.com wrote:

 On Tue, Nov 11, 2014 at 6:28 PM, Scott Laird sc...@sigkill.org wrote:
  I'm having a problem with my cluster.  It's running 0.87 right now, but I
  saw the same behavior with 0.80.5 and 0.80.7.
 
  The problem is that my logs are filling up with replacing existing
 (lossy)
  channel log lines (see below), to the point where I'm filling drives to
  100% almost daily just with logs.
 
  It doesn't appear to be network related, because it happens even when
  talking to other OSDs on the same host.

 Well, that means it's probably not physical network related, but there
 can still be plenty wrong with the networking stack... ;)

  The logs pretty much all point to
  port 0 on the remote end.  Is this an indicator that it's failing to
 resolve
  port numbers somehow, or is this normal at this point in connection
 setup?

 That's definitely unusual, but I'd need to see a little more to be
 sure if it's bad. My guess is that these pipes are connections from
 the other OSD's Objecter, which is treated as a regular client and
 doesn't bind to a socket for incoming connections.

 The repetitive channel replacements are concerning, though — they can
 be harmless in some circumstances but this looks more like the
 connection is simply failing to establish and so it's retrying over
 and over again. Can you restart the OSDs with debug ms = 10 in their
 config file and post the logs somewhere? (There is not really any
 documentation available on what they mean, but the deeper detail ones
 might also be more understandable to you.)
 -Greg

 
  The systems that are causing this problem are somewhat unusual; they're
  running OSDs in Docker containers, but they *should* be configured to
 run as
  root and have full access to the host's network stack.  They manage to
 work,
  mostly, but things are still really flaky.
 
  Also, is there documentation on what the various fields mean, short of
  digging through the source?  And how does Ceph resolve OSD numbers into
  host/port addresses?
 
 
  2014-11-12 01:50:40.802604 7f7828db8700  0 -- 10.2.0.36:6819/1 
  10.2.0.36:0/1 pipe(0x1ce31c80 sd=135 :6819 s=0 pgs=0 cs=0 l=1
  c=0x1e070580).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.802708 7f7816538700  0 -- 10.2.0.36:6830/1 
  10.2.0.36:0/1 pipe(0x1ff61080 sd=120 :6830 s=0 pgs=0 cs=0 l=1
  c=0x1f3db2e0).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.803346 7f781ba8d700  0 -- 10.2.0.36:6819/1 
  10.2.0.36:0/1 pipe(0x1ce31180 sd=125 :6819 s=0 pgs=0 cs=0 l=1
  c=0x1e070420).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.803944 7f781996c700  0 -- 10.2.0.36:6830/1 
  10.2.0.36:0/1 pipe(0x1ff618c0 sd=107 :6830 s=0 pgs=0 cs=0 l=1
  c=0x1f3d8420).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.804185 7f7816538700  0 -- 10.2.0.36:6819/1 
  10.2.0.36:0/1 pipe(0x1ffd1e40 sd=20 :6819 s=0 pgs=0 cs=0 l=1
  c=0x1e070840).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.805235 7f7813407700  0 -- 10.2.0.36:6819/1 
  10.2.0.36:0/1 pipe(0x1ffd1340 sd=60 :6819 s=0 pgs=0 cs=0 l=1
  c=0x1b2d6260).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.806364 7f781bc8f700  0 -- 10.2.0.36:6819/1 
  10.2.0.36:0/1 pipe(0x1ffd0b00 sd=162 :6819 s=0 pgs=0 cs=0 l=1
  c=0x675c580).accept replacing existing (lossy) channel (new one lossy=1)
 
  2014-11-12 01:50:40.806425 7f781aa7d700  0 -- 10.2.0.36:6830/1 
  10.2.0.36:0/1 pipe(0x1db29600 sd=143 :6830 s=0 pgs=0 cs=0 l=1
  c=0x1f3d9600).accept replacing existing (lossy) channel (new one lossy=1)
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Federated gateways

2014-11-12 Thread Craig Lewis
http://tracker.ceph.com/issues/9206

My post to the ML: http://www.spinics.net/lists/ceph-users/msg12665.html


IIRC, the system uses didn't see the other user's bucket in a bucket
listing, but they could read and write the objects fine.



On Wed, Nov 12, 2014 at 11:16 AM, Aaron Bassett aa...@five3genomics.com
wrote:

 In playing around with this a bit more, I noticed that the two users on
 the secondary node cant see each others buckets. Is this a problem?


IIRC, the system user couldn't see each other's buckets, but they could
read and write the objects.

 On Nov 11, 2014, at 6:56 PM, Craig Lewis cle...@centraldesktop.com
 wrote:

 I see you're running 0.80.5.  Are you using Apache 2.4?  There is a known
 issue with Apache 2.4 on the primary and replication.  It's fixed, just
 waiting for the next firefly release.  Although, that causes 40x errors
 with Apache 2.4, not 500 errors.

 It is apache 2.4, but I’m actually running 0.80.7 so I probably have that
 bug fix?


 No, the unreleased 0.80.8 has the fix.




 Have you verified that both system users can read and write to both
 clusters?  (Just make sure you clean up the writes to the slave cluster).

 Yes I can write everywhere and radosgw-agent isn’t getting any 403s like
 it was earlier when I had mismatched keys. The .us-nh.rgw.buckets.index
 pool is syncing properly, as are the users. It seems like really the only
 thing that isn’t syncing is the .zone.rgw.buckets pool.


 That's pretty much the same behavior I was seeing with Apache 2.4.

 Try downgrading the primary cluster to Apache 2.2.  In my testing, the
 secondary cluster could run 2.2 or 2.4.

 Do you have a link to that bug#? I want to see if it gives me any clues.

 Aaron


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-12 Thread Harm Weites
Hi,

When trying to add a new OSD to my cluster the ceph-osd process hangs:

# ceph-osd -i $id --mkfs --mkkey
nothing

At this point I have to explicitly kill -9 the ceph-osd since it doesn't
respond to anything. It also didn't adhere to my foreground debug log
request; the logs are empty. Stracing the ceph-osd [2] shows its very
busy with this:

 nanosleep({0, 201}, NULL)   = 0
 gettimeofday({1415741192, 862216}, NULL) = 0
 nanosleep({0, 201}, NULL)   = 0
 gettimeofday({1415741192, 864563}, NULL) = 0

I've rebuilt python to undo a threading regression [2], though that's
unrelated to this issue. It did fix ceph not returning properly after
commands like 'ceph osd tree' though, so it is usefull.

This machine is Fedora 21 on ARM with ceph-0.80.7-1.fc21.armv7hl. The
mon/mds/osd are all x86, CentOS 7. Could this be a configuration issue
on my end or is something just broken on my platform?

# lscpu
Architecture:  armv7l
Byte Order:Little Endian
CPU(s):2
On-line CPU(s) list:   0,1
Thread(s) per core:1
Core(s) per socket:2
Socket(s): 1
Model name:ARMv7 Processor rev 4 (v7l)

[1] http://paste.openstack.org/show/132555/
[2] http://bugs.python.org/issue21963

Regards,
Harm
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem with radosgw-admin subuser rm

2014-11-12 Thread Seth Mason
Hi --

I'm trying to remove a subuser but it's not removing the S3 keys when I
pass in --purge-keys.

First I create a sub-user:
$ radosgw-admin subuser create --uid=smason --subuser='smason:test' \
--access=full --key-type=s3 --gen-secret

subusers: [
{ id: smason:test,
  permissions: full-control}],
  keys: [
{ user: smason,
  access_key: B8D062SWPB560CBA3HHX,
  secret_key: snip},
{ user: smason:test,
  access_key: ERKTY5JJ1H2IXE9T5TY3,
  secret_key: snip}],


Then I try to remove the user and the keys:
$ radosgw-admin subuser rm --subuser='smason:test' --purge-keys
 subusers: [],
  keys: [
{ user: smason,
  access_key: B8D062SWPB560CBA3HHX,
  secret_key: snip},
{ user: smason:test,
  access_key: ERKTY5JJ1H2IXE9T5TY3,
  secret_key: snip}],

I'm running ceph version 0.80.5
(38b73c67d375a2552d8ed67843c8a65c2c0feba6). FWIW, I've observed the same
behavior when I use the admin ops REST API.

Let me know if I can provide any more information.

Thanks in advance,

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-osd mkfs mkkey hangs on ARM

2014-11-12 Thread Sage Weil
On Wed, 12 Nov 2014, Harm Weites wrote:
 Hi,
 
 When trying to add a new OSD to my cluster the ceph-osd process hangs:
 
 # ceph-osd -i $id --mkfs --mkkey
 nothing
 
 At this point I have to explicitly kill -9 the ceph-osd since it doesn't
 respond to anything. It also didn't adhere to my foreground debug log
 request; the logs are empty. Stracing the ceph-osd [2] shows its very
 busy with this:
 
  nanosleep({0, 201}, NULL)   = 0
  gettimeofday({1415741192, 862216}, NULL) = 0
  nanosleep({0, 201}, NULL)   = 0
  gettimeofday({1415741192, 864563}, NULL) = 0

Can you gdb attach to the ceph-osd process while it is in this state and 
see what 'bt' says?

sage


 
 I've rebuilt python to undo a threading regression [2], though that's
 unrelated to this issue. It did fix ceph not returning properly after
 commands like 'ceph osd tree' though, so it is usefull.
 
 This machine is Fedora 21 on ARM with ceph-0.80.7-1.fc21.armv7hl. The
 mon/mds/osd are all x86, CentOS 7. Could this be a configuration issue
 on my end or is something just broken on my platform?
 
 # lscpu
 Architecture:  armv7l
 Byte Order:Little Endian
 CPU(s):2
 On-line CPU(s) list:   0,1
 Thread(s) per core:1
 Core(s) per socket:2
 Socket(s): 1
 Model name:ARMv7 Processor rev 4 (v7l)
 
 [1] http://paste.openstack.org/show/132555/
 [2] http://bugs.python.org/issue21963
 
 Regards,
 Harm
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com