date:20170727

[ceph-users] PG:: recovery optimazation: recovery what is really modified by mslovy ・ Pull Request #3837 ・ ceph/ceph ・ GitHub

2017-07-27 Thread donglifec...@gmail.com

yaoning, haomai, Json

what about the "recovery what is really modified" feature? I didn't see any 
update on github recently, will it be further developed?

https://github.com/ceph/ceph/pull/3837 (PG:: recovery optimazation: recovery 
what is really modified)
Thanks a lot.




donglifec...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] XFS attempt to access beyond end of device

2017-07-27 Thread Brad Hubbard

An update on this.

The "attempt to access beyond end of device" messages are created due to a
kernel bug which is rectified by the following patches.

  - 59d43914ed7b9625(vfs: make guard_bh_eod() more generic)
  - 4db96b71e3caea(vfs: guard end of device for mpage interface)

An upgraded Red Hat kernel including these patches is pending.

There was also discussion of the following upstream tracker
http://tracker.ceph.com/issues/14842 however that has been eliminated as being
in play for any of the devices analysed whilst investigating this issue since
these partitions are correctly aligned.

On Sun, Jul 23, 2017 at 10:49 AM, Brad Hubbard  wrote:
> Blair,
>
> I should clarify that I am *now* aware of your support case =D
>
> For anyone willing to run a systemtap the following should give us more
> information about the problem.
>
> stap --all-modules -e 'probe kernel.function("handle_bad_sector"){ 
> printf("handle_bad_sector(): ARGS is %s\n",$$parms$$);  print_backtrace()}'
>
> In order to run this you will need to install some non-trivial packages such 
> as
> the kernel debuginfo package and kernel-devel. This is generally best
> accomplished as follows, at least on rpm based systems.
>
> (yum|dnf) install systemtap
> stap-prep
>
> The systemtap needs to be running when the error is generated as it monitors
> calls to "handle_bad_sector" which is the function generating the error 
> message.
> Once that function is called the probe will dump all information about the bio
> structure passed as a parameter to "handle_bad_sector" as well as dumping the
> call stack. This would give us a good idea of the specific code involved.
>
>
> On Sat, Jul 22, 2017 at 9:45 AM, Brad Hubbard  wrote:
>> On Sat, Jul 22, 2017 at 9:38 AM, Blair Bethwaite
>>  wrote:
>>> Hi Brad,
>>>
>>> On 22 July 2017 at 09:04, Brad Hubbard  wrote:
 Could you share what kernel/distro you are running and also please test 
 whether
 the error message can be triggered by running the "blkid" command?
>>>
>>> I'm seeing it on RHEL7.3 (3.10.0-514.2.2.el7.x86_64). See Red Hat
>>> support case #01891011 for sosreport etc.
>>
>> Thanks Blair,
>>
>> I'm aware of your case and the Bugzilla created from it and we are
>> investigating further.
>>
>>>
>>> No, blkid does not seem to trigger it. So far I haven't figured out
>>> what does. It seems to be showing up roughly once for each disk every
>>
>> Thanks, that appears to exclude any link to an existing Bugzilla that
>> was suggested as being related.
>>
>>> 1-2 weeks, and there is a clear time correlation across the hosts
>>> experiencing it.
>>>
>>> --
>>> Cheers,
>>> ~Blairo
>>
>>
>>
>> --
>> Cheers,
>> Brad
>
>
>
> --
> Cheers,
> Brad



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Networking/naming doubt

2017-07-27 Thread David Turner

The only thing that is supposed to use the cluster network are the OSDs.
Not even the MONs access the cluster network. I am sure that if you have a
need to make this work that you can find a way, but I don't know that one
exists in the standard tool set.

You might try temporarily setting the /etc/hosts reference for vdicnode02
and vdicnode03 to the cluster network and use the proper hosts name in the
ceph-deploy command. Ceph cluster operations do not use dns at all, so you
could probably leave your /etc/hosts in this state. I don't know if it
would work though. It's really not intended for any communication to happen
on this subnet other than inter-OSD traffic.

On Thu, Jul 27, 2017 at 6:31 PM Oscar Segarra 
wrote:

> Sorry! I'd like to add that I want to use the cluster network for both
> purposes:
>
> ceph-deploy --username vdicceph new vdicnode01 --cluster-network
> 192.168.100.0/24 --public-network 192.168.100.0/24
>
> Thanks a lot
>
>
> 2017-07-28 0:29 GMT+02:00 Oscar Segarra :
>
>> Hi,
>>
>> ¿Do you mean that for security reasons ceph-deploy can only be executed
>> from the public interface?
>>
>> Looks extrange that one cannot decide what network use for ceph-deploy...
>> I could have a dedicated network for ceph-deploy... :S
>>
>> Thanks a lot
>>
>> 2017-07-28 0:03 GMT+02:00 Roger Brown :
>>
>>> I could be wrong, but I think you cannot achieve this objective. If you
>>> declare a cluster network, OSDs will route heartbeat, object replication
>>> and recovery traffic over the cluster network. We prefer that the cluster
>>> network is NOT reachable from the public network or the Internet for added
>>> security. Therefore it will not work with ceph-deploy actions.
>>> Source:
>>> http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/
>>>
>>>
>>> On Thu, Jul 27, 2017 at 3:53 PM Oscar Segarra 
>>> wrote:
>>>
 Hi,

 In my environment I have 3 hosts, every host has 2 network interfaces:

 public: 192.168.2.0/24
 cluster: 192.168.100.0/24

 The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved
 by public DNS through the public interface, that means the "ping
 vdicnode01" will resolve 192.168.2.1.

 In my environment the "admin" node is the first node vdicnode01 and I'd
 like all the deployment "ceph-deploy" and all osd traffic to go from the
 cluster network.

 1) To begin with, I create the cluster and I want all traffic to go
 from the cluster network:
 ceph-deploy --username vdicceph new vdicnode01 --cluster-network
 192.168.100.0/24 --public-network 192.168.100.0/24

 2) The problem comes when I have to launch my commands to the other
 hosts for example, from node vdicnode01 I execute:

 2.1) ceph-deploy --username vdicceph osd create vdicnode02:sdb
 --> Finishes Ok but communication goes through the public interface

 2.2) ceph-deploy --username vdicceph osd create vdicnode02.local:sdb
 --> vdicnode02.local is added manually in /etc/hosts (assigned a
 cluster IP)
 --> It raises some errors/warnings becase vdicnod02.local is not the
 real hostname. Some files are created with vdicnode02.local in the middle
 of the name of the file and some errors appear when starting up the osd
 service related to "file does not exist"

 2.3) ceph-deploy --username vdicceph osd create vdicnode02-priv:sdb
 --> vdicnode02-priv is added manually in /etc/hosts (assigned a cluster
 IP)
 --> It raises some errors/warnings becase vdicnod02-priv is not the
 real hostname. Some files are created with vdicnode02-priv in the middle of
 the name of the file and some errors appear when starting up the osd
 service related to "file does not exist"

 What would be the right way to achieve my objective?

 If is there any documentation I have not found, please redirect me...

 Thanks a lot for your help in advance.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High iowait on OSD node

2017-07-27 Thread Anthony D'Atri

My first suspicion would be the HBA.  Are you using a RAID HBA?  If so I 
suggest checking the status of your BBU/FBWC and cache policy.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Networking/naming doubt

2017-07-27 Thread Oscar Segarra

Sorry! I'd like to add that I want to use the cluster network for both
purposes:

ceph-deploy --username vdicceph new vdicnode01 --cluster-network
192.168.100.0/24 --public-network 192.168.100.0/24

Thanks a lot

2017-07-28 0:29 GMT+02:00 Oscar Segarra :

> Hi,
>
> ¿Do you mean that for security reasons ceph-deploy can only be executed
> from the public interface?
>
> Looks extrange that one cannot decide what network use for ceph-deploy...
> I could have a dedicated network for ceph-deploy... :S
>
> Thanks a lot
>
> 2017-07-28 0:03 GMT+02:00 Roger Brown :
>
>> I could be wrong, but I think you cannot achieve this objective. If you
>> declare a cluster network, OSDs will route heartbeat, object replication
>> and recovery traffic over the cluster network. We prefer that the cluster
>> network is NOT reachable from the public network or the Internet for added
>> security. Therefore it will not work with ceph-deploy actions.
>> Source: http://docs.ceph.com/docs/master/rados/configuration
>> /network-config-ref/
>>
>>
>> On Thu, Jul 27, 2017 at 3:53 PM Oscar Segarra 
>> wrote:
>>
>>> Hi,
>>>
>>> In my environment I have 3 hosts, every host has 2 network interfaces:
>>>
>>> public: 192.168.2.0/24
>>> cluster: 192.168.100.0/24
>>>
>>> The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved by
>>> public DNS through the public interface, that means the "ping vdicnode01"
>>> will resolve 192.168.2.1.
>>>
>>> In my environment the "admin" node is the first node vdicnode01 and I'd
>>> like all the deployment "ceph-deploy" and all osd traffic to go from the
>>> cluster network.
>>>
>>> 1) To begin with, I create the cluster and I want all traffic to go from
>>> the cluster network:
>>> ceph-deploy --username vdicceph new vdicnode01 --cluster-network
>>> 192.168.100.0/24 --public-network 192.168.100.0/24
>>>
>>> 2) The problem comes when I have to launch my commands to the other
>>> hosts for example, from node vdicnode01 I execute:
>>>
>>> 2.1) ceph-deploy --username vdicceph osd create vdicnode02:sdb
>>> --> Finishes Ok but communication goes through the public interface
>>>
>>> 2.2) ceph-deploy --username vdicceph osd create vdicnode02.local:sdb
>>> --> vdicnode02.local is added manually in /etc/hosts (assigned a cluster
>>> IP)
>>> --> It raises some errors/warnings becase vdicnod02.local is not the
>>> real hostname. Some files are created with vdicnode02.local in the middle
>>> of the name of the file and some errors appear when starting up the osd
>>> service related to "file does not exist"
>>>
>>> 2.3) ceph-deploy --username vdicceph osd create vdicnode02-priv:sdb
>>> --> vdicnode02-priv is added manually in /etc/hosts (assigned a cluster
>>> IP)
>>> --> It raises some errors/warnings becase vdicnod02-priv is not the real
>>> hostname. Some files are created with vdicnode02-priv in the middle of the
>>> name of the file and some errors appear when starting up the osd service
>>> related to "file does not exist"
>>>
>>> What would be the right way to achieve my objective?
>>>
>>> If is there any documentation I have not found, please redirect me...
>>>
>>> Thanks a lot for your help in advance.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

2017-07-27 Thread Brad Hubbard

On Fri, Jul 28, 2017 at 6:06 AM, Jared Watts  wrote:
> I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in).
> ceph status and ceph osd tree output can be found at:
>
> https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12
>
>
>
> In osd.4 log, I see many of these:
>
> 2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120 heartbeat_check: no
> reply from 10.32.0.3:6807 osd.15 ever on either front or back, first ping
> sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)
>
> 2017-07-27 19:38:53.468881 7f3855c1c700 -1 osd.4 120 heartbeat_check: no
> reply from 10.32.0.3:6811 osd.16 ever on either front or back, first ping
> sent 2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)
>
>
>
> From osd.4, those endpoints look reachable:
>
> / # nc -vz 10.32.0.3 6807
>
> 10.32.0.3 (10.32.0.3:6807) open
>
> / # nc -vz 10.32.0.3 6811
>
> 10.32.0.3 (10.32.0.3:6811) open
>
>
>
> What else can I look at to determine why most of the OSDs cannot
> communicate?  http://tracker.ceph.com/issues/16092 indicates this behavior
> is a networking or hardware issue, what else can I check there?  I can turn
> on extra logging as needed.  Thanks!

Do a packet capture on both machines at the same time and verify the
packets are arriving as expected.

>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Networking/naming doubt

2017-07-27 Thread Roger Brown

I could be wrong, but I think you cannot achieve this objective. If you
declare a cluster network, OSDs will route heartbeat, object replication
and recovery traffic over the cluster network. We prefer that the cluster
network is NOT reachable from the public network or the Internet for added
security. Therefore it will not work with ceph-deploy actions.
Source:
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/


On Thu, Jul 27, 2017 at 3:53 PM Oscar Segarra 
wrote:

> Hi,
>
> In my environment I have 3 hosts, every host has 2 network interfaces:
>
> public: 192.168.2.0/24
> cluster: 192.168.100.0/24
>
> The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved by
> public DNS through the public interface, that means the "ping vdicnode01"
> will resolve 192.168.2.1.
>
> In my environment the "admin" node is the first node vdicnode01 and I'd
> like all the deployment "ceph-deploy" and all osd traffic to go from the
> cluster network.
>
> 1) To begin with, I create the cluster and I want all traffic to go from
> the cluster network:
> ceph-deploy --username vdicceph new vdicnode01 --cluster-network
> 192.168.100.0/24 --public-network 192.168.100.0/24
>
> 2) The problem comes when I have to launch my commands to the other hosts
> for example, from node vdicnode01 I execute:
>
> 2.1) ceph-deploy --username vdicceph osd create vdicnode02:sdb
> --> Finishes Ok but communication goes through the public interface
>
> 2.2) ceph-deploy --username vdicceph osd create vdicnode02.local:sdb
> --> vdicnode02.local is added manually in /etc/hosts (assigned a cluster
> IP)
> --> It raises some errors/warnings becase vdicnod02.local is not the real
> hostname. Some files are created with vdicnode02.local in the middle of the
> name of the file and some errors appear when starting up the osd service
> related to "file does not exist"
>
> 2.3) ceph-deploy --username vdicceph osd create vdicnode02-priv:sdb
> --> vdicnode02-priv is added manually in /etc/hosts (assigned a cluster IP)
> --> It raises some errors/warnings becase vdicnod02-priv is not the real
> hostname. Some files are created with vdicnode02-priv in the middle of the
> name of the file and some errors appear when starting up the osd service
> related to "file does not exist"
>
> What would be the right way to achieve my objective?
>
> If is there any documentation I have not found, please redirect me...
>
> Thanks a lot for your help in advance.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

2017-07-27 Thread Oscar Segarra

Hi Roger,

Thanks a lot, I will try your workarround.

I have opened a bug in order devs to review it as soon as they have
availability.

http://tracker.ceph.com/issues/20807



2017-07-27 23:39 GMT+02:00 Roger Brown :

> I had same issue on Lumninous and worked around it by disabling ceph-disk.
> The osds can start without it.
>
> On Thu, Jul 27, 2017 at 3:36 PM Oscar Segarra 
> wrote:
>
>> Hi,
>>
>> First of all, my version:
>>
>> [root@vdicnode01 ~]# ceph -v
>> ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous
>> (rc)
>>
>> When I boot my ceph node (I have an all in one) I get the following
>> message in boot.log:
>>
>> *[FAILED] Failed to start Ceph disk activation: /dev/sdb2.*
>> *See 'systemctl status ceph-disk@dev-sdb2.service' for details.*
>> *[FAILED] Failed to start Ceph disk activation: /dev/sdb1.*
>> *See 'systemctl status ceph-disk@dev-sdb1.service' for details.*
>>
>> [root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb1.service
>> ● ceph-disk@dev-sdb1.service - Ceph disk activation: /dev/sdb1
>>Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
>> vendor preset: disabled)
>>Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
>> 1h 52min ago
>>   Process: 740 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
>> /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
>> --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
>>  Main PID: 740 (code=exited, status=1/FAILURE)
>>
>> Jul 27 23:37:23 vdicnode01 sh[740]: main(sys.argv[1:])
>> Jul 27 23:37:23 vdicnode01 sh[740]: File 
>> "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 5682, in main
>> Jul 27 23:37:23 vdicnode01 sh[740]: args.func(args)
>> Jul 27 23:37:23 vdicnode01 sh[740]: File 
>> "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4891, in main_trigger
>> Jul 27 23:37:23 vdicnode01 sh[740]: raise Error('return code ' + str(ret))
>> Jul 27 23:37:23 vdicnode01 sh[740]: ceph_disk.main.Error: Error: return
>> code 1
>> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service: main
>> process exited, code=exited, status=1/FAILURE
>> Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
>> activation: /dev/sdb1.
>> Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb1.service
>> entered failed state.
>> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service failed.
>>
>>
>> [root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb2.service
>> ● ceph-disk@dev-sdb2.service - Ceph disk activation: /dev/sdb2
>>Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
>> vendor preset: disabled)
>>Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
>> 1h 52min ago
>>   Process: 744 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
>> /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
>> --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
>>  Main PID: 744 (code=exited, status=1/FAILURE)
>>
>> Jul 27 23:37:23 vdicnode01 sh[744]: main(sys.argv[1:])
>> Jul 27 23:37:23 vdicnode01 sh[744]: File 
>> "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 5682, in main
>> Jul 27 23:37:23 vdicnode01 sh[744]: args.func(args)
>> Jul 27 23:37:23 vdicnode01 sh[744]: File 
>> "/usr/lib/python2.7/site-packages/ceph_disk/main.py",
>> line 4891, in main_trigger
>> Jul 27 23:37:23 vdicnode01 sh[744]: raise Error('return code ' + str(ret))
>> Jul 27 23:37:23 vdicnode01 sh[744]: ceph_disk.main.Error: Error: return
>> code 1
>> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service: main
>> process exited, code=exited, status=1/FAILURE
>> Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
>> activation: /dev/sdb2.
>> Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb2.service
>> entered failed state.
>> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service failed.
>>
>> I have created an entry in /etc/fstab in order to mount journal disk
>> automatically:
>>
>> /dev/sdb1   /var/lib/ceph/osd/ceph-0   xfs  defaults,noatime
>>  1 2
>>
>> But when I boot, I get the same error message.
>>
>> When I execute ceph -s osd look work perfectly:
>>
>> [root@vdicnode01 ~]# ceph -s
>>   cluster:
>> id: 61881df3-1365-4139-a586-92b5eca9cf18
>> health: HEALTH_WARN
>> Degraded data redundancy: 5/10 objects degraded (50.000%),
>> 128 pgs unclean, 128 pgs degraded, 128 pgs undersized
>> 128 pgs not scrubbed for 86400
>>
>>   services:
>> mon: 1 daemons, quorum vdicnode01
>> mgr: vdicnode01(active)
>> osd: 1 osds: 1 up, 1 in
>>
>>   data:
>> pools:   1 pools, 128 pgs
>> objects: 5 objects, 1349 bytes
>> usage:   1073 MB used, 39785 MB / 40858 MB avail
>> pgs: 5/10 objects degraded (50.000%)
>>  128 active+undersized+degraded
>>
>>
>> ¿Anybody has experienced the same issue?
>>

[ceph-users] Networking/naming doubt

2017-07-27 Thread Oscar Segarra

Hi,

In my environment I have 3 hosts, every host has 2 network interfaces:

public: 192.168.2.0/24
cluster: 192.168.100.0/24

The hostname "vdicnode01", "vdicnode02" and "vdicnode03" are resolved by
public DNS through the public interface, that means the "ping vdicnode01"
will resolve 192.168.2.1.

In my environment the "admin" node is the first node vdicnode01 and I'd
like all the deployment "ceph-deploy" and all osd traffic to go from the
cluster network.

1) To begin with, I create the cluster and I want all traffic to go from
the cluster network:
ceph-deploy --username vdicceph new vdicnode01 --cluster-network
192.168.100.0/24 --public-network 192.168.100.0/24

2) The problem comes when I have to launch my commands to the other hosts
for example, from node vdicnode01 I execute:

2.1) ceph-deploy --username vdicceph osd create vdicnode02:sdb
--> Finishes Ok but communication goes through the public interface

2.2) ceph-deploy --username vdicceph osd create vdicnode02.local:sdb
--> vdicnode02.local is added manually in /etc/hosts (assigned a cluster IP)
--> It raises some errors/warnings becase vdicnod02.local is not the real
hostname. Some files are created with vdicnode02.local in the middle of the
name of the file and some errors appear when starting up the osd service
related to "file does not exist"

2.3) ceph-deploy --username vdicceph osd create vdicnode02-priv:sdb
--> vdicnode02-priv is added manually in /etc/hosts (assigned a cluster IP)
--> It raises some errors/warnings becase vdicnod02-priv is not the real
hostname. Some files are created with vdicnode02-priv in the middle of the
name of the file and some errors appear when starting up the osd service
related to "file does not exist"

What would be the right way to achieve my objective?

If is there any documentation I have not found, please redirect me...

Thanks a lot for your help in advance.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

2017-07-27 Thread Roger Brown

I had same issue on Lumninous and worked around it by disabling ceph-disk.
The osds can start without it.

On Thu, Jul 27, 2017 at 3:36 PM Oscar Segarra 
wrote:

> Hi,
>
> First of all, my version:
>
> [root@vdicnode01 ~]# ceph -v
> ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous
> (rc)
>
> When I boot my ceph node (I have an all in one) I get the following
> message in boot.log:
>
> *[FAILED] Failed to start Ceph disk activation: /dev/sdb2.*
> *See 'systemctl status ceph-disk@dev-sdb2.service' for details.*
> *[FAILED] Failed to start Ceph disk activation: /dev/sdb1.*
> *See 'systemctl status ceph-disk@dev-sdb1.service' for details.*
>
> [root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb1.service
> ● ceph-disk@dev-sdb1.service - Ceph disk activation: /dev/sdb1
>Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
> vendor preset: disabled)
>Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
> 1h 52min ago
>   Process: 740 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
> /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
> --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
>  Main PID: 740 (code=exited, status=1/FAILURE)
>
> Jul 27 23:37:23 vdicnode01 sh[740]: main(sys.argv[1:])
> Jul 27 23:37:23 vdicnode01 sh[740]: File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main
> Jul 27 23:37:23 vdicnode01 sh[740]: args.func(args)
> Jul 27 23:37:23 vdicnode01 sh[740]: File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4891, in
> main_trigger
> Jul 27 23:37:23 vdicnode01 sh[740]: raise Error('return code ' + str(ret))
> Jul 27 23:37:23 vdicnode01 sh[740]: ceph_disk.main.Error: Error: return
> code 1
> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service: main
> process exited, code=exited, status=1/FAILURE
> Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
> activation: /dev/sdb1.
> Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb1.service
> entered failed state.
> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service failed.
>
>
> [root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb2.service
> ● ceph-disk@dev-sdb2.service - Ceph disk activation: /dev/sdb2
>Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
> vendor preset: disabled)
>Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
> 1h 52min ago
>   Process: 744 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
> /var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
> --log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
>  Main PID: 744 (code=exited, status=1/FAILURE)
>
> Jul 27 23:37:23 vdicnode01 sh[744]: main(sys.argv[1:])
> Jul 27 23:37:23 vdicnode01 sh[744]: File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main
> Jul 27 23:37:23 vdicnode01 sh[744]: args.func(args)
> Jul 27 23:37:23 vdicnode01 sh[744]: File
> "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4891, in
> main_trigger
> Jul 27 23:37:23 vdicnode01 sh[744]: raise Error('return code ' + str(ret))
> Jul 27 23:37:23 vdicnode01 sh[744]: ceph_disk.main.Error: Error: return
> code 1
> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service: main
> process exited, code=exited, status=1/FAILURE
> Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
> activation: /dev/sdb2.
> Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb2.service
> entered failed state.
> Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service failed.
>
> I have created an entry in /etc/fstab in order to mount journal disk
> automatically:
>
> /dev/sdb1   /var/lib/ceph/osd/ceph-0   xfs  defaults,noatime
>  1 2
>
> But when I boot, I get the same error message.
>
> When I execute ceph -s osd look work perfectly:
>
> [root@vdicnode01 ~]# ceph -s
>   cluster:
> id: 61881df3-1365-4139-a586-92b5eca9cf18
> health: HEALTH_WARN
> Degraded data redundancy: 5/10 objects degraded (50.000%), 128
> pgs unclean, 128 pgs degraded, 128 pgs undersized
> 128 pgs not scrubbed for 86400
>
>   services:
> mon: 1 daemons, quorum vdicnode01
> mgr: vdicnode01(active)
> osd: 1 osds: 1 up, 1 in
>
>   data:
> pools:   1 pools, 128 pgs
> objects: 5 objects, 1349 bytes
> usage:   1073 MB used, 39785 MB / 40858 MB avail
> pgs: 5/10 objects degraded (50.000%)
>  128 active+undersized+degraded
>
>
> ¿Anybody has experienced the same issue?
>
> Thanks a lot.
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

2017-07-27 Thread Oscar Segarra

Hi,

First of all, my version:

[root@vdicnode01 ~]# ceph -v
ceph version 12.1.1 (f3e663a190bf2ed12c7e3cda288b9a159572c800) luminous (rc)

When I boot my ceph node (I have an all in one) I get the following message
in boot.log:

*[FAILED] Failed to start Ceph disk activation: /dev/sdb2.*
*See 'systemctl status ceph-disk@dev-sdb2.service' for details.*
*[FAILED] Failed to start Ceph disk activation: /dev/sdb1.*
*See 'systemctl status ceph-disk@dev-sdb1.service' for details.*

[root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb1.service
● ceph-disk@dev-sdb1.service - Ceph disk activation: /dev/sdb1
   Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
1h 52min ago
  Process: 740 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
/var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
--log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
 Main PID: 740 (code=exited, status=1/FAILURE)

Jul 27 23:37:23 vdicnode01 sh[740]: main(sys.argv[1:])
Jul 27 23:37:23 vdicnode01 sh[740]: File
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main
Jul 27 23:37:23 vdicnode01 sh[740]: args.func(args)
Jul 27 23:37:23 vdicnode01 sh[740]: File
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4891, in
main_trigger
Jul 27 23:37:23 vdicnode01 sh[740]: raise Error('return code ' + str(ret))
Jul 27 23:37:23 vdicnode01 sh[740]: ceph_disk.main.Error: Error: return
code 1
Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service: main
process exited, code=exited, status=1/FAILURE
Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
activation: /dev/sdb1.
Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb1.service
entered failed state.
Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb1.service failed.


[root@vdicnode01 ~]# systemctl status ceph-disk@dev-sdb2.service
● ceph-disk@dev-sdb2.service - Ceph disk activation: /dev/sdb2
   Loaded: loaded (/usr/lib/systemd/system/ceph-disk@.service; static;
vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2017-07-27 23:37:23 CEST;
1h 52min ago
  Process: 744 ExecStart=/bin/sh -c timeout $CEPH_DISK_TIMEOUT flock
/var/lock/ceph-disk-$(basename %f) /usr/sbin/ceph-disk --verbose
--log-stdout trigger --sync %f (code=exited, status=1/FAILURE)
 Main PID: 744 (code=exited, status=1/FAILURE)

Jul 27 23:37:23 vdicnode01 sh[744]: main(sys.argv[1:])
Jul 27 23:37:23 vdicnode01 sh[744]: File
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5682, in main
Jul 27 23:37:23 vdicnode01 sh[744]: args.func(args)
Jul 27 23:37:23 vdicnode01 sh[744]: File
"/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4891, in
main_trigger
Jul 27 23:37:23 vdicnode01 sh[744]: raise Error('return code ' + str(ret))
Jul 27 23:37:23 vdicnode01 sh[744]: ceph_disk.main.Error: Error: return
code 1
Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service: main
process exited, code=exited, status=1/FAILURE
Jul 27 23:37:23 vdicnode01 systemd[1]: Failed to start Ceph disk
activation: /dev/sdb2.
Jul 27 23:37:23 vdicnode01 systemd[1]: Unit ceph-disk@dev-sdb2.service
entered failed state.
Jul 27 23:37:23 vdicnode01 systemd[1]: ceph-disk@dev-sdb2.service failed.

I have created an entry in /etc/fstab in order to mount journal disk
automatically:

/dev/sdb1   /var/lib/ceph/osd/ceph-0   xfs  defaults,noatime  1
2

But when I boot, I get the same error message.

When I execute ceph -s osd look work perfectly:

[root@vdicnode01 ~]# ceph -s
  cluster:
id: 61881df3-1365-4139-a586-92b5eca9cf18
health: HEALTH_WARN
Degraded data redundancy: 5/10 objects degraded (50.000%), 128
pgs unclean, 128 pgs degraded, 128 pgs undersized
128 pgs not scrubbed for 86400

  services:
mon: 1 daemons, quorum vdicnode01
mgr: vdicnode01(active)
osd: 1 osds: 1 up, 1 in

  data:
pools:   1 pools, 128 pgs
objects: 5 objects, 1349 bytes
usage:   1073 MB used, 39785 MB / 40858 MB avail
pgs: 5/10 objects degraded (50.000%)
 128 active+undersized+degraded


¿Anybody has experienced the same issue?

Thanks a lot.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

2017-07-27 Thread Jared Watts

I’ve got a cluster where a bunch of OSDs are down/out (only 6/21 are up/in).  
ceph status and ceph osd tree output can be found at:
https://gist.github.com/jbw976/24895f5c35ef0557421124f4b26f6a12

In osd.4 log, I see many of these:
2017-07-27 19:38:53.468852 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply 
from 10.32.0.3:6807 osd.15 ever on either front or back, first ping sent 
2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)
2017-07-27 19:38:53.468881 7f3855c1c700 -1 osd.4 120 heartbeat_check: no reply 
from 10.32.0.3:6811 osd.16 ever on either front or back, first ping sent 
2017-07-27 19:37:40.857220 (cutoff 2017-07-27 19:38:33.468850)

From osd.4, those endpoints look reachable:
/ # nc -vz 10.32.0.3 6807
10.32.0.3 (10.32.0.3:6807) open
/ # nc -vz 10.32.0.3 6811
10.32.0.3 (10.32.0.3:6811) open

What else can I look at to determine why most of the OSDs cannot communicate?  
http://tracker.ceph.com/issues/16092 indicates this behavior is a networking or 
hardware issue, what else can I check there?  I can turn on extra logging as 
needed.  Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Client behavior when OSD is unreachable

2017-07-27 Thread David Turner

The clients receive up to date versions of the osd map which includes which
osds are down. So yes, when an osd is marked down in the cluster the
clients know about it. If an osd is unreachable but isn't marked down in
the cluster, the result is blocked requests.

On Thu, Jul 27, 2017, 1:21 PM Daniel K  wrote:

> Does the client track which OSDs are reachable? How does it behave if some
> are not reachable?
>
> For example:
>
> Cluster network with all OSD hosts on a switch.
> Public network with OSD hosts split between two switches, failure domain
> is switch.
>
> copies=3 so with a failure of the public switch, 1 copy would be reachable
> by client. Will the client know that it can't reach the OSDs on the failed
> switch?
>
> Well...thinking through this:
> The mons communicate on the public network -- correct? So an unreachable
> public network for some of the OSDs would cause them to be marked down,
> which then the clients would know about.
>
> Correct?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Client behavior when OSD is unreachable

2017-07-27 Thread Daniel K

Does the client track which OSDs are reachable? How does it behave if some
are not reachable?

For example:

Cluster network with all OSD hosts on a switch.
Public network with OSD hosts split between two switches, failure domain is
switch.

copies=3 so with a failure of the public switch, 1 copy would be reachable
by client. Will the client know that it can't reach the OSDs on the failed
switch?

Well...thinking through this:
The mons communicate on the public network -- correct? So an unreachable
public network for some of the OSDs would cause them to be marked down,
which then the clients would know about.

Correct?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] High iowait on OSD node

2017-07-27 Thread Peter Maloney

I'm using bcache (starting around the middle of December...before that
see way higher await) for all the 12 hdds on the 2 SSDs, and NVMe for
journals. (and some months ago I changed all the 2TB disks to 6TB and
added ceph4,5)

Here's my iostat in ganglia:

just raw per disk await
   
http://www.brockmann-consult.de/ganglia/graph_all_periods.php?title[]=ceph.*[]=sd[a-z]_await=line=show=1
per host max await
   
http://www.brockmann-consult.de/ganglia/graph_all_periods.php?title[]=ceph.*[]=max_await=line=show=1

strangely aggregated data (my max metric is the max disk, but ganglia
averages out across disk/host or something, so it's not really a max)
   
http://www.brockmann-consult.de/ganglia/graph_all_periods.php?c=ceph=network_report=week=by%20name=4=2=1501155678=disk_wait_report=large

or to explore and make your own graphs, start from here:
http://www.brockmann-consult.de/ganglia/

I didn't find any ganglia plugins for that, so I wrote some that take
30s averages every minute from iostat and stores them, so when you see
numbers like 400 in my data, it could have been steady 400 for 30
seconds, or 4000 for 3 seconds and then 0 for 27 seconds averaged
together, and 30s of every minute is missing from the data.

In my data, sda,b,c on ceph1,2,3 are probably always the SSDs, and sdm,n
on ceph4,5 are currently the SSDs and possibly were sda,b once;
sometimes rebooting changes it (yeah not ideal but not sure how to
change it... maybe a udev rule to name ssds differently).

And also note that I found deadline instead of CFQ scheduler has way
lower iowait and latency, but not necessarily more throughput or iops...
you could test that; but not using CFQ might disable some ceph priority
settings (or maybe not relevant since Jewel?).

ps. use fixed width on your iostat and it's more readable in html
supporting email clients...see below where I changed it

On 07/27/17 05:48, John Petrini wrote:
> Hello list,
>
> Just curious if anyone has ever seen this behavior and might have some
> ideas on how to troubleshoot it. 
>
> We're seeing very high iowait in iostat across all OSD's in on a
> single OSD host. It's very spiky - dropping to zero and then shooting
> up to as high as 400 in some cases. Despite this it does not seem to
> be having a major impact on the cluster performance as a whole.
>
> Some more details:
> 3x OSD Nodes - Dell R730's: 24 cores @2.6GHz, 256GB RAM, 20x 1.2TB 10K
> SAS OSD's per node.
>
> We're running ceph hammer.
>
> Here's the output of iostat. Note that this is from a period when the
> cluster is not very busy but you can still see high spikes on a few
> OSD's. It's much worse during high load.
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> sda   0.00 0.000.000.50 0.00 6.00  
>  24.00 0.008.000.008.00   8.00   0.40
> sdb   0.00 0.000.00   60.00 0.00   808.00  
>  26.93 0.000.070.000.07   0.03   0.20
> sdc   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sdd   0.00 0.000.00   67.00 0.00  1010.00  
>  30.15 0.010.090.000.09   0.09   0.60
> sde   0.00 0.000.00   93.00 0.00   868.00  
>  18.67 0.000.040.000.04   0.04   0.40
> sdf   0.00 0.000.00   57.50 0.00   572.00  
>  19.90 0.000.030.000.03   0.03   0.20
> sdg   0.00 1.000.003.50 0.0022.00  
>  12.57 0.75   16.000.00   16.00   2.86   1.00
> sdh   0.00 0.001.50   25.50 6.00   458.50  
>  34.41 2.03   75.260.00   79.69   3.04   8.20
> sdi   0.00 0.000.00   30.50 0.00   384.50  
>  25.21 2.36   77.510.00   77.51   3.28  10.00
> sdj   0.00 1.001.50  105.00 6.00   925.75  
>  17.5010.85  101.848.00  103.18   2.35  25.00
> sdl   0.00 0.002.000.00   320.00 0.00  
> 320.00 0.013.003.000.00   2.00   0.40
> sdk   0.00 1.000.00   55.00 0.00   334.50  
>  12.16 7.92  136.910.00  136.91   2.51  13.80
> sdm   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sdn   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sdo   0.00 0.001.000.00 4.00 0.00
> 8.00 0.004.004.000.00   4.00   0.40
> sdp   0.00 0.000.000.00 0.00 0.00
> 0.00 0.000.000.000.00   0.00   0.00
> sdq   0.50 0.00  756.000.00 93288.00 0.00  
> 246.79 1.471.951.950.00   1.17  88.60
> sdr   0.00 0.001.000.00 4.00

Re: [ceph-users] Fwd: [lca-announce] Call for Proposals for linux.conf.au 2018 in Sydney are open!

2017-07-27 Thread Tim Serong

On 07/03/2017 02:36 PM, Tim Serong wrote:
> It's that time of year again, folks!  Please everyone go submit talks,
> or at least plan to attend this most excellent of F/OSS conferences.

CFP closes in a bit over a week (August 6).  Get into it if you didn't
already :-)

> (I thought I might put in a proposal to run a ceph miniconf, unless
> anyone else was already thinking of doing that?  If accepted, that would
> give us a whole day of cephy goodness in addition to whatever lands in
> the main conference programme.)

I *did* put in a proposal for a Ceph miniconf, BTW.  Fingers crossed...

Tim

>  Forwarded Message 
> Subject: [lca-announce] Call for Proposals for linux.conf.au 2018 in
> Sydney are open!
> Date: Mon, 03 Jul 2017 11:04:27 +1000
> From: linux.conf.au Announcements 
> Reply-To: lca-annou...@lists.linux.org.au
> To: lca-annou...@lists.linux.org.au, annou...@lists.linux.org.au
> 
> On behalf of the LCA2018 team we are pleased to announce that the Call
> for Proposals for linux.conf.au 2018 is now open! This Call for
> Proposals will close on August 6 with no extensions expected.
> 
> linux.conf.au is one of the best-known community driven Free and Open
> Source Software conferences in the world. In 2018 we welcome you to join
> us in Sydney, New South Wales on Monday 22 January through to Friday 26
> January.
> 
> For full details including those not covered by this announcement visit
> https://linux.conf.au/proposals
> 
> == IMPORTANT DATES ==
> 
> * Call for Proposals Opens: 3 July 2017
> * Call for Proposals Closes: 6 August 2017 (no extensions)
> * Notifications from the programme committee: mid-September 2017
> * Conference Opens: 22nd January 2018
> 
> == HOW TO SUBMIT ==
> 
> Create an account at https://login.linux.conf.au/manage/public/newuser
> Visit https://linux.conf.au/proposals and click the link to submit your
> proposal
> 
> == ABOUT LINUX.CONF.AU ==
> 
> linux.conf.au is a conference where people gather to learn about the
> entire world of Free and Open Source Software, directly from the people
> who shape the projects and topics that they’re presenting on.
> 
> Our aim is to create a deeply technical conference made up of industry
> leaders and experts on a wide range of subjects.
> 
> linux.conf.au welcomes submissions first-time and seasoned speakers from
> all free and open technology communities and all walks of life. We
> respect and encourage diversity at our conference.
> 
> == CONFERENCE THEME ==
> 
> The theme for linux.conf.au 2018 is “A Little Bit Of History Repeating”.
> Building on last year’s theme of “The Future of Open Source”, we intend
> to examine the future through the lens of the past.
> 
> For some suggestions to get you started with your proposal ideas please
> visit the linux.conf.au website.
> 
> == PROPOSAL TYPES ==
> 
> We’re accepting submissions for three different types of proposal:
> 
> * Presentation (45 minutes): These are generally presented in lecture
> format and form the bulk of the available conference slots.
> * Tutorial (100 minutes): These are generally presented in a classroom
> format. They should be interactive or hands-on in nature. Tutorials are
> expected to have a specific learning outcome for attendees.
> * Miniconf (full-day): Single-track mini-conferences that run for the
> duration of a day on either Monday or Tuesday. We provide the room, and
> you provide the speakers. Together, you can explore a field in Free and
> Open Source software in depth.
> 
> == PROPOSER RECOGNITION ==
> 
> In recognition of the value that presenters and organisers bring to our
> conference, once a proposal is accepted, one presenter or organiser per
> proposal is entitled to:
> 
> * Free registration, which holds all of the benefits of a Professional
> Delegate Ticket
> * A complimentary ticket to the Speakers' Dinner for the speaker, with
> additional tickets for significant others and children of the speaker
> available for purchase.
> * Optionally, recognition as a Fairy Penguin Sponsor, available at 50%
> off the advertised price
> 
> If your proposal includes more than one presenter or organiser, these
> additional people will be entitled to:
> 
> * Professional or hobbyist registration at the Early Bird rate,
> regardless of whether the Early Bird rate is generally available
> * Speakers’ dinner tickets available for purchase at cost
> 
> Important Note for miniconf organisers: These discounts apply to the
> organisers only. All participants in your miniconf must arrange or
> purchase tickets for themselves via the regular ticket sales process or
> they may not be able to attend!
> 
> As a volunteer-run non-profit conference, linux.conf.au does not pay
> speakers to present at the conference; but you may be eligible for
> financial assistance.
> 
> == FINANCIAL ASSISTANCE ==
> 
> linux.conf.au is able to provide limited financial assistance for some
> speakers.
> 
> Financial assistance may

[ceph-users] Ceph Developers Monthly - August

2017-07-27 Thread Leonardo Vaz

Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:

 https://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 https://wiki.ceph.com/CDM_02-AUG-2017

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] High iowait on OSD node

2017-07-27 Thread John Petrini

Hello list,

Just curious if anyone has ever seen this behavior and might have some
ideas on how to troubleshoot it.

We're seeing very high iowait in iostat across all OSD's in on a single OSD
host. It's very spiky - dropping to zero and then shooting up to as high as
400 in some cases. Despite this it does not seem to be having a major
impact on the cluster performance as a whole.

Some more details:
3x OSD Nodes - Dell R730's: 24 cores @2.6GHz, 256GB RAM, 20x 1.2TB 10K SAS
OSD's per node.

We're running ceph hammer.

Here's the output of iostat. Note that this is from a period when the
cluster is not very busy but you can still see high spikes on a few OSD's.
It's much worse during high load.

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz
avgqu-sz   await r_await w_await  svctm  %util
sda   0.00 0.000.000.50 0.00 6.0024.00
0.008.000.008.00   8.00   0.40
sdb   0.00 0.000.00   60.00 0.00   808.0026.93
0.000.070.000.07   0.03   0.20
sdc   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdd   0.00 0.000.00   67.00 0.00  1010.0030.15
0.010.090.000.09   0.09   0.60
sde   0.00 0.000.00   93.00 0.00   868.0018.67
0.000.040.000.04   0.04   0.40
sdf   0.00 0.000.00   57.50 0.00   572.0019.90
0.000.030.000.03   0.03   0.20
sdg   0.00 1.000.003.50 0.0022.0012.57
0.75   16.000.00   16.00   2.86   1.00
sdh   0.00 0.001.50   25.50 6.00   458.5034.41
2.03   75.260.00   79.69   3.04   8.20
sdi   0.00 0.000.00   30.50 0.00   384.5025.21
2.36   77.510.00   77.51   3.28  10.00
sdj   0.00 1.001.50  105.00 6.00   925.7517.50
   10.85  101.848.00  103.18   2.35  25.00
sdl   0.00 0.002.000.00   320.00 0.00   320.00
0.013.003.000.00   2.00   0.40
sdk   0.00 1.000.00   55.00 0.00   334.5012.16
7.92  136.910.00  136.91   2.51  13.80
sdm   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdn   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdo   0.00 0.001.000.00 4.00 0.00 8.00
0.004.004.000.00   4.00   0.40
sdp   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdq   0.50 0.00  756.000.00 93288.00 0.00   246.79
1.471.951.950.00   1.17  88.60
sdr   0.00 0.001.000.00 4.00 0.00 8.00
0.004.004.000.00   4.00   0.40
sds   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdt   0.00 0.000.00   36.50 0.00   643.5035.26
3.49   95.730.00   95.73   2.63   9.60
sdu   0.00 0.000.00   21.00 0.00   323.2530.79
0.78   37.240.00   37.24   2.95   6.20
sdv   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
sdw   0.00 0.000.00   31.00 0.00   689.5044.48
2.48   80.060.00   80.06   3.29  10.20
sdx   0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
dm-0  0.00 0.000.000.50 0.00 6.0024.00
0.008.000.008.00   8.00   0.40
dm-1  0.00 0.000.000.00 0.00 0.00 0.00
0.000.000.000.00   0.00   0.00
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph object recovery

2017-07-27 Thread Daniel K

So I'm not sure if this was the best or right way to do this but --

using rados I confirmed the unfound object was in the cephfs_data pool
# rados -p cephfs_data ls|grep 001c0ed4

using the osdmap tool I found the pg/osd the unfound object was in --
# osdmaptool --test-map-object 162.001c0ed4 osdmap
(previously exported osdmap to file "osdmap")

>  object '162.001c0ed4' -> 1.21 -> [4]

then told ceph to just delete the unfound object
ceph pg 1.21 mark_unfound_lost delete

and then used rados to put the object back (from the file I had extracted
previously)
# rados -p cephfs_data put 162.001c0ed4 162.001c0ed4.obj


Still have more recovery to do but this seems to have fixed my unfound
object problem.


On Tue, Jul 25, 2017 at 12:54 PM, Daniel K  wrote:

> I did some bad things to my cluster, broke 5 OSDs and wound up with 1
> unfound object.
>
> I mounted one of the OSD drives, used ceph-objectstore-tool to find and
> exported the object:
>
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-10
> 162.001c0ed4 get-bytes filename.obj
>
>
> What's the best way to bring this object back into the active cluster?
>
> Do I need to bring an OSD offline, mount it and do the reverse of the
> above command?
>
> Something like:
> ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-22
> 162.001c0ed4 set-bytes filename.obj
>
> Is there some way to do this without bringing down an osd?
>
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] PG:: recovery optimazation: recovery what is really modified by mslovy ・ Pull Request #3837 ・ ceph/ceph ・ GitHub

Re: [ceph-users] XFS attempt to access beyond end of device

Re: [ceph-users] Networking/naming doubt

Re: [ceph-users] High iowait on OSD node

Re: [ceph-users] Networking/naming doubt

Re: [ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

Re: [ceph-users] Networking/naming doubt

Re: [ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

[ceph-users] Networking/naming doubt

Re: [ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

[ceph-users] Error in boot.log - Failed to start Ceph disk activation - Luminous

[ceph-users] how to troubleshoot "heartbeat_check: no reply" in OSD log

Re: [ceph-users] Client behavior when OSD is unreachable

[ceph-users] Client behavior when OSD is unreachable

Re: [ceph-users] High iowait on OSD node

Re: [ceph-users] Fwd: [lca-announce] Call for Proposals for linux.conf.au 2018 in Sydney are open!

[ceph-users] Ceph Developers Monthly - August

[ceph-users] High iowait on OSD node

Re: [ceph-users] Ceph object recovery

19 matches

Site Navigation

Mail list logo

Footer information