Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

Good morning,

the osd.61 actually just crashed and the disk is still intact. However,
after 8 hours of rebuilding, the unfound objects are still missing:

root@server1:~# ceph -s
  cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
111436/3017766 objects misplaced (3.693%)
9377/1005922 objects unfound (0.932%)
Reduced data availability: 84 pgs inactive
Degraded data redundancy: 277034/3017766 objects degraded (9.180%), 
84 pgs unclean, 84 pgs degraded, 84 pgs undersized
mon server2 is low on available space

  services:
mon: 3 daemons, quorum server5,server3,server2
mgr: server5(active), standbys: server2, 2, 0, server3
osd: 54 osds: 54 up, 54 in; 84 remapped pgs
 flags noscrub,nodeep-scrub

  data:
pools:   3 pools, 1344 pgs
objects: 982k objects, 3837 GB
usage:   10618 GB used, 39030 GB / 49648 GB avail
pgs: 6.250% pgs not active
 277034/3017766 objects degraded (9.180%)
 111436/3017766 objects misplaced (3.693%)
 9377/1005922 objects unfound (0.932%)
 1260 active+clean
 84   recovery_wait+undersized+degraded+remapped+peered

  io:
client:   68960 B/s rd, 20722 kB/s wr, 12 op/s rd, 77 op/s wr

We tried restarting osd.61, but ceph health detail does not change
anymore:

HEALTH_WARN noscrub,nodeep-scrub flag(s) set; 111436/3017886 objects misplaced 
(3.69
3%); 9377/1005962 objects unfound (0.932%); Reduced data availability: 84 pgs 
inacti
ve; Degraded data redundancy: 277034/3017886 objects degraded (9.180%), 84 pgs 
uncle
an, 84 pgs degraded, 84 pgs undersized; mon server2 is low on available space
OSDMAP_FLAGS noscrub,nodeep-scrub flag(s) set
OBJECT_MISPLACED 111436/3017886 objects misplaced (3.693%)
OBJECT_UNFOUND 9377/1005962 objects unfound (0.932%)
pg 4.fa has 117 unfound objects
pg 4.ff has 107 unfound objects
pg 4.fd has 113 unfound objects
...
pg 4.2a has 108 unfound objects

PG_AVAILABILITY Reduced data availability: 84 pgs inactive
pg 4.2a is stuck inactive for 64117.189552, current state 
recovery_wait+undersiz
ed+degraded+remapped+peered, last acting [61]
pg 4.31 is stuck inactive for 64117.147636, current state 
recovery_wait+undersiz
ed+degraded+remapped+peered, last acting [61]
pg 4.32 is stuck inactive for 64117.178461, current state 
recovery_wait+undersiz
ed+degraded+remapped+peered, last acting [61]
pg 4.34 is stuck inactive for 64117.150475, current state 
recovery_wait+undersiz
ed+degraded+remapped+peered, last acting [61]
...


PG_DEGRADED Degraded data redundancy: 277034/3017886 objects degraded (9.180%), 
84 pgs unclean, 84 pgs degraded, 84 pgs undersized
pg 4.2a is stuck unclean for 131612.984555, current state 
recovery_wait+undersized+degraded+remapped+peered, last acting [61]
pg 4.31 is stuck undersized for 221.568468, current state 
recovery_wait+undersized+degraded+remapped+peered, last acting [61]


Is there any chance to recover those pgs or did we actually lose data
with a 2 disk failure?

And is there any way out  of this besides going with

ceph pg {pg-id} mark_unfound_lost revert|delete

?

Best,

Nico

p.s.: the ceph 4.2a query:

{
"state": "recovery_wait+undersized+degraded+remapped+peered",
"snap_trimq": "[]",
"epoch": 17879,
"up": [
17,
13,
25
],
"acting": [
61
],
"backfill_targets": [
"13",
"17",
"25"
],
"actingbackfill": [
"13",
"17",
"25",
"61"
],
"info": {
"pgid": "4.2a",
"last_update": "17529'53875",
"last_complete": "17217'45447",
"log_tail": "17090'43812",
"last_user_version": 53875,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [
{
"start": "1",
"length": "3"
},
{
"start": "6",
"length": "8"
},
{
"start": "10",
"length": "2"
}
],
"history": {
"epoch_created": 9134,
"epoch_pool_created": 9134,
"last_epoch_started": 17528,
"last_interval_started": 17527,
"last_epoch_clean": 17079,
"last_interval_clean": 17078,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 17143,
"same_interval_since": 17878,
"same_primary_since": 17878,
"last_scrub": "17090'44622",
"last_scrub_stamp": "2018-01-21 09:37:09.888508",
"last_deep_scrub": "17090'42219",
"last_deep_scrub_stamp": "2018-01-20 05:05:45.372052",
"last_clean_scrub_stamp": "2018-01-21 09:37:09.888508"
},
"stats": {
 

Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Maged Mokhtar
On 2018-01-23 08:27, Blair Bethwaite wrote:

> Firstly, the OP's premise in asking, "Or should there be a differnce
> of 10x", is fundamentally incorrect. Greater bandwidth does not mean
> lower latency, though the latter almost always results in the former.
> Unfortunately, changing the speed of light remains a difficult
> engineering challenge :-). However, you can do things like: add
> multiple links, overlap signals on the wire, and tweak error
> correction encodings; all to get more bits on the wire without making
> the wire itself any faster. Take Mellanox 100Gb ethernet, 1 lane is
> 25Gb, to get 50Gb they mash 2 lanes together, to get 100Gb they mash 4
> lanes - the latency of single bit transmission is more-or-less
> unchanged. Also note that with UDP/TCP pings or actual Ceph traffic
> we're going via the kernel stack running on the CPU and as such the
> speed & power-management of the CPU can make quite a difference.
> 
> Example 25GE on a dual-port CX-4 card in LACP bond, RHEL7 host.
> 
> $ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.3 (Maipo)
> $ ofed_info | head -1
> MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):
> $ grep 'model name' /proc/cpuinfo | uniq
> model name  : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> $ ibv_devinfo
> hca_id: mlx5_1
> transport:  InfiniBand (0)
> fw_ver: 14.18.1000
> node_guid:  ...
> sys_image_guid: ...
> vendor_id:  0x02c9
> vendor_part_id: 4117
> hw_ver: 0x0
> board_id:   MT_2420110034
> ...
> 
> $ sudo ping -M do -s 8972 -c 10 -f ...
> 10 packets transmitted, 10 received, 0% packet loss, time 4652ms
> rtt min/avg/max/mdev = 0.029/0.031/2.711/0.015 ms, ipg/ewma 0.046/0.031 ms
> 
> $ sudo ping -M do -s 3972 -c 10 -f ...
> 10 packets transmitted, 10 received, 0% packet loss, time 3321ms
> rtt min/avg/max/mdev = 0.019/0.022/0.364/0.003 ms, ipg/ewma 0.033/0.022 ms
> 
> $ sudo ping -M do -s 1972 -c 10 -f ...
> 10 packets transmitted, 10 received, 0% packet loss, time 2818ms
> rtt min/avg/max/mdev = 0.017/0.018/0.086/0.005 ms, ipg/ewma 0.028/0.021 ms
> 
> $ sudo ping -M do -s 472 -c 10 -f ...
> 10 packets transmitted, 10 received, 0% packet loss, time 2498ms
> rtt min/avg/max/mdev = 0.014/0.016/0.305/0.005 ms, ipg/ewma 0.024/0.017 ms
> 
> $ sudo ping -M do -c 10 -f ...
> 10 packets transmitted, 10 received, 0% packet loss, time 2363ms
> rtt min/avg/max/mdev = 0.014/0.015/0.322/0.006 ms, ipg/ewma 0.023/0.016 ms
> 
> On 22 January 2018 at 22:37, Nick Fisk  wrote: 
> 
>> Anyone with 25G ethernet willing to do the test? Would love to see what the
>> latency figures are for that.
>> 
>> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>> Maged Mokhtar
>> Sent: 22 January 2018 11:28
>> To: ceph-users@lists.ceph.com
>> Subject: Re: [ceph-users] What is the should be the expected latency of
>> 10Gbit network connections
>> 
>> On 2018-01-22 08:39, Wido den Hollander wrote:
>> 
>> On 01/20/2018 02:02 PM, Marc Roos wrote:
>> 
>> If I test my connections with sockperf via a 1Gbit switch I get around
>> 25usec, when I test the 10Gbit connection via the switch I have around
>> 12usec is that normal? Or should there be a differnce of 10x.
>> 
>> No, that's normal.
>> 
>> Tests with 8k ping packets over different links I did:
>> 
>> 1GbE:  0.800ms
>> 10GbE: 0.200ms
>> 40GbE: 0.150ms
>> 
>> Wido
>> 
>> sockperf ping-pong
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
>> ReceivedMessages=432874
>> sockperf: = Printing statistics for Server No: 0
>> sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
>> ReceivedMessages=428640
>> sockperf: > avg-lat= 11.609 (std-dev=1.684)
>> sockperf: # dropped messages = 0; # duplicated messages = 0; #
>> out-of-order messages = 0
>> sockperf: Summary: Latency is 11.609 usec
>> sockperf: Total 428640 observations; each percentile contains 4286.40
>> observations
>> sockperf: --->  observation =  856.944
>> sockperf: ---> percentile  99.99 =   39.789
>> sockperf: ---> percentile  99.90 =   20.550
>> sockperf: ---> percentile  99.50 =   17.094
>> sockperf: ---> percentile  99.00 =   15.578
>> sockperf: ---> percentile  95.00 =   12.838
>> sockperf: ---> percentile  90.00 =   12.299
>> sockperf: ---> percentile  75.00 =   11.844
>> sockperf: ---> percentile  50.00 =   11.409
>> sockperf: ---> percentile  25.00 =   11.124
>> sockperf: --->  observation =8.888
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=1.100 sec; 

[ceph-users] Replication count - demo

2018-01-22 Thread M Ranga Swami Reddy
Hello,
What is best and simple way to showcase that - each ceph image
replicated 3 (ie size=3)?

A few ideas:
 - Use "ceph osd map  image_id

If any simple way to showcase the above, please share.

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Blair Bethwaite
Firstly, the OP's premise in asking, "Or should there be a differnce
of 10x", is fundamentally incorrect. Greater bandwidth does not mean
lower latency, though the latter almost always results in the former.
Unfortunately, changing the speed of light remains a difficult
engineering challenge :-). However, you can do things like: add
multiple links, overlap signals on the wire, and tweak error
correction encodings; all to get more bits on the wire without making
the wire itself any faster. Take Mellanox 100Gb ethernet, 1 lane is
25Gb, to get 50Gb they mash 2 lanes together, to get 100Gb they mash 4
lanes - the latency of single bit transmission is more-or-less
unchanged. Also note that with UDP/TCP pings or actual Ceph traffic
we're going via the kernel stack running on the CPU and as such the
speed & power-management of the CPU can make quite a difference.

Example 25GE on a dual-port CX-4 card in LACP bond, RHEL7 host.

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
$ ofed_info | head -1
MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):
$ grep 'model name' /proc/cpuinfo | uniq
model name  : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
$ ibv_devinfo
hca_id: mlx5_1
transport:  InfiniBand (0)
fw_ver: 14.18.1000
node_guid:  ...
sys_image_guid: ...
vendor_id:  0x02c9
vendor_part_id: 4117
hw_ver: 0x0
board_id:   MT_2420110034
...


$ sudo ping -M do -s 8972 -c 10 -f ...
10 packets transmitted, 10 received, 0% packet loss, time 4652ms
rtt min/avg/max/mdev = 0.029/0.031/2.711/0.015 ms, ipg/ewma 0.046/0.031 ms

$ sudo ping -M do -s 3972 -c 10 -f ...
10 packets transmitted, 10 received, 0% packet loss, time 3321ms
rtt min/avg/max/mdev = 0.019/0.022/0.364/0.003 ms, ipg/ewma 0.033/0.022 ms

$ sudo ping -M do -s 1972 -c 10 -f ...
10 packets transmitted, 10 received, 0% packet loss, time 2818ms
rtt min/avg/max/mdev = 0.017/0.018/0.086/0.005 ms, ipg/ewma 0.028/0.021 ms

$ sudo ping -M do -s 472 -c 10 -f ...
10 packets transmitted, 10 received, 0% packet loss, time 2498ms
rtt min/avg/max/mdev = 0.014/0.016/0.305/0.005 ms, ipg/ewma 0.024/0.017 ms

$ sudo ping -M do -c 10 -f ...
10 packets transmitted, 10 received, 0% packet loss, time 2363ms
rtt min/avg/max/mdev = 0.014/0.015/0.322/0.006 ms, ipg/ewma 0.023/0.016 ms

On 22 January 2018 at 22:37, Nick Fisk  wrote:
> Anyone with 25G ethernet willing to do the test? Would love to see what the
> latency figures are for that.
>
>
>
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Maged Mokhtar
> Sent: 22 January 2018 11:28
> To: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] What is the should be the expected latency of
> 10Gbit network connections
>
>
>
> On 2018-01-22 08:39, Wido den Hollander wrote:
>
>
>
> On 01/20/2018 02:02 PM, Marc Roos wrote:
>
>   If I test my connections with sockperf via a 1Gbit switch I get around
> 25usec, when I test the 10Gbit connection via the switch I have around
> 12usec is that normal? Or should there be a differnce of 10x.
>
>
> No, that's normal.
>
> Tests with 8k ping packets over different links I did:
>
> 1GbE:  0.800ms
> 10GbE: 0.200ms
> 40GbE: 0.150ms
>
> Wido
>
>
> sockperf ping-pong
>
> sockperf: Warmup stage (sending a few dummy messages)...
> sockperf: Starting test...
> sockperf: Test end (interrupted by timer)
> sockperf: Test ended
> sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
> ReceivedMessages=432874
> sockperf: = Printing statistics for Server No: 0
> sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
> ReceivedMessages=428640
> sockperf: > avg-lat= 11.609 (std-dev=1.684)
> sockperf: # dropped messages = 0; # duplicated messages = 0; #
> out-of-order messages = 0
> sockperf: Summary: Latency is 11.609 usec
> sockperf: Total 428640 observations; each percentile contains 4286.40
> observations
> sockperf: --->  observation =  856.944
> sockperf: ---> percentile  99.99 =   39.789
> sockperf: ---> percentile  99.90 =   20.550
> sockperf: ---> percentile  99.50 =   17.094
> sockperf: ---> percentile  99.00 =   15.578
> sockperf: ---> percentile  95.00 =   12.838
> sockperf: ---> percentile  90.00 =   12.299
> sockperf: ---> percentile  75.00 =   11.844
> sockperf: ---> percentile  50.00 =   11.409
> sockperf: ---> percentile  25.00 =   11.124
> sockperf: --->  observation =8.888
>
> sockperf: Warmup stage (sending a few dummy messages)...
> sockperf: Starting test...
> sockperf: Test end (interrupted by timer)
> sockperf: Test ended
> sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
> ReceivedMessages=22064
> sockperf: = Printing statistics for Server No: 0
> sockperf: [Valid Duration] RunTime=1.000 

Re: [ceph-users] OSD doesn't start - fresh installation

2018-01-22 Thread Hüseyin Atatür YILDIRIM
Hello,

Sorry for wasting your time; below mistakes originated from unproperly removing 
OSDs (specifically  forgetting to run “ceph auth del osd.#” ) from the cluster.
Thank you for attention though.

Best regards,
Atatür


From: Brad Hubbard [mailto:bhubb...@redhat.com]
Sent: Tuesday, January 23, 2018 2:43 AM
To: Hüseyin Atatür YILDIRIM 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD doesn't start - fresh installation



On Mon, Jan 22, 2018 at 10:37 PM, Hüseyin Atatür YILDIRIM 
> wrote:

Hi again,

In the “journalctl –xe”  output:

Jan 22 15:29:18 mon02 ceph-osd-prestart.sh[1526]: OSD data directory 
/var/lib/ceph/osd/ceph-1 does not exist; bailing out.

Also in my previous post, I forgot to say that “ceph-deploy osd create”  
command  doesn’t fail and appears to be successful, you can see from the logs.
But dameons on nodes don’t start.

Regards,
Atatur



[cid:image001.png@01D39429.0A875E50]



Hüseyin Atatür YILDIRIM
SİSTEM MÜHENDİSİ

Üniversiteler Mah. İhsan Doğramacı Bul. ODTÜ Teknokent Havelsan A.Ş. 23/B 
Çankaya Ankara TÜRKİYE

[cid:image003.png@01D39429.0A875E50]

+90 312 292 74 00

[cid:image004.png@01D39429.0A875E50]

+90 312 219 57 97



[cid:image005.jpg@01D39429.0A875E50]

YASAL UYARI: Bu elektronik posta işbu linki kullanarak ulaşabileceğiniz Koşul 
ve Şartlar dokümanına tabidir. 


LEGAL NOTICE: This e-mail is subject to the Terms and Conditions document which 
can be accessed with this link. 


[http://www.havelsan.com.tr/Library/images/mail/email.jpg]

Lütfen gerekmedikçe bu sayfa ve eklerini yazdırmayınız / Please consider the 
environment before printing this email




From: Hüseyin Atatür YILDIRIM
Sent: Monday, January 22, 2018 3:19 PM
To: ceph-users@lists.ceph.com
Subject: OSD doesn't start - fresh installation

Hi all,

Fresh installation but already ısed disks. I zapped all the disks and  ran  
“ceph-deploy ods create”  again but  got same results.
Log is attached. Can you please help?

Did you mean "sdb1" rather than "sdb" perhaps?



Thank you,
Atatur

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG inactive, peering

2018-01-22 Thread Karun Josy
Hi,

We added a new host to cluster and it was rebalancing.
And one PG became "inactive, peering" for very long time which created lot
of slow requests and poor performance to the whole cluster.

When I queried that PG, it showed this :

"recovery_state": [
{
"name": "Started/Primary/Peering/GetMissing",
"enter_time": "2018-01-22 18:40:04.777654",
"peer_missing_requested": [
{
"osd": "77(7)",

So I assumed it was stuck getting information from osd77 and so I marked
osd.77 down.
The status of the PG changed to "active+undersized+degraded" and PG became
active again.

Can anyone know why this happened ?
If I start osd.77,again the PG becomes inactive and peering state.


Is it becase osd.77 is bad ? Or will the same happen when the PG tries to
peer again with another disk?


Any help is really appreciated

Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Konstantin Shalygin

ping  -c 10 -f 10.0.1.12


Intel X710-DA2 -> Switch -> Intel X710-DA2:


--- 172.16.16.3 ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 1932ms
rtt min/avg/max/mdev = 0.013/0.014/0.131/0.004 ms, ipg/ewma 0.019/0.014 ms




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ghost degraded objects

2018-01-22 Thread David Zafman


Yes, the pending backport for what we have so far is in 
https://github.com/ceph/ceph/pull/20055


With this changes a backfill caused by marking an osd out has the 
results as shown:



    health: HEALTH_WARN
    115/600 objects misplaced (19.167%)

...
  data:
    pools:   1 pools, 1 pgs
    objects: 200 objects, 310 kB
    usage:   173 GB used, 126 GB / 299 GB avail
    pgs: 115/600 objects misplaced (19.167%)
 1 active+remapped+backfilling

David


On 1/19/18 5:14 AM, Sage Weil wrote:

On Fri, 19 Jan 2018, Ugis wrote:

Running Luminous 12.2.2, noticed strange behavior lately.
When for example setting "ceph osd out X" closer to the reballancing
end "degraded" objects still show up, but in "pgs:" section of ceph -s
no degraded pgs are still recovering, just ramapped and no degraded
pgs can be found in "ceph pg dump"

   health: HEALTH_WARN
 355767/30286841 objects misplaced (1.175%)
 Degraded data redundancy: 28/30286841 objects degraded
(0.000%), 96 pgs unclean

   services:
 ...
 osd: 38 osds: 38 up, 37 in; 96 remapped pgs

   data:
 pools:   19 pools, 4176 pgs
 objects: 9859k objects, 39358 GB
 usage:   114 TB used, 120 TB / 234 TB avail
 pgs: 28/30286841 objects degraded (0.000%)
  355767/30286841 objects misplaced (1.175%)
  4080 active+clean
  81   active+remapped+backfilling
  15   active+remapped+backfill_wait


Where those 28 degraded objects come from?

There aren't actually degraded objects.. in this case it's just
misreporting that there are.

This is a known issue in luminous.  Shortly after release we noticed the
problem and David has been working on several changes to the stats
calculation to improve the reporting, but those changes have not been
backported (and aren't quite complete, either--getting a truly accurate
number there is nontrivial in some cases it turns out).


In such cases usually when backfilling is done degraded objects also
disappear, but normally degraded objects should fix before remapped
ones by priority.

Yes.

It's unfortunately a scary warning (there shouldn't be degraded
objects... and generally speaking aren't) that understandably alarms
users.  We hope to have this sorted out soon!

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Warren Wang
25Gbe network. Servers have ConnectX-4 Pro, across a router, since L2 is 
terminated as the ToR:

10 packets transmitted, 10 received, 0% packet loss, time 1926ms
rtt min/avg/max/mdev = 0.013/0.013/0.205/0.004 ms, ipg/ewma 0.019/0.014 ms

Warren Wang
 
On 1/22/18, 4:06 PM, "ceph-users on behalf of Marc Roos" 
 wrote:


ping -c 10 -f 
ping -M do -s 8972 
 
10Gb ConnectX-3 Pro, DAC + Vlan
rtt min/avg/max/mdev = 0.010/0.013/0.200/0.003 ms, ipg/ewma 0.025/0.014 
ms

8980 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.144 ms
8980 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.205 ms
8980 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=0.248 ms
8980 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.281 ms
8980 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.187 ms
8980 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.121 ms

I350 Gigabit + bond
rtt min/avg/max/mdev = 0.027/0.038/0.211/0.006 ms, ipg/ewma 0.050/0.041 
ms

8980 bytes from 192.168.0.11: icmp_seq=1 ttl=64 time=0.555 ms
8980 bytes from 192.168.0.11: icmp_seq=2 ttl=64 time=0.508 ms
8980 bytes from 192.168.0.11: icmp_seq=3 ttl=64 time=0.514 ms
8980 bytes from 192.168.0.11: icmp_seq=4 ttl=64 time=0.555 ms



-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: maandag 22 januari 2018 12:38
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] What is the should be the expected latency of 
10Gbit network connections

Anyone with 25G ethernet willing to do the test? Would love to see what 
the latency figures are for that.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Maged Mokhtar
Sent: 22 January 2018 11:28
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] What is the should be the expected latency of 
10Gbit network connections

 

On 2018-01-22 08:39, Wido den Hollander wrote:



On 01/20/2018 02:02 PM, Marc Roos wrote: 

  If I test my connections with sockperf via a 1Gbit switch I 
get around
25usec, when I test the 10Gbit connection via the switch I 
have around
12usec is that normal? Or should there be a differnce of 10x.


No, that's normal.

Tests with 8k ping packets over different links I did:

1GbE:  0.800ms
10GbE: 0.200ms
40GbE: 0.150ms

Wido




sockperf ping-pong

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
ReceivedMessages=432874
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; 
SentMessages=428640;
ReceivedMessages=428640
sockperf: > avg-lat= 11.609 (std-dev=1.684)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages = 0
sockperf: Summary: Latency is 11.609 usec
sockperf: Total 428640 observations; each percentile contains 
4286.40
observations
sockperf: --->  observation =  856.944
sockperf: ---> percentile  99.99 =   39.789
sockperf: ---> percentile  99.90 =   20.550
sockperf: ---> percentile  99.50 =   17.094
sockperf: ---> percentile  99.00 =   15.578
sockperf: ---> percentile  95.00 =   12.838
sockperf: ---> percentile  90.00 =   12.299
sockperf: ---> percentile  75.00 =   11.844
sockperf: ---> percentile  50.00 =   11.409
sockperf: ---> percentile  25.00 =   11.124
sockperf: --->  observation =8.888

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
ReceivedMessages=22064
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=1.000 sec; 
SentMessages=20056;
ReceivedMessages=20056
sockperf: > avg-lat= 24.861 (std-dev=1.774)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages 

Re: [ceph-users] OSD doesn't start - fresh installation

2018-01-22 Thread Brad Hubbard
On Mon, Jan 22, 2018 at 10:37 PM, Hüseyin Atatür YILDIRIM <
hyildi...@havelsan.com.tr> wrote:

>
> Hi again,
>
>
>
> In the “journalctl –xe”  output:
>
>
>
> Jan 22 15:29:18 mon02 ceph-osd-prestart.sh[1526]: OSD data directory
> /var/lib/ceph/osd/ceph-1 does not exist; bailing out.
>
>
>
> Also in my previous post, I forgot to say that “ceph-deploy osd create”
> command  doesn’t fail and appears to be successful, you can see from the
> logs.
>
> But dameons on nodes don’t start.
>
>
>
> Regards,
>
> Atatur
>
>
>
>
>
>
> 
> Hüseyin Atatür YILDIRIM
> SİSTEM MÜHENDİSİ
> Üniversiteler Mah. İhsan Doğramacı Bul. ODTÜ Teknokent Havelsan A.Ş. 23/B
> Çankaya Ankara TÜRKİYE
> +90 312 292 74 00 <+90%20312%20292%2074%2000> +90 312 219 57 97
> <+90%20312%20219%2057%2097>
> YASAL UYARI: Bu elektronik posta işbu linki kullanarak ulaşabileceğiniz
> Koşul ve Şartlar dokümanına tabidir.
> 
> LEGAL NOTICE: This e-mail is subject to the Terms and Conditions
> document which can be accessed with this link.
> 
> Lütfen gerekmedikçe bu sayfa ve eklerini yazdırmayınız / Please consider
> the environment before printing this email
>
> *From:* Hüseyin Atatür YILDIRIM
> *Sent:* Monday, January 22, 2018 3:19 PM
> *To:* ceph-users@lists.ceph.com
> *Subject:* OSD doesn't start - fresh installation
>
>
>
> Hi all,
>
>
>
> Fresh installation but already ısed disks. I zapped all the disks and  ran
>  “ceph-deploy ods create”  again but  got same results.
>
> Log is attached. Can you please help?
>

Did you mean "sdb1" rather than "sdb" perhaps?


>
>
>
> Thank you,
>
> Atatur
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

2018-01-22 Thread Andras Pataki
Just to close this thread up - it looks like all the problems were 
related to setting the "mds cache size" option in Luminous instead of 
using "mds cache memory limit".  The "mds cache size" option 
documentation says that "it is recommended to use mds_cache_memory_limit 
...", but it looks more like "mds cache size" simply does not work in 
Luminous like it used to in Jewel (or does not work period).  As a 
result the MDS was trying to aggressively reduce caches in our setup.  
Since we switched all MDS's over to 'mds cache memory limit' of 16GB and 
bounced them, we have had no performance or cache pressure issues, and 
as expected they hover around 22-23GB of RSS.


Thanks everyone for the help,

Andras


On 01/18/2018 12:34 PM, Patrick Donnelly wrote:

Hi Andras,

On Thu, Jan 18, 2018 at 3:38 AM, Andras Pataki
 wrote:

Hi John,

Some other symptoms of the problem:  when the MDS has been running for a few
days, it starts looking really busy.  At this time, listing directories
becomes really slow.  An "ls -l" on a directory with about 250 entries takes
about 2.5 seconds.  All the metadata is on OSDs with NVMe backing stores.
Interestingly enough the memory usage seems pretty low (compared to the
allowed cache limit).


 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
1604408 ceph  20   0 3710304 2.387g  18360 S 100.0  0.9 757:06.92
/usr/bin/ceph-mds -f --cluster ceph --id cephmon00 --setuser ceph --setgroup
ceph

Once I bounce it (fail it over), the CPU usage goes down to the 10-25%
range.  The same ls -l after the bounce takes about 0.5 seconds.  I
remounted the filesystem before each test to ensure there isn't anything
cached.

 PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND
   00 ceph  20   0 6537052 5.864g  18500 S  17.6  2.3   9:23.55
/usr/bin/ceph-mds -f --cluster ceph --id cephmon02 --setuser ceph --setgroup
ceph

Also, I have a crawler that crawls the file system periodically.  Normally
the full crawl runs for about 24 hours, but with the slowing down MDS, now
it has been running for more than 2 days and isn't close to finishing.

The MDS related settings we are running with are:

mds_cache_memory_limit = 17179869184
mds_cache_reservation = 0.10

Debug logs from the MDS at that time would be helpful with `debug mds
= 20` and `debug ms = 1`. Feel free to create a tracker ticket and use
ceph-post-file [1] to share logs.

[1] http://docs.ceph.com/docs/hammer/man/8/ceph-post-file/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous upgrade with existing EC pools

2018-01-22 Thread John Spray
On Mon, Jan 22, 2018 at 9:23 PM, David Turner  wrote:
> I ran into a problem removing the cache tier.  I tried everything I could to
> get past it, but I ended up having to re-enable it.  I'm running on 12.2.2
> with all bluestore OSDs.
>
> I successfully set allow_ec_overwrites to true, I set the cache-mode to
> forward, I flushed/evicted the entire cache, and then went to remove-overlay
> on the data pool and received the error "Error EBUSY: pool 'cephfs_data' is
> in use by CephFS via its tier".  I made sure that no client had cephfs
> mounted, I even stopped the MDS daemons, and the same error came up every
> time.  I also tested setting the cache-mode to none (from forward) after I
> ensured that the cache was empty and it told me "set cache-mode for pool
> 'cephfs_cache' to none (WARNING: pool is still configured as read or write
> tier)" and still had the same error for removing the overlay.  I ultimately
> had to concede defeat and set the cache-mode back to writeback to get things
> working again.
>
> Does anyone have any ideas for how to remove this cache tier?  Having full
> write on reads is not something I really want to keep around if I can get
> rid of it.

Oops, this is a case that we didn't think about in the mon's logic for
checking whether it's okay to remove a tier.  I've opened a ticket:
http://tracker.ceph.com/issues/22754

You'll need to either wait for a patch (or hack out that buggy check
yourself from the OSDMonitor::_check_remove_tier function), or leave
those pools in place and add a second, new EC pool to your filesystem
and use layouts to move some files onto it.

John

> On Mon, Jan 22, 2018 at 7:25 AM David Turner  wrote:
>>
>> I've already migrated all osds to bluestore and changed my pools to use a
>> crush rule specifying them to use an HDD class (forced about half of my data
>> to move). This week I'm planning to add in some new SSDs to move the
>> metadata pool to.
>>
>> I have experience with adding and removing cache tiers without losing data
>> in the underlying pool. The documentation on this in the upgrade procedure
>> and from the EC documentation had me very leary. Seeing the information
>> about EC pools from the CephFS documentation helps me to feel much more
>> confident. Thank you.
>>
>> On Mon, Jan 22, 2018, 5:53 AM John Spray  wrote:
>>>
>>> On Sat, Jan 20, 2018 at 6:26 PM, David Turner 
>>> wrote:
>>> > I am not able to find documentation for how to convert an existing
>>> > cephfs
>>> > filesystem to use allow_ec_overwrites. The documentation says that the
>>> > metadata pool needs to be replicated, but that the data pool can be EC.
>>> > But
>>> > it says, "For Cephfs, using an erasure coded pool means setting that
>>> > pool in
>>> > a file layout." Is that really necessary if your metadata pool is
>>> > replicated
>>> > and you have an existing EC pool for the data? Could I just enable ec
>>> > overwrites and start flushing/removing the cache tier and be on my way
>>> > to
>>> > just using an EC pool?
>>>
>>> That snipped in the RADOS docs is a bit misleading: you only need to
>>> use a file layout if you're adding an EC pool as an addition pool
>>> rather than using it during creation of a filesystem.
>>>
>>> The CephFS version of events is here:
>>>
>>> http://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
>>>
>>> As for migrating from a cache tiered configuration, I haven't tried
>>> it, but there's nothing CephFS-specific about it.  If the underlying
>>> pool that's set as the cephfs data pool is EC and has
>>> allow_ec_overwrites then CephFS won't care -- but I'm personally not
>>> an expert on what knobs and buttons to use to migrate away from a
>>> cache tiered config.
>>>
>>> Do bear in mind that your OSDs need to be using bluestore (which may
>>> not be the case since you're talking about migrating an existing
>>> system?)
>>>
>>> John
>>>
>>> >
>>> > ___
>>> > ceph-users mailing list
>>> > ceph-users@lists.ceph.com
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck pgs (activating+remapped) and slow requests after adding OSD node via ceph-ansible

2018-01-22 Thread Peter Linder
Did you find out anything about this? We are also getting pgs stuck 
"activating+remapped". I have to manually alter bucket weights so that 
they are basically the same everywhere, even if disks aren't the same 
size to fix the problem, but it is a real hassle every time we add a new 
node or disk.


See my email subject "Weird issues related to (large/small) weights in 
mixed nvme/hdd pool" from 2018-01-20 and see if there are some similarities?



Regards,
Peter

Den 2018-01-07 kl. 12:17, skrev Tzachi Strul:

Hi all,
We have 5 node ceph cluster (Luminous 12.2.1) installed via ceph-ansible.
All servers have 16X1.5TB SSD disks.
3 of these servers are also acting as MON+MGRs.
We don't have separated network for cluster and public, each node has 
4 NICs bonded together (40G) and serves cluster+public communication 
(We know it's not ideal and planning to change it).


Last week we added another node to cluster (another 16*1.5TB ssd).
We used ceph-ansible latest stable release.
After OSD activation cluster started rebalancing and problems began:
1. Cluster entered HEALTH_ERROR state
2. 67 pgs stuck at activating+remapped
3. A lot of blocked slow requests.

This cluster serves OpenStack volumes and almost all OpenStack 
instances got 100% disk utilization and hanged, eventually, 
cinder-volume has crushed.


Eventually, after restarting several OSDs, problem solved and cluster 
got back to HEALTH_OK


Our configuration already has:
osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery op priority = 1

In addition, we see a lot of bad mappings:
for example: bad mapping rule 0 x 52 num_rep 8 result 
[32,5,78,25,96,59,80]


What can be the cause and what can I do in order to avoid this 
situation? we need to add another 9 osd servers and can't afford downtime.


Any help would be appreciated. Thank you very much


Our ceph configuration:

[mgr]
mgr_modules = dashboard zabbix

[global]
cluster network = *removed for security resons*
fsid =  *removed for security resons*
mon host =  *removed for security resons*
mon initial members =  *removed for security resons*
mon osd down out interval = 900
osd pool default size = 3
public network =  *removed for security resons*

[client.libvirt]
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok # 
must be writable by QEMU and allowed by SELinux or AppArmor
log file = /var/log/ceph/qemu-guest-$pid.log # must be writable by 
QEMU and allowed by SELinux or AppArmor


[osd]
osd backfill scan max = 16
osd backfill scan min = 4
osd bluestore cache size = 104857600  **Due to 12.2.1 bluestore memory 
leak bug**

osd max backfills = 1
osd max scrubs = 1
osd recovery max active = 1
osd recovery max single start = 1
osd recovery op priority = 1
osd recovery threads = 1


--

*Tzachi Strul*

*Storage DevOps *// *Kenshoo*


This e-mail, as well as any attached document, may contain material 
which is confidential and privileged and may include trademark, 
copyright and other intellectual property rights that are proprietary 
to Kenshoo Ltd,  its subsidiaries or affiliates ("Kenshoo"). This 
e-mail and its attachments may be read, copied and used only by the 
addressee for the purpose(s) for which it was disclosed herein. If you 
have received it in error, please destroy the message and any 
attachment, and contact us immediately. If you are not the intended 
recipient, be aware that any review, reliance, disclosure, copying, 
distribution or use of the contents of this message without Kenshoo's 
express permission is strictly prohibited.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous upgrade with existing EC pools

2018-01-22 Thread David Turner
I ran into a problem removing the cache tier.  I tried everything I could
to get past it, but I ended up having to re-enable it.  I'm running on
12.2.2 with all bluestore OSDs.

I successfully set allow_ec_overwrites to true, I set the cache-mode to
forward, I flushed/evicted the entire cache, and then went to
remove-overlay on the data pool and received the error "Error EBUSY: pool
'cephfs_data' is in use by CephFS via its tier".  I made sure that no
client had cephfs mounted, I even stopped the MDS daemons, and the same
error came up every time.  I also tested setting the cache-mode to none
(from forward) after I ensured that the cache was empty and it told me "set
cache-mode for pool 'cephfs_cache' to none (WARNING: pool is still
configured as read or write tier)" and still had the same error for
removing the overlay.  I ultimately had to concede defeat and set the
cache-mode back to writeback to get things working again.

Does anyone have any ideas for how to remove this cache tier?  Having full
write on reads is not something I really want to keep around if I can get
rid of it.

On Mon, Jan 22, 2018 at 7:25 AM David Turner  wrote:

> I've already migrated all osds to bluestore and changed my pools to use a
> crush rule specifying them to use an HDD class (forced about half of my
> data to move). This week I'm planning to add in some new SSDs to move the
> metadata pool to.
>
> I have experience with adding and removing cache tiers without losing data
> in the underlying pool. The documentation on this in the upgrade procedure
> and from the EC documentation had me very leary. Seeing the information
> about EC pools from the CephFS documentation helps me to feel much more
> confident. Thank you.
>
> On Mon, Jan 22, 2018, 5:53 AM John Spray  wrote:
>
>> On Sat, Jan 20, 2018 at 6:26 PM, David Turner 
>> wrote:
>> > I am not able to find documentation for how to convert an existing
>> cephfs
>> > filesystem to use allow_ec_overwrites. The documentation says that the
>> > metadata pool needs to be replicated, but that the data pool can be EC.
>> But
>> > it says, "For Cephfs, using an erasure coded pool means setting that
>> pool in
>> > a file layout." Is that really necessary if your metadata pool is
>> replicated
>> > and you have an existing EC pool for the data? Could I just enable ec
>> > overwrites and start flushing/removing the cache tier and be on my way
>> to
>> > just using an EC pool?
>>
>> That snipped in the RADOS docs is a bit misleading: you only need to
>> use a file layout if you're adding an EC pool as an addition pool
>> rather than using it during creation of a filesystem.
>>
>> The CephFS version of events is here:
>>
>> http://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
>>
>> As for migrating from a cache tiered configuration, I haven't tried
>> it, but there's nothing CephFS-specific about it.  If the underlying
>> pool that's set as the cephfs data pool is EC and has
>> allow_ec_overwrites then CephFS won't care -- but I'm personally not
>> an expert on what knobs and buttons to use to migrate away from a
>> cache tiered config.
>>
>> Do bear in mind that your OSDs need to be using bluestore (which may
>> not be the case since you're talking about migrating an existing
>> system?)
>>
>> John
>>
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Marc Roos

ping -c 10 -f 
ping -M do -s 8972 
 
10Gb ConnectX-3 Pro, DAC + Vlan
rtt min/avg/max/mdev = 0.010/0.013/0.200/0.003 ms, ipg/ewma 0.025/0.014 
ms

8980 bytes from 10.0.0.11: icmp_seq=3 ttl=64 time=0.144 ms
8980 bytes from 10.0.0.11: icmp_seq=4 ttl=64 time=0.205 ms
8980 bytes from 10.0.0.11: icmp_seq=5 ttl=64 time=0.248 ms
8980 bytes from 10.0.0.11: icmp_seq=6 ttl=64 time=0.281 ms
8980 bytes from 10.0.0.11: icmp_seq=7 ttl=64 time=0.187 ms
8980 bytes from 10.0.0.11: icmp_seq=8 ttl=64 time=0.121 ms

I350 Gigabit + bond
rtt min/avg/max/mdev = 0.027/0.038/0.211/0.006 ms, ipg/ewma 0.050/0.041 
ms

8980 bytes from 192.168.0.11: icmp_seq=1 ttl=64 time=0.555 ms
8980 bytes from 192.168.0.11: icmp_seq=2 ttl=64 time=0.508 ms
8980 bytes from 192.168.0.11: icmp_seq=3 ttl=64 time=0.514 ms
8980 bytes from 192.168.0.11: icmp_seq=4 ttl=64 time=0.555 ms



-Original Message-
From: Nick Fisk [mailto:n...@fisk.me.uk] 
Sent: maandag 22 januari 2018 12:38
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] What is the should be the expected latency of 
10Gbit network connections

Anyone with 25G ethernet willing to do the test? Would love to see what 
the latency figures are for that.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Maged Mokhtar
Sent: 22 January 2018 11:28
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] What is the should be the expected latency of 
10Gbit network connections

 

On 2018-01-22 08:39, Wido den Hollander wrote:



On 01/20/2018 02:02 PM, Marc Roos wrote: 

  If I test my connections with sockperf via a 1Gbit switch I 
get around
25usec, when I test the 10Gbit connection via the switch I 
have around
12usec is that normal? Or should there be a differnce of 10x.


No, that's normal.

Tests with 8k ping packets over different links I did:

1GbE:  0.800ms
10GbE: 0.200ms
40GbE: 0.150ms

Wido




sockperf ping-pong

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
ReceivedMessages=432874
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; 
SentMessages=428640;
ReceivedMessages=428640
sockperf: > avg-lat= 11.609 (std-dev=1.684)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages = 0
sockperf: Summary: Latency is 11.609 usec
sockperf: Total 428640 observations; each percentile contains 
4286.40
observations
sockperf: --->  observation =  856.944
sockperf: ---> percentile  99.99 =   39.789
sockperf: ---> percentile  99.90 =   20.550
sockperf: ---> percentile  99.50 =   17.094
sockperf: ---> percentile  99.00 =   15.578
sockperf: ---> percentile  95.00 =   12.838
sockperf: ---> percentile  90.00 =   12.299
sockperf: ---> percentile  75.00 =   11.844
sockperf: ---> percentile  50.00 =   11.409
sockperf: ---> percentile  25.00 =   11.124
sockperf: --->  observation =8.888

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
ReceivedMessages=22064
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=1.000 sec; 
SentMessages=20056;
ReceivedMessages=20056
sockperf: > avg-lat= 24.861 (std-dev=1.774)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages = 0
sockperf: Summary: Latency is 24.861 usec
sockperf: Total 20056 observations; each percentile contains 
200.56
observations
sockperf: --->  observation =   77.158
sockperf: ---> percentile  99.99 =   54.285
sockperf: ---> percentile  99.90 =   37.864
sockperf: ---> percentile  99.50 =   34.406
sockperf: ---> percentile  99.00 =   33.337
sockperf: ---> percentile  95.00 =   27.497
sockperf: ---> percentile  90.00 =   26.072
sockperf: ---> 

Re: [ceph-users] Ceph Future

2018-01-22 Thread Jack
On 01/22/2018 08:38 PM, Massimiliano Cuttini wrote:
> The web interface is needed because:*cmd-lines are prune to typos.*
And you never misclick, indeed;

> SMART is widely used.
SMART has never, and will never be any useful for failure prediction.

> My opinion is pretty simple: the more a software is complex the more
> you'll be prune to errors.
As you said, "from great power comes great costs".
Ceph is not for dummies (albeit installing and maintening a running
cluster is pretty straighforward).

> A web interface can just make the basics checks before submitting a new
> command to the pool.
And the command line does exactly the same. Try removing a pool, you
will see.
MMI's are the same : if errors can be prevented, they shall.
To prevent all errors, you must remove all functionnalities.

> To say "/ceph is not for rookies, //it's better having a threshold"/ can
> be said only from a person that really don't love it's own data (keeping
> management as error free as possible), but instead just want to be the
> only one allowed to manage them.
Yeah, well, whatever, most system engineers know how to handle Ceph.
Most non-system engineers do not.
A task, a job, I don't master other's job, hence it feels natural that
others do not master mine.

Sorry if this sound so strange to you.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
Here's a bit more info as I read the logs. Firstly, these are in fact
Filestore OSDs... I was confused, but I don't think it makes a big
difference.

Next, all the other OSDs had indeed noticed that osd.2 had failed:

2018-01-22 18:37:20.456535 7f831728e700 -1 osd.0 598 heartbeat_check:
no reply from 137.138.121.224:6803 osd.2 since back 2018-01-22
18:36:59.514902 front 2018-01-22 18:36:59.514902 (cutoff 2018-01-22
18:37:00.456532)

2018-01-22 18:37:21.085178 7fc911169700 -1 osd.1 598 heartbeat_check:
no reply from 137.138.121.224:6803 osd.2 since back 2018-01-22
18:37:00.518067 front 2018-01-22 18:37:00.518067 (cutoff 2018-01-22
18:37:01.085175)

2018-01-22 18:37:21.408881 7f78b8ea4700 -1 osd.4 598 heartbeat_check:
no reply from 137.138.121.224:6803 osd.2 since back 2018-01-22
18:37:00.873298 front 2018-01-22 18:37:00.873298 (cutoff 2018-01-22
18:37:01.408880)

2018-01-22 18:37:21.117301 7f4ac8138700 -1 osd.3 598 heartbeat_check:
no reply from 137.138.121.224:6803 osd.2 since back 2018-01-22
18:37:01.092182 front 2018-01-22 18:37:01.092182 (cutoff 2018-01-22
18:37:01.117298)



The only "reported failed" came from osd.0, who BTW was the only OSD
who hadn't been marked down for not sending beacons:

2018-01-22 18:37:20.457400 7fc1b51ce700  1
mon.cephcta-mon-658cb618c9@0(leader).osd e598 prepare_failure osd.2
137.138.121.224:6800/1377 from osd.0 137.138.156.51:6800/1286 is
reporting failure:1
2018-01-22 18:37:20.457457 7fc1b51ce700  0 log_channel(cluster) log
[DBG] : osd.2 137.138.121.224:6800/1377 reported failed by osd.0
137.138.156.51:6800/1286


So presumably it's because only 1 reporter showed up that osd.2 was
never marked down. (1 being less than "mon_osd_min_down_reporters":
"2")


And BTW, I didn't mention before that the cluster came fully back to
HEALTH_OK after I hard rebooted the osd.2 machine -- the other OSDs
were unblocked and recovery healed everything:

2018-01-22 19:31:12.381762 7fc907956700  0 log_channel(cluster) log
[WRN] : Monitor daemon marked osd.1 down, but it is still running
2018-01-22 19:31:12.381774 7fc907956700  0 log_channel(cluster) log
[DBG] : map e602 wrongly marked me down at e601

2018-01-22 19:31:12.515178 7f78af691700  0 log_channel(cluster) log
[WRN] : Monitor daemon marked osd.4 down, but it is still running
2018-01-22 19:31:12.515186 7f78af691700  0 log_channel(cluster) log
[DBG] : map e602 wrongly marked me down at e601

2018-01-22 19:31:12.586532 7f4abe925700  0 log_channel(cluster) log
[WRN] : Monitor daemon marked osd.3 down, but it is still running
2018-01-22 19:31:12.586544 7f4abe925700  0 log_channel(cluster) log
[DBG] : map e602 wrongly marked me down at e601


Thanks for the help solving this puzzle,

Dan


On Mon, Jan 22, 2018 at 8:07 PM, Dan van der Ster  wrote:
> Hi all,
>
> We just saw an example of one single down OSD taking down a whole
> (small) luminous 12.2.2 cluster.
>
> The cluster has only 5 OSDs, on 5 different servers. Three of those
> servers also run a mon/mgr combo.
>
> First, we had one server (mon+osd) go down legitimately [1] -- I can
> tell when it went down because the mon quorum broke:
>
> 2018-01-22 18:26:31.521695 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121277 : cluster [WRN] Health check failed: 1/3
> mons down, quorum cephcta-mon-658cb618c9,cephcta-mon-3e0d524825
> (MON_DOWN)
>
> Then there's a long pileup of slow requests until the OSD is finally
> marked down due to no beacon:
>
> 2018-01-22 18:47:31.549791 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121447 : cluster [WRN] Health check update: 372
> slow requests are blocked > 32 sec (REQUEST_SLOW)
> 2018-01-22 18:47:56.671360 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121448 : cluster [INF] osd.2 marked down after no
> beacon for 903.538932 seconds
> 2018-01-22 18:47:56.672315 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121449 : cluster [WRN] Health check failed: 1
> osds down (OSD_DOWN)
>
>
> So, first question is: why didn't that OSD get detected as failing much 
> earlier?
>
>
> The slow requests continue until almost 10 minutes later ceph marks 3
> of the other 4 OSDs down after seeing no beacons:
>
> 2018-01-22 18:56:31.727970 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121539 : cluster [INF] osd.1 marked down after no
> beacon for 900.091770 seconds
> 2018-01-22 18:56:31.728105 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121540 : cluster [INF] osd.3 marked down after no
> beacon for 900.091770 seconds
> 2018-01-22 18:56:31.728197 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121541 : cluster [INF] osd.4 marked down after no
> beacon for 900.091770 seconds
> 2018-01-22 18:56:31.730108 mon.cephcta-mon-658cb618c9 mon.0
> 137.138.62.69:6789/0 121542 : cluster [WRN] Health check update: 4
> osds down (OSD_DOWN)
>
>
> 900 is the default mon_osd_report_timeout -- why are these OSDs all
> stuck not sending beacons? Why haven't they noticed that the osd.2 had
> failed, then 

Re: [ceph-users] Ceph Future

2018-01-22 Thread Massimiliano Cuttini

I have to disagree with you Marc,



Hmmm, I have to disagree with

'too many services'
What do you mean, there is a process for each osd, mon, mgr and mds.
There are less processes running than on a default windows fileserver.
What is the complaint here?


I wrote: "/_Ceph is amazing_, but is just too big to have everything 
under control (too many services)/" under the point "MANAGEMENT 
COMPLICATIONS".
I found this sentence pretty CLEAR. Here I did NOT complaint about the 
fact that there are too many services to run a software.
Instead I was talking about the management complexity to find easily if 
there is something wrong.

There is not a clear view if everything is running right or not.

Is this a complaint about Ceph having less or more process than a 
windows fileservers? Of course that was not the point.
Please read carefully before answer with a non-sense comparison with 
other services.



'manage everything by your command-line'
What is so bad about this? Even microsoft is seeing the advantages and
introduced power shell etc.

I'm saying that there nothing else EXCEPT the command line.
Again I'm said something different, please read again what i wrote.
Why call a service "manager" when it just act as a "performance monitor"?



I would recommend hiring a ceph admin, then
you don't even need to use the web interface. You will have voice
control on ceph, how cool is that! ;)
(actually maybe we can do feature request to integrate apple siri (not
forgetting of course google/amazon talk?))


Wow you are so funny and cool!
Probably you love to have all your collegues thinking about how great 
are you managing such a complex system without any kind of error.


Instead I need to delegate to others.
The shell is powerfull. But from great power comes great responsability.
If you don't see the issue of giving the shell to lower-level techs even 
if it's just needed to setup a new RBD image it's your business.


The web interface is needed because:*cmd-lines are prune to typos.*
Moreover *cmd-lines of Ceph *(due to the complexity of the software) are 
very long and much more prune to typos.
Wrap inputs in commands before sending them in command-lines is just a 
safer way to handle a solution that holding TB/PB of delicate data of 
customers.
If you don't agree with such a simple statement it's because probably 
you are the only one that can brag to never type a wrong command in your 
life.



'iscsi'
Afaik this is not even a default install with ceph or a ceph package. I
am also not complaining to ceph, that my nespresso machine does not have
triple redundancy.

Nice metafora. However you are wrong.
I'm not complaining about the left of triple redundancy of your 
Nespresso Machine or something that has nothing to do with CEPH.


CEPH is a storage software. ISCSI is a storage connectivity technology.
If you miss the connection between them I'm so so so sorry for you.

Many people looked after for this feature since the beginning of this 
project.
Ceph have already integrated ISCSI in the latest release: 
http://docs.ceph.com/docs/master/rbd/iscsi-overview/#
However it's not clear if the support of this technology is optimized, 
or just a early feature.


ISCSI is still widely used by old system as unique way to connect remote 
storage.
XenServer just miss the possibility to connect with CEPH without a 
proper ISCSI connection.
If you don't feel the needed for ISCSI while other do... it's ok but 
please also don't feel the need to comment everything with silly metaforas.

Thanks.


'check hardware below the hood'
Why waste development on this when there are already enough solutions
out there? As if it is even possible to make a one size fits all
solution.

SMART is widely used.
I don't think it's stupid to read some data that can forecast your next 
disk failure.



Afaiac I think the ceph team has done great job. I was pleasantly
surprised by the very easy to install. Just with installing the rpms
(not using ceph-deploy).

Installation is easy.
Management is out of view.


Next to this, I think it is good to have some
sort of 'threshold' to keep the wordpress admin's at a distance. Ceph
solutions are holding TB/PB of other peoples data, and we don’t want
rookies destroying that, nor blame ceph for that matter.

You are completly out of point.
Everybody knows already that no "wordpress" admin can manage an 
enterprise storage.
Ceph is runned and builded by people that also are aware to setup OS, 
LAN, BOND and so on.

So your talks about dangerous rookies are about nothing.

My opinion is pretty simple: the more a software is complex the more 
you'll be prune to errors.
It's not a matter of how much you are good with command line (you should 
never overstimate yourself).

It's just a matter that someday it'll happen.
Maybe because you are bored to run the same command everytime or maybe 
because after tons of working ours you miss to run properly a command 
before another.


A web interface can just make 

[ceph-users] Luminous: example of a single down osd taking out a cluster

2018-01-22 Thread Dan van der Ster
Hi all,

We just saw an example of one single down OSD taking down a whole
(small) luminous 12.2.2 cluster.

The cluster has only 5 OSDs, on 5 different servers. Three of those
servers also run a mon/mgr combo.

First, we had one server (mon+osd) go down legitimately [1] -- I can
tell when it went down because the mon quorum broke:

2018-01-22 18:26:31.521695 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121277 : cluster [WRN] Health check failed: 1/3
mons down, quorum cephcta-mon-658cb618c9,cephcta-mon-3e0d524825
(MON_DOWN)

Then there's a long pileup of slow requests until the OSD is finally
marked down due to no beacon:

2018-01-22 18:47:31.549791 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121447 : cluster [WRN] Health check update: 372
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-22 18:47:56.671360 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121448 : cluster [INF] osd.2 marked down after no
beacon for 903.538932 seconds
2018-01-22 18:47:56.672315 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121449 : cluster [WRN] Health check failed: 1
osds down (OSD_DOWN)


So, first question is: why didn't that OSD get detected as failing much earlier?


The slow requests continue until almost 10 minutes later ceph marks 3
of the other 4 OSDs down after seeing no beacons:

2018-01-22 18:56:31.727970 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121539 : cluster [INF] osd.1 marked down after no
beacon for 900.091770 seconds
2018-01-22 18:56:31.728105 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121540 : cluster [INF] osd.3 marked down after no
beacon for 900.091770 seconds
2018-01-22 18:56:31.728197 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121541 : cluster [INF] osd.4 marked down after no
beacon for 900.091770 seconds
2018-01-22 18:56:31.730108 mon.cephcta-mon-658cb618c9 mon.0
137.138.62.69:6789/0 121542 : cluster [WRN] Health check update: 4
osds down (OSD_DOWN)


900 is the default mon_osd_report_timeout -- why are these OSDs all
stuck not sending beacons? Why haven't they noticed that the osd.2 had
failed, then recover things on the remaining OSDs?

The config [2] is pretty standard, save for one perhaps culprit:

   osd op thread suicide timeout = 1800

That's part of our standard config, mostly to prevent OSDs from
suiciding during FileStore splitting. (This particular cluster is 100%
bluestore, so admittedly we could revert that here).

Any idea what went wrong here?

I can create a tracker and post logs if this is interesting.

Best Regards,

Dan

[1] The failure mode of this OSD appears like its block device just
froze. It runs inside a VM and the console showed several of the
typical 120s block dev timeouts. The machine remained pingable, but
wasn't doing any IO.

[2] https://gist.github.com/dvanders/7eca771b6a8d1164bae8ea1fe45cf9f2
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
Weight the remaining disks you added to 0.0.  They seem to be a bad batch.
This will start moving their data off of them and back onto the rest of the
cluster.  I generally suggest not to add storage in more than what you can
afford to lose, unless you trust your burn-in process.  So if you have a
host failure domain and size=3, I wouldn't add storage in more than 2 nodes
at a time in case the disks die.  That way you are much less likely to have
scares.

I assume this disk was in a third node leaving you with 3 failed disks
across 3 hosts?  It doesn't seem like these drives are going to work out
and I would immediately weight all newly added disks to 0.0 and get back to
a point where you are no longer backfilling/recovering PGs and see where
things are at from there.

On Mon, Jan 22, 2018 at 1:33 PM Nico Schottelius <
nico.schottel...@ungleich.ch> wrote:

>
> While writing, yet another disk (osd.61 now) died and now we have
> 172 pgs down:
>
> [19:32:35] server2:~# ceph -s
>   cluster:
> id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
> health: HEALTH_WARN
> noscrub,nodeep-scrub flag(s) set
> 21033/2263701 objects misplaced (0.929%)
> Reduced data availability: 186 pgs inactive, 172 pgs down
> Degraded data redundancy: 67370/2263701 objects degraded
> (2.976%), 219 pgs unclean, 46 pgs degraded, 46 pgs undersized
> mon server2 is low on available space
>
>   services:
> mon: 3 daemons, quorum server5,server3,server2
> mgr: server5(active), standbys: server2, 2, 0, server3
> osd: 54 osds: 53 up, 53 in; 47 remapped pgs
>  flags noscrub,nodeep-scrub
>
>   data:
> pools:   3 pools, 1344 pgs
> objects: 736k objects, 2889 GB
> usage:   8517 GB used, 36474 GB / 44991 GB avail
> pgs: 13.839% pgs not active
>  67370/2263701 objects degraded (2.976%)
>  21033/2263701 objects misplaced (0.929%)
>  1125 active+clean
>  172  down
>  26   active+undersized+degraded+remapped+backfilling
>  14   undersized+degraded+remapped+backfilling+peered
>  6active+undersized+degraded+remapped+backfill_wait
>  1active+remapped+backfill_wait
>
>   io:
> client:   835 kB/s rd, 262 kB/s wr, 16 op/s rd, 25 op/s wr
> recovery: 102 MB/s, 26 objects/s
>
> What is the most sensible way to get out of this situation?
>
>
>
>
>
> David Turner  writes:
>
> > I do remember seeing that exactly. As the number of recovery_wait pgs
> > decreased, the number of unfound objects decreased until they were all
> > found.  Unfortunately it blocked some IO from happening during the
> > recovery, but in the long run we ended up with full data integrity again.
> >
> > On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius <
> > nico.schottel...@ungleich.ch> wrote:
> >
> >>
> >> Hey David,
> >>
> >> thanks for the fast answer. All our pools are running with size=3,
> >> min_size=2 and the two disks were in 2 different hosts.
> >>
> >> What I am a bit worried about is the output of "ceph pg 4.fa query" (see
> >> below) that indicates that ceph already queried all other hosts and did
> >> not find the data anywhere.
> >>
> >> Do you remember having seen something similar?
> >>
> >> Best,
> >>
> >> Nico
> >>
> >> David Turner  writes:
> >>
> >> > I have had the same problem before with unfound objects that happened
> >> while
> >> > backfilling after losing a drive. We didn't lose drives outside of the
> >> > failure domains and ultimately didn't lose any data, but we did have
> to
> >> > wait until after all of the PGs in recovery_wait state were caught up.
> >> So
> >> > if the 2 disks you lost were in the same host and your CRUSH rules are
> >> set
> >> > so that you can lose a host without losing data, then the cluster will
> >> > likely find all of the objects by the time it's done backfilling.
> With
> >> > only losing 2 disks, I wouldn't worry about the missing objects not
> >> > becoming found unless you're pool size=2.
> >> >
> >> > On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
> >> > nico.schottel...@ungleich.ch> wrote:
> >> >
> >> >>
> >> >> Hello,
> >> >>
> >> >> we added about 7 new disks yesterday/today and our cluster became
> very
> >> >> slow. While the rebalancing took place, 2 of the 7 new added disks
> >> >> died.
> >> >>
> >> >> Our cluster is still recovering, however we spotted that there are a
> lot
> >> >> of unfound objects.
> >> >>
> >> >> We lost osd.63 and osd.64, which seem not to be involved into the
> sample
> >> >> pg that has unfound objects.
> >> >>
> >> >> We were wondering why there are unfound objects, where they are
> coming
> >> >> from and if there is a way to recover them?
> >> >>
> >> >> Any help appreciated,
> >> >>
> >> >> Best,
> >> >>
> >> >> Nico
> >> >>
> >> >>
> >> >> Our status is:
> >> >>
> >> >>   cluster:
> >> >> id: 

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

While writing, yet another disk (osd.61 now) died and now we have
172 pgs down:

[19:32:35] server2:~# ceph -s
  cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
noscrub,nodeep-scrub flag(s) set
21033/2263701 objects misplaced (0.929%)
Reduced data availability: 186 pgs inactive, 172 pgs down
Degraded data redundancy: 67370/2263701 objects degraded (2.976%), 
219 pgs unclean, 46 pgs degraded, 46 pgs undersized
mon server2 is low on available space

  services:
mon: 3 daemons, quorum server5,server3,server2
mgr: server5(active), standbys: server2, 2, 0, server3
osd: 54 osds: 53 up, 53 in; 47 remapped pgs
 flags noscrub,nodeep-scrub

  data:
pools:   3 pools, 1344 pgs
objects: 736k objects, 2889 GB
usage:   8517 GB used, 36474 GB / 44991 GB avail
pgs: 13.839% pgs not active
 67370/2263701 objects degraded (2.976%)
 21033/2263701 objects misplaced (0.929%)
 1125 active+clean
 172  down
 26   active+undersized+degraded+remapped+backfilling
 14   undersized+degraded+remapped+backfilling+peered
 6active+undersized+degraded+remapped+backfill_wait
 1active+remapped+backfill_wait

  io:
client:   835 kB/s rd, 262 kB/s wr, 16 op/s rd, 25 op/s wr
recovery: 102 MB/s, 26 objects/s

What is the most sensible way to get out of this situation?





David Turner  writes:

> I do remember seeing that exactly. As the number of recovery_wait pgs
> decreased, the number of unfound objects decreased until they were all
> found.  Unfortunately it blocked some IO from happening during the
> recovery, but in the long run we ended up with full data integrity again.
>
> On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius <
> nico.schottel...@ungleich.ch> wrote:
>
>>
>> Hey David,
>>
>> thanks for the fast answer. All our pools are running with size=3,
>> min_size=2 and the two disks were in 2 different hosts.
>>
>> What I am a bit worried about is the output of "ceph pg 4.fa query" (see
>> below) that indicates that ceph already queried all other hosts and did
>> not find the data anywhere.
>>
>> Do you remember having seen something similar?
>>
>> Best,
>>
>> Nico
>>
>> David Turner  writes:
>>
>> > I have had the same problem before with unfound objects that happened
>> while
>> > backfilling after losing a drive. We didn't lose drives outside of the
>> > failure domains and ultimately didn't lose any data, but we did have to
>> > wait until after all of the PGs in recovery_wait state were caught up.
>> So
>> > if the 2 disks you lost were in the same host and your CRUSH rules are
>> set
>> > so that you can lose a host without losing data, then the cluster will
>> > likely find all of the objects by the time it's done backfilling.  With
>> > only losing 2 disks, I wouldn't worry about the missing objects not
>> > becoming found unless you're pool size=2.
>> >
>> > On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
>> > nico.schottel...@ungleich.ch> wrote:
>> >
>> >>
>> >> Hello,
>> >>
>> >> we added about 7 new disks yesterday/today and our cluster became very
>> >> slow. While the rebalancing took place, 2 of the 7 new added disks
>> >> died.
>> >>
>> >> Our cluster is still recovering, however we spotted that there are a lot
>> >> of unfound objects.
>> >>
>> >> We lost osd.63 and osd.64, which seem not to be involved into the sample
>> >> pg that has unfound objects.
>> >>
>> >> We were wondering why there are unfound objects, where they are coming
>> >> from and if there is a way to recover them?
>> >>
>> >> Any help appreciated,
>> >>
>> >> Best,
>> >>
>> >> Nico
>> >>
>> >>
>> >> Our status is:
>> >>
>> >>   cluster:
>> >> id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
>> >> health: HEALTH_WARN
>> >> 261953/3006663 objects misplaced (8.712%)
>> >> 9377/1002221 objects unfound (0.936%)
>> >> Reduced data availability: 176 pgs inactive
>> >> Degraded data redundancy: 609338/3006663 objects degraded
>> >> (20.266%), 243 pgs unclea
>> >> n, 222 pgs degraded, 213 pgs undersized
>> >> mon server2 is low on available space
>> >>
>> >>   services:
>> >> mon: 3 daemons, quorum server5,server3,server2
>> >> mgr: server5(active), standbys: 2, server2, 0, server3
>> >> osd: 54 osds: 54 up, 54 in; 234 remapped pgs
>> >>
>> >>   data:
>> >> pools:   3 pools, 1344 pgs
>> >> objects: 978k objects, 3823 GB
>> >> usage:   9350 GB used, 40298 GB / 49648 GB avail
>> >> pgs: 13.095% pgs not active
>> >>  609338/3006663 objects degraded (20.266%)
>> >>  261953/3006663 objects misplaced (8.712%)
>> >>  9377/1002221 objects unfound (0.936%)
>> >>  1101 active+clean
>> >>  84   

[ceph-users] Ideal Bluestore setup

2018-01-22 Thread Ean Price
Hi folks,

I’m not sure the ideal setup for bluestore given the set of hardware I have to 
work with so I figured I would ask the collective wisdom of the ceph community. 
It is a small deployment so the hardware is not all that impressive, but I’d 
still like to get some feedback on what would be the preferred and most 
maintainable setup. 

We have 5 ceph OSD hosts with the following setup:

16 GB RAM
1 PCI-E NVRAM 128GB
1 SSD 250 GB
2 HDD 1 TB each

I was thinking to put:

OS on NVRAM with 2x20 GB partitions for bluestore’s WAL and rocksdb
And either use bcache with the SSD to cache the 2x HDDs or possibly use Ceph’s 
built in cache tiering. 

My questions are:

1) is a 20GB logical volume adequate for the WAL and db with a 1TB HDD or 
should it be larger?

2) or - should I put the rocksdb on the SSD and just leave the WAL on the NVRAM 
device?

3) Lastly, what are the downsides of bcache vs Ceph’s cache tiering? I see both 
are used in production so I’m not sure which is the better choice for us. 

Performance is, of course, important but maintainability and stability are 
definitely more important.

Thanks in advance for your advice!

Best,
Ean





-- 
__

This message contains information which may be confidential.  Unless you 
are the addressee (or authorized to receive for the addressee), you may not 
use, copy, or disclose to anyone the message or any information contained 
in the message.  If you have received the message in error, please advise 
the sender by reply e-mail or contact the sender at Price Paper & Twine 
Company by phone at (516) 378-7842 and delete the message.  Thank you very 
much.

__
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
I do remember seeing that exactly. As the number of recovery_wait pgs
decreased, the number of unfound objects decreased until they were all
found.  Unfortunately it blocked some IO from happening during the
recovery, but in the long run we ended up with full data integrity again.

On Mon, Jan 22, 2018 at 1:03 PM Nico Schottelius <
nico.schottel...@ungleich.ch> wrote:

>
> Hey David,
>
> thanks for the fast answer. All our pools are running with size=3,
> min_size=2 and the two disks were in 2 different hosts.
>
> What I am a bit worried about is the output of "ceph pg 4.fa query" (see
> below) that indicates that ceph already queried all other hosts and did
> not find the data anywhere.
>
> Do you remember having seen something similar?
>
> Best,
>
> Nico
>
> David Turner  writes:
>
> > I have had the same problem before with unfound objects that happened
> while
> > backfilling after losing a drive. We didn't lose drives outside of the
> > failure domains and ultimately didn't lose any data, but we did have to
> > wait until after all of the PGs in recovery_wait state were caught up.
> So
> > if the 2 disks you lost were in the same host and your CRUSH rules are
> set
> > so that you can lose a host without losing data, then the cluster will
> > likely find all of the objects by the time it's done backfilling.  With
> > only losing 2 disks, I wouldn't worry about the missing objects not
> > becoming found unless you're pool size=2.
> >
> > On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
> > nico.schottel...@ungleich.ch> wrote:
> >
> >>
> >> Hello,
> >>
> >> we added about 7 new disks yesterday/today and our cluster became very
> >> slow. While the rebalancing took place, 2 of the 7 new added disks
> >> died.
> >>
> >> Our cluster is still recovering, however we spotted that there are a lot
> >> of unfound objects.
> >>
> >> We lost osd.63 and osd.64, which seem not to be involved into the sample
> >> pg that has unfound objects.
> >>
> >> We were wondering why there are unfound objects, where they are coming
> >> from and if there is a way to recover them?
> >>
> >> Any help appreciated,
> >>
> >> Best,
> >>
> >> Nico
> >>
> >>
> >> Our status is:
> >>
> >>   cluster:
> >> id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
> >> health: HEALTH_WARN
> >> 261953/3006663 objects misplaced (8.712%)
> >> 9377/1002221 objects unfound (0.936%)
> >> Reduced data availability: 176 pgs inactive
> >> Degraded data redundancy: 609338/3006663 objects degraded
> >> (20.266%), 243 pgs unclea
> >> n, 222 pgs degraded, 213 pgs undersized
> >> mon server2 is low on available space
> >>
> >>   services:
> >> mon: 3 daemons, quorum server5,server3,server2
> >> mgr: server5(active), standbys: 2, server2, 0, server3
> >> osd: 54 osds: 54 up, 54 in; 234 remapped pgs
> >>
> >>   data:
> >> pools:   3 pools, 1344 pgs
> >> objects: 978k objects, 3823 GB
> >> usage:   9350 GB used, 40298 GB / 49648 GB avail
> >> pgs: 13.095% pgs not active
> >>  609338/3006663 objects degraded (20.266%)
> >>  261953/3006663 objects misplaced (8.712%)
> >>  9377/1002221 objects unfound (0.936%)
> >>  1101 active+clean
> >>  84   recovery_wait+undersized+degraded+remapped+peered
> >>  82   undersized+degraded+remapped+backfill_wait+peered
> >>  23   active+undersized+degraded+remapped+backfill_wait
> >>  18   active+remapped+backfill_wait
> >>  14   active+undersized+degraded+remapped+backfilling
> >>  10   undersized+degraded+remapped+backfilling+peered
> >>  9active+recovery_wait+degraded
> >>  3active+remapped+backfilling
> >>
> >>   io:
> >> client:   624 kB/s rd, 3255 kB/s wr, 22 op/s rd, 66 op/s wr
> >> recovery: 90148 kB/s, 22 objects/s
> >>
> >> Looking at the unfound objects:
> >>
> >> [17:32:17] server1:~# ceph health detail
> >> HEALTH_WARN 263745/3006663 objects misplaced (8.772%); 9377/1002221
> >> objects unfound (0.936%); Reduced data availability: 176 pgs inactive;
> >> Degraded data redundancy: 612398/3006663 objects degraded (20.368%), 244
> >> pgs unclean, 223 pgs degraded, 214 pgs undersized; mon server2 is low on
> >> available space
> >> OBJECT_MISPLACED 263745/3006663 objects misplaced (8.772%)
> >> OBJECT_UNFOUND 9377/1002221 objects unfound (0.936%)
> >> pg 4.fa has 117 unfound objects
> >> pg 4.ff has 107 unfound objects
> >> pg 4.fd has 113 unfound objects
> >> pg 4.f0 has 120 unfound objects
> >> 
> >>
> >>
> >> Output from ceph pg 4.fa query:
> >>
> >> {
> >> "state": "recovery_wait+undersized+degraded+remapped+peered",
> >> "snap_trimq": "[]",
> >> "epoch": 17561,
> >> "up": [
> >> 8,
> >> 17,
> >> 25
> >> ],
> >> "acting": [
> >> 61
> >> ],
> >> 

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

Hey David,

thanks for the fast answer. All our pools are running with size=3,
min_size=2 and the two disks were in 2 different hosts.

What I am a bit worried about is the output of "ceph pg 4.fa query" (see
below) that indicates that ceph already queried all other hosts and did
not find the data anywhere.

Do you remember having seen something similar?

Best,

Nico

David Turner  writes:

> I have had the same problem before with unfound objects that happened while
> backfilling after losing a drive. We didn't lose drives outside of the
> failure domains and ultimately didn't lose any data, but we did have to
> wait until after all of the PGs in recovery_wait state were caught up.  So
> if the 2 disks you lost were in the same host and your CRUSH rules are set
> so that you can lose a host without losing data, then the cluster will
> likely find all of the objects by the time it's done backfilling.  With
> only losing 2 disks, I wouldn't worry about the missing objects not
> becoming found unless you're pool size=2.
>
> On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
> nico.schottel...@ungleich.ch> wrote:
>
>>
>> Hello,
>>
>> we added about 7 new disks yesterday/today and our cluster became very
>> slow. While the rebalancing took place, 2 of the 7 new added disks
>> died.
>>
>> Our cluster is still recovering, however we spotted that there are a lot
>> of unfound objects.
>>
>> We lost osd.63 and osd.64, which seem not to be involved into the sample
>> pg that has unfound objects.
>>
>> We were wondering why there are unfound objects, where they are coming
>> from and if there is a way to recover them?
>>
>> Any help appreciated,
>>
>> Best,
>>
>> Nico
>>
>>
>> Our status is:
>>
>>   cluster:
>> id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
>> health: HEALTH_WARN
>> 261953/3006663 objects misplaced (8.712%)
>> 9377/1002221 objects unfound (0.936%)
>> Reduced data availability: 176 pgs inactive
>> Degraded data redundancy: 609338/3006663 objects degraded
>> (20.266%), 243 pgs unclea
>> n, 222 pgs degraded, 213 pgs undersized
>> mon server2 is low on available space
>>
>>   services:
>> mon: 3 daemons, quorum server5,server3,server2
>> mgr: server5(active), standbys: 2, server2, 0, server3
>> osd: 54 osds: 54 up, 54 in; 234 remapped pgs
>>
>>   data:
>> pools:   3 pools, 1344 pgs
>> objects: 978k objects, 3823 GB
>> usage:   9350 GB used, 40298 GB / 49648 GB avail
>> pgs: 13.095% pgs not active
>>  609338/3006663 objects degraded (20.266%)
>>  261953/3006663 objects misplaced (8.712%)
>>  9377/1002221 objects unfound (0.936%)
>>  1101 active+clean
>>  84   recovery_wait+undersized+degraded+remapped+peered
>>  82   undersized+degraded+remapped+backfill_wait+peered
>>  23   active+undersized+degraded+remapped+backfill_wait
>>  18   active+remapped+backfill_wait
>>  14   active+undersized+degraded+remapped+backfilling
>>  10   undersized+degraded+remapped+backfilling+peered
>>  9active+recovery_wait+degraded
>>  3active+remapped+backfilling
>>
>>   io:
>> client:   624 kB/s rd, 3255 kB/s wr, 22 op/s rd, 66 op/s wr
>> recovery: 90148 kB/s, 22 objects/s
>>
>> Looking at the unfound objects:
>>
>> [17:32:17] server1:~# ceph health detail
>> HEALTH_WARN 263745/3006663 objects misplaced (8.772%); 9377/1002221
>> objects unfound (0.936%); Reduced data availability: 176 pgs inactive;
>> Degraded data redundancy: 612398/3006663 objects degraded (20.368%), 244
>> pgs unclean, 223 pgs degraded, 214 pgs undersized; mon server2 is low on
>> available space
>> OBJECT_MISPLACED 263745/3006663 objects misplaced (8.772%)
>> OBJECT_UNFOUND 9377/1002221 objects unfound (0.936%)
>> pg 4.fa has 117 unfound objects
>> pg 4.ff has 107 unfound objects
>> pg 4.fd has 113 unfound objects
>> pg 4.f0 has 120 unfound objects
>> 
>>
>>
>> Output from ceph pg 4.fa query:
>>
>> {
>> "state": "recovery_wait+undersized+degraded+remapped+peered",
>> "snap_trimq": "[]",
>> "epoch": 17561,
>> "up": [
>> 8,
>> 17,
>> 25
>> ],
>> "acting": [
>> 61
>> ],
>> "backfill_targets": [
>> "8",
>> "17",
>> "25"
>> ],
>> "actingbackfill": [
>> "8",
>> "17",
>> "25",
>> "61"
>> ],
>> "info": {
>> "pgid": "4.fa",
>> "last_update": "17529'85051",
>> "last_complete": "17217'77468",
>> "log_tail": "17091'75034",
>> "last_user_version": 85051,
>> "last_backfill": "MAX",
>> "last_backfill_bitwise": 0,
>> "purged_snaps": [
>> {
>> "start": "1",
>> "length": "3"
>> },
>> {
>> 

Re: [ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread David Turner
I have had the same problem before with unfound objects that happened while
backfilling after losing a drive. We didn't lose drives outside of the
failure domains and ultimately didn't lose any data, but we did have to
wait until after all of the PGs in recovery_wait state were caught up.  So
if the 2 disks you lost were in the same host and your CRUSH rules are set
so that you can lose a host without losing data, then the cluster will
likely find all of the objects by the time it's done backfilling.  With
only losing 2 disks, I wouldn't worry about the missing objects not
becoming found unless you're pool size=2.

On Mon, Jan 22, 2018 at 11:47 AM Nico Schottelius <
nico.schottel...@ungleich.ch> wrote:

>
> Hello,
>
> we added about 7 new disks yesterday/today and our cluster became very
> slow. While the rebalancing took place, 2 of the 7 new added disks
> died.
>
> Our cluster is still recovering, however we spotted that there are a lot
> of unfound objects.
>
> We lost osd.63 and osd.64, which seem not to be involved into the sample
> pg that has unfound objects.
>
> We were wondering why there are unfound objects, where they are coming
> from and if there is a way to recover them?
>
> Any help appreciated,
>
> Best,
>
> Nico
>
>
> Our status is:
>
>   cluster:
> id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
> health: HEALTH_WARN
> 261953/3006663 objects misplaced (8.712%)
> 9377/1002221 objects unfound (0.936%)
> Reduced data availability: 176 pgs inactive
> Degraded data redundancy: 609338/3006663 objects degraded
> (20.266%), 243 pgs unclea
> n, 222 pgs degraded, 213 pgs undersized
> mon server2 is low on available space
>
>   services:
> mon: 3 daemons, quorum server5,server3,server2
> mgr: server5(active), standbys: 2, server2, 0, server3
> osd: 54 osds: 54 up, 54 in; 234 remapped pgs
>
>   data:
> pools:   3 pools, 1344 pgs
> objects: 978k objects, 3823 GB
> usage:   9350 GB used, 40298 GB / 49648 GB avail
> pgs: 13.095% pgs not active
>  609338/3006663 objects degraded (20.266%)
>  261953/3006663 objects misplaced (8.712%)
>  9377/1002221 objects unfound (0.936%)
>  1101 active+clean
>  84   recovery_wait+undersized+degraded+remapped+peered
>  82   undersized+degraded+remapped+backfill_wait+peered
>  23   active+undersized+degraded+remapped+backfill_wait
>  18   active+remapped+backfill_wait
>  14   active+undersized+degraded+remapped+backfilling
>  10   undersized+degraded+remapped+backfilling+peered
>  9active+recovery_wait+degraded
>  3active+remapped+backfilling
>
>   io:
> client:   624 kB/s rd, 3255 kB/s wr, 22 op/s rd, 66 op/s wr
> recovery: 90148 kB/s, 22 objects/s
>
> Looking at the unfound objects:
>
> [17:32:17] server1:~# ceph health detail
> HEALTH_WARN 263745/3006663 objects misplaced (8.772%); 9377/1002221
> objects unfound (0.936%); Reduced data availability: 176 pgs inactive;
> Degraded data redundancy: 612398/3006663 objects degraded (20.368%), 244
> pgs unclean, 223 pgs degraded, 214 pgs undersized; mon server2 is low on
> available space
> OBJECT_MISPLACED 263745/3006663 objects misplaced (8.772%)
> OBJECT_UNFOUND 9377/1002221 objects unfound (0.936%)
> pg 4.fa has 117 unfound objects
> pg 4.ff has 107 unfound objects
> pg 4.fd has 113 unfound objects
> pg 4.f0 has 120 unfound objects
> 
>
>
> Output from ceph pg 4.fa query:
>
> {
> "state": "recovery_wait+undersized+degraded+remapped+peered",
> "snap_trimq": "[]",
> "epoch": 17561,
> "up": [
> 8,
> 17,
> 25
> ],
> "acting": [
> 61
> ],
> "backfill_targets": [
> "8",
> "17",
> "25"
> ],
> "actingbackfill": [
> "8",
> "17",
> "25",
> "61"
> ],
> "info": {
> "pgid": "4.fa",
> "last_update": "17529'85051",
> "last_complete": "17217'77468",
> "log_tail": "17091'75034",
> "last_user_version": 85051,
> "last_backfill": "MAX",
> "last_backfill_bitwise": 0,
> "purged_snaps": [
> {
> "start": "1",
> "length": "3"
> },
> {
> "start": "6",
> "length": "8"
> },
> {
> "start": "10",
> "length": "2"
> }
> ],
> "history": {
> "epoch_created": 9134,
> "epoch_pool_created": 9134,
> "last_epoch_started": 17528,
> "last_interval_started": 17527,
> "last_epoch_clean": 17079,
> "last_interval_clean": 17078,
> "last_epoch_split": 0,
> "last_epoch_marked_full": 0,
> "same_up_since": 

Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread Steven Vacaroaia
Hi David,

I noticed the public interface of the server I am running the test from is
heavily used  so I will bond that one too

I doubt though that this explains the poor performance

Thanks for your advice

Steven



On 22 January 2018 at 12:02, David Turner  wrote:

> I'm not speaking to anything other than your configuration.
>
> "I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
> xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public"
> It might not be a bad idea for you to forgo the public network on the 1Gb
> interfaces and either put everything on one network or use VLANs on the
> 10Gb connections.  I lean more towards that in particular because your
> public network doesn't have a bond on it.  Just as a note, communication
> between the OSDs and the MONs are all done on the public network.  If that
> interface goes down, then the OSDs are likely to be marked down/out from
> your cluster.  I'm a fan of VLANs, but if you don't have the equipment or
> expertise to go that route, then just using the same subnet for public and
> private is a decent way to go.
>
> On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia 
> wrote:
>
>> I did test with rados bench ..here are the results
>>
>> rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p
>> ssdpool 300 -t 12  seq
>>
>> Total time run: 300.322608
>> Total writes made:  10632
>> Write size: 4194304
>> Object size:4194304
>> Bandwidth (MB/sec): 141.608
>> Stddev Bandwidth:   74.1065
>> Max bandwidth (MB/sec): 264
>> Min bandwidth (MB/sec): 0
>> Average IOPS:   35
>> Stddev IOPS:18
>> Max IOPS:   66
>> Min IOPS:   0
>> Average Latency(s): 0.33887
>> Stddev Latency(s):  0.701947
>> Max latency(s): 9.80161
>> Min latency(s): 0.015171
>>
>> Total time run:   300.829945
>> Total reads made: 10070
>> Read size:4194304
>> Object size:  4194304
>> Bandwidth (MB/sec):   133.896
>> Average IOPS: 33
>> Stddev IOPS:  14
>> Max IOPS: 68
>> Min IOPS: 3
>> Average Latency(s):   0.35791
>> Max latency(s):   4.68213
>> Min latency(s):   0.0107572
>>
>>
>> rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p
>> scbench256 300 -t 12  seq
>>
>> Total time run: 300.747004
>> Total writes made:  10239
>> Write size: 4194304
>> Object size:4194304
>> Bandwidth (MB/sec): 136.181
>> Stddev Bandwidth:   75.5
>> Max bandwidth (MB/sec): 272
>> Min bandwidth (MB/sec): 0
>> Average IOPS:   34
>> Stddev IOPS:18
>> Max IOPS:   68
>> Min IOPS:   0
>> Average Latency(s): 0.352339
>> Stddev Latency(s):  0.72211
>> Max latency(s): 9.62304
>> Min latency(s): 0.00936316
>> hints = 1
>>
>>
>> Total time run:   300.610761
>> Total reads made: 7628
>> Read size:4194304
>> Object size:  4194304
>> Bandwidth (MB/sec):   101.5
>> Average IOPS: 25
>> Stddev IOPS:  11
>> Max IOPS: 61
>> Min IOPS: 0
>> Average Latency(s):   0.472321
>> Max latency(s):   15.636
>> Min latency(s):   0.0188098
>>
>>
>> On 22 January 2018 at 11:34, Steven Vacaroaia  wrote:
>>
>>> sorry ..send the message too soon
>>> Here is more info
>>> Vendor Id  : SEAGATE
>>> Product Id : ST600MM0006
>>> State  : Online
>>> Disk Type  : SAS,Hard Disk Device
>>> Capacity   : 558.375 GB
>>> Power State: Active
>>>
>>> ( SSD is in slot 0)
>>>
>>>  megacli -LDGetProp  -Cache -LALL -a0
>>>
>>> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive,
>>> Direct, No Write Cache if bad BBU
>>>
>>> [root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0
>>>
>>> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
>>> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
>>> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default
>>>
>>>
>>> CPU
>>> 

Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread David Turner
I'm not speaking to anything other than your configuration.

"I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public"
It might not be a bad idea for you to forgo the public network on the 1Gb
interfaces and either put everything on one network or use VLANs on the
10Gb connections.  I lean more towards that in particular because your
public network doesn't have a bond on it.  Just as a note, communication
between the OSDs and the MONs are all done on the public network.  If that
interface goes down, then the OSDs are likely to be marked down/out from
your cluster.  I'm a fan of VLANs, but if you don't have the equipment or
expertise to go that route, then just using the same subnet for public and
private is a decent way to go.

On Mon, Jan 22, 2018 at 11:37 AM Steven Vacaroaia  wrote:

> I did test with rados bench ..here are the results
>
> rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p
> ssdpool 300 -t 12  seq
>
> Total time run: 300.322608
> Total writes made:  10632
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 141.608
> Stddev Bandwidth:   74.1065
> Max bandwidth (MB/sec): 264
> Min bandwidth (MB/sec): 0
> Average IOPS:   35
> Stddev IOPS:18
> Max IOPS:   66
> Min IOPS:   0
> Average Latency(s): 0.33887
> Stddev Latency(s):  0.701947
> Max latency(s): 9.80161
> Min latency(s): 0.015171
>
> Total time run:   300.829945
> Total reads made: 10070
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   133.896
> Average IOPS: 33
> Stddev IOPS:  14
> Max IOPS: 68
> Min IOPS: 3
> Average Latency(s):   0.35791
> Max latency(s):   4.68213
> Min latency(s):   0.0107572
>
>
> rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p
> scbench256 300 -t 12  seq
>
> Total time run: 300.747004
> Total writes made:  10239
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 136.181
> Stddev Bandwidth:   75.5
> Max bandwidth (MB/sec): 272
> Min bandwidth (MB/sec): 0
> Average IOPS:   34
> Stddev IOPS:18
> Max IOPS:   68
> Min IOPS:   0
> Average Latency(s): 0.352339
> Stddev Latency(s):  0.72211
> Max latency(s): 9.62304
> Min latency(s): 0.00936316
> hints = 1
>
>
> Total time run:   300.610761
> Total reads made: 7628
> Read size:4194304
> Object size:  4194304
> Bandwidth (MB/sec):   101.5
> Average IOPS: 25
> Stddev IOPS:  11
> Max IOPS: 61
> Min IOPS: 0
> Average Latency(s):   0.472321
> Max latency(s):   15.636
> Min latency(s):   0.0188098
>
>
> On 22 January 2018 at 11:34, Steven Vacaroaia  wrote:
>
>> sorry ..send the message too soon
>> Here is more info
>> Vendor Id  : SEAGATE
>> Product Id : ST600MM0006
>> State  : Online
>> Disk Type  : SAS,Hard Disk Device
>> Capacity   : 558.375 GB
>> Power State: Active
>>
>> ( SSD is in slot 0)
>>
>>  megacli -LDGetProp  -Cache -LALL -a0
>>
>> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive,
>> Direct, No Write Cache if bad BBU
>>
>> [root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0
>>
>> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
>> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
>> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
>> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
>> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
>> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default
>>
>>
>> CPU
>> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
>>
>> Centos 7 kernel 3.10.0-693.11.6.el7.x86_64
>>
>> sysctl -p
>> net.ipv4.tcp_sack = 0
>> net.core.netdev_budget = 600
>> net.ipv4.tcp_window_scaling = 1
>> net.core.rmem_max = 16777216
>> net.core.wmem_max = 16777216
>> net.core.rmem_default = 16777216
>> net.core.wmem_default = 16777216
>> net.core.optmem_max = 40960
>> net.ipv4.tcp_rmem = 4096 87380 16777216
>> net.ipv4.tcp_wmem = 4096 

[ceph-users] Adding disks -> getting unfound objects [Luminous]

2018-01-22 Thread Nico Schottelius

Hello,

we added about 7 new disks yesterday/today and our cluster became very
slow. While the rebalancing took place, 2 of the 7 new added disks
died.

Our cluster is still recovering, however we spotted that there are a lot
of unfound objects.

We lost osd.63 and osd.64, which seem not to be involved into the sample
pg that has unfound objects.

We were wondering why there are unfound objects, where they are coming
from and if there is a way to recover them?

Any help appreciated,

Best,

Nico


Our status is:

  cluster:
id: 26c0c5a8-d7ce-49ac-b5a7-bfd9d0ba81ab
health: HEALTH_WARN
261953/3006663 objects misplaced (8.712%)
9377/1002221 objects unfound (0.936%)
Reduced data availability: 176 pgs inactive
Degraded data redundancy: 609338/3006663 objects degraded 
(20.266%), 243 pgs unclea
n, 222 pgs degraded, 213 pgs undersized
mon server2 is low on available space

  services:
mon: 3 daemons, quorum server5,server3,server2
mgr: server5(active), standbys: 2, server2, 0, server3
osd: 54 osds: 54 up, 54 in; 234 remapped pgs

  data:
pools:   3 pools, 1344 pgs
objects: 978k objects, 3823 GB
usage:   9350 GB used, 40298 GB / 49648 GB avail
pgs: 13.095% pgs not active
 609338/3006663 objects degraded (20.266%)
 261953/3006663 objects misplaced (8.712%)
 9377/1002221 objects unfound (0.936%)
 1101 active+clean
 84   recovery_wait+undersized+degraded+remapped+peered
 82   undersized+degraded+remapped+backfill_wait+peered
 23   active+undersized+degraded+remapped+backfill_wait
 18   active+remapped+backfill_wait
 14   active+undersized+degraded+remapped+backfilling
 10   undersized+degraded+remapped+backfilling+peered
 9active+recovery_wait+degraded
 3active+remapped+backfilling

  io:
client:   624 kB/s rd, 3255 kB/s wr, 22 op/s rd, 66 op/s wr
recovery: 90148 kB/s, 22 objects/s

Looking at the unfound objects:

[17:32:17] server1:~# ceph health detail
HEALTH_WARN 263745/3006663 objects misplaced (8.772%); 9377/1002221 objects 
unfound (0.936%); Reduced data availability: 176 pgs inactive; Degraded data 
redundancy: 612398/3006663 objects degraded (20.368%), 244 pgs unclean, 223 pgs 
degraded, 214 pgs undersized; mon server2 is low on available space
OBJECT_MISPLACED 263745/3006663 objects misplaced (8.772%)
OBJECT_UNFOUND 9377/1002221 objects unfound (0.936%)
pg 4.fa has 117 unfound objects
pg 4.ff has 107 unfound objects
pg 4.fd has 113 unfound objects
pg 4.f0 has 120 unfound objects



Output from ceph pg 4.fa query:

{
"state": "recovery_wait+undersized+degraded+remapped+peered",
"snap_trimq": "[]",
"epoch": 17561,
"up": [
8,
17,
25
],
"acting": [
61
],
"backfill_targets": [
"8",
"17",
"25"
],
"actingbackfill": [
"8",
"17",
"25",
"61"
],
"info": {
"pgid": "4.fa",
"last_update": "17529'85051",
"last_complete": "17217'77468",
"log_tail": "17091'75034",
"last_user_version": 85051,
"last_backfill": "MAX",
"last_backfill_bitwise": 0,
"purged_snaps": [
{
"start": "1",
"length": "3"
},
{
"start": "6",
"length": "8"
},
{
"start": "10",
"length": "2"
}
],
"history": {
"epoch_created": 9134,
"epoch_pool_created": 9134,
"last_epoch_started": 17528,
"last_interval_started": 17527,
"last_epoch_clean": 17079,
"last_interval_clean": 17078,
"last_epoch_split": 0,
"last_epoch_marked_full": 0,
"same_up_since": 17143,
"same_interval_since": 17530,
"same_primary_since": 17515,
"last_scrub": "17090'57357",
"last_scrub_stamp": "2018-01-20 20:45:32.616142",
"last_deep_scrub": "17082'54734",
"last_deep_scrub_stamp": "2018-01-15 21:09:34.121488",
"last_clean_scrub_stamp": "2018-01-20 20:45:32.616142"
},
"stats": {
"version": "17529'85051",
"reported_seq": "218453",
"reported_epoch": "17561",
"state": "recovery_wait+undersized+degraded+remapped+peered",
"last_fresh": "2018-01-22 17:42:28.196701",
"last_change": "2018-01-22 15:00:46.507189",
"last_active": "2018-01-22 15:00:44.635399",
"last_peered": "2018-01-22 17:42:28.196701",
"last_clean": "2018-01-21 20:15:48.267209",
"last_became_active": "2018-01-22 14:53:07.918893",

Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread Steven Vacaroaia
I did test with rados bench ..here are the results

rados bench -p ssdpool 300 -t 12 write --no-cleanup && rados bench -p
ssdpool 300 -t 12  seq

Total time run: 300.322608
Total writes made:  10632
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 141.608
Stddev Bandwidth:   74.1065
Max bandwidth (MB/sec): 264
Min bandwidth (MB/sec): 0
Average IOPS:   35
Stddev IOPS:18
Max IOPS:   66
Min IOPS:   0
Average Latency(s): 0.33887
Stddev Latency(s):  0.701947
Max latency(s): 9.80161
Min latency(s): 0.015171

Total time run:   300.829945
Total reads made: 10070
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   133.896
Average IOPS: 33
Stddev IOPS:  14
Max IOPS: 68
Min IOPS: 3
Average Latency(s):   0.35791
Max latency(s):   4.68213
Min latency(s):   0.0107572


rados bench -p scbench256 300 -t 12 write --no-cleanup && rados bench -p
scbench256 300 -t 12  seq

Total time run: 300.747004
Total writes made:  10239
Write size: 4194304
Object size:4194304
Bandwidth (MB/sec): 136.181
Stddev Bandwidth:   75.5
Max bandwidth (MB/sec): 272
Min bandwidth (MB/sec): 0
Average IOPS:   34
Stddev IOPS:18
Max IOPS:   68
Min IOPS:   0
Average Latency(s): 0.352339
Stddev Latency(s):  0.72211
Max latency(s): 9.62304
Min latency(s): 0.00936316
hints = 1


Total time run:   300.610761
Total reads made: 7628
Read size:4194304
Object size:  4194304
Bandwidth (MB/sec):   101.5
Average IOPS: 25
Stddev IOPS:  11
Max IOPS: 61
Min IOPS: 0
Average Latency(s):   0.472321
Max latency(s):   15.636
Min latency(s):   0.0188098


On 22 January 2018 at 11:34, Steven Vacaroaia  wrote:

> sorry ..send the message too soon
> Here is more info
> Vendor Id  : SEAGATE
> Product Id : ST600MM0006
> State  : Online
> Disk Type  : SAS,Hard Disk Device
> Capacity   : 558.375 GB
> Power State: Active
>
> ( SSD is in slot 0)
>
>  megacli -LDGetProp  -Cache -LALL -a0
>
> Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
> Direct, No Write Cache if bad BBU
> Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive,
> Direct, No Write Cache if bad BBU
> Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive,
> Direct, No Write Cache if bad BBU
> Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive,
> Direct, No Write Cache if bad BBU
> Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive,
> Direct, No Write Cache if bad BBU
> Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive,
> Direct, No Write Cache if bad BBU
>
> [root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0
>
> Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
> Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
> Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
> Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
> Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
> Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default
>
>
> CPU
> Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
>
> Centos 7 kernel 3.10.0-693.11.6.el7.x86_64
>
> sysctl -p
> net.ipv4.tcp_sack = 0
> net.core.netdev_budget = 600
> net.ipv4.tcp_window_scaling = 1
> net.core.rmem_max = 16777216
> net.core.wmem_max = 16777216
> net.core.rmem_default = 16777216
> net.core.wmem_default = 16777216
> net.core.optmem_max = 40960
> net.ipv4.tcp_rmem = 4096 87380 16777216
> net.ipv4.tcp_wmem = 4096 65536 16777216
> net.ipv4.tcp_syncookies = 0
> net.core.somaxconn = 1024
> net.core.netdev_max_backlog = 2
> net.ipv4.tcp_max_syn_backlog = 3
> net.ipv4.tcp_max_tw_buckets = 200
> net.ipv4.tcp_tw_reuse = 1
> net.ipv4.tcp_slow_start_after_idle = 0
> net.ipv4.conf.all.send_redirects = 0
> net.ipv4.conf.all.accept_redirects = 0
> net.ipv4.conf.all.accept_source_route = 0
> vm.min_free_kbytes = 262144
> vm.swappiness = 0
> vm.vfs_cache_pressure = 100
> fs.suid_dumpable = 0
> kernel.core_uses_pid = 1
> kernel.msgmax = 65536
> kernel.msgmnb = 65536
> kernel.randomize_va_space = 1
> kernel.sysrq = 0
> kernel.pid_max = 4194304
> fs.file-max = 10
>
>
> ceph.conf
>
>
> public_network = 10.10.30.0/24
> cluster_network = 192.168.0.0/24
>
>
> osd_op_num_threads_per_shard = 2
> osd_op_num_shards = 25
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
> osd_pool_default_pg_num = 256
> osd_pool_default_pgp_num = 256
> osd_crush_chooseleaf_type = 1
> osd_scrub_load_threshold = 0.01
> osd_scrub_min_interval = 

Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread Steven Vacaroaia
sorry ..send the message too soon
Here is more info
Vendor Id  : SEAGATE
Product Id : ST600MM0006
State  : Online
Disk Type  : SAS,Hard Disk Device
Capacity   : 558.375 GB
Power State: Active

( SSD is in slot 0)

 megacli -LDGetProp  -Cache -LALL -a0

Adapter 0-VD 0(target id: 0): Cache Policy:WriteThrough, ReadAheadNone,
Direct, No Write Cache if bad BBU
Adapter 0-VD 1(target id: 1): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 2(target id: 2): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 3(target id: 3): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 4(target id: 4): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU
Adapter 0-VD 5(target id: 5): Cache Policy:WriteBack, ReadAdaptive, Direct,
No Write Cache if bad BBU

[root@osd01 ~]#  megacli -LDGetProp  -DskCache -LALL -a0

Adapter 0-VD 0(target id: 0): Disk Write Cache : Disabled
Adapter 0-VD 1(target id: 1): Disk Write Cache : Disk's Default
Adapter 0-VD 2(target id: 2): Disk Write Cache : Disk's Default
Adapter 0-VD 3(target id: 3): Disk Write Cache : Disk's Default
Adapter 0-VD 4(target id: 4): Disk Write Cache : Disk's Default
Adapter 0-VD 5(target id: 5): Disk Write Cache : Disk's Default


CPU
Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz

Centos 7 kernel 3.10.0-693.11.6.el7.x86_64

sysctl -p
net.ipv4.tcp_sack = 0
net.core.netdev_budget = 600
net.ipv4.tcp_window_scaling = 1
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.optmem_max = 40960
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_syncookies = 0
net.core.somaxconn = 1024
net.core.netdev_max_backlog = 2
net.ipv4.tcp_max_syn_backlog = 3
net.ipv4.tcp_max_tw_buckets = 200
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.all.accept_source_route = 0
vm.min_free_kbytes = 262144
vm.swappiness = 0
vm.vfs_cache_pressure = 100
fs.suid_dumpable = 0
kernel.core_uses_pid = 1
kernel.msgmax = 65536
kernel.msgmnb = 65536
kernel.randomize_va_space = 1
kernel.sysrq = 0
kernel.pid_max = 4194304
fs.file-max = 10


ceph.conf


public_network = 10.10.30.0/24
cluster_network = 192.168.0.0/24


osd_op_num_threads_per_shard = 2
osd_op_num_shards = 25
osd_pool_default_size = 2
osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded state
osd_pool_default_pg_num = 256
osd_pool_default_pgp_num = 256
osd_crush_chooseleaf_type = 1
osd_scrub_load_threshold = 0.01
osd_scrub_min_interval = 137438953472
osd_scrub_max_interval = 137438953472
osd_deep_scrub_interval = 137438953472
osd_max_scrubs = 16
osd_op_threads = 8
osd_max_backfills = 1
osd_recovery_max_active = 1
osd_recovery_op_priority = 1




debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0


[mon]
mon_allow_pool_delete = true

[osd]
osd_heartbeat_grace = 20
osd_heartbeat_interval = 5
bluestore_block_db_size = 16106127360
bluestore_block_wal_size = 1073741824

[osd.6]
host = osd01
osd_journal =
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.1d58775a-5019-42ea-8149-a126f51a2501
crush_location = root=ssds host=osd01-ssd

[osd.7]
host = osd02
osd_journal =
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.683dc52d-5d69-4ff0-b5d9-b17056a55681
crush_location = root=ssds host=osd02-ssd

[osd.8]
host = osd04
osd_journal =
/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.bd7c0088-b724-441e-9b88-9457305c541d
crush_location = root=ssds host=osd04-ssd


On 22 January 2018 at 11:29, Steven Vacaroaia  wrote:

> Hi David,
>
> Yes, I meant no separate partitions for WAL and DB
>
> I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
> xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public
> Disks are
> Vendor Id  : TOSHIBA
> Product Id : PX05SMB040Y
> State  : Online
> Disk Type  : SAS,Solid State Device
> Capacity   : 372.0 GB
>
>
> On 22 January 2018 at 11:24, David Turner  wrote:
>
>> Disk models, other hardware information including CPU, network 

Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread Steven Vacaroaia
Hi David,

Yes, I meant no separate partitions for WAL and DB

I am using 2 x 10 GB bonded ( BONDING_OPTS="mode=4 miimon=100
xmit_hash_policy=1 lacp_rate=1")  for cluster and 1 x 1GB for public
Disks are
Vendor Id  : TOSHIBA
Product Id : PX05SMB040Y
State  : Online
Disk Type  : SAS,Solid State Device
Capacity   : 372.0 GB


On 22 January 2018 at 11:24, David Turner  wrote:

> Disk models, other hardware information including CPU, network config?
> You say you're using Luminous, but then say journal on same device.  I'm
> assuming you mean that you just have the bluestore OSD configured without a
> separate WAL or DB partition?  Any more specifics you can give will be
> helpful.
>
> On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia 
> wrote:
>
>> Hi,
>>
>> I'll appreciate if you can provide some guidance / suggestions regarding
>> perfomance issues on a test cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x
>> 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)
>>
>> I created 2 pools ( replication factor 2) one with only SSD and the other
>> with only HDD
>> ( journal on same disk for both)
>>
>> The perfomance is quite similar although I was expecting to be at least 5
>> times better
>> No issues noticed using atop
>>
>> What  should I check / tune ?
>>
>> Many thanks
>> Steven
>>
>>
>>
>> HDD based pool ( journal on the same disk)
>>
>> ceph osd pool get scbench256 all
>>
>> size: 2
>> min_size: 1
>> crash_replay_interval: 0
>> pg_num: 256
>> pgp_num: 256
>> crush_rule: replicated_rule
>> hashpspool: true
>> nodelete: false
>> nopgchange: false
>> nosizechange: false
>> write_fadvise_dontneed: false
>> noscrub: false
>> nodeep-scrub: false
>> use_gmt_hitset: 1
>> auid: 0
>> fast_read: 0
>>
>>
>> rbd bench --io-type write  image1 --pool=scbench256
>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>> sequential
>>   SEC   OPS   OPS/SEC   BYTES/SEC
>> 1 46816  46836.46  191842139.78
>> 2 90658  45339.11  185709011.80
>> 3133671  44540.80  182439126.08
>> 4177341  44340.36  181618100.14
>> 5217300  43464.04  178028704.54
>> 6259595  42555.85  174308767.05
>> elapsed: 6  ops:   262144  ops/sec: 42694.50  bytes/sec: 174876688.23
>>
>> fio /home/cephuser/write_256.fio
>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>> iodepth=32
>> fio-2.2.8
>> Starting 1 process
>> rbd engine: RBD version: 1.12.0
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 iops]
>> [eta 00m:00s]
>>
>>
>> fio /home/cephuser/write_256.fio
>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
>> fio-2.2.8
>> Starting 1 process
>> rbd engine: RBD version: 1.12.0
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 iops]
>> [eta 00m:00s]
>>
>>
>> SSD based pool
>>
>>
>> ceph osd pool get ssdpool all
>>
>> size: 2
>> min_size: 1
>> crash_replay_interval: 0
>> pg_num: 128
>> pgp_num: 128
>> crush_rule: ssdpool
>> hashpspool: true
>> nodelete: false
>> nopgchange: false
>> nosizechange: false
>> write_fadvise_dontneed: false
>> noscrub: false
>> nodeep-scrub: false
>> use_gmt_hitset: 1
>> auid: 0
>> fast_read: 0
>>
>>  rbd -p ssdpool create --size 52100 image2
>>
>> rbd bench --io-type write  image2 --pool=ssdpool
>> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
>> sequential
>>   SEC   OPS   OPS/SEC   BYTES/SEC
>> 1 42412  41867.57  171489557.93
>> 2 78343  39180.86  160484805.88
>> 3118082  39076.48  160057256.16
>> 4155164  38683.98  158449572.38
>> 5192825  38307.59  156907885.84
>> 6230701  37716.95  154488608.16
>> elapsed: 7  ops:   262144  ops/sec: 36862.89  bytes/sec: 150990387.29
>>
>>
>> [root@osd01 ~]# fio /home/cephuser/write_256.fio
>> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
>> fio-2.2.8
>> Starting 1 process
>> rbd engine: RBD version: 1.12.0
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 iops]
>> [eta 00m:00s]
>>
>>
>> fio /home/cephuser/write_256.fio
>> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
>> iodepth=32
>> fio-2.2.8
>> Starting 1 process
>> rbd engine: RBD version: 1.12.0
>> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 iops]
>> [eta 00m:00s]
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread Sage Weil
On Mon, 22 Jan 2018, Steven Vacaroaia wrote:
> Hi,
> 
> I'll appreciate if you can provide some guidance / suggestions regarding
> perfomance issues on a test cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x
> 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)
> 
> I created 2 pools ( replication factor 2) one with only SSD and the other
> with only HDD
> ( journal on same disk for both)
> 
> The perfomance is quite similar although I was expecting to be at least 5
> times better
> No issues noticed using atop
> 
> What  should I check / tune ?

Can you test both pools with 'rados bench' instead of 'rbd bench'?  The 
bottleneck might be in RBD (e.g., in the object map updates).

sage


> 
> Many thanks
> Steven
> 
> 
> 
> HDD based pool ( journal on the same disk)
> 
> ceph osd pool get scbench256 all
> 
> size: 2
> min_size: 1
> crash_replay_interval: 0
> pg_num: 256
> pgp_num: 256
> crush_rule: replicated_rule
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> auid: 0
> fast_read: 0
> 
> 
> rbd bench --io-type write  image1 --pool=scbench256
> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
> sequential
>   SEC   OPS   OPS/SEC   BYTES/SEC
> 1 46816  46836.46  191842139.78
> 2 90658  45339.11  185709011.80
> 3133671  44540.80  182439126.08
> 4177341  44340.36  181618100.14
> 5217300  43464.04  178028704.54
> 6259595  42555.85  174308767.05
> elapsed: 6  ops:   262144  ops/sec: 42694.50  bytes/sec: 174876688.23
> 
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 iops]
> [eta 00m:00s]
> 
> 
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 iops]
> [eta 00m:00s]
> 
> 
> SSD based pool
> 
> 
> ceph osd pool get ssdpool all
> 
> size: 2
> min_size: 1
> crash_replay_interval: 0
> pg_num: 128
> pgp_num: 128
> crush_rule: ssdpool
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> auid: 0
> fast_read: 0
> 
>  rbd -p ssdpool create --size 52100 image2
> 
> rbd bench --io-type write  image2 --pool=ssdpool
> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
> sequential
>   SEC   OPS   OPS/SEC   BYTES/SEC
> 1 42412  41867.57  171489557.93
> 2 78343  39180.86  160484805.88
> 3118082  39076.48  160057256.16
> 4155164  38683.98  158449572.38
> 5192825  38307.59  156907885.84
> 6230701  37716.95  154488608.16
> elapsed: 7  ops:   262144  ops/sec: 36862.89  bytes/sec: 150990387.29
> 
> 
> [root@osd01 ~]# fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 iops]
> [eta 00m:00s]
> 
> 
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 iops]
> [eta 00m:00s]
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous - bad performance

2018-01-22 Thread David Turner
Disk models, other hardware information including CPU, network config?  You
say you're using Luminous, but then say journal on same device.  I'm
assuming you mean that you just have the bluestore OSD configured without a
separate WAL or DB partition?  Any more specifics you can give will be
helpful.

On Mon, Jan 22, 2018 at 11:20 AM Steven Vacaroaia  wrote:

> Hi,
>
> I'll appreciate if you can provide some guidance / suggestions regarding
> perfomance issues on a test cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x
> 600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)
>
> I created 2 pools ( replication factor 2) one with only SSD and the other
> with only HDD
> ( journal on same disk for both)
>
> The perfomance is quite similar although I was expecting to be at least 5
> times better
> No issues noticed using atop
>
> What  should I check / tune ?
>
> Many thanks
> Steven
>
>
>
> HDD based pool ( journal on the same disk)
>
> ceph osd pool get scbench256 all
>
> size: 2
> min_size: 1
> crash_replay_interval: 0
> pg_num: 256
> pgp_num: 256
> crush_rule: replicated_rule
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> auid: 0
> fast_read: 0
>
>
> rbd bench --io-type write  image1 --pool=scbench256
> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
> sequential
>   SEC   OPS   OPS/SEC   BYTES/SEC
> 1 46816  46836.46  191842139.78
> 2 90658  45339.11  185709011.80
> 3133671  44540.80  182439126.08
> 4177341  44340.36  181618100.14
> 5217300  43464.04  178028704.54
> 6259595  42555.85  174308767.05
> elapsed: 6  ops:   262144  ops/sec: 42694.50  bytes/sec: 174876688.23
>
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
> iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 iops]
> [eta 00m:00s]
>
>
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 iops]
> [eta 00m:00s]
>
>
> SSD based pool
>
>
> ceph osd pool get ssdpool all
>
> size: 2
> min_size: 1
> crash_replay_interval: 0
> pg_num: 128
> pgp_num: 128
> crush_rule: ssdpool
> hashpspool: true
> nodelete: false
> nopgchange: false
> nosizechange: false
> write_fadvise_dontneed: false
> noscrub: false
> nodeep-scrub: false
> use_gmt_hitset: 1
> auid: 0
> fast_read: 0
>
>  rbd -p ssdpool create --size 52100 image2
>
> rbd bench --io-type write  image2 --pool=ssdpool
> bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
> sequential
>   SEC   OPS   OPS/SEC   BYTES/SEC
> 1 42412  41867.57  171489557.93
> 2 78343  39180.86  160484805.88
> 3118082  39076.48  160057256.16
> 4155164  38683.98  158449572.38
> 5192825  38307.59  156907885.84
> 6230701  37716.95  154488608.16
> elapsed: 7  ops:   262144  ops/sec: 36862.89  bytes/sec: 150990387.29
>
>
> [root@osd01 ~]# fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 iops]
> [eta 00m:00s]
>
>
> fio /home/cephuser/write_256.fio
> write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd,
> iodepth=32
> fio-2.2.8
> Starting 1 process
> rbd engine: RBD version: 1.12.0
> Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 iops]
> [eta 00m:00s]
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous - bad performance

2018-01-22 Thread Steven Vacaroaia
Hi,

I'll appreciate if you can provide some guidance / suggestions regarding
perfomance issues on a test cluster ( 3 x DELL R620, 1 Entreprise SSD, 3 x
600 GB ,Entreprise HDD, 8 cores, 64 GB RAM)

I created 2 pools ( replication factor 2) one with only SSD and the other
with only HDD
( journal on same disk for both)

The perfomance is quite similar although I was expecting to be at least 5
times better
No issues noticed using atop

What  should I check / tune ?

Many thanks
Steven



HDD based pool ( journal on the same disk)

ceph osd pool get scbench256 all

size: 2
min_size: 1
crash_replay_interval: 0
pg_num: 256
pgp_num: 256
crush_rule: replicated_rule
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
auid: 0
fast_read: 0


rbd bench --io-type write  image1 --pool=scbench256
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
sequential
  SEC   OPS   OPS/SEC   BYTES/SEC
1 46816  46836.46  191842139.78
2 90658  45339.11  185709011.80
3133671  44540.80  182439126.08
4177341  44340.36  181618100.14
5217300  43464.04  178028704.54
6259595  42555.85  174308767.05
elapsed: 6  ops:   262144  ops/sec: 42694.50  bytes/sec: 174876688.23

fio /home/cephuser/write_256.fio
write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=1): [r(1)] [100.0% done] [66284KB/0KB/0KB /s] [16.6K/0/0 iops]
[eta 00m:00s]


fio /home/cephuser/write_256.fio
write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/14464KB/0KB /s] [0/3616/0 iops]
[eta 00m:00s]


SSD based pool


ceph osd pool get ssdpool all

size: 2
min_size: 1
crash_replay_interval: 0
pg_num: 128
pgp_num: 128
crush_rule: ssdpool
hashpspool: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
auid: 0
fast_read: 0

 rbd -p ssdpool create --size 52100 image2

rbd bench --io-type write  image2 --pool=ssdpool
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern
sequential
  SEC   OPS   OPS/SEC   BYTES/SEC
1 42412  41867.57  171489557.93
2 78343  39180.86  160484805.88
3118082  39076.48  160057256.16
4155164  38683.98  158449572.38
5192825  38307.59  156907885.84
6230701  37716.95  154488608.16
elapsed: 7  ops:   262144  ops/sec: 36862.89  bytes/sec: 150990387.29


[root@osd01 ~]# fio /home/cephuser/write_256.fio
write-4M: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/20224KB/0KB /s] [0/5056/0 iops]
[eta 00m:00s]


fio /home/cephuser/write_256.fio
write-4M: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=rbd, iodepth=32
fio-2.2.8
Starting 1 process
rbd engine: RBD version: 1.12.0
Jobs: 1 (f=1): [r(1)] [100.0% done] [76096KB/0KB/0KB /s] [19.3K/0/0 iops]
[eta 00m:00s]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Steven Vacaroaia
Phew ..right ...  thanks for your patience
Again, my apologies for waiting your time

Steven

On 22 January 2018 at 09:53, Jason Dillaman  wrote:

> Point yum at it -- those "repodata" files are for yum/dnf not you. The
> packages are in the x86_64 / noarch directories as per the standard
> layout of repos.
>
> On Mon, Jan 22, 2018 at 9:52 AM, Steven Vacaroaia 
> wrote:
> > Clicking on the link provided , I get this
> >
> > ../
> > SRPMS/ 22-Jan-2018 13:51
> > -
> > aarch64/   22-Jan-2018 13:51
> > -
> > noarch/22-Jan-2018 13:51
> > -
> > x86_64/22-Jan-2018 13:51
> >
> >
> > Inside of the x86_64/repodata  from above , there is this
> >
> > ../
> > 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f..> 22-Jan-2018 13:51
> > 573
> > 401dc19bda88c82c403423fb835844d64345f7e95f5b983..> 22-Jan-2018 13:51
> > 123
> > 6bf9672d0862e8ef8b8ff05a2fd0208a922b1f5978e6589..> 22-Jan-2018 13:51
> > 123
> > 99a710d0cadf6e62be6c49c455ac355f1c65c1740da1bd8..> 22-Jan-2018 13:51
> > 593
> > dabe2ce5481d23de1f4f52bdcfee0f9af98316c9e0de2ce..> 22-Jan-2018 13:51
> > 134
> > fd013707e27f8251b52dc4b6aea4c07c81c0b06cff0c0c2..> 22-Jan-2018 13:51
> > 1156
> > repomd.xml 22-Jan-2018 13:51
> > 2962
> >
> >
> > How would i use them ?
> >
> >
> > Steven
> >
> >
> > On 22 January 2018 at 09:34, Jason Dillaman  wrote:
> >>
> >> Which URL isn't working for you? You should follow the links in a web
> >> browser, select the most recent build, and then click the "Repo URL"
> >> button to get the URL to provide yum.
> >>
> >> On Mon, Jan 22, 2018 at 9:30 AM, Steven Vacaroaia 
> >> wrote:
> >> >
> >> > Thanks again for your prompt response
> >> >
> >> > My apologies for wasting your time with trivial question
> >> > but
> >> > the repo provided does not contain rpms but a bunch of compressed
> files
> >> > (
> >> > like
> >> >
> >> > 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f31d82bb27bdd8
> 0531-other.sqlite.bz2
> >> > )
> >> > and a repomd.xml
> >> >
> >> > How/what exactly should I download/install   ?
> >> >
> >> > Steven
> >> >
> >> > On 22 January 2018 at 09:18, Jason Dillaman 
> wrote:
> >> >>
> >> >> You can use these repos [1][2][3]
> >> >>
> >> >> [1] https://shaman.ceph.com/repos/python-rtslib/master/
> >> >> [2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
> >> >> [3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/
> >> >>
> >> >> targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
> >> >> replaces it), so you hopefully shouldn't need to update it.
> >> >>
> >> >> On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia 
> >> >> wrote:
> >> >> > Excellent news
> >> >> > Many thanks for all your efforts
> >> >> >
> >> >> > If you do not mind, please confirm the following steps ( for centos
> >> >> > 7,
> >> >> > kernel version 3.10.0-693.11.6.el7.x86_64)
> >> >> >
> >> >> > - download and Install the RPMs from x86_64 repositories you
> provided
> >> >> > - do a git clone and, if new version available "pip install .
> >> >> > --upgrade"
> >> >> > for
> >> >> >  ceph-iscsi-cli
> >> >> >  ceph-iscsi-config
> >> >> >  rtslib-fb
> >> >> > targetcli-fb
> >> >> > - reboot
> >> >> >
> >> >> > Steven
> >> >> >
> >> >> >
> >> >> > On 22 January 2018 at 08:53, Jason Dillaman 
> >> >> > wrote:
> >> >> >>
> >> >> >> The v4.13-based kernel with the necessary bug fixes and TCMU
> changes
> >> >> >> is available here [1] and tcmu-runner v1.3.0 is available here
> [2].
> >> >> >>
> >> >> >> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
> >> >> >> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
> >> >> >>
> >> >> >> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos
> >> >> >> 
> >> >> >> wrote:
> >> >> >> >
> >> >> >> >
> >> >> >> > Sorry for me asking maybe the obvious but is this the kernel
> >> >> >> > available
> >> >> >> > in elrepo? Or a different one?
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > -Original Message-
> >> >> >> > From: Mike Christie [mailto:mchri...@redhat.com]
> >> >> >> > Sent: zaterdag 20 januari 2018 1:19
> >> >> >> > To: Steven Vacaroaia; Joshua Chen
> >> >> >> > Cc: ceph-users
> >> >> >> > Subject: Re: [ceph-users] iSCSI over RBD
> >> >> >> >
> >> >> >> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
> >> >> >> >> Hi Joshua,
> >> >> >> >>
> >> >> >> >> I was under the impression that kernel  3.10.0-693 will work
> with
> >> >> >> > iscsi
> >> >> >> >>
> >> >> >> >
> >> >> >> > That kernel works with RHCS 2.5 and below. You need the rpms
> from
> >> >> >> > that
> >> >> >> > or the matching upstream releases. Besides trying to dig out the
> >> >> >> > versions and 

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Jason Dillaman
Point yum at it -- those "repodata" files are for yum/dnf not you. The
packages are in the x86_64 / noarch directories as per the standard
layout of repos.

On Mon, Jan 22, 2018 at 9:52 AM, Steven Vacaroaia  wrote:
> Clicking on the link provided , I get this
>
> ../
> SRPMS/ 22-Jan-2018 13:51
> -
> aarch64/   22-Jan-2018 13:51
> -
> noarch/22-Jan-2018 13:51
> -
> x86_64/22-Jan-2018 13:51
>
>
> Inside of the x86_64/repodata  from above , there is this
>
> ../
> 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f..> 22-Jan-2018 13:51
> 573
> 401dc19bda88c82c403423fb835844d64345f7e95f5b983..> 22-Jan-2018 13:51
> 123
> 6bf9672d0862e8ef8b8ff05a2fd0208a922b1f5978e6589..> 22-Jan-2018 13:51
> 123
> 99a710d0cadf6e62be6c49c455ac355f1c65c1740da1bd8..> 22-Jan-2018 13:51
> 593
> dabe2ce5481d23de1f4f52bdcfee0f9af98316c9e0de2ce..> 22-Jan-2018 13:51
> 134
> fd013707e27f8251b52dc4b6aea4c07c81c0b06cff0c0c2..> 22-Jan-2018 13:51
> 1156
> repomd.xml 22-Jan-2018 13:51
> 2962
>
>
> How would i use them ?
>
>
> Steven
>
>
> On 22 January 2018 at 09:34, Jason Dillaman  wrote:
>>
>> Which URL isn't working for you? You should follow the links in a web
>> browser, select the most recent build, and then click the "Repo URL"
>> button to get the URL to provide yum.
>>
>> On Mon, Jan 22, 2018 at 9:30 AM, Steven Vacaroaia 
>> wrote:
>> >
>> > Thanks again for your prompt response
>> >
>> > My apologies for wasting your time with trivial question
>> > but
>> > the repo provided does not contain rpms but a bunch of compressed files
>> > (
>> > like
>> >
>> > 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f31d82bb27bdd80531-other.sqlite.bz2
>> > )
>> > and a repomd.xml
>> >
>> > How/what exactly should I download/install   ?
>> >
>> > Steven
>> >
>> > On 22 January 2018 at 09:18, Jason Dillaman  wrote:
>> >>
>> >> You can use these repos [1][2][3]
>> >>
>> >> [1] https://shaman.ceph.com/repos/python-rtslib/master/
>> >> [2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
>> >> [3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/
>> >>
>> >> targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
>> >> replaces it), so you hopefully shouldn't need to update it.
>> >>
>> >> On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia 
>> >> wrote:
>> >> > Excellent news
>> >> > Many thanks for all your efforts
>> >> >
>> >> > If you do not mind, please confirm the following steps ( for centos
>> >> > 7,
>> >> > kernel version 3.10.0-693.11.6.el7.x86_64)
>> >> >
>> >> > - download and Install the RPMs from x86_64 repositories you provided
>> >> > - do a git clone and, if new version available "pip install .
>> >> > --upgrade"
>> >> > for
>> >> >  ceph-iscsi-cli
>> >> >  ceph-iscsi-config
>> >> >  rtslib-fb
>> >> > targetcli-fb
>> >> > - reboot
>> >> >
>> >> > Steven
>> >> >
>> >> >
>> >> > On 22 January 2018 at 08:53, Jason Dillaman 
>> >> > wrote:
>> >> >>
>> >> >> The v4.13-based kernel with the necessary bug fixes and TCMU changes
>> >> >> is available here [1] and tcmu-runner v1.3.0 is available here [2].
>> >> >>
>> >> >> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
>> >> >> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
>> >> >>
>> >> >> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos
>> >> >> 
>> >> >> wrote:
>> >> >> >
>> >> >> >
>> >> >> > Sorry for me asking maybe the obvious but is this the kernel
>> >> >> > available
>> >> >> > in elrepo? Or a different one?
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > -Original Message-
>> >> >> > From: Mike Christie [mailto:mchri...@redhat.com]
>> >> >> > Sent: zaterdag 20 januari 2018 1:19
>> >> >> > To: Steven Vacaroaia; Joshua Chen
>> >> >> > Cc: ceph-users
>> >> >> > Subject: Re: [ceph-users] iSCSI over RBD
>> >> >> >
>> >> >> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
>> >> >> >> Hi Joshua,
>> >> >> >>
>> >> >> >> I was under the impression that kernel  3.10.0-693 will work with
>> >> >> > iscsi
>> >> >> >>
>> >> >> >
>> >> >> > That kernel works with RHCS 2.5 and below. You need the rpms from
>> >> >> > that
>> >> >> > or the matching upstream releases. Besides trying to dig out the
>> >> >> > versions and matching things up, the problem with those releases
>> >> >> > is
>> >> >> > that
>> >> >> > they were tech previewed or only supports linux initiators.
>> >> >> >
>> >> >> > It looks like you are using the newer upstream tools or RHCS 3.0
>> >> >> > tools.
>> >> >> > For them you need the RHEL 7.5 beta or newer kernel or an upstream
>> >> >> > one.
>> >> >> > For upstream all the patches got merged into the target layer
>> >> >> > maintainer's tree yesterday. A 

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Steven Vacaroaia
Clicking on the link provided , I get this

../ 
SRPMS/

22-Jan-2018 13:51
 -aarch64/

  22-Jan-2018 13:51
   -noarch/

   22-Jan-2018 13:51
-x86_64/

   22-Jan-2018 13:51


Inside of the x86_64/repodata  from above , there is this

../ 
2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f..>

22-Jan-2018 13:51
573401dc19bda88c82c403423fb835844d64345f7e95f5b983..>

22-Jan-2018 13:51
1236bf9672d0862e8ef8b8ff05a2fd0208a922b1f5978e6589..>

22-Jan-2018 13:51
12399a710d0cadf6e62be6c49c455ac355f1c65c1740da1bd8..>

22-Jan-2018 13:51
593dabe2ce5481d23de1f4f52bdcfee0f9af98316c9e0de2ce..>

22-Jan-2018 13:51
134fd013707e27f8251b52dc4b6aea4c07c81c0b06cff0c0c2..>

22-Jan-2018 13:511156repomd.xml

22-Jan-2018 13:51
  2962


How would i use them ?


Steven


On 22 January 2018 at 09:34, Jason Dillaman  wrote:

> Which URL isn't working for you? You should follow the links in a web
> browser, select the most recent build, and then click the "Repo URL"
> button to get the URL to provide yum.
>
> On Mon, Jan 22, 2018 at 9:30 AM, Steven Vacaroaia 
> wrote:
> >
> > Thanks again for your prompt response
> >
> > My apologies for wasting your time with trivial question
> > but
> > the repo provided does not contain rpms but a bunch of compressed files (
> > like
> > 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f31d82bb27bdd8
> 0531-other.sqlite.bz2
> > )
> > and a repomd.xml
> >
> > How/what exactly should I download/install   ?
> >
> > Steven
> >
> > On 22 January 2018 at 09:18, Jason Dillaman  wrote:
> >>
> >> You can use these repos [1][2][3]
> >>
> >> [1] https://shaman.ceph.com/repos/python-rtslib/master/
> >> [2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
> >> [3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/
> >>
> >> targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
> >> replaces it), so you hopefully shouldn't need to update it.
> >>
> >> On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia 
> >> wrote:
> >> > Excellent news
> >> > Many thanks for all your efforts
> >> >
> >> > If you do not mind, please confirm the following steps ( for centos 7,
> >> > kernel version 3.10.0-693.11.6.el7.x86_64)
> >> >
> >> > - download and Install the RPMs from x86_64 repositories you provided
> >> > - do a git clone and, if new version available "pip install .
> --upgrade"
> >> > for
> >> >  ceph-iscsi-cli
> >> >  ceph-iscsi-config
> >> >  rtslib-fb
> >> > targetcli-fb
> >> > - reboot
> >> >
> >> > Steven
> >> >
> >> >
> >> > On 22 January 2018 at 08:53, Jason 

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Jason Dillaman
Which URL isn't working for you? You should follow the links in a web
browser, select the most recent build, and then click the "Repo URL"
button to get the URL to provide yum.

On Mon, Jan 22, 2018 at 9:30 AM, Steven Vacaroaia  wrote:
>
> Thanks again for your prompt response
>
> My apologies for wasting your time with trivial question
> but
> the repo provided does not contain rpms but a bunch of compressed files (
> like
> 2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f31d82bb27bdd80531-other.sqlite.bz2
> )
> and a repomd.xml
>
> How/what exactly should I download/install   ?
>
> Steven
>
> On 22 January 2018 at 09:18, Jason Dillaman  wrote:
>>
>> You can use these repos [1][2][3]
>>
>> [1] https://shaman.ceph.com/repos/python-rtslib/master/
>> [2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
>> [3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/
>>
>> targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
>> replaces it), so you hopefully shouldn't need to update it.
>>
>> On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia 
>> wrote:
>> > Excellent news
>> > Many thanks for all your efforts
>> >
>> > If you do not mind, please confirm the following steps ( for centos 7,
>> > kernel version 3.10.0-693.11.6.el7.x86_64)
>> >
>> > - download and Install the RPMs from x86_64 repositories you provided
>> > - do a git clone and, if new version available "pip install . --upgrade"
>> > for
>> >  ceph-iscsi-cli
>> >  ceph-iscsi-config
>> >  rtslib-fb
>> > targetcli-fb
>> > - reboot
>> >
>> > Steven
>> >
>> >
>> > On 22 January 2018 at 08:53, Jason Dillaman  wrote:
>> >>
>> >> The v4.13-based kernel with the necessary bug fixes and TCMU changes
>> >> is available here [1] and tcmu-runner v1.3.0 is available here [2].
>> >>
>> >> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
>> >> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
>> >>
>> >> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos 
>> >> wrote:
>> >> >
>> >> >
>> >> > Sorry for me asking maybe the obvious but is this the kernel
>> >> > available
>> >> > in elrepo? Or a different one?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > -Original Message-
>> >> > From: Mike Christie [mailto:mchri...@redhat.com]
>> >> > Sent: zaterdag 20 januari 2018 1:19
>> >> > To: Steven Vacaroaia; Joshua Chen
>> >> > Cc: ceph-users
>> >> > Subject: Re: [ceph-users] iSCSI over RBD
>> >> >
>> >> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
>> >> >> Hi Joshua,
>> >> >>
>> >> >> I was under the impression that kernel  3.10.0-693 will work with
>> >> > iscsi
>> >> >>
>> >> >
>> >> > That kernel works with RHCS 2.5 and below. You need the rpms from
>> >> > that
>> >> > or the matching upstream releases. Besides trying to dig out the
>> >> > versions and matching things up, the problem with those releases is
>> >> > that
>> >> > they were tech previewed or only supports linux initiators.
>> >> >
>> >> > It looks like you are using the newer upstream tools or RHCS 3.0
>> >> > tools.
>> >> > For them you need the RHEL 7.5 beta or newer kernel or an upstream
>> >> > one.
>> >> > For upstream all the patches got merged into the target layer
>> >> > maintainer's tree yesterday. A new tcmu-runner release has been made.
>> >> > And I just pushed a test kernel with all the patches based on 4.13
>> >> > (4.14
>> >> > had a bug in the login code which is being fixed still) to github, so
>> >> > people do not have to wait for the next-next kernel release to come
>> >> > out.
>> >> >
>> >> > Just give us a couple days for the kernel build to be done, to make
>> >> > the
>> >> > needed ceph-iscsi-* release (current version will fail to create rbd
>> >> > images with the current tcmu-runner release) and get the
>> >> > documentation
>> >> > updated because some links are incorrect and some version info needs
>> >> > to
>> >> > be updated.
>> >> >
>> >> >
>> >> >> Unfortunately  I still cannot create a disk because qfull_time_out
>> >> >> is
>> >> >> not supported
>> >> >>
>> >> >> What am I missing / do it wrong ?
>> >> >>
>> >> >> 2018-01-19 15:06:45,216 INFO [lun.py:601:add_dev_to_lio()] -
>> >> >> (LUN.add_dev_to_lio) Adding image 'rbd.disk2' to LIO
>> >> >> 2018-01-19 15:06:45,295ERROR [lun.py:634:add_dev_to_lio()] -
>> >> >> Could
>> >> >> not set LIO device attribute cmd_time_out/qfull_time_out for device:
>> >> >> rbd.disk2. Kernel not supported. - error(Cannot find attribute:
>> >> >> qfull_time_out)
>> >> >> 2018-01-19 15:06:45,300ERROR [rbd-target-api:731:_disk()] - LUN
>> >> >> alloc problem - Could not set LIO device attribute
>> >> >> cmd_time_out/qfull_time_out for device: rbd.disk2. Kernel not
>> >> > supported.
>> >> >> - error(Cannot find attribute: qfull_time_out)
>> >> >>
>> >> >>
>> >> >> Many thanks
>> >> >>
>> >> >> Steven
>> >> >>
>> >> >> On 4 January 2018 at 22:40, Joshua Chen 

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Steven Vacaroaia
Thanks again for your prompt response

My apologies for wasting your time with trivial question
but
the repo provided does not contain rpms but a bunch of compressed files (
like
2f241a8387cf35372fd709be4ef6ec83b8a00cc744bb90f31d82bb27bdd80531-other.sqlite.bz2
)
and a repomd.xml

How/what exactly should I download/install   ?

Steven

On 22 January 2018 at 09:18, Jason Dillaman  wrote:

> You can use these repos [1][2][3]
>
> [1] https://shaman.ceph.com/repos/python-rtslib/master/
> [2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
> [3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/
>
> targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
> replaces it), so you hopefully shouldn't need to update it.
>
> On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia 
> wrote:
> > Excellent news
> > Many thanks for all your efforts
> >
> > If you do not mind, please confirm the following steps ( for centos 7,
> > kernel version 3.10.0-693.11.6.el7.x86_64)
> >
> > - download and Install the RPMs from x86_64 repositories you provided
> > - do a git clone and, if new version available "pip install . --upgrade"
> for
> >  ceph-iscsi-cli
> >  ceph-iscsi-config
> >  rtslib-fb
> > targetcli-fb
> > - reboot
> >
> > Steven
> >
> >
> > On 22 January 2018 at 08:53, Jason Dillaman  wrote:
> >>
> >> The v4.13-based kernel with the necessary bug fixes and TCMU changes
> >> is available here [1] and tcmu-runner v1.3.0 is available here [2].
> >>
> >> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
> >> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
> >>
> >> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos 
> >> wrote:
> >> >
> >> >
> >> > Sorry for me asking maybe the obvious but is this the kernel available
> >> > in elrepo? Or a different one?
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > -Original Message-
> >> > From: Mike Christie [mailto:mchri...@redhat.com]
> >> > Sent: zaterdag 20 januari 2018 1:19
> >> > To: Steven Vacaroaia; Joshua Chen
> >> > Cc: ceph-users
> >> > Subject: Re: [ceph-users] iSCSI over RBD
> >> >
> >> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
> >> >> Hi Joshua,
> >> >>
> >> >> I was under the impression that kernel  3.10.0-693 will work with
> >> > iscsi
> >> >>
> >> >
> >> > That kernel works with RHCS 2.5 and below. You need the rpms from that
> >> > or the matching upstream releases. Besides trying to dig out the
> >> > versions and matching things up, the problem with those releases is
> that
> >> > they were tech previewed or only supports linux initiators.
> >> >
> >> > It looks like you are using the newer upstream tools or RHCS 3.0
> tools.
> >> > For them you need the RHEL 7.5 beta or newer kernel or an upstream
> one.
> >> > For upstream all the patches got merged into the target layer
> >> > maintainer's tree yesterday. A new tcmu-runner release has been made.
> >> > And I just pushed a test kernel with all the patches based on 4.13
> (4.14
> >> > had a bug in the login code which is being fixed still) to github, so
> >> > people do not have to wait for the next-next kernel release to come
> out.
> >> >
> >> > Just give us a couple days for the kernel build to be done, to make
> the
> >> > needed ceph-iscsi-* release (current version will fail to create rbd
> >> > images with the current tcmu-runner release) and get the documentation
> >> > updated because some links are incorrect and some version info needs
> to
> >> > be updated.
> >> >
> >> >
> >> >> Unfortunately  I still cannot create a disk because qfull_time_out is
> >> >> not supported
> >> >>
> >> >> What am I missing / do it wrong ?
> >> >>
> >> >> 2018-01-19 15:06:45,216 INFO [lun.py:601:add_dev_to_lio()] -
> >> >> (LUN.add_dev_to_lio) Adding image 'rbd.disk2' to LIO
> >> >> 2018-01-19 15:06:45,295ERROR [lun.py:634:add_dev_to_lio()] -
> Could
> >> >> not set LIO device attribute cmd_time_out/qfull_time_out for device:
> >> >> rbd.disk2. Kernel not supported. - error(Cannot find attribute:
> >> >> qfull_time_out)
> >> >> 2018-01-19 15:06:45,300ERROR [rbd-target-api:731:_disk()] - LUN
> >> >> alloc problem - Could not set LIO device attribute
> >> >> cmd_time_out/qfull_time_out for device: rbd.disk2. Kernel not
> >> > supported.
> >> >> - error(Cannot find attribute: qfull_time_out)
> >> >>
> >> >>
> >> >> Many thanks
> >> >>
> >> >> Steven
> >> >>
> >> >> On 4 January 2018 at 22:40, Joshua Chen  >> >> > wrote:
> >> >>
> >> >> Hello Steven,
> >> >>   I am using CentOS 7.4.1708 with kernel 3.10.0-693.el7.x86_64
> >> >>   and the following packages:
> >> >>
> >> >> ceph-iscsi-cli-2.5-9.el7.centos.noarch.rpm
> >> >> ceph-iscsi-config-2.3-12.el7.centos.noarch.rpm
> >> >> libtcmu-1.3.0-0.4.el7.centos.x86_64.rpm
> >> >> libtcmu-devel-1.3.0-0.4.el7.centos.x86_64.rpm
> >> >> 

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Jason Dillaman
You can use these repos [1][2][3]

[1] https://shaman.ceph.com/repos/python-rtslib/master/
[2] https://shaman.ceph.com/repos/ceph-iscsi-config/master/
[3] https://shaman.ceph.com/repos/ceph-iscsi-cli/master/

targetcli isn't used for iSCSI over RBD (gwcli from ceph-iscsi-cli
replaces it), so you hopefully shouldn't need to update it.

On Mon, Jan 22, 2018 at 9:06 AM, Steven Vacaroaia  wrote:
> Excellent news
> Many thanks for all your efforts
>
> If you do not mind, please confirm the following steps ( for centos 7,
> kernel version 3.10.0-693.11.6.el7.x86_64)
>
> - download and Install the RPMs from x86_64 repositories you provided
> - do a git clone and, if new version available "pip install . --upgrade" for
>  ceph-iscsi-cli
>  ceph-iscsi-config
>  rtslib-fb
> targetcli-fb
> - reboot
>
> Steven
>
>
> On 22 January 2018 at 08:53, Jason Dillaman  wrote:
>>
>> The v4.13-based kernel with the necessary bug fixes and TCMU changes
>> is available here [1] and tcmu-runner v1.3.0 is available here [2].
>>
>> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
>> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
>>
>> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos 
>> wrote:
>> >
>> >
>> > Sorry for me asking maybe the obvious but is this the kernel available
>> > in elrepo? Or a different one?
>> >
>> >
>> >
>> >
>> >
>> > -Original Message-
>> > From: Mike Christie [mailto:mchri...@redhat.com]
>> > Sent: zaterdag 20 januari 2018 1:19
>> > To: Steven Vacaroaia; Joshua Chen
>> > Cc: ceph-users
>> > Subject: Re: [ceph-users] iSCSI over RBD
>> >
>> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
>> >> Hi Joshua,
>> >>
>> >> I was under the impression that kernel  3.10.0-693 will work with
>> > iscsi
>> >>
>> >
>> > That kernel works with RHCS 2.5 and below. You need the rpms from that
>> > or the matching upstream releases. Besides trying to dig out the
>> > versions and matching things up, the problem with those releases is that
>> > they were tech previewed or only supports linux initiators.
>> >
>> > It looks like you are using the newer upstream tools or RHCS 3.0 tools.
>> > For them you need the RHEL 7.5 beta or newer kernel or an upstream one.
>> > For upstream all the patches got merged into the target layer
>> > maintainer's tree yesterday. A new tcmu-runner release has been made.
>> > And I just pushed a test kernel with all the patches based on 4.13 (4.14
>> > had a bug in the login code which is being fixed still) to github, so
>> > people do not have to wait for the next-next kernel release to come out.
>> >
>> > Just give us a couple days for the kernel build to be done, to make the
>> > needed ceph-iscsi-* release (current version will fail to create rbd
>> > images with the current tcmu-runner release) and get the documentation
>> > updated because some links are incorrect and some version info needs to
>> > be updated.
>> >
>> >
>> >> Unfortunately  I still cannot create a disk because qfull_time_out is
>> >> not supported
>> >>
>> >> What am I missing / do it wrong ?
>> >>
>> >> 2018-01-19 15:06:45,216 INFO [lun.py:601:add_dev_to_lio()] -
>> >> (LUN.add_dev_to_lio) Adding image 'rbd.disk2' to LIO
>> >> 2018-01-19 15:06:45,295ERROR [lun.py:634:add_dev_to_lio()] - Could
>> >> not set LIO device attribute cmd_time_out/qfull_time_out for device:
>> >> rbd.disk2. Kernel not supported. - error(Cannot find attribute:
>> >> qfull_time_out)
>> >> 2018-01-19 15:06:45,300ERROR [rbd-target-api:731:_disk()] - LUN
>> >> alloc problem - Could not set LIO device attribute
>> >> cmd_time_out/qfull_time_out for device: rbd.disk2. Kernel not
>> > supported.
>> >> - error(Cannot find attribute: qfull_time_out)
>> >>
>> >>
>> >> Many thanks
>> >>
>> >> Steven
>> >>
>> >> On 4 January 2018 at 22:40, Joshua Chen > >> > wrote:
>> >>
>> >> Hello Steven,
>> >>   I am using CentOS 7.4.1708 with kernel 3.10.0-693.el7.x86_64
>> >>   and the following packages:
>> >>
>> >> ceph-iscsi-cli-2.5-9.el7.centos.noarch.rpm
>> >> ceph-iscsi-config-2.3-12.el7.centos.noarch.rpm
>> >> libtcmu-1.3.0-0.4.el7.centos.x86_64.rpm
>> >> libtcmu-devel-1.3.0-0.4.el7.centos.x86_64.rpm
>> >> python-rtslib-2.1.fb64-2.el7.centos.noarch.rpm
>> >> python-rtslib-doc-2.1.fb64-2.el7.centos.noarch.rpm
>> >> targetcli-2.1.fb47-0.1.20170815.git5bf3517.el7.centos.noarch.rpm
>> >> tcmu-runner-1.3.0-0.4.el7.centos.x86_64.rpm
>> >> tcmu-runner-debuginfo-1.3.0-0.4.el7.centos.x86_64.rpm
>> >>
>> >>
>> >> Cheers
>> >> Joshua
>> >>
>> >>
>> >> On Fri, Jan 5, 2018 at 2:14 AM, Steven Vacaroaia > >> > wrote:
>> >>
>> >> Hi Joshua,
>> >>
>> >> How did you manage to use iSCSI gateway ?
>> >> I would like to do that but still waiting for a patched kernel
>> >
>> >>
>> >>   

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Steven Vacaroaia
Excellent news
Many thanks for all your efforts

If you do not mind, please confirm the following steps ( for centos 7,
kernel version 3.10.0-693.11.6.el7.x86_64)

- download and Install the RPMs from x86_64 repositories you provided
- do a git clone and, if new version available "pip install . --upgrade" for
 ceph-iscsi-cli
 ceph-iscsi-config
 rtslib-fb
targetcli-fb
- reboot

Steven


On 22 January 2018 at 08:53, Jason Dillaman  wrote:

> The v4.13-based kernel with the necessary bug fixes and TCMU changes
> is available here [1] and tcmu-runner v1.3.0 is available here [2].
>
> [1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
> [2] https://shaman.ceph.com/repos/tcmu-runner/master/
>
> On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos 
> wrote:
> >
> >
> > Sorry for me asking maybe the obvious but is this the kernel available
> > in elrepo? Or a different one?
> >
> >
> >
> >
> >
> > -Original Message-
> > From: Mike Christie [mailto:mchri...@redhat.com]
> > Sent: zaterdag 20 januari 2018 1:19
> > To: Steven Vacaroaia; Joshua Chen
> > Cc: ceph-users
> > Subject: Re: [ceph-users] iSCSI over RBD
> >
> > On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
> >> Hi Joshua,
> >>
> >> I was under the impression that kernel  3.10.0-693 will work with
> > iscsi
> >>
> >
> > That kernel works with RHCS 2.5 and below. You need the rpms from that
> > or the matching upstream releases. Besides trying to dig out the
> > versions and matching things up, the problem with those releases is that
> > they were tech previewed or only supports linux initiators.
> >
> > It looks like you are using the newer upstream tools or RHCS 3.0 tools.
> > For them you need the RHEL 7.5 beta or newer kernel or an upstream one.
> > For upstream all the patches got merged into the target layer
> > maintainer's tree yesterday. A new tcmu-runner release has been made.
> > And I just pushed a test kernel with all the patches based on 4.13 (4.14
> > had a bug in the login code which is being fixed still) to github, so
> > people do not have to wait for the next-next kernel release to come out.
> >
> > Just give us a couple days for the kernel build to be done, to make the
> > needed ceph-iscsi-* release (current version will fail to create rbd
> > images with the current tcmu-runner release) and get the documentation
> > updated because some links are incorrect and some version info needs to
> > be updated.
> >
> >
> >> Unfortunately  I still cannot create a disk because qfull_time_out is
> >> not supported
> >>
> >> What am I missing / do it wrong ?
> >>
> >> 2018-01-19 15:06:45,216 INFO [lun.py:601:add_dev_to_lio()] -
> >> (LUN.add_dev_to_lio) Adding image 'rbd.disk2' to LIO
> >> 2018-01-19 15:06:45,295ERROR [lun.py:634:add_dev_to_lio()] - Could
> >> not set LIO device attribute cmd_time_out/qfull_time_out for device:
> >> rbd.disk2. Kernel not supported. - error(Cannot find attribute:
> >> qfull_time_out)
> >> 2018-01-19 15:06:45,300ERROR [rbd-target-api:731:_disk()] - LUN
> >> alloc problem - Could not set LIO device attribute
> >> cmd_time_out/qfull_time_out for device: rbd.disk2. Kernel not
> > supported.
> >> - error(Cannot find attribute: qfull_time_out)
> >>
> >>
> >> Many thanks
> >>
> >> Steven
> >>
> >> On 4 January 2018 at 22:40, Joshua Chen  >> > wrote:
> >>
> >> Hello Steven,
> >>   I am using CentOS 7.4.1708 with kernel 3.10.0-693.el7.x86_64
> >>   and the following packages:
> >>
> >> ceph-iscsi-cli-2.5-9.el7.centos.noarch.rpm
> >> ceph-iscsi-config-2.3-12.el7.centos.noarch.rpm
> >> libtcmu-1.3.0-0.4.el7.centos.x86_64.rpm
> >> libtcmu-devel-1.3.0-0.4.el7.centos.x86_64.rpm
> >> python-rtslib-2.1.fb64-2.el7.centos.noarch.rpm
> >> python-rtslib-doc-2.1.fb64-2.el7.centos.noarch.rpm
> >> targetcli-2.1.fb47-0.1.20170815.git5bf3517.el7.centos.noarch.rpm
> >> tcmu-runner-1.3.0-0.4.el7.centos.x86_64.rpm
> >> tcmu-runner-debuginfo-1.3.0-0.4.el7.centos.x86_64.rpm
> >>
> >>
> >> Cheers
> >> Joshua
> >>
> >>
> >> On Fri, Jan 5, 2018 at 2:14 AM, Steven Vacaroaia  >> > wrote:
> >>
> >> Hi Joshua,
> >>
> >> How did you manage to use iSCSI gateway ?
> >> I would like to do that but still waiting for a patched kernel
> >
> >>
> >> What kernel/OS did you use and/or how did you patch it ?
> >>
> >> Tahnsk
> >> Steven
> >>
> >> On 4 January 2018 at 04:50, Joshua Chen
> >>  > >
> >> wrote:
> >>
> >> Dear all,
> >>   Although I managed to run gwcli and created some iqns,
> > or
> >> luns,
> >> but I do need some working config example so that my
> >> initiator could connect and get the lun.
> >>
> >>  

Re: [ceph-users] iSCSI over RBD

2018-01-22 Thread Jason Dillaman
The v4.13-based kernel with the necessary bug fixes and TCMU changes
is available here [1] and tcmu-runner v1.3.0 is available here [2].

[1] https://shaman.ceph.com/repos/kernel/ceph-iscsi-test/
[2] https://shaman.ceph.com/repos/tcmu-runner/master/

On Sat, Jan 20, 2018 at 7:33 AM, Marc Roos  wrote:
>
>
> Sorry for me asking maybe the obvious but is this the kernel available
> in elrepo? Or a different one?
>
>
>
>
>
> -Original Message-
> From: Mike Christie [mailto:mchri...@redhat.com]
> Sent: zaterdag 20 januari 2018 1:19
> To: Steven Vacaroaia; Joshua Chen
> Cc: ceph-users
> Subject: Re: [ceph-users] iSCSI over RBD
>
> On 01/19/2018 02:12 PM, Steven Vacaroaia wrote:
>> Hi Joshua,
>>
>> I was under the impression that kernel  3.10.0-693 will work with
> iscsi
>>
>
> That kernel works with RHCS 2.5 and below. You need the rpms from that
> or the matching upstream releases. Besides trying to dig out the
> versions and matching things up, the problem with those releases is that
> they were tech previewed or only supports linux initiators.
>
> It looks like you are using the newer upstream tools or RHCS 3.0 tools.
> For them you need the RHEL 7.5 beta or newer kernel or an upstream one.
> For upstream all the patches got merged into the target layer
> maintainer's tree yesterday. A new tcmu-runner release has been made.
> And I just pushed a test kernel with all the patches based on 4.13 (4.14
> had a bug in the login code which is being fixed still) to github, so
> people do not have to wait for the next-next kernel release to come out.
>
> Just give us a couple days for the kernel build to be done, to make the
> needed ceph-iscsi-* release (current version will fail to create rbd
> images with the current tcmu-runner release) and get the documentation
> updated because some links are incorrect and some version info needs to
> be updated.
>
>
>> Unfortunately  I still cannot create a disk because qfull_time_out is
>> not supported
>>
>> What am I missing / do it wrong ?
>>
>> 2018-01-19 15:06:45,216 INFO [lun.py:601:add_dev_to_lio()] -
>> (LUN.add_dev_to_lio) Adding image 'rbd.disk2' to LIO
>> 2018-01-19 15:06:45,295ERROR [lun.py:634:add_dev_to_lio()] - Could
>> not set LIO device attribute cmd_time_out/qfull_time_out for device:
>> rbd.disk2. Kernel not supported. - error(Cannot find attribute:
>> qfull_time_out)
>> 2018-01-19 15:06:45,300ERROR [rbd-target-api:731:_disk()] - LUN
>> alloc problem - Could not set LIO device attribute
>> cmd_time_out/qfull_time_out for device: rbd.disk2. Kernel not
> supported.
>> - error(Cannot find attribute: qfull_time_out)
>>
>>
>> Many thanks
>>
>> Steven
>>
>> On 4 January 2018 at 22:40, Joshua Chen > > wrote:
>>
>> Hello Steven,
>>   I am using CentOS 7.4.1708 with kernel 3.10.0-693.el7.x86_64
>>   and the following packages:
>>
>> ceph-iscsi-cli-2.5-9.el7.centos.noarch.rpm
>> ceph-iscsi-config-2.3-12.el7.centos.noarch.rpm
>> libtcmu-1.3.0-0.4.el7.centos.x86_64.rpm
>> libtcmu-devel-1.3.0-0.4.el7.centos.x86_64.rpm
>> python-rtslib-2.1.fb64-2.el7.centos.noarch.rpm
>> python-rtslib-doc-2.1.fb64-2.el7.centos.noarch.rpm
>> targetcli-2.1.fb47-0.1.20170815.git5bf3517.el7.centos.noarch.rpm
>> tcmu-runner-1.3.0-0.4.el7.centos.x86_64.rpm
>> tcmu-runner-debuginfo-1.3.0-0.4.el7.centos.x86_64.rpm
>>
>>
>> Cheers
>> Joshua
>>
>>
>> On Fri, Jan 5, 2018 at 2:14 AM, Steven Vacaroaia > > wrote:
>>
>> Hi Joshua,
>>
>> How did you manage to use iSCSI gateway ?
>> I would like to do that but still waiting for a patched kernel
>
>>
>> What kernel/OS did you use and/or how did you patch it ?
>>
>> Tahnsk
>> Steven
>>
>> On 4 January 2018 at 04:50, Joshua Chen
>>  >
>> wrote:
>>
>> Dear all,
>>   Although I managed to run gwcli and created some iqns,
> or
>> luns,
>> but I do need some working config example so that my
>> initiator could connect and get the lun.
>>
>>   I am familiar with targetcli and I used to do the
>> following ACL style connection rather than password,
>> the targetcli setting tree is here:
>>
>> (or see this page
>> )
>>
>> #targetcli ls
>> o- /
>>
> 
> .
>> [...]
>>   o- backstores
>>
> 
> ..
>> [...]
>>   | o- block
>>
> 

Re: [ceph-users] udev rule or script to auto add bcache devices?

2018-01-22 Thread Alfredo Deza
On Mon, Jan 22, 2018 at 1:37 AM, Wido den Hollander  wrote:
>
>
> On 01/20/2018 07:56 PM, Stefan Priebe - Profihost AG wrote:
>>
>> Hello,
>>
>> bcache didn't supported partitions on the past so that a lot of our osds
>> have their data directly on:
>> /dev/bcache[0-9]
>>
>> But that means i can't give them the needed part type of
>> 4fbd7e29-9d25-41b8-afd0-062c0ceff05d and that means that the activation
>> with udev und ceph-disk does not work.
>>
>> Had anybody already fixed this or hacked something together?

Like Wido mentioned, if you are using Luminous, you can do this easily
with ceph-volume. There is no need to force partitions on anything or
set labels
to get recognized by udev.

Note that ceph-volume doesn't support encryption yet, although that
work is almost complete and should be available soon.
>
>
> Not really. But with ceph-volume around the corner, isn't that something
> that might work? It doesn't use udev anymore.
>
> You need to run Luminous though.
>
> Wido
>
>
>>
>> Greets,
>> Stefan
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD doesn't start - fresh installation

2018-01-22 Thread Hüseyin Atatür YILDIRIM

Hi again,

In the “journalctl –xe”  output:

Jan 22 15:29:18 mon02 ceph-osd-prestart.sh[1526]: OSD data directory 
/var/lib/ceph/osd/ceph-1 does not exist; bailing out.

Also in my previous post, I forgot to say that “ceph-deploy osd create”  
command  doesn’t fail and appears to be successful, you can see from the logs.
But dameons on nodes don’t start.

Regards,
Atatur




[cid:imagebb4d32.PNG@d7ad441f.4c963bbc] 
[cid:image50c417.JPG@606135d0.4f9a9358]
Hüseyin Atatür YILDIRIM
SİSTEM MÜHENDİSİ
Üniversiteler Mah. İhsan Doğramacı Bul. ODTÜ Teknokent Havelsan A.Ş. 23/B 
Çankaya Ankara TÜRKİYE
[cid:image60c332.PNG@38d52cd1.40afcd61] +90 312 292 74 00   
[cid:imagec41e42.PNG@a4560fba.4aaeabd0] +90 312 219 57 97


[cid:image6e5829.JPG@ef9b8277.45994307]
YASAL UYARI: Bu elektronik posta işbu linki kullanarak ulaşabileceğiniz Koşul 
ve Şartlar dokümanına tabidir. 

LEGAL NOTICE: This e-mail is subject to the Terms and Conditions document which 
can be accessed with this link. 


[http://www.havelsan.com.tr/Library/images/mail/email.jpg]  Lütfen 
gerekmedikçe bu sayfa ve eklerini yazdırmayınız / Please consider the 
environment before printing this email


From: Hüseyin Atatür YILDIRIM
Sent: Monday, January 22, 2018 3:19 PM
To: ceph-users@lists.ceph.com
Subject: OSD doesn't start - fresh installation

Hi all,

Fresh installation but already ısed disks. I zapped all the disks and  ran  
“ceph-deploy ods create”  again but  got same results.
Log is attached. Can you please help?


Thank you,
Atatur
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous upgrade with existing EC pools

2018-01-22 Thread David Turner
I've already migrated all osds to bluestore and changed my pools to use a
crush rule specifying them to use an HDD class (forced about half of my
data to move). This week I'm planning to add in some new SSDs to move the
metadata pool to.

I have experience with adding and removing cache tiers without losing data
in the underlying pool. The documentation on this in the upgrade procedure
and from the EC documentation had me very leary. Seeing the information
about EC pools from the CephFS documentation helps me to feel much more
confident. Thank you.

On Mon, Jan 22, 2018, 5:53 AM John Spray  wrote:

> On Sat, Jan 20, 2018 at 6:26 PM, David Turner 
> wrote:
> > I am not able to find documentation for how to convert an existing cephfs
> > filesystem to use allow_ec_overwrites. The documentation says that the
> > metadata pool needs to be replicated, but that the data pool can be EC.
> But
> > it says, "For Cephfs, using an erasure coded pool means setting that
> pool in
> > a file layout." Is that really necessary if your metadata pool is
> replicated
> > and you have an existing EC pool for the data? Could I just enable ec
> > overwrites and start flushing/removing the cache tier and be on my way to
> > just using an EC pool?
>
> That snipped in the RADOS docs is a bit misleading: you only need to
> use a file layout if you're adding an EC pool as an addition pool
> rather than using it during creation of a filesystem.
>
> The CephFS version of events is here:
>
> http://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
>
> As for migrating from a cache tiered configuration, I haven't tried
> it, but there's nothing CephFS-specific about it.  If the underlying
> pool that's set as the cephfs data pool is EC and has
> allow_ec_overwrites then CephFS won't care -- but I'm personally not
> an expert on what knobs and buttons to use to migrate away from a
> cache tiered config.
>
> Do bear in mind that your OSDs need to be using bluestore (which may
> not be the case since you're talking about migrating an existing
> system?)
>
> John
>
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD doesn't start - fresh installation

2018-01-22 Thread Hüseyin Atatür YILDIRIM

Hi all,

Fresh installation but already ısed disks. I zapped all the disks and  ran  
“ceph-deploy ods create”  again but  got same results.
Log is attached. Can you please help?


Thank you,
Atatur


[cid:image028a8d.PNG@94a00751.438f095a] 
[cid:imageb2b5d1.JPG@c9e5388f.42b3d277]
Hüseyin Atatür YILDIRIM
SİSTEM MÜHENDİSİ
Üniversiteler Mah. İhsan Doğramacı Bul. ODTÜ Teknokent Havelsan A.Ş. 23/B 
Çankaya Ankara TÜRKİYE
[cid:imagedeafde.PNG@be2f9503.4da6c9ef] +90 312 292 74 00   
[cid:image3667b8.PNG@95f007c4.4f9a80e1] +90 312 219 57 97


[cid:imageaf771f.JPG@922ea955.42b08e59]
YASAL UYARI: Bu elektronik posta işbu linki kullanarak ulaşabileceğiniz Koşul 
ve Şartlar dokümanına tabidir. 

LEGAL NOTICE: This e-mail is subject to the Terms and Conditions document which 
can be accessed with this link. 


[http://www.havelsan.com.tr/Library/images/mail/email.jpg]  Lütfen 
gerekmedikçe bu sayfa ve eklerini yazdırmayınız / Please consider the 
environment before printing this email




Osd-create.log
Description: Osd-create.log
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Nick Fisk
Anyone with 25G ethernet willing to do the test? Would love to see what the 
latency figures are for that.

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Maged 
Mokhtar
Sent: 22 January 2018 11:28
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] What is the should be the expected latency of 10Gbit 
network connections

 

On 2018-01-22 08:39, Wido den Hollander wrote:



On 01/20/2018 02:02 PM, Marc Roos wrote: 

  If I test my connections with sockperf via a 1Gbit switch I get around
25usec, when I test the 10Gbit connection via the switch I have around
12usec is that normal? Or should there be a differnce of 10x.


No, that's normal.

Tests with 8k ping packets over different links I did:

1GbE:  0.800ms
10GbE: 0.200ms
40GbE: 0.150ms

Wido




sockperf ping-pong

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
ReceivedMessages=432874
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
ReceivedMessages=428640
sockperf: > avg-lat= 11.609 (std-dev=1.684)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages = 0
sockperf: Summary: Latency is 11.609 usec
sockperf: Total 428640 observations; each percentile contains 4286.40
observations
sockperf: --->  observation =  856.944
sockperf: ---> percentile  99.99 =   39.789
sockperf: ---> percentile  99.90 =   20.550
sockperf: ---> percentile  99.50 =   17.094
sockperf: ---> percentile  99.00 =   15.578
sockperf: ---> percentile  95.00 =   12.838
sockperf: ---> percentile  90.00 =   12.299
sockperf: ---> percentile  75.00 =   11.844
sockperf: ---> percentile  50.00 =   11.409
sockperf: ---> percentile  25.00 =   11.124
sockperf: --->  observation =8.888

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
ReceivedMessages=22064
sockperf: = Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=20056;
ReceivedMessages=20056
sockperf: > avg-lat= 24.861 (std-dev=1.774)
sockperf: # dropped messages = 0; # duplicated messages = 0; #
out-of-order messages = 0
sockperf: Summary: Latency is 24.861 usec
sockperf: Total 20056 observations; each percentile contains 200.56
observations
sockperf: --->  observation =   77.158
sockperf: ---> percentile  99.99 =   54.285
sockperf: ---> percentile  99.90 =   37.864
sockperf: ---> percentile  99.50 =   34.406
sockperf: ---> percentile  99.00 =   33.337
sockperf: ---> percentile  95.00 =   27.497
sockperf: ---> percentile  90.00 =   26.072
sockperf: ---> percentile  75.00 =   24.618
sockperf: ---> percentile  50.00 =   24.443
sockperf: ---> percentile  25.00 =   24.361
sockperf: --->  observation =   16.746
[root@c01 sbin]# sockperf ping-pong -i 192.168.0.12 -p 5001 -t 10
sockperf: == version #2.6 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
socket(s)








___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com  
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

I find the ping command with flood option handy to measure latency, gives stats 
min/max/average/std deviation

example:

ping  -c 10 -f 10.0.1.12

Maged

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is the should be the expected latency of 10Gbit network connections

2018-01-22 Thread Maged Mokhtar
On 2018-01-22 08:39, Wido den Hollander wrote:

> On 01/20/2018 02:02 PM, Marc Roos wrote: 
> 
>> If I test my connections with sockperf via a 1Gbit switch I get around
>> 25usec, when I test the 10Gbit connection via the switch I have around
>> 12usec is that normal? Or should there be a differnce of 10x.
> 
> No, that's normal.
> 
> Tests with 8k ping packets over different links I did:
> 
> 1GbE:  0.800ms
> 10GbE: 0.200ms
> 40GbE: 0.150ms
> 
> Wido
> 
>> sockperf ping-pong
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
>> ReceivedMessages=432874
>> sockperf: = Printing statistics for Server No: 0
>> sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
>> ReceivedMessages=428640
>> sockperf: > avg-lat= 11.609 (std-dev=1.684)
>> sockperf: # dropped messages = 0; # duplicated messages = 0; #
>> out-of-order messages = 0
>> sockperf: Summary: Latency is 11.609 usec
>> sockperf: Total 428640 observations; each percentile contains 4286.40
>> observations
>> sockperf: --->  observation =  856.944
>> sockperf: ---> percentile  99.99 =   39.789
>> sockperf: ---> percentile  99.90 =   20.550
>> sockperf: ---> percentile  99.50 =   17.094
>> sockperf: ---> percentile  99.00 =   15.578
>> sockperf: ---> percentile  95.00 =   12.838
>> sockperf: ---> percentile  90.00 =   12.299
>> sockperf: ---> percentile  75.00 =   11.844
>> sockperf: ---> percentile  50.00 =   11.409
>> sockperf: ---> percentile  25.00 =   11.124
>> sockperf: --->  observation =8.888
>> 
>> sockperf: Warmup stage (sending a few dummy messages)...
>> sockperf: Starting test...
>> sockperf: Test end (interrupted by timer)
>> sockperf: Test ended
>> sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
>> ReceivedMessages=22064
>> sockperf: = Printing statistics for Server No: 0
>> sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=20056;
>> ReceivedMessages=20056
>> sockperf: > avg-lat= 24.861 (std-dev=1.774)
>> sockperf: # dropped messages = 0; # duplicated messages = 0; #
>> out-of-order messages = 0
>> sockperf: Summary: Latency is 24.861 usec
>> sockperf: Total 20056 observations; each percentile contains 200.56
>> observations
>> sockperf: --->  observation =   77.158
>> sockperf: ---> percentile  99.99 =   54.285
>> sockperf: ---> percentile  99.90 =   37.864
>> sockperf: ---> percentile  99.50 =   34.406
>> sockperf: ---> percentile  99.00 =   33.337
>> sockperf: ---> percentile  95.00 =   27.497
>> sockperf: ---> percentile  90.00 =   26.072
>> sockperf: ---> percentile  75.00 =   24.618
>> sockperf: ---> percentile  50.00 =   24.443
>> sockperf: ---> percentile  25.00 =   24.361
>> sockperf: --->  observation =   16.746
>> [root@c01 sbin]# sockperf ping-pong -i 192.168.0.12 -p 5001 -t 10
>> sockperf: == version #2.6 ==
>> sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
>> socket(s)
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

I find the ping command with flood option handy to measure latency,
gives stats min/max/average/std deviation 

example: 

ping  -c 10 -f 10.0.1.12 

Maged___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous upgrade with existing EC pools

2018-01-22 Thread John Spray
On Sat, Jan 20, 2018 at 6:26 PM, David Turner  wrote:
> I am not able to find documentation for how to convert an existing cephfs
> filesystem to use allow_ec_overwrites. The documentation says that the
> metadata pool needs to be replicated, but that the data pool can be EC. But
> it says, "For Cephfs, using an erasure coded pool means setting that pool in
> a file layout." Is that really necessary if your metadata pool is replicated
> and you have an existing EC pool for the data? Could I just enable ec
> overwrites and start flushing/removing the cache tier and be on my way to
> just using an EC pool?

That snipped in the RADOS docs is a bit misleading: you only need to
use a file layout if you're adding an EC pool as an addition pool
rather than using it during creation of a filesystem.

The CephFS version of events is here:
http://docs.ceph.com/docs/master/cephfs/createfs/#using-erasure-coded-pools-with-cephfs

As for migrating from a cache tiered configuration, I haven't tried
it, but there's nothing CephFS-specific about it.  If the underlying
pool that's set as the cephfs data pool is EC and has
allow_ec_overwrites then CephFS won't care -- but I'm personally not
an expert on what knobs and buttons to use to migrate away from a
cache tiered config.

Do bear in mind that your OSDs need to be using bluestore (which may
not be the case since you're talking about migrating an existing
system?)

John

>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df shows 100% used

2018-01-22 Thread Webert de Souza Lima
Hi,

On Fri, Jan 19, 2018 at 8:31 PM, zhangbingyin 
 wrote:

> 'MAX AVAIL' in the 'ceph df' output represents the amount of data that can
> be used before the first OSD becomes full, and not the sum of all free
> space across a set of OSDs.
>

Thank you very much. I figured this out by the end of the day. That is the
answer. I'm not sure this is in ceph.com docs though.
Now I know the problem is indeed solved (by doing proper reweight).

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Missing udev rule for FC disks (Re: mkjournal error creating journal ... : (13) Permission denied)

2018-01-22 Thread tom.byrne
I believe I've recently spent some time with this issue, so I hope this is 
helpful. Apologies if it's an unrelated dm/udev/ceph-disk problem.

https://lists.freedesktop.org/archives/systemd-devel/2017-July/039222.html

The above email from last July explains the situation somewhat, with the 
outcome (as I understand it) being future versions of lvm/dm will have rules to 
create the necessary partuuid symlinks for dm devices.

I'm unsure when that will make its way into various distribution lvm packages 
(I haven't checked up on this for a month or two actually). For now I've tested 
running with the new dm-disk.rules on the storage nodes that need it, which 
allowed ceph-disk to work as expected.

Cheers
Tom

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Fulvio 
Galeazzi
Sent: 19 January 2018 15:46
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Missing udev rule for FC disks (Re: mkjournal error 
creating journal ... : (13) Permission denied)

Hallo,
  apologies for reviving an old thread, but I just wasted again one full 
day as I had forgotten about this issue...

To recap, udev rules nowadays do not (at least in my case, I am using disks 
served via FiberChannel) create the links /dev/disk/by-partuuid that ceph-disk 
expects.

I see the "culprit" is this line in (am on CentOS, but Ubuntu has the same 
issue): /usr/lib/udev/rules.d/60-persistent-storage.rules

.
# skip rules for inappropriate block devices 
KERNEL=="fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md*|zram*|mmcblk[0-9]*rpmb",
GOTO="persistent_storage_end"
.

stating that multipath'ed devices (called dm-*) should be skipped.


I can happily live with the file mentioned below, but was wondering:

- is there any hope that newer kernels may handle multipath devices
   properly?

- as an alternative, could it be possible to update ceph-disk
   such that symlinks for journal use some other
   /dev/disk/by-?

Thanks!

Fulvio

On 3/16/2017 5:59 AM, Gunwoo Gim wrote:
>   Thank you so much Peter. The 'udevadm trigger' after 'partprobe' 
> triggered the udev rules and I've found out that even before the udev 
> ruleset triggers the owner is already ceph:ceph.
> 
>   I've dug into ceph-disk a little more and found out that there is a 
> symbolic link of
> /dev/disk/by-partuuid/120c536d-cb30-4cea-b607-dd347022a497 at 
> [/dev/mapper/vg--hdd1-lv--hdd1p1(the_filestore_osd)]/journal and the 
> source doesn't exist. though it exists in /dev/disk/by-parttypeuuid 
> which has been populated by 
> /lib/udev/rules.d/60-ceph-by-parttypeuuid.rules
> 
>   So I added this in /lib/udev/rules.d/60-ceph-by-parttypeuuid.rules:
> # when ceph-disk prepares a filestore osd it makes a symbolic link by 
> disk/by-partuuid but LVM2 doesn't seem to populate /dev/disk/by-partuuid.
> ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_TYPE}=="?*", 
> ENV{ID_PART_ENTRY_UUID}=="?*",
> SYMLINK+="disk/by-partuuid/$env{ID_PART_ENTRY_UUID}"
>   And finally got the osds all up and in. :D
> 
>   Yeah, It wasn't actually a permission problem, but the link just 
> wasn't existing.
> 
> 
> ~ # ceph-disk -v activate /dev/mapper/vg--hdd1-lv--hdd1p1 ...
> mount: Mounting /dev/mapper/vg--hdd1-lv--hdd1p1 on 
> /var/lib/ceph/tmp/mnt.ECAifr with options 
> noatime,largeio,inode64,swalloc
> command_check_call: Running command: /bin/mount -t xfs -o 
> noatime,largeio,inode64,swalloc -- /dev/mapper/vg--hdd1-lv--hdd1p1 
> /var/lib/ceph/tmp/mnt.ECAifr
> mount: DIGGIN ls -al /var/lib/ceph/tmp/mnt.ECAifr
> mount: DIGGIN total 36
> drwxr-xr-x 3 ceph ceph  174 Mar 14 11:51 .
> drwxr-xr-x 6 ceph ceph 4096 Mar 16 11:30 ..
> -rw-r--r-- 1 root root  202 Mar 16 11:19 activate.monmap
> -rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 ceph_fsid drwxr-xr-x 3 ceph 
> ceph   39 Mar 14 11:51 current
> -rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 fsid lrwxrwxrwx 1 ceph ceph   
> 58 Mar 14 11:45 journal ->
> /dev/disk/by-partuuid/120c536d-cb30-4cea-b607-dd347022a497
> -rw-r--r-- 1 ceph ceph   37 Mar 14 11:45 journal_uuid
> -rw-r--r-- 1 ceph ceph   21 Mar 14 11:45 magic
> -rw-r--r-- 1 ceph ceph    4 Mar 14 11:51 store_version
> -rw-r--r-- 1 ceph ceph   53 Mar 14 11:51 superblock
> -rw-r--r-- 1 ceph ceph    2 Mar 14 11:51 whoami ...
> ceph_disk.main.Error: Error: ['ceph-osd', '--cluster', 'ceph', 
> '--mkfs', '--mkkey', '-i', u'0', '--monmap', 
> '/var/lib/ceph/tmp/mnt.ECAifr/activate.monmap', '- -osd-data', 
> '/var/lib/ceph/tmp/mnt.ECAifr', '--osd-journal', 
> '/var/lib/ceph/tmp/mnt.ECAifr/journal', '--osd-uuid', 
> u'377c336b-278d-4caf-b2f5-592ac72cd9b6', '- -keyring', 
> '/var/lib/ceph/tmp/mnt.ECAifr/keyring', '--setuser', 'ceph', 
> '--setgroup', 'ceph'] failed : 2017-03-16 11:30:05.238725 7f918fbc0a40
> -1 filestore(/v
> ar/lib/ceph/tmp/mnt.ECAifr) mkjournal error creating journal on
> /var/lib/ceph/tmp/mnt.ECAifr/journal: (13) Permission denied
> 2017-03-16 11:30:05.238756 7f918fbc0a40 -1 OSD::mkfs: 
> ObjectStore::mkfs failed with