Re: [ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread M Ranga Swami Reddy
My thought was - Ceph block volume with raid#0 (means I mounted a ceph
block volumes to an instance/VM, there I would like to configure this
volume with RAID0).

Just to know, if anyone doing the same as above, if yes what are the
constraints?

Thanks
Swami

On Wed, Jan 30, 2019 at 7:56 PM Janne Johansson  wrote:
>
> Den ons 30 jan. 2019 kl 14:47 skrev M Ranga Swami Reddy 
> :
>>
>> Hello - Can I use the ceph block volume with RAID#0? Are there any
>> issues with this?
>
>
> Hard to tell if you mean raid0 over a block volume or a block volume over 
> raid0. Still, it is seldom a good idea to stack redundancies on top of each 
> other.
> It will work, but may not give the gains you might expect from it.
>
> --
> May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Lenz Grimmer


Am 30. Januar 2019 19:33:14 MEZ schrieb PHARABOT Vincent 
:

>Thanks for the info
>But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full,
>/api/health/minimal also...)

On which node did you try to access the API? Did you enable the Dashboard 
module in Ceph manager?

Lenz

-- 
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] block storage over provisioning

2019-01-30 Thread Void Star Nill
Thanks Wido. Appreciate quick response.

On Wed, 30 Jan 2019 at 12:27, Wido den Hollander  wrote:

>
>
> On 1/30/19 9:12 PM, Void Star Nill wrote:
> > Hello,
> >
> > When a Ceph block device is created with a given size, does Ceph
> > allocate all that space right away or is that allocated as the user
> > starts storing the data?
> >
> > I want to know if we can over provision the Ceph cluster. For example,
> > if we have a cluster with 10G available space, am I allowed to create
> > only 10,  1G volumes? Or can I over provision and assume that each
> > volume is going to be used only up to 50% and allow creation of 20  1G
> > volumes?
> >
>
> Data is not allocated. You can overprovision with RBD as much as you like.
>
> You can provision 1PB of storage on a cluster with just 100TB of usable
> capacity. Ceph/RBD won't prevent you from doing so.
>
> Wido
>
> > Thanks,
> > Shridhar
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph mimic issue with snaptimming.

2019-01-30 Thread Darius Kasparavičius
Hello,


I have recently update a cluster to mimic.  After the upgrade I have
started converting nodes to bluestore one by one. While ceph was
rebalancing I slapped a "nosnaptrim" on the cluster to save a bit of
IO. After the rebalancing was done I enabled the snaptrim and my osds
started flapping like crazy. I immediately slapped back "nosnaptrim"
on the cluster and let the osds come back online.
After everything calmed down I'm left with 28/31196788 objects unfound
(0.000%) and still can't enable the snaptrim. All the osds start
flapping with simillar messages to this:
   -10> 2019-01-30 21:12:24.970 7f9c773f1700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371018, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-9> 2019-01-30 21:12:24.970 7f9c773f1700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371019, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-8> 2019-01-30 21:12:24.970 7f9c773f1700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371020, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-7> 2019-01-30 21:12:24.971 7f9c733e9700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371021, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-6> 2019-01-30 21:12:24.975 7f9c773f1700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371022, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
-5> 2019-01-30 21:12:24.975 7f9c831f3700  3 osd.45 395796
handle_osd_map epochs [395796,395796], i have 395796, src has
[373186,395796]
-4> 2019-01-30 21:12:24.977 7f9c723e7700  5 osd.45 pg_epoch:
395796 pg[11.2eb( v 395794'141488204
(393296'141485192,395794'141488204] local-lis/les=395790/395791 n=9355
ec=448/448 lis/c 395790/395709 les/c/f 395791/395733/0
395795/395795/395795) [98,45
] r=1 lpr=395795 pi=[395709,395795)/1 crt=395794'141488204 lcod
395782'141488203 unknown NOTIFY mbc={}
ps=[190e4~1,19131~1,19181~1,191f1~1,19248~1,1928f~1,192e0~1,19328~1,19370~1,193c0~1,19431~1]]
exit Started/Stray 0.979289 7 0.000253
-3> 2019-01-30 21:12:24.977 7f9c723e7700  5 osd.45 pg_epoch:
395796 pg[11.2eb( v 395794'141488204
(393296'141485192,395794'141488204] local-lis/les=395790/395791 n=9355
ec=448/448 lis/c 395790/395709 les/c/f 395791/395733/0
395795/395795/395795) [98,45
] r=1 lpr=395795 pi=[395709,395795)/1 crt=395794'141488204 lcod
395782'141488203 unknown NOTIFY mbc={}
ps=[190e4~1,19131~1,19181~1,191f1~1,19248~1,1928f~1,192e0~1,19328~1,19370~1,193c0~1,19431~1]]
enter Started/ReplicaActive
-2> 2019-01-30 21:12:24.977 7f9c723e7700  5 osd.45 pg_epoch:
395796 pg[11.2eb( v 395794'141488204
(393296'141485192,395794'141488204] local-lis/les=395790/395791 n=9355
ec=448/448 lis/c 395790/395709 les/c/f 395791/395733/0
395795/395795/395795) [98,45
] r=1 lpr=395795 pi=[395709,395795)/1 crt=395794'141488204 lcod
395782'141488203 unknown NOTIFY mbc={}
ps=[190e4~1,19131~1,19181~1,191f1~1,19248~1,1928f~1,192e0~1,19328~1,19370~1,193c0~1,19431~1]]
enter Started/ReplicaActive/RepNotRecovering
-1> 2019-01-30 21:12:24.983 7f9c733e9700  5 write_log_and_missing
with: dirty_to: 0'0, dirty_from: 4294967295'18446744073709551615,
writeout_from: 395796'95371023, trimmed: , trimmed_dups: ,
clear_divergent_priors: 0
 0> 2019-01-30 21:12:24.990 7f9c7b3f9700 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.4/rpm/el7/BUILD/ceph-13.2.4/src/osd/PrimaryLogPG.h:
In funct
ion 
'PrimaryLogPG::Trimming::Trimming(boost::statechart::state::my_context)' thread 7f9c7b3f9700 time
2019-01-30 21:12:24.987263
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.4/rpm/el7/BUILD/ceph-13.2.4/src/osd/PrimaryLogPG.h:
1571: FAILED assert(context< SnapTrimmer >().can_trim())

 ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0xff) [0x7f9ca0f5c16f]
 2: (()+0x25a337) [0x7f9ca0f5c337]
 3: (PrimaryLogPG::NotTrimming::react(PrimaryLogPG::KickTrim
const&)+0x783) [0x56351d32abc3]
 4: (boost::statechart::simple_state,
(boost::statechart::history_mode)0>::react_impl(boost::statechart::event_base
const&, void const*)+0xa9) [0x56351d376629]
 5: (boost::statechart::state_machine,
boost::statechart::null_exception_translator>::process_queued_events()+0xb3)
[0x56351d350f23]
 6: (boost::statechart::state_machine,
boost::statechart::null_exception_translator>::process_event(boost::statechart::event_base
const&)+0x87) [0x56351d351187]
 7: (PrimaryLogPG::WaitReservation::ReservationCB::finish(int)+0xbb)

Re: [ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Marc Roos

I was wondering the same, from a 'default' setup I get this performance,
no idea if this is bad, good or normal.

4k r ran. 

4k w ran. 

4k r seq. 

4k w seq. 

1024k r ran. 

1024k w ran. 

1024k r seq. 

1024k w seq. 

  size 

lat 

iops 

kB/s 

lat 

iops 

kB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

lat 

iops 

MB/s 

Cephfs 

ssd rep. 3 

  2.78 

1781 

7297 

1.42 

700 

2871 

0.29 

3314 

13.6 

0.04 

889 

3.64 

4.3 

231 

243 

0.08 

132 

139 

4.23 

235 

247 

6.99 

142 

150 

Cephfs 

ssd rep. 1 

  0.54 

1809 

7412 

0.8 

1238 

5071 

0.29 

3325 

13.6 

0.56 

1761 

7.21 

4.27 

233 

245 

4.34 

229 

241 

4.21 

236 

248 

4.34 

229 

241 

Samsung 

MZK7KM480 

480GB 

   0.09 

10.2k 

41600 

0.05 

17.9k 

73200 

0.05 

18k 

77.6 

0.05 

18.3k 

75.1 

2.06 

482 

506 

2.16 

460 

483 

1.98 

502 

527 

2.13 

466 

489 


(4 nodes, CentOS7, luminous) 

Ps. not sure why you test with one node. If you expand to a 2nd node,
you might get a unpleasant surprise with a drop in performance, because
you will be adding network latency that decreases your iops.



-Original Message-
From: Hector Martin [mailto:hec...@marcansoft.com]
Sent: 30 January 2019 19:43
To: ceph-users@lists.ceph.com
Subject: [ceph-users] CephFS performance vs. underlying storage

Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at
most.

--
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed

2019-01-30 Thread Brian Godette
Did you mkfs with -O 64bit or have it in the [defaults] section of 
/etc/mke2fs.conf before creating the filesystem? If you didn't 4TB is as big as 
it goes and can't be changed after the fact. If the device is already larger 
than 4TB when you create the filesystem, mkfs does the right then and silently 
enables 64bit.


man ext4



From: ceph-users  on behalf of Götz Reinicke 

Sent: Saturday, January 26, 2019 8:10 AM
To: Ceph Users
Subject: Re: [ceph-users] Rezising an online mounted ext4 on a rbd - failed



Am 26.01.2019 um 14:16 schrieb Kevin Olbrich 
mailto:k...@sv01.de>>:

Am Sa., 26. Jan. 2019 um 13:43 Uhr schrieb Götz Reinicke
mailto:goetz.reini...@filmakademie.de>>:

Hi,

I have a fileserver which mounted a 4TB rbd, which is ext4 formatted.

I grow that rbd and ext4 starting with an 2TB rbd that way:

rbd resize testpool/disk01--size 4194304

resize2fs /dev/rbd0

Today I wanted to extend that ext4 to 8 TB and did:

rbd resize testpool/disk01--size 8388608

resize2fs /dev/rbd0

=> which gives an error: The filesystem is already 1073741824 blocks. Nothing 
to do.


   I bet I missed something very simple. Any hint? Thanks and regards . Götz

Try "partprobe" to read device metrics again.

Did not change anything and did not give any output/log messages.

/Götz


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] block storage over provisioning

2019-01-30 Thread Wido den Hollander


On 1/30/19 9:12 PM, Void Star Nill wrote:
> Hello,
> 
> When a Ceph block device is created with a given size, does Ceph
> allocate all that space right away or is that allocated as the user
> starts storing the data?
> 
> I want to know if we can over provision the Ceph cluster. For example,
> if we have a cluster with 10G available space, am I allowed to create
> only 10,  1G volumes? Or can I over provision and assume that each
> volume is going to be used only up to 50% and allow creation of 20  1G
> volumes?
> 

Data is not allocated. You can overprovision with RBD as much as you like.

You can provision 1PB of storage on a cluster with just 100TB of usable
capacity. Ceph/RBD won't prevent you from doing so.

Wido

> Thanks,
> Shridhar
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-30 Thread Wido den Hollander


On 1/30/19 9:08 PM, David Zafman wrote:
> 
> Strange, I can't reproduce this with v13.2.4.  I tried the following
> scenarios:
> 
> pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out).  The df on osd.2
> shows 0 space, but only osd.4 (backfill target) checks full space.
> 
> pg acting 1, 0, 2 -> up 4,3,5 (osd,1,0,2 all marked out).  The df for
> 1,0,2 show 0 space but osd.4,3,4 (backafill targets) check full space.
> 
> FYI, In a later release even when a backfill target is below
> backfillfull_ratio, if there isn't enough room for the pg to fit then
> backfill_toofull occurs.
> 
> 
> The question in your case is was any of  OSDs 999, 1900, or 145 above
> 90% (backfillfull_ratio) usage.

I triple-checked and this was not the case. I've had two Instances of
Mimic 13.2.4 where I ran into this and had somebody else report it to me.

In a few weeks I'll be performing an expansion with a customer where I'm
expecting this to show up again.

I'll check again and note the use on all OSDs and report back.

Wido

> 
> David
> 
> On 1/27/19 11:34 PM, Wido den Hollander wrote:
>>
>> On 1/25/19 8:33 AM, Gregory Farnum wrote:
>>> This doesn’t look familiar to me. Is the cluster still doing recovery so
>>> we can at least expect them to make progress when the “out” OSDs get
>>> removed from the set?
>> The recovery has already finished. It resolves itself, but in the
>> meantime I saw many PGs in the backfill_toofull state for a long time.
>>
>> This is new since Mimic.
>>
>> Wido
>>
>>> On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander >> > wrote:
>>>
>>>  Hi,
>>>
>>>  I've got a couple of PGs which are stuck in backfill_toofull,
>>> but none
>>>  of them are actually full.
>>>
>>>    "up": [
>>>      999,
>>>      1900,
>>>      145
>>>    ],
>>>    "acting": [
>>>      701,
>>>      1146,
>>>      1880
>>>    ],
>>>    "backfill_targets": [
>>>      "145",
>>>      "999",
>>>      "1900"
>>>    ],
>>>    "acting_recovery_backfill": [
>>>      "145",
>>>      "701",
>>>      "999",
>>>      "1146",
>>>      "1880",
>>>      "1900"
>>>    ],
>>>
>>>  I checked all these OSDs, but they are all <75% utilization.
>>>
>>>  full_ratio 0.95
>>>  backfillfull_ratio 0.9
>>>  nearfull_ratio 0.9
>>>
>>>  So I started checking all the PGs and I've noticed that each of
>>> these
>>>  PGs has one OSD in the 'acting_recovery_backfill' which is
>>> marked as
>>>  out.
>>>
>>>  In this case osd.1880 is marked as out and thus it's capacity is
>>> shown
>>>  as zero.
>>>
>>>  [ceph@ceph-mgr ~]$ ceph osd df|grep 1880
>>>  1880   hdd 4.54599        0     0 B      0 B      0 B     0   
>>> 0  27
>>>  [ceph@ceph-mgr ~]$
>>>
>>>  This is on a Mimic 13.2.4 cluster. Is this expected or is this a
>>> unknown
>>>  side-effect of one of the OSDs being marked as out?
>>>
>>>  Thanks,
>>>
>>>  Wido
>>>  ___
>>>  ceph-users mailing list
>>>  ceph-users@lists.ceph.com 
>>>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] block storage over provisioning

2019-01-30 Thread Void Star Nill
Hello,

When a Ceph block device is created with a given size, does Ceph allocate
all that space right away or is that allocated as the user starts storing
the data?

I want to know if we can over provision the Ceph cluster. For example, if
we have a cluster with 10G available space, am I allowed to create only 10,
 1G volumes? Or can I over provision and assume that each volume is going
to be used only up to 50% and allow creation of 20  1G volumes?

Thanks,
Shridhar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] backfill_toofull while OSDs are not full

2019-01-30 Thread David Zafman


Strange, I can't reproduce this with v13.2.4.  I tried the following 
scenarios:


pg acting 1, 0, 2 -> up 1, 0 4 (osd.2 marked out).  The df on osd.2 
shows 0 space, but only osd.4 (backfill target) checks full space.


pg acting 1, 0, 2 -> up 4,3,5 (osd,1,0,2 all marked out).  The df for 
1,0,2 show 0 space but osd.4,3,4 (backafill targets) check full space.


FYI, In a later release even when a backfill target is below 
backfillfull_ratio, if there isn't enough room for the pg to fit then 
backfill_toofull occurs.



The question in your case is was any of  OSDs 999, 1900, or 145 above 
90% (backfillfull_ratio) usage.


David

On 1/27/19 11:34 PM, Wido den Hollander wrote:


On 1/25/19 8:33 AM, Gregory Farnum wrote:

This doesn’t look familiar to me. Is the cluster still doing recovery so
we can at least expect them to make progress when the “out” OSDs get
removed from the set?

The recovery has already finished. It resolves itself, but in the
meantime I saw many PGs in the backfill_toofull state for a long time.

This is new since Mimic.

Wido


On Tue, Jan 22, 2019 at 2:44 PM Wido den Hollander mailto:w...@42on.com>> wrote:

 Hi,

 I've got a couple of PGs which are stuck in backfill_toofull, but none
 of them are actually full.

   "up": [
     999,
     1900,
     145
   ],
   "acting": [
     701,
     1146,
     1880
   ],
   "backfill_targets": [
     "145",
     "999",
     "1900"
   ],
   "acting_recovery_backfill": [
     "145",
     "701",
     "999",
     "1146",
     "1880",
     "1900"
   ],

 I checked all these OSDs, but they are all <75% utilization.

 full_ratio 0.95
 backfillfull_ratio 0.9
 nearfull_ratio 0.9

 So I started checking all the PGs and I've noticed that each of these
 PGs has one OSD in the 'acting_recovery_backfill' which is marked as
 out.

 In this case osd.1880 is marked as out and thus it's capacity is shown
 as zero.

 [ceph@ceph-mgr ~]$ ceph osd df|grep 1880
 1880   hdd 4.54599        0     0 B      0 B      0 B     0    0  27
 [ceph@ceph-mgr ~]$

 This is on a Mimic 13.2.4 cluster. Is this expected or is this a unknown
 side-effect of one of the OSDs being marked as out?

 Thanks,

 Wido
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] moving a new hardware to cluster

2019-01-30 Thread Fabio Abreu
Hi Martin Thanks for you reply !

Yes I am using "osd recovery op priority", "osd max backfills", "osd
recovery max active" and "osd client op priorit" to trying  minimize the
impact in the cluster expansion.

My Ceph version is 10.2.7 Jewel and I am moving 1 osd waiting the recovery
and go to another one, because I have a history of move an entire storage
and this change create slow requests and pg blocking.

I asking this question early to try understand the experience of another
ceph admins with this scenario.

Thanks and best Regards,
Fabio Abreu


On Wed, Jan 30, 2019 at 5:37 PM Martin Verges 
wrote:

> Hello Fabio,
>
> you can use the "osd recovery sleep" option to prevent trouble while
> recovery/rebalancing happens. Other than that, options like "osd recovery
> op priority", "osd max backfills", "osd recovery max active", "osd client
> op priority" and other might help you depending on your cluster version,
> configuration and hardware.
> As we believe a good Ceph solution should make your live easier, we have
> build a option slider within the maintenance view in our software. You can
> see this in the attached screenshot. Maybe you give it a try!
>
> Please feel free to contact us if you need assistance with your cluster.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695
> E-Mail: martin.ver...@croit.io
> Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
>
> Web: https://croit.io
> YouTube: https://goo.gl/PGE1Bx
>
>
> Am Mi., 30. Jan. 2019 um 12:38 Uhr schrieb Fabio Abreu <
> fabioabreur...@gmail.com>:
>
>> Hi everybody,
>>
>> I have a doubt about moving a new sata storage(new hardware too) inside
>> of production rack with a huge amount data.
>>
>> I thinks this movimentation creates news pgs and can be reduce my
>> performance if i do this wrong and we don't a lot experience in a new
>> hardware move inside cluster.
>>
>> Can someome recommend me what I should review before the new hardware
>> move ?
>>
>> if I move osd to the cluster can I have more precaution in this scenario
>> ?
>>
>> Regards,
>>
>> Fabio Abreu Reis
>> http://fajlinux.com.br
>> *Tel : *+55 21 98244-0161
>> *Skype : *fabioabreureis
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>

-- 
Atenciosamente,
Fabio Abreu Reis
http://fajlinux.com.br
*Tel : *+55 21 98244-0161
*Skype : *fabioabreureis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread solarflow99
Do can you do HA on the NFS shares?

On Wed, Jan 30, 2019 at 9:10 AM David C  wrote:

> Hi Patrick
>
> Thanks for the info. If I did multiple exports, how does that work in
> terms of the cache settings defined in ceph.conf, are those settings per
> CephFS client or a shared cache? I.e if I've definied client_oc_size, would
> that be per export?
>
> Cheers,
>
> On Tue, Jan 15, 2019 at 6:47 PM Patrick Donnelly 
> wrote:
>
>> On Mon, Jan 14, 2019 at 7:11 AM Daniel Gryniewicz 
>> wrote:
>> >
>> > Hi.  Welcome to the community.
>> >
>> > On 01/14/2019 07:56 AM, David C wrote:
>> > > Hi All
>> > >
>> > > I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
>> > > filesystem, it seems to be working pretty well so far. A few
>> questions:
>> > >
>> > > 1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a
>> > > libcephfs client,..." [1]. For arguments sake, if I have ten top level
>> > > dirs in my Cephfs namespace, is there any value in creating a separate
>> > > export for each directory? Will that potentially give me better
>> > > performance than a single export of the entire namespace?
>> >
>> > I don't believe there are any advantages from the Ceph side.  From the
>> > Ganesha side, you configure permissions, client ACLs, squashing, and so
>> > on on a per-export basis, so you'll need different exports if you need
>> > different settings for each top level directory.  If they can all use
>> > the same settings, one export is probably better.
>>
>> There may be performance impact (good or bad) with having separate
>> exports for CephFS. Each export instantiates a separate instance of
>> the CephFS client which has its own bookkeeping and set of
>> capabilities issued by the MDS. Also, each client instance has a
>> separate big lock (potentially a big deal for performance). If the
>> data for each export is disjoint (no hard links or shared inodes) and
>> the NFS server is expected to have a lot of load, breaking out the
>> exports can have a positive impact on performance. If there are hard
>> links, then the clients associated with the exports will potentially
>> fight over capabilities which will add to request latency.)
>>
>> --
>> Patrick Donnelly
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Martin Verges
Hello Vincent,

when you install or migrate to croit, you can get a large number of REST
API's (see https://croit.io/docs/v1809/cluster#get-cluster-status) and we
support read-only users that you can create in our GUI.
If you want to use our API's from the cli, you can use our httpie-auth
plugin from https://github.com/croit/httpie-auth-croit to simplify the auth.

You can try it out our Ceph management solution with our demo from
https://croit.io/croit-virtual-demo on your computer or by just importing
your existing cluster using the https://croit.io/croit-production-guide.

Everything you see in our GUI can be reached through API's. To get a
glimpse of the possibilities, look at https://croit.io/screenshots.

--
Martin Verges
Managing director

Mobile: +49 174 9335695
E-Mail: martin.ver...@croit.io
Chat: https://t.me/MartinVerges

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263

Web: https://croit.io
YouTube: https://goo.gl/PGE1Bx


Am Mi., 30. Jan. 2019 um 14:04 Uhr schrieb PHARABOT Vincent <
vincent.phara...@3ds.com>:

> Hello,
>
>
>
> I have my cluster set up correctly now (thank you again for the help)
>
>
>
> I am seeking now a way to get cluster health thru API (REST) with curl
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to provide
> simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health
> check result – moreover I don’t want monitoring user to be able to do all
> the command in this module.
>
> Dashboard is a dashboard so could not get health thru curl
>
>
>
> It seems it was possible with “ceph-rest-api” but it looks like this tools
> is no more available in ceph-common…
>
>
>
> Is there a simple way to have this ? (without writing python mgr module
> which will take a lot of time for this)
>
>
>
> Thank you
>
> Vincent
>
>
>
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential and/or
> privileged.
>
> If you are not one of the named recipients or have received this email in
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
>
> (iii) Dassault Systèmes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
>
> Please be informed that your personal data are processed according to our
> data privacy policy as described on our website. Should you have any
> questions related to personal data protection, please contact 3DS Data
> Protection Officer at 3ds.compliance-priv...@3ds.com
>
>
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>Thanks. Is there any reason you monitor op_w_latency but not 
>>op_r_latency but instead op_latency? 
>>
>>Also why do you monitor op_w_process_latency? but not op_r_process_latency? 

I monitor read too. (I have all metrics for osd sockets, and a lot of graphs).

I just don't see latency difference on reads. (or they are very very small  vs 
the write latency increase)



- Mail original -
De: "Stefan Priebe, Profihost AG" 
À: "aderumier" 
Cc: "Sage Weil" , "ceph-users" , 
"ceph-devel" 
Envoyé: Mercredi 30 Janvier 2019 19:50:20
Objet: Re: [ceph-users] ceph osd commit latency increase over time, until 
restart

Hi, 

Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER: 
> Hi Stefan, 
> 
>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>> like suggested. This report makes me a little nervous about my change. 
> Well,I'm really not sure that it's a tcmalloc bug. 
> maybe bluestore related (don't have filestore anymore to compare) 
> I need to compare with bigger latencies 
> 
> here an example, when all osd at 20-50ms before restart, then after restart 
> (at 21:15), 1ms 
> http://odisoweb1.odiso.net/latencybad.png 
> 
> I observe the latency in my guest vm too, on disks iowait. 
> 
> http://odisoweb1.odiso.net/latencybadvm.png 
> 
>>> Also i'm currently only monitoring latency for filestore osds. Which 
>>> exact values out of the daemon do you use for bluestore? 
> 
> here my influxdb queries: 
> 
> It take op_latency.sum/op_latency.avgcount on last second. 
> 
> 
> SELECT non_negative_derivative(first("op_latency.sum"), 
> 1s)/non_negative_derivative(first("op_latency.avgcount"),1s) FROM "ceph" 
> WHERE "host" =~ /^([[host]])$/ AND "id" =~ /^([[osd]])$/ AND $timeFilter 
> GROUP BY time($interval), "host", "id" fill(previous) 
> 
> 
> SELECT non_negative_derivative(first("op_w_latency.sum"), 
> 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s) FROM "ceph" 
> WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ /^([[osd]])$/ 
> AND $timeFilter GROUP BY time($interval), "host", "id" fill(previous) 
> 
> 
> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 
> 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s) FROM 
> "ceph" WHERE "host" =~ /^([[host]])$/ AND collection='osd' AND "id" =~ 
> /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
> fill(previous) 

Thanks. Is there any reason you monitor op_w_latency but not 
op_r_latency but instead op_latency? 

Also why do you monitor op_w_process_latency? but not op_r_process_latency? 

greets, 
Stefan 

> 
> 
> 
> 
> 
> - Mail original - 
> De: "Stefan Priebe, Profihost AG"  
> À: "aderumier" , "Sage Weil"  
> Cc: "ceph-users" , "ceph-devel" 
>  
> Envoyé: Mercredi 30 Janvier 2019 08:45:33 
> Objet: Re: [ceph-users] ceph osd commit latency increase over time, until 
> restart 
> 
> Hi, 
> 
> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>> Hi, 
>> 
>> here some new results, 
>> different osd/ different cluster 
>> 
>> before osd restart latency was between 2-5ms 
>> after osd restart is around 1-1.5ms 
>> 
>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>> 
>> From what I see in diff, the biggest difference is in tcmalloc, but maybe 
>> I'm wrong. 
>> (I'm using tcmalloc 2.5-2.2) 
> 
> currently i'm in the process of switching back from jemalloc to tcmalloc 
> like suggested. This report makes me a little nervous about my change. 
> 
> Also i'm currently only monitoring latency for filestore osds. Which 
> exact values out of the daemon do you use for bluestore? 
> 
> I would like to check if i see the same behaviour. 
> 
> Greets, 
> Stefan 
> 
>> 
>> - Mail original - 
>> De: "Sage Weil"  
>> À: "aderumier"  
>> Cc: "ceph-users" , "ceph-devel" 
>>  
>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>> Objet: Re: ceph osd commit latency increase over time, until restart 
>> 
>> Can you capture a perf top or perf record to see where teh CPU time is 
>> going on one of the OSDs wth a high latency? 
>> 
>> Thanks! 
>> sage 
>> 
>> 
>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>> 
>>> 
>>> Hi, 
>>> 
>>> I have a strange behaviour of my osd, on multiple clusters, 
>>> 
>>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
>>> export-diff/snapshotdelete each day for backup 
>>> 
>>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>> 
>>> But overtime, this latency increase slowly (maybe around 1ms by day), until 
>>> reaching crazy 
>>> values like 20-200ms. 
>>> 
>>> Some example graphs: 
>>> 
>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>> 
>>> All osds have this behaviour, in all clusters. 
>>> 
>>> The latency of physical disks 

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>If it does, probably only by accident. :) The autotuner in master is 
>>pretty dumb and mostly just grows/shrinks the caches based on the 
>>default ratios but accounts for the memory needed for rocksdb 
>>indexes/filters. It will try to keep the total OSD memory consumption 
>>below the specified limit. It doesn't do anything smart like monitor 
>>whether or not large caches may introduce more latency than small 
>>caches. It actually adds a small amount of additional overhead in the 
>>mempool thread to perform the calculations. If you had a static 
>>workload and tuned the bluestore cache size and ratios perfectly it 
>>would only add extra (albeit fairly minimal with the default settings) 
>>computational cost.

Ok, thanks for the explain !



>>If perf isn't showing anything conclusive, you might try my wallclock 
>>profiler: http://github.com/markhpc/gdbpmp 

I'll try, thanks


>>Some other things to watch out for are CPUs switching C states 

for cpu, c-state are disabled, cpu is running always at max frequency
(intel_pstate=disable intel_idle.max_cstate=0 processor.max_cstate=1)


and the 
>>effect of having transparent huge pages enabled (though I'd be more 
>>concerned about this in terms of memory usage). 

cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never


(also server have only 1 socket, so no numa problem)

- Mail original -
De: "Mark Nelson" 
À: "ceph-users" 
Envoyé: Mercredi 30 Janvier 2019 18:08:08
Objet: Re: [ceph-users] ceph osd commit latency increase over time, until 
restart

On 1/30/19 7:45 AM, Alexandre DERUMIER wrote: 
>>> I don't see any smoking gun here... :/ 
> I need to test to compare when latency are going very high, but I need to 
> wait more days/weeks. 
> 
> 
>>> The main difference between a warm OSD and a cold one is that on startup 
>>> the bluestore cache is empty. You might try setting the bluestore cache 
>>> size to something much smaller and see if that has an effect on the CPU 
>>> utilization? 
> I will try to test. I also wonder if the new auto memory tuning from Mark 
> could help too ? 
> (I'm still on mimic 13.2.1, planning to update to 13.2.5 next month) 
> 
> also, could check some bluestore related counters ? (onodes, 
> rocksdb,bluestore cache) 


If it does, probably only by accident. :) The autotuner in master is 
pretty dumb and mostly just grows/shrinks the caches based on the 
default ratios but accounts for the memory needed for rocksdb 
indexes/filters. It will try to keep the total OSD memory consumption 
below the specified limit. It doesn't do anything smart like monitor 
whether or not large caches may introduce more latency than small 
caches. It actually adds a small amount of additional overhead in the 
mempool thread to perform the calculations. If you had a static 
workload and tuned the bluestore cache size and ratios perfectly it 
would only add extra (albeit fairly minimal with the default settings) 
computational cost. 


If perf isn't showing anything conclusive, you might try my wallclock 
profiler: http://github.com/markhpc/gdbpmp 


Some other things to watch out for are CPUs switching C states and the 
effect of having transparent huge pages enabled (though I'd be more 
concerned about this in terms of memory usage). 


Mark 


> 
>>> Note that this doesn't necessarily mean that's what you want. Maybe the 
>>> reason why the CPU utilization is higher is because the cache is warm and 
>>> the OSD is serving more requests per second... 
> Well, currently, the server is really quiet 
> 
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await 
> w_await svctm %util 
> nvme0n1 2,00 515,00 48,00 1182,00 304,00 11216,00 18,73 0,01 0,00 0,00 0,00 
> 0,01 1,20 
> 
> %Cpu(s): 1,5 us, 1,0 sy, 0,0 ni, 97,2 id, 0,2 wa, 0,0 hi, 0,1 si, 0,0 st 
> 
> And this is only with writes, not reads 
> 
> 
> 
> - Mail original - 
> De: "Sage Weil"  
> À: "aderumier"  
> Cc: "ceph-users" , "ceph-devel" 
>  
> Envoyé: Mercredi 30 Janvier 2019 14:33:23 
> Objet: Re: ceph osd commit latency increase over time, until restart 
> 
> On Wed, 30 Jan 2019, Alexandre DERUMIER wrote: 
>> Hi, 
>> 
>> here some new results, 
>> different osd/ different cluster 
>> 
>> before osd restart latency was between 2-5ms 
>> after osd restart is around 1-1.5ms 
>> 
>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
> I don't see any smoking gun here... :/ 
> 
> The main difference between a warm OSD and a cold one is that on startup 
> the bluestore cache is empty. You might try setting the bluestore cache 
> size to something much smaller and see if that has an effect on the CPU 
> utilization? 
> 
> Note that this doesn't necessarily mean that's what you want. Maybe the 
> reason why the CPU utilization is higher is because the cache is warm and 
> the OSD is serving more requests per second... 
> 
> 

Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Stefan Priebe - Profihost AG
Hi,

Am 30.01.19 um 14:59 schrieb Alexandre DERUMIER:
> Hi Stefan,
> 
>>> currently i'm in the process of switching back from jemalloc to tcmalloc 
>>> like suggested. This report makes me a little nervous about my change. 
> Well,I'm really not sure that it's a tcmalloc bug. 
> maybe bluestore related (don't have filestore anymore to compare)
> I need to compare with bigger latencies
> 
> here an example, when all osd at 20-50ms before restart, then after restart 
> (at 21:15), 1ms
> http://odisoweb1.odiso.net/latencybad.png
> 
> I observe the latency in my guest vm too, on disks iowait.
> 
> http://odisoweb1.odiso.net/latencybadvm.png
> 
>>> Also i'm currently only monitoring latency for filestore osds. Which
>>> exact values out of the daemon do you use for bluestore?
> 
> here my influxdb queries:
> 
> It take op_latency.sum/op_latency.avgcount on last second.
> 
> 
> SELECT non_negative_derivative(first("op_latency.sum"), 
> 1s)/non_negative_derivative(first("op_latency.avgcount"),1s)   FROM "ceph" 
> WHERE "host" =~  /^([[host]])$/  AND "id" =~ /^([[osd]])$/ AND $timeFilter 
> GROUP BY time($interval), "host", "id" fill(previous)
> 
> 
> SELECT non_negative_derivative(first("op_w_latency.sum"), 
> 1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s)   FROM "ceph" 
> WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ 
> /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
> fill(previous)
> 
> 
> SELECT non_negative_derivative(first("op_w_process_latency.sum"), 
> 1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s)   FROM 
> "ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ 
> /^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
> fill(previous)

Thanks. Is there any reason you monitor op_w_latency but not
op_r_latency but instead op_latency?

Also why do you monitor op_w_process_latency? but not op_r_process_latency?

greets,
Stefan

> 
> 
> 
> 
> 
> - Mail original -
> De: "Stefan Priebe, Profihost AG" 
> À: "aderumier" , "Sage Weil" 
> Cc: "ceph-users" , "ceph-devel" 
> 
> Envoyé: Mercredi 30 Janvier 2019 08:45:33
> Objet: Re: [ceph-users] ceph osd commit latency increase over time, until 
> restart
> 
> Hi, 
> 
> Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
>> Hi, 
>>
>> here some new results, 
>> different osd/ different cluster 
>>
>> before osd restart latency was between 2-5ms 
>> after osd restart is around 1-1.5ms 
>>
>> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
>> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
>> http://odisoweb1.odiso.net/cephperf2/diff.txt 
>>
>> From what I see in diff, the biggest difference is in tcmalloc, but maybe 
>> I'm wrong. 
>> (I'm using tcmalloc 2.5-2.2) 
> 
> currently i'm in the process of switching back from jemalloc to tcmalloc 
> like suggested. This report makes me a little nervous about my change. 
> 
> Also i'm currently only monitoring latency for filestore osds. Which 
> exact values out of the daemon do you use for bluestore? 
> 
> I would like to check if i see the same behaviour. 
> 
> Greets, 
> Stefan 
> 
>>
>> - Mail original - 
>> De: "Sage Weil"  
>> À: "aderumier"  
>> Cc: "ceph-users" , "ceph-devel" 
>>  
>> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
>> Objet: Re: ceph osd commit latency increase over time, until restart 
>>
>> Can you capture a perf top or perf record to see where teh CPU time is 
>> going on one of the OSDs wth a high latency? 
>>
>> Thanks! 
>> sage 
>>
>>
>> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
>>
>>>
>>> Hi, 
>>>
>>> I have a strange behaviour of my osd, on multiple clusters, 
>>>
>>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
>>> export-diff/snapshotdelete each day for backup 
>>>
>>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>>>
>>> But overtime, this latency increase slowly (maybe around 1ms by day), until 
>>> reaching crazy 
>>> values like 20-200ms. 
>>>
>>> Some example graphs: 
>>>
>>> http://odisoweb1.odiso.net/osdlatency1.png 
>>> http://odisoweb1.odiso.net/osdlatency2.png 
>>>
>>> All osds have this behaviour, in all clusters. 
>>>
>>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>>>
>>> And if I restart the osd, the latency come back to 0,5-1ms. 
>>>
>>> That's remember me old tcmalloc bug, but maybe could it be a bluestore 
>>> memory bug ? 
>>>
>>> Any Hints for counters/logs to check ? 
>>>
>>>
>>> Regards, 
>>>
>>> Alexandre 
>>>
>>>
>>
>> ___ 
>> ceph-users mailing list 
>> ceph-users@lists.ceph.com 
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS performance vs. underlying storage

2019-01-30 Thread Hector Martin
Hi list,

I'm experimentally running single-host CephFS as as replacement for
"traditional" filesystems.

My setup is 8×8TB HDDs using dm-crypt, with CephFS on a 5+2 EC pool. All
of the components are running on the same host (mon/osd/mds/kernel
CephFS client). I've set the stripe_unit/object_size to a relatively
high 80MB (up from the default 4MB). I figure I want individual reads on
the disks to be several megabytes per object for good sequential
performance, and since this is an EC pool 4MB objects would be split
into 800kB chunks, which is clearly not ideal. With 80MB objects, chunks
are 16MB, which sounds more like a healthy read size for sequential
access (e.g. something like 10 IOPS per disk during seq reads).

With this config, I get about 270MB/s sequential from CephFS. On the
same disks, an ext4 on dm-crypt on dm-raid6 yields ~680MB/s. So it seems
Ceph achieves less than half of the raw performance that the underlying
storage is capable of (with similar RAID redundancy). *

Obviously there will be some overhead with a stack as deep as Ceph
compared to more traditional setups, but I'm wondering if there are
improvements to be had here. While reading from CephFS I do not have
significant CPU usage, so I don't think I'm CPU limited. Could the issue
perhaps be latency through the stack / lack of read-ahead? Reading two
files in parallel doesn't really get me more than 300MB/s in total, so
parallelism doesn't seem to help much.

I'm curious as to whether there are any knobs I can play with to try to
improve performance, or whether this level of overhead is pretty much
inherent to Ceph. Even though this is an unusual single-host setup, I
imagine proper clusters might also have similar results when comparing
raw storage performance.

* Ceph has a slight disadvantage here because its chunk of the drives is
logically after the traditional RAID, and HDDs get slower towards higher
logical addresses, but this should be on the order of a 15-20% hit at most.

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello

Thanks for the info
But, nope, on Mimic (13.2.4) /api/health ends in 404 (/api/health/full, 
/api/health/minimal also...)

Vincent

-Message d'origine-
De : ceph-users [mailto:ceph-users-boun...@lists.ceph.com] De la part de Lenz 
Grimmer
Envoyé : mercredi 30 janvier 2019 16:26
À : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Simple API to have cluster healthcheck ?

Hi,

On 1/30/19 2:02 PM, PHARABOT Vincent wrote:

> I have my cluster set up correctly now (thank you again for the help)

What version of Ceph is this?

> I am seeking now a way to get cluster health thru API (REST) with curl
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health
> check result – moreover I don’t want monitoring user to be able to do
> all the command in this module.
>
> Dashboard is a dashboard so could not get health thru curl

Hmm, the Mimic dashboard's REST API should expose an "/api/health"
endpoint. Have you tried that one?

For Nautilus, this seems to has been split into /api/health/full and 
/api/health/minimal, to reduce the overhead.

Lenz

--
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany) GF:Felix 
Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)

This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Scottix
I generally have gone the crush reweight 0 route
This way the drive can participate in the rebalance, and the rebalance
only happens once. Then you can take it out and purge.

If I am not mistaken this is the safest.

ceph osd crush reweight  0

On Wed, Jan 30, 2019 at 7:45 AM Fyodor Ustinov  wrote:
>
> Hi!
>
> But unless after "ceph osd crush remove" I will not got the undersized 
> objects? That is, this is not the same thing as simply turning off the OSD 
> and waiting for the cluster to be restored?
>
> - Original Message -
> From: "Wido den Hollander" 
> To: "Fyodor Ustinov" , "ceph-users" 
> Sent: Wednesday, 30 January, 2019 15:05:35
> Subject: Re: [ceph-users] Right way to delete OSD from cluster?
>
> On 1/30/19 2:00 PM, Fyodor Ustinov wrote:
> > Hi!
> >
> > I thought I should first do "ceph osd out", wait for the end relocation of 
> > the misplaced objects and after that do "ceph osd purge".
> > But after "purge" the cluster starts relocation again.
> >
> > Maybe I'm doing something wrong? Then what is the correct way to delete the 
> > OSD from the cluster?
> >
>
> You are not doing anything wrong, this is the expected behavior. There
> are two CRUSH changes:
>
> - Marking it out
> - Purging it
>
> You could do:
>
> $ ceph osd crush remove osd.X
>
> Wait for all good
>
> $ ceph osd purge X
>
> The last step should then not initiate any data movement.
>
> Wido
>
> > WBR,
> > Fyodor.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
T: @Thaumion
IG: Thaumion
scot...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Krishna Verma
Hi Amit,

Still same. Please see the below output. Anything else I can try? Please update

[cephuser@zabbix-client ~]$ radosgw-admin period update --commit 2>/dev/null
Sending period to new master zone 71931e0e-1be6-449f-af34-edb4166c4e4a
[cephuser@zabbix-client ~]$ sudo systemctl start ceph-radosgw@rgw.`hostname -s`
[cephuser@zabbix-client ~]$ sudo systemctl restart ceph-radosgw@rgw.`hostname 
-s`
[cephuser@zabbix-client ~]$ sudo systemctl status ceph-radosgw@rgw.`hostname -s`
● ceph-radosgw@rgw.zabbix-client.service - Ceph rados gateway
   Loaded: loaded (/usr/lib/systemd/system/ceph-radosgw@.service; enabled; 
vendor preset: disabled)
   Active: active (running) since Wed 2019-01-30 22:38:15 IST; 7s ago
Main PID: 8234 (radosgw)
   CGroup: 
/system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@rgw.zabbix-client.service
   └─8234 /usr/bin/radosgw -f --cluster ceph --name 
client.rgw.zabbix-client --setuser ceph --setgroup ceph

Jan 30 22:38:15 zabbix-client systemd[1]: Started Ceph rados gateway.
[cephuser@zabbix-client ~]$ radosgw-admin sync status --source-zone  noida1 
2>/dev/null
  realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
  zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
   zone 45c690a8-f39c-4b1d-9faf-e0e991ceaaac (san-jose)
  metadata sync failed to read sync status: (2) No such file or directory
  data sync source: 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
failed to retrieve sync info: (5) Input/output error
[cephuser@zabbix-client ~]$

/Krishna

From: Amit Ghadge 
Sent: Wednesday, January 30, 2019 9:17 PM
To: Krishna Verma 
Cc: Casey Bodley ; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Multisite Ceph setup sync issue

EXTERNAL MAIL
Have you commit your changes on slave gateway?
First, run commit command on slave gateway and then try.

-AmitG
On Wed, 30 Jan 2019, 21:06 Krishna Verma, 
mailto:kve...@cadence.com>> wrote:
Hi Casey,

Thanks for your reply, however I tried with "--source-zone" option with sync 
command but getting below error:

Sync status From slave gateway to master zone "noida1"

[cephuser@zabbix-client ~]$ radosgw-admin sync status --source-zone  noida1 
2>/dev/null
  realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
  zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
   zone 45c690a8-f39c-4b1d-9faf-e0e991ceaaac (san-jose)
  metadata sync failed to read sync status: (2) No such file or directory
  data sync source: 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
failed to retrieve sync info: (5) Input/output error
[cephuser@zabbix-client ~]$

Sync status From Master Gateway to slave zone " san-jose":

[cephuser@zabbix-server ~]$  radosgw-admin sync status --source-zone  san-jose 
2>/dev/null
  realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
  zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
   zone 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
  metadata sync no sync (zone is master)
[cephuser@zabbix-server ~]$

Zone detail from master gateway :

[cephuser@zabbix-server ~]$ radosgw-admin zonegroup get  2>/dev/null
{
"id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"name": "noida",
"api_name": "noida",
"is_master": "true",
"endpoints": [
"http:\/\/zabbix-server:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"zones": [
{
"id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"name": "noida1",
"endpoints": [
"http:\/\/vlno-ceph01:7480"
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "1102c891-d81c-480e-9487-c9f874287d13"
}

[cephuser@zabbix-server ~]$


Zone detail from slave gateway:

[cephuser@zabbix-client ~]$ radosgw-admin zonegroup get  2>/dev/null
{
"id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"name": "noida",
"api_name": "noida",
"is_master": "true",
"endpoints": [
"http:\/\/zabbix-server:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"zones": [
{
"id": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
"name": "san-jose",
"endpoints": [
"http:\/\/zabbix-client:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
},
{
"id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"name": "noida1",
"endpoints": [
"http:\/\/vlno-ceph01:7480"
  

Re: [ceph-users] CEPH_FSAL Nfs-ganesha

2019-01-30 Thread David C
Hi Patrick

Thanks for the info. If I did multiple exports, how does that work in terms
of the cache settings defined in ceph.conf, are those settings per CephFS
client or a shared cache? I.e if I've definied client_oc_size, would that
be per export?

Cheers,

On Tue, Jan 15, 2019 at 6:47 PM Patrick Donnelly 
wrote:

> On Mon, Jan 14, 2019 at 7:11 AM Daniel Gryniewicz  wrote:
> >
> > Hi.  Welcome to the community.
> >
> > On 01/14/2019 07:56 AM, David C wrote:
> > > Hi All
> > >
> > > I've been playing around with the nfs-ganesha 2.7 exporting a cephfs
> > > filesystem, it seems to be working pretty well so far. A few questions:
> > >
> > > 1) The docs say " For each NFS-Ganesha export, FSAL_CEPH uses a
> > > libcephfs client,..." [1]. For arguments sake, if I have ten top level
> > > dirs in my Cephfs namespace, is there any value in creating a separate
> > > export for each directory? Will that potentially give me better
> > > performance than a single export of the entire namespace?
> >
> > I don't believe there are any advantages from the Ceph side.  From the
> > Ganesha side, you configure permissions, client ACLs, squashing, and so
> > on on a per-export basis, so you'll need different exports if you need
> > different settings for each top level directory.  If they can all use
> > the same settings, one export is probably better.
>
> There may be performance impact (good or bad) with having separate
> exports for CephFS. Each export instantiates a separate instance of
> the CephFS client which has its own bookkeeping and set of
> capabilities issued by the MDS. Also, each client instance has a
> separate big lock (potentially a big deal for performance). If the
> data for each export is disjoint (no hard links or shared inodes) and
> the NFS server is expected to have a lot of load, breaking out the
> exports can have a positive impact on performance. If there are hard
> links, then the clients associated with the exports will potentially
> fight over capabilities which will add to request latency.)
>
> --
> Patrick Donnelly
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Mark Nelson


On 1/30/19 7:45 AM, Alexandre DERUMIER wrote:

I don't see any smoking gun here... :/

I need to test to compare when latency are going very high, but I need to wait 
more days/weeks.



The main difference between a warm OSD and a cold one is that on startup
the bluestore cache is empty. You might try setting the bluestore cache
size to something much smaller and see if that has an effect on the CPU
utilization?

I will try to test. I also wonder if the new auto memory tuning from Mark could 
help too ?
(I'm still on mimic 13.2.1, planning to update to 13.2.5 next month)

also, could check some bluestore related counters ? (onodes, rocksdb,bluestore 
cache)



If it does, probably only by accident. :)  The autotuner in master is 
pretty dumb and mostly just grows/shrinks the caches based on the 
default ratios but accounts for the memory needed for rocksdb 
indexes/filters.  It will try to keep the total OSD memory consumption 
below the specified limit.  It doesn't do anything smart like monitor 
whether or not large caches may introduce more latency than small 
caches.  It actually adds a small amount of additional overhead in the 
mempool thread to perform the calculations.  If you had a static 
workload and tuned the bluestore cache size and ratios perfectly it 
would only add extra (albeit fairly minimal with the default settings) 
computational cost.



If perf isn't showing anything conclusive, you might try my wallclock 
profiler: http://github.com/markhpc/gdbpmp



Some other things to watch out for are CPUs switching C states and the 
effect of having transparent huge pages enabled (though I'd be more 
concerned about this in terms of memory usage).



Mark





Note that this doesn't necessarily mean that's what you want. Maybe the
reason why the CPU utilization is higher is because the cache is warm and
the OSD is serving more requests per second...

Well, currently, the server is really quiet

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
nvme0n1   2,00   515,00   48,00 1182,00   304,00 11216,0018,73 
0,010,000,000,00   0,01   1,20

%Cpu(s):  1,5 us,  1,0 sy,  0,0 ni, 97,2 id,  0,2 wa,  0,0 hi,  0,1 si,  0,0 st

And this is only with writes, not reads



- Mail original -
De: "Sage Weil" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 

Envoyé: Mercredi 30 Janvier 2019 14:33:23
Objet: Re: ceph osd commit latency increase over time, until restart

On Wed, 30 Jan 2019, Alexandre DERUMIER wrote:

Hi,

here some new results,
different osd/ different cluster

before osd restart latency was between 2-5ms
after osd restart is around 1-1.5ms

http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms)
http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms)
http://odisoweb1.odiso.net/cephperf2/diff.txt

I don't see any smoking gun here... :/

The main difference between a warm OSD and a cold one is that on startup
the bluestore cache is empty. You might try setting the bluestore cache
size to something much smaller and see if that has an effect on the CPU
utilization?

Note that this doesn't necessarily mean that's what you want. Maybe the
reason why the CPU utilization is higher is because the cache is warm and
the OSD is serving more requests per second...

sage



>From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm wrong. 


(I'm using tcmalloc 2.5-2.2)


- Mail original -
De: "Sage Weil" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 

Envoyé: Vendredi 25 Janvier 2019 10:49:02
Objet: Re: ceph osd commit latency increase over time, until restart

Can you capture a perf top or perf record to see where teh CPU time is
going on one of the OSDs wth a high latency?

Thanks!
sage


On Fri, 25 Jan 2019, Alexandre DERUMIER wrote:


Hi,

I have a strange behaviour of my osd, on multiple clusters,

All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers,
workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
export-diff/snapshotdelete each day for backup

When the osd are refreshly started, the commit latency is between 0,5-1ms.

But overtime, this latency increase slowly (maybe around 1ms by day), until 
reaching crazy
values like 20-200ms.

Some example graphs:

http://odisoweb1.odiso.net/osdlatency1.png
http://odisoweb1.odiso.net/osdlatency2.png

All osds have this behaviour, in all clusters.

The latency of physical disks is ok. (Clusters are far to be full loaded)

And if I restart the osd, the latency come back to 0,5-1ms.

That's remember me old tcmalloc bug, but maybe could it be a bluestore memory 
bug ?

Any Hints for counters/logs to check ?


Regards,

Alexandre






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list

Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Amit Ghadge
Better way is increase osd set-full-ratio slightly (.97) and then remove
buckets.

-AmitG

On Wed, 30 Jan 2019, 21:30 Paul Emmerich,  wrote:

> Quick and dirty solution: take the full OSD down to issue the deletion
> command ;)
>
> Better solutions: temporarily incrase the full limit (ceph osd
> set-full-ratio) or reduce the OSD's reweight (ceph osd reweight)
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Wed, Jan 30, 2019 at 11:56 AM Fabio - NS3 srl  wrote:
> >
> > Hello guys,
> > i have a Ceph with a full S3
> >
> > ~# ceph health detail
> > HEALTH_ERR 1 full osd(s); 1 near full osd(s)
> > osd.2 is full at 95%
> > osd.5 is near full at 85%
> >
> >
> > I want to delete some bucket but when i tried to show list bucket
> >
> >
> > ~# radosgw-admin bucket list
> > 2019-01-30 11:41:47.933621 7f467a9d0780  0 client.3967227.objecter
> FULL, paused modify 0x2aaf410 tid 8
> >
> > the command remains blocked ...no prompt.
> >
> > Solutions  as well as adding an OSD?
> >
> > Many thank
> > --
> > Fabio
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Paul Emmerich
Quick and dirty solution: take the full OSD down to issue the deletion
command ;)

Better solutions: temporarily incrase the full limit (ceph osd
set-full-ratio) or reduce the OSD's reweight (ceph osd reweight)


Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Wed, Jan 30, 2019 at 11:56 AM Fabio - NS3 srl  wrote:
>
> Hello guys,
> i have a Ceph with a full S3
>
> ~# ceph health detail
> HEALTH_ERR 1 full osd(s); 1 near full osd(s)
> osd.2 is full at 95%
> osd.5 is near full at 85%
>
>
> I want to delete some bucket but when i tried to show list bucket
>
>
> ~# radosgw-admin bucket list
> 2019-01-30 11:41:47.933621 7f467a9d0780  0 client.3967227.objecter  FULL, 
> paused modify 0x2aaf410 tid 8
>
> the command remains blocked ...no prompt.
>
> Solutions  as well as adding an OSD?
>
> Many thank
> --
> Fabio
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Amit Ghadge
Have you commit your changes on slave gateway?
First, run commit command on slave gateway and then try.

-AmitG

On Wed, 30 Jan 2019, 21:06 Krishna Verma,  wrote:

> Hi Casey,
>
> Thanks for your reply, however I tried with "--source-zone" option with
> sync command but getting below error:
>
> Sync status From slave gateway to master zone "noida1"
>
> [cephuser@zabbix-client ~]$ radosgw-admin sync status --source-zone
> noida1 2>/dev/null
>   realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
>   zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
>zone 45c690a8-f39c-4b1d-9faf-e0e991ceaaac (san-jose)
>   metadata sync failed to read sync status: (2) No such file or directory
>   data sync source: 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
> failed to retrieve sync info: (5) Input/output
> error
> [cephuser@zabbix-client ~]$
>
> Sync status From Master Gateway to slave zone " san-jose":
>
> [cephuser@zabbix-server ~]$  radosgw-admin sync status --source-zone
> san-jose 2>/dev/null
>   realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
>   zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
>zone 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
>   metadata sync no sync (zone is master)
> [cephuser@zabbix-server ~]$
>
> Zone detail from master gateway :
>
> [cephuser@zabbix-server ~]$ radosgw-admin zonegroup get  2>/dev/null
> {
> "id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
> "name": "noida",
> "api_name": "noida",
> "is_master": "true",
> "endpoints": [
> "http:\/\/zabbix-server:7480"
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
> "zones": [
> {
> "id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
> "name": "noida1",
> "endpoints": [
> "http:\/\/vlno-ceph01:7480"
> ],
> "log_meta": "false",
> "log_data": "false",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "1102c891-d81c-480e-9487-c9f874287d13"
> }
>
> [cephuser@zabbix-server ~]$
>
>
> Zone detail from slave gateway:
>
> [cephuser@zabbix-client ~]$ radosgw-admin zonegroup get  2>/dev/null
> {
> "id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
> "name": "noida",
> "api_name": "noida",
> "is_master": "true",
> "endpoints": [
> "http:\/\/zabbix-server:7480"
> ],
> "hostnames": [],
> "hostnames_s3website": [],
> "master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
> "zones": [
> {
> "id": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
> "name": "san-jose",
> "endpoints": [
> "http:\/\/zabbix-client:7480"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> },
> {
> "id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
> "name": "noida1",
> "endpoints": [
> "http:\/\/vlno-ceph01:7480"
> ],
> "log_meta": "false",
> "log_data": "true",
> "bucket_index_max_shards": 0,
> "read_only": "false"
> }
> ],
> "placement_targets": [
> {
> "name": "default-placement",
> "tags": []
> }
> ],
> "default_placement": "default-placement",
> "realm_id": "1102c891-d81c-480e-9487-c9f874287d13"
> }
>
> [cephuser@zabbix-client ~]
>
> I need your expert advice.
>
> /Krishna
>
> -Original Message-
> From: Casey Bodley 
> Sent: Wednesday, January 30, 2019 1:54 AM
> To: Krishna Verma 
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] Multisite Ceph setup sync issue
>
> EXTERNAL MAIL
>
>
> On Tue, Jan 29, 2019 at 12:24 PM Krishna Verma  wrote:
> >
> > Hi Ceph Users,
> >
> >
> >
> > I need your to fix sync issue in multisite setup.
> >
> >
> >
> > I have 2 cluster in different datacenter that we want to use for
> bidirectional data replication. By followed the documentation
> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_radosgw_multisite_=DwIFaQ=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY=0E5nRoxLsT2ZXgCpJM_6ZItAWQ2jH8rVLG6tiXhoLFE=6MZxZ1DTyLWzj88W8kB9g8C3vLhZvcRI3-Xv_HdQ-Hg=Uw-eyENNATG6meKsTgmwdwMLYUD13mmDkpr9Eo2dqZo=
> I have setup the gateway on each site but when I am checking the sync
> status its getting failed as below:
> >
> >
> >
> > Admin node at master :
> >
> > [cephuser@vlno-ceph01 cluster]$ radosgw-admin data sync status
> >
> > ERROR: source zone not specified
> >
> > [cephuser@vlno-ceph01 cluster]$ 

Re: [ceph-users] Multisite Ceph setup sync issue

2019-01-30 Thread Krishna Verma
Hi Casey,

Thanks for your reply, however I tried with "--source-zone" option with sync 
command but getting below error:

Sync status From slave gateway to master zone "noida1" 

[cephuser@zabbix-client ~]$ radosgw-admin sync status --source-zone  noida1 
2>/dev/null
  realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
  zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
   zone 45c690a8-f39c-4b1d-9faf-e0e991ceaaac (san-jose)
  metadata sync failed to read sync status: (2) No such file or directory
  data sync source: 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
failed to retrieve sync info: (5) Input/output error
[cephuser@zabbix-client ~]$

Sync status From Master Gateway to slave zone " san-jose":

[cephuser@zabbix-server ~]$  radosgw-admin sync status --source-zone  san-jose 
2>/dev/null
  realm 1102c891-d81c-480e-9487-c9f874287d13 (georep)
  zonegroup 74ad391b-fbca-4c05-b9e7-c90fd4851223 (noida)
   zone 71931e0e-1be6-449f-af34-edb4166c4e4a (noida1)
  metadata sync no sync (zone is master)
[cephuser@zabbix-server ~]$

Zone detail from master gateway :

[cephuser@zabbix-server ~]$ radosgw-admin zonegroup get  2>/dev/null
{
"id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"name": "noida",
"api_name": "noida",
"is_master": "true",
"endpoints": [
"http:\/\/zabbix-server:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"zones": [
{
"id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"name": "noida1",
"endpoints": [
"http:\/\/vlno-ceph01:7480"
],
"log_meta": "false",
"log_data": "false",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "1102c891-d81c-480e-9487-c9f874287d13"
}

[cephuser@zabbix-server ~]$


Zone detail from slave gateway: 

[cephuser@zabbix-client ~]$ radosgw-admin zonegroup get  2>/dev/null
{
"id": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
"name": "noida",
"api_name": "noida",
"is_master": "true",
"endpoints": [
"http:\/\/zabbix-server:7480"
],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"zones": [
{
"id": "45c690a8-f39c-4b1d-9faf-e0e991ceaaac",
"name": "san-jose",
"endpoints": [
"http:\/\/zabbix-client:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
},
{
"id": "71931e0e-1be6-449f-af34-edb4166c4e4a",
"name": "noida1",
"endpoints": [
"http:\/\/vlno-ceph01:7480"
],
"log_meta": "false",
"log_data": "true",
"bucket_index_max_shards": 0,
"read_only": "false"
}
],
"placement_targets": [
{
"name": "default-placement",
"tags": []
}
],
"default_placement": "default-placement",
"realm_id": "1102c891-d81c-480e-9487-c9f874287d13"
}

[cephuser@zabbix-client ~]

I need your expert advice. 

/Krishna

-Original Message-
From: Casey Bodley  
Sent: Wednesday, January 30, 2019 1:54 AM
To: Krishna Verma 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Multisite Ceph setup sync issue

EXTERNAL MAIL


On Tue, Jan 29, 2019 at 12:24 PM Krishna Verma  wrote:
>
> Hi Ceph Users,
>
>
>
> I need your to fix sync issue in multisite setup.
>
>
>
> I have 2 cluster in different datacenter that we want to use for 
> bidirectional data replication. By followed the documentation 
> https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.ceph.com_docs_master_radosgw_multisite_=DwIFaQ=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY=0E5nRoxLsT2ZXgCpJM_6ZItAWQ2jH8rVLG6tiXhoLFE=6MZxZ1DTyLWzj88W8kB9g8C3vLhZvcRI3-Xv_HdQ-Hg=Uw-eyENNATG6meKsTgmwdwMLYUD13mmDkpr9Eo2dqZo=
>  I have setup the gateway on each site but when I am checking the sync status 
> its getting failed as below:
>
>
>
> Admin node at master :
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin data sync status
>
> ERROR: source zone not specified
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin realm list
>
> {
>
> "default_info": "1102c891-d81c-480e-9487-c9f874287d13",
>
> "realms": [
>
> "georep",
>
> "geodata"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ radosgw-admin zonegroup list
>
> read_default_id : 0
>
> {
>
> "default_info": "74ad391b-fbca-4c05-b9e7-c90fd4851223",
>
> "zonegroups": [
>
> "noida"
>
> ]
>
> }
>
>
>
> [cephuser@vlno-ceph01 cluster]$ 

Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Lenz Grimmer
Hi,

On 1/30/19 2:02 PM, PHARABOT Vincent wrote:

> I have my cluster set up correctly now (thank you again for the help)

What version of Ceph is this?

> I am seeking now a way to get cluster health thru API (REST) with curl
> command.
> 
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
> 
> RESTful module do a lot of things but I didn’t find the simple health
> check result – moreover I don’t want monitoring user to be able to do
> all the command in this module.
> 
> Dashboard is a dashboard so could not get health thru curl

Hmm, the Mimic dashboard's REST API should expose an "/api/health"
endpoint. Have you tried that one?

For Nautilus, this seems to has been split into /api/health/full and
/api/health/minimal, to reduce the overhead.

Lenz

-- 
SUSE Linux GmbH - Maxfeldstr. 5 - 90409 Nuernberg (Germany)
GF:Felix Imendörffer,Jane Smithard,Graham Norton,HRB 21284 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread Janne Johansson
Den ons 30 jan. 2019 kl 14:47 skrev M Ranga Swami Reddy <
swamire...@gmail.com>:

> Hello - Can I use the ceph block volume with RAID#0? Are there any
> issues with this?
>

Hard to tell if you mean raid0 over a block volume or a block volume over
raid0. Still, it is seldom a good idea to stack redundancies on top of each
other.
It will work, but may not give the gains you might expect from it.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hi

Yes it could do the job in the meantime

Thank you !
Vincent

-Message d'origine-
De : Alexandru Cucu [mailto:m...@alexcucu.ro]
Envoyé : mercredi 30 janvier 2019 14:31
À : PHARABOT Vincent 
Cc : ceph-users@lists.ceph.com
Objet : Re: [ceph-users] Simple API to have cluster healthcheck ?

Hello,

Not exactly what you were looking for, but you could use the Prometheus plugin 
for ceph-mgr and get the health status from the metrics.

curl -s http://ceph-mgr-node:9283/metrics | grep ^ceph_health_status


On Wed, Jan 30, 2019 at 3:04 PM PHARABOT Vincent  
wrote:
>
> Hello,
>
>
>
> I have my cluster set up correctly now (thank you again for the help)
>
>
>
> I am seeking now a way to get cluster health thru API (REST) with curl 
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health check 
> result – moreover I don’t want monitoring user to be able to do all the 
> command in this module.
>
> Dashboard is a dashboard so could not get health thru curl
>
>
>
> It seems it was possible with “ceph-rest-api” but it looks like this
> tools is no more available in ceph-common…
>
>
>
> Is there a simple way to have this ? (without writing python mgr
> module which will take a lot of time for this)
>
>
>
> Thank you
>
> Vincent
>
>
>
> This email and any attachments are intended solely for the use of the 
> individual or entity to whom it is addressed and may be confidential and/or 
> privileged.
>
> If you are not one of the named recipients or have received this email
> in error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete
> this email and all attachments,
>
> (iii) Dassault Systèmes does not accept or assume any liability or 
> responsibility for any use of or reliance on this email.
>
>
> Please be informed that your personal data are processed according to
> our data privacy policy as described on our website. Should you have
> any questions related to personal data protection, please contact 3DS
> Data Protection Officer at 3ds.compliance-priv...@3ds.com
>
>
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
Hi Stefan,

>>currently i'm in the process of switching back from jemalloc to tcmalloc 
>>like suggested. This report makes me a little nervous about my change. 
Well,I'm really not sure that it's a tcmalloc bug. 
maybe bluestore related (don't have filestore anymore to compare)
I need to compare with bigger latencies

here an example, when all osd at 20-50ms before restart, then after restart (at 
21:15), 1ms
http://odisoweb1.odiso.net/latencybad.png

I observe the latency in my guest vm too, on disks iowait.

http://odisoweb1.odiso.net/latencybadvm.png

>>Also i'm currently only monitoring latency for filestore osds. Which
>>exact values out of the daemon do you use for bluestore?

here my influxdb queries:

It take op_latency.sum/op_latency.avgcount on last second.


SELECT non_negative_derivative(first("op_latency.sum"), 
1s)/non_negative_derivative(first("op_latency.avgcount"),1s)   FROM "ceph" 
WHERE "host" =~  /^([[host]])$/  AND "id" =~ /^([[osd]])$/ AND $timeFilter 
GROUP BY time($interval), "host", "id" fill(previous)


SELECT non_negative_derivative(first("op_w_latency.sum"), 
1s)/non_negative_derivative(first("op_w_latency.avgcount"),1s)   FROM "ceph" 
WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ 
/^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
fill(previous)


SELECT non_negative_derivative(first("op_w_process_latency.sum"), 
1s)/non_negative_derivative(first("op_w_process_latency.avgcount"),1s)   FROM 
"ceph" WHERE "host" =~ /^([[host]])$/  AND collection='osd'  AND  "id" =~ 
/^([[osd]])$/ AND $timeFilter GROUP BY time($interval), "host", "id" 
fill(previous)





- Mail original -
De: "Stefan Priebe, Profihost AG" 
À: "aderumier" , "Sage Weil" 
Cc: "ceph-users" , "ceph-devel" 

Envoyé: Mercredi 30 Janvier 2019 08:45:33
Objet: Re: [ceph-users] ceph osd commit latency increase over time, until 
restart

Hi, 

Am 30.01.19 um 08:33 schrieb Alexandre DERUMIER: 
> Hi, 
> 
> here some new results, 
> different osd/ different cluster 
> 
> before osd restart latency was between 2-5ms 
> after osd restart is around 1-1.5ms 
> 
> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
> http://odisoweb1.odiso.net/cephperf2/diff.txt 
> 
> From what I see in diff, the biggest difference is in tcmalloc, but maybe I'm 
> wrong. 
> (I'm using tcmalloc 2.5-2.2) 

currently i'm in the process of switching back from jemalloc to tcmalloc 
like suggested. This report makes me a little nervous about my change. 

Also i'm currently only monitoring latency for filestore osds. Which 
exact values out of the daemon do you use for bluestore? 

I would like to check if i see the same behaviour. 

Greets, 
Stefan 

> 
> - Mail original - 
> De: "Sage Weil"  
> À: "aderumier"  
> Cc: "ceph-users" , "ceph-devel" 
>  
> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
> Objet: Re: ceph osd commit latency increase over time, until restart 
> 
> Can you capture a perf top or perf record to see where teh CPU time is 
> going on one of the OSDs wth a high latency? 
> 
> Thanks! 
> sage 
> 
> 
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
> 
>> 
>> Hi, 
>> 
>> I have a strange behaviour of my osd, on multiple clusters, 
>> 
>> All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
>> workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
>> export-diff/snapshotdelete each day for backup 
>> 
>> When the osd are refreshly started, the commit latency is between 0,5-1ms. 
>> 
>> But overtime, this latency increase slowly (maybe around 1ms by day), until 
>> reaching crazy 
>> values like 20-200ms. 
>> 
>> Some example graphs: 
>> 
>> http://odisoweb1.odiso.net/osdlatency1.png 
>> http://odisoweb1.odiso.net/osdlatency2.png 
>> 
>> All osds have this behaviour, in all clusters. 
>> 
>> The latency of physical disks is ok. (Clusters are far to be full loaded) 
>> 
>> And if I restart the osd, the latency come back to 0,5-1ms. 
>> 
>> That's remember me old tcmalloc bug, but maybe could it be a bluestore 
>> memory bug ? 
>> 
>> Any Hints for counters/logs to check ? 
>> 
>> 
>> Regards, 
>> 
>> Alexandre 
>> 
>> 
> 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph block - volume with RAID#0

2019-01-30 Thread M Ranga Swami Reddy
Hello - Can I use the ceph block volume with RAID#0? Are there any
issues with this?

Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Alexandre DERUMIER
>>I don't see any smoking gun here... :/ 

I need to test to compare when latency are going very high, but I need to wait 
more days/weeks.


>>The main difference between a warm OSD and a cold one is that on startup 
>>the bluestore cache is empty. You might try setting the bluestore cache 
>>size to something much smaller and see if that has an effect on the CPU 
>>utilization? 

I will try to test. I also wonder if the new auto memory tuning from Mark could 
help too ?
(I'm still on mimic 13.2.1, planning to update to 13.2.5 next month)

also, could check some bluestore related counters ? (onodes, rocksdb,bluestore 
cache)

>>Note that this doesn't necessarily mean that's what you want. Maybe the 
>>reason why the CPU utilization is higher is because the cache is warm and 
>>the OSD is serving more requests per second... 

Well, currently, the server is really quiet

Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s avgrq-sz 
avgqu-sz   await r_await w_await  svctm  %util
nvme0n1   2,00   515,00   48,00 1182,00   304,00 11216,0018,73 
0,010,000,000,00   0,01   1,20

%Cpu(s):  1,5 us,  1,0 sy,  0,0 ni, 97,2 id,  0,2 wa,  0,0 hi,  0,1 si,  0,0 st

And this is only with writes, not reads



- Mail original -
De: "Sage Weil" 
À: "aderumier" 
Cc: "ceph-users" , "ceph-devel" 

Envoyé: Mercredi 30 Janvier 2019 14:33:23
Objet: Re: ceph osd commit latency increase over time, until restart

On Wed, 30 Jan 2019, Alexandre DERUMIER wrote: 
> Hi, 
> 
> here some new results, 
> different osd/ different cluster 
> 
> before osd restart latency was between 2-5ms 
> after osd restart is around 1-1.5ms 
> 
> http://odisoweb1.odiso.net/cephperf2/bad.txt (2-5ms) 
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms) 
> http://odisoweb1.odiso.net/cephperf2/diff.txt 

I don't see any smoking gun here... :/ 

The main difference between a warm OSD and a cold one is that on startup 
the bluestore cache is empty. You might try setting the bluestore cache 
size to something much smaller and see if that has an effect on the CPU 
utilization? 

Note that this doesn't necessarily mean that's what you want. Maybe the 
reason why the CPU utilization is higher is because the cache is warm and 
the OSD is serving more requests per second... 

sage 



> 
> >From what I see in diff, the biggest difference is in tcmalloc, but maybe 
> >I'm wrong. 
> 
> (I'm using tcmalloc 2.5-2.2) 
> 
> 
> - Mail original - 
> De: "Sage Weil"  
> À: "aderumier"  
> Cc: "ceph-users" , "ceph-devel" 
>  
> Envoyé: Vendredi 25 Janvier 2019 10:49:02 
> Objet: Re: ceph osd commit latency increase over time, until restart 
> 
> Can you capture a perf top or perf record to see where teh CPU time is 
> going on one of the OSDs wth a high latency? 
> 
> Thanks! 
> sage 
> 
> 
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
> 
> > 
> > Hi, 
> > 
> > I have a strange behaviour of my osd, on multiple clusters, 
> > 
> > All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
> > workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
> > export-diff/snapshotdelete each day for backup 
> > 
> > When the osd are refreshly started, the commit latency is between 0,5-1ms. 
> > 
> > But overtime, this latency increase slowly (maybe around 1ms by day), until 
> > reaching crazy 
> > values like 20-200ms. 
> > 
> > Some example graphs: 
> > 
> > http://odisoweb1.odiso.net/osdlatency1.png 
> > http://odisoweb1.odiso.net/osdlatency2.png 
> > 
> > All osds have this behaviour, in all clusters. 
> > 
> > The latency of physical disks is ok. (Clusters are far to be full loaded) 
> > 
> > And if I restart the osd, the latency come back to 0,5-1ms. 
> > 
> > That's remember me old tcmalloc bug, but maybe could it be a bluestore 
> > memory bug ? 
> > 
> > Any Hints for counters/logs to check ? 
> > 
> > 
> > Regards, 
> > 
> > Alexandre 
> > 
> > 
> 
> 
> 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph osd commit latency increase over time, until restart

2019-01-30 Thread Sage Weil
On Wed, 30 Jan 2019, Alexandre DERUMIER wrote:
> Hi,
> 
> here some new results,
> different osd/ different cluster
> 
> before osd restart latency was between 2-5ms
> after osd restart is around 1-1.5ms
> 
> http://odisoweb1.odiso.net/cephperf2/bad.txt  (2-5ms)
> http://odisoweb1.odiso.net/cephperf2/ok.txt (1-1.5ms)
> http://odisoweb1.odiso.net/cephperf2/diff.txt

I don't see any smoking gun here... :/

The main difference between a warm OSD and a cold one is that on startup 
the bluestore cache is empty.  You might try setting the bluestore cache 
size to something much smaller and see if that has an effect on the CPU 
utilization?

Note that this doesn't necessarily mean that's what you want.  Maybe the 
reason why the CPU utilization is higher is because the cache is warm and 
the OSD is serving more requests per second...

sage



> 
> >From what I see in diff, the biggest difference is in tcmalloc, but maybe 
> >I'm wrong.
> 
> (I'm using tcmalloc 2.5-2.2)
> 
> 
> - Mail original -
> De: "Sage Weil" 
> À: "aderumier" 
> Cc: "ceph-users" , "ceph-devel" 
> 
> Envoyé: Vendredi 25 Janvier 2019 10:49:02
> Objet: Re: ceph osd commit latency increase over time, until restart
> 
> Can you capture a perf top or perf record to see where teh CPU time is 
> going on one of the OSDs wth a high latency? 
> 
> Thanks! 
> sage 
> 
> 
> On Fri, 25 Jan 2019, Alexandre DERUMIER wrote: 
> 
> > 
> > Hi, 
> > 
> > I have a strange behaviour of my osd, on multiple clusters, 
> > 
> > All cluster are running mimic 13.2.1,bluestore, with ssd or nvme drivers, 
> > workload is rbd only, with qemu-kvm vms running with librbd + snapshot/rbd 
> > export-diff/snapshotdelete each day for backup 
> > 
> > When the osd are refreshly started, the commit latency is between 0,5-1ms. 
> > 
> > But overtime, this latency increase slowly (maybe around 1ms by day), until 
> > reaching crazy 
> > values like 20-200ms. 
> > 
> > Some example graphs: 
> > 
> > http://odisoweb1.odiso.net/osdlatency1.png 
> > http://odisoweb1.odiso.net/osdlatency2.png 
> > 
> > All osds have this behaviour, in all clusters. 
> > 
> > The latency of physical disks is ok. (Clusters are far to be full loaded) 
> > 
> > And if I restart the osd, the latency come back to 0,5-1ms. 
> > 
> > That's remember me old tcmalloc bug, but maybe could it be a bluestore 
> > memory bug ? 
> > 
> > Any Hints for counters/logs to check ? 
> > 
> > 
> > Regards, 
> > 
> > Alexandre 
> > 
> > 
> 
> 
> ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Alexandru Cucu
Hello,

Not exactly what you were looking for, but you could use the
Prometheus plugin for ceph-mgr and get the health status from the
metrics.

curl -s http://ceph-mgr-node:9283/metrics | grep ^ceph_health_status


On Wed, Jan 30, 2019 at 3:04 PM PHARABOT Vincent
 wrote:
>
> Hello,
>
>
>
> I have my cluster set up correctly now (thank you again for the help)
>
>
>
> I am seeking now a way to get cluster health thru API (REST) with curl 
> command.
>
> I had a look at manager / RESTful and Dashboard but none seems to provide 
> simple way to get cluster health
>
> RESTful module do a lot of things but I didn’t find the simple health check 
> result – moreover I don’t want monitoring user to be able to do all the 
> command in this module.
>
> Dashboard is a dashboard so could not get health thru curl
>
>
>
> It seems it was possible with “ceph-rest-api” but it looks like this tools is 
> no more available in ceph-common…
>
>
>
> Is there a simple way to have this ? (without writing python mgr module which 
> will take a lot of time for this)
>
>
>
> Thank you
>
> Vincent
>
>
>
> This email and any attachments are intended solely for the use of the 
> individual or entity to whom it is addressed and may be confidential and/or 
> privileged.
>
> If you are not one of the named recipients or have received this email in 
> error,
>
> (i) you should not read, disclose, or copy it,
>
> (ii) please notify sender of your receipt by reply email and delete this 
> email and all attachments,
>
> (iii) Dassault Systèmes does not accept or assume any liability or 
> responsibility for any use of or reliance on this email.
>
>
> Please be informed that your personal data are processed according to our 
> data privacy policy as described on our website. Should you have any 
> questions related to personal data protection, please contact 3DS Data 
> Protection Officer at 3ds.compliance-priv...@3ds.com
>
>
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Fyodor Ustinov
Hi!

But unless after "ceph osd crush remove" I will not got the undersized objects? 
That is, this is not the same thing as simply turning off the OSD and waiting 
for the cluster to be restored?

- Original Message -
From: "Wido den Hollander" 
To: "Fyodor Ustinov" , "ceph-users" 
Sent: Wednesday, 30 January, 2019 15:05:35
Subject: Re: [ceph-users] Right way to delete OSD from cluster?

On 1/30/19 2:00 PM, Fyodor Ustinov wrote:
> Hi!
> 
> I thought I should first do "ceph osd out", wait for the end relocation of 
> the misplaced objects and after that do "ceph osd purge".
> But after "purge" the cluster starts relocation again.
> 
> Maybe I'm doing something wrong? Then what is the correct way to delete the 
> OSD from the cluster?
> 

You are not doing anything wrong, this is the expected behavior. There
are two CRUSH changes:

- Marking it out
- Purging it

You could do:

$ ceph osd crush remove osd.X

Wait for all good

$ ceph osd purge X

The last step should then not initiate any data movement.

Wido

> WBR,
> Fyodor.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread Wido den Hollander


On 1/30/19 2:02 PM, PHARABOT Vincent wrote:
> Hello,
> 
>  
> 
> I have my cluster set up correctly now (thank you again for the help)
> 
>  
> 
> I am seeking now a way to get cluster health thru API (REST) with curl
> command.
> 
> I had a look at manager / RESTful and Dashboard but none seems to
> provide simple way to get cluster health
> 
> RESTful module do a lot of things but I didn’t find the simple health
> check result – moreover I don’t want monitoring user to be able to do
> all the command in this module.
> 
> Dashboard is a dashboard so could not get health thru curl
> 
>  
> 
> It seems it was possible with “ceph-rest-api” but it looks like this
> tools is no more available in ceph-common…
> 
>  
> 
> Is there a simple way to have this ? (without writing python mgr module
> which will take a lot of time for this)
> 

Not at this time, but I do agree with you. A very simple JSON API which
is read-only would be very welcome.

I've been playing with the idea to create a Mgr Module called
'status-api' or something which just allows you to query certain
elements like:

- health
- data usage
- performance (?)

But as time is limited I haven't gotten to this yet.

Wido

>  
> 
> Thank you
> 
> Vincent
> 
>  
> 
> This email and any attachments are intended solely for the use of the
> individual or entity to whom it is addressed and may be confidential
> and/or privileged.
> 
> If you are not one of the named recipients or have received this email
> in error,
> 
> (i) you should not read, disclose, or copy it,
> 
> (ii) please notify sender of your receipt by reply email and delete this
> email and all attachments,
> 
> (iii) Dassault Systèmes does not accept or assume any liability or
> responsibility for any use of or reliance on this email.
> 
> 
> Please be informed that your personal data are processed according to
> our data privacy policy as described on our website. Should you have any
> questions related to personal data protection, please contact 3DS Data
> Protection Officer at 3ds.compliance-priv...@3ds.com
> 
> 
> 
> For other languages, go to https://www.3ds.com/terms/email-disclaimer
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Wido den Hollander



On 1/30/19 2:00 PM, Fyodor Ustinov wrote:
> Hi!
> 
> I thought I should first do "ceph osd out", wait for the end relocation of 
> the misplaced objects and after that do "ceph osd purge".
> But after "purge" the cluster starts relocation again.
> 
> Maybe I'm doing something wrong? Then what is the correct way to delete the 
> OSD from the cluster?
> 

You are not doing anything wrong, this is the expected behavior. There
are two CRUSH changes:

- Marking it out
- Purging it

You could do:

$ ceph osd crush remove osd.X

Wait for all good

$ ceph osd purge X

The last step should then not initiate any data movement.

Wido

> WBR,
> Fyodor.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Simple API to have cluster healthcheck ?

2019-01-30 Thread PHARABOT Vincent
Hello,

I have my cluster set up correctly now (thank you again for the help)

I am seeking now a way to get cluster health thru API (REST) with curl command.
I had a look at manager / RESTful and Dashboard but none seems to provide 
simple way to get cluster health
RESTful module do a lot of things but I didn’t find the simple health check 
result – moreover I don’t want monitoring user to be able to do all the command 
in this module.
Dashboard is a dashboard so could not get health thru curl

It seems it was possible with “ceph-rest-api” but it looks like this tools is 
no more available in ceph-common…

Is there a simple way to have this ? (without writing python mgr module which 
will take a lot of time for this)

Thank you
Vincent


This email and any attachments are intended solely for the use of the 
individual or entity to whom it is addressed and may be confidential and/or 
privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email 
and all attachments,

(iii) Dassault Systèmes does not accept or assume any liability or 
responsibility for any use of or reliance on this email.


Please be informed that your personal data are processed according to our data 
privacy policy as described on our website. Should you have any questions 
related to personal data protection, please contact 3DS Data Protection Officer 
at 3ds.compliance-priv...@3ds.com


For other languages, go to https://www.3ds.com/terms/email-disclaimer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Right way to delete OSD from cluster?

2019-01-30 Thread Fyodor Ustinov
Hi!

I thought I should first do "ceph osd out", wait for the end relocation of the 
misplaced objects and after that do "ceph osd purge".
But after "purge" the cluster starts relocation again.

Maybe I'm doing something wrong? Then what is the correct way to delete the OSD 
from the cluster?

WBR,
Fyodor.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] moving a new hardware to cluster

2019-01-30 Thread Fabio Abreu
Hi everybody,

I have a doubt about moving a new sata storage(new hardware too) inside of
production rack with a huge amount data.

I thinks this movimentation creates news pgs and can be reduce my
performance if i do this wrong and we don't a lot experience in a new
hardware move inside cluster.

Can someome recommend me what I should review before the new hardware move ?

if I move osd to the cluster can I have more precaution in this scenario ?

Regards,

Fabio Abreu Reis
http://fajlinux.com.br
*Tel : *+55 21 98244-0161
*Skype : *fabioabreureis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cluster Status:HEALTH_ERR for Full OSD

2019-01-30 Thread Fabio - NS3 srl

Hello guys,
i have a Ceph with a full S3

~# ceph health detail
HEALTH_ERR 1 full osd(s); 1 near full osd(s)
osd.2 is full at 95%
osd.5 is near full at 85%


I want to delete some bucket but when i tried to show list bucket


~# radosgw-admin bucket list
2019-01-30 11:41:47.933621 7f467a9d0780  0 client.3967227.objecter FULL, 
paused modify 0x2aaf410 tid 8


the command remains blocked ...no prompt.

Solutions  as well as adding an OSD?

Many thank
--
*Fabio *

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question regarding client-network

2019-01-30 Thread Robert Sander
On 30.01.19 08:55, Buchberger, Carsten wrote:

> So as long as there is ip-connectivity between the client, and the
> client-network ip –adressses of our ceph-cluster everything is fine ?

Yes, client traffic is routable.

Even inter-OSD traffic is routable, there are reports from people
running routing protocols inside their Ceph clusters.

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 93818 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bionic Upgrade 12.2.10

2019-01-30 Thread ceph
Hello scott,

Ive Seen a Solution from the croit Guys
Perhaps this is Related?
https://croit.io/2018/09/23/2018-09-23-debian-mirror

Greetz
Mehmet

Am 14. Januar 2019 20:33:59 MEZ schrieb Scottix :
>Wow OK.
>I wish there was some official stance on this.
>
>Now I got to remove those OSDs, downgrade to 16.04 and re-add them,
>this is going to take a while.
>
>--Scott
>
>On Mon, Jan 14, 2019 at 10:53 AM Reed Dier 
>wrote:
>>
>> This is because Luminous is not being built for Bionic for whatever
>reason.
>> There are some other mailing list entries detailing this.
>>
>> Right now you have ceph installed from the Ubuntu bionic-updates
>repo, which has 12.2.8, but does not get regular release updates.
>>
>> This is what I ended up having to do for my ceph nodes that were
>upgraded from Xenial to Bionic, as well as new ceph nodes that
>installed straight to Bionic, due to the repo issues. Even if you try
>to use the xenial packages, you will run into issues with libcurl4 and
>libcurl3 I imagine.
>>
>> Reed
>>
>> On Jan 14, 2019, at 12:21 PM, Scottix  wrote:
>>
>> https://download.ceph.com/debian-luminous/
>>
>>
>
>
>-- 
>T: @Thaumion
>IG: Thaumion
>scot...@gmail.com
>___
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-30 Thread Janne Johansson
Den ons 30 jan. 2019 kl 05:24 skrev Linh Vu :
>
> We use https://github.com/cernceph/ceph-scripts  ceph-gentle-split script to 
> slowly increase by 16 pgs at a time until we hit the target.

>
> Somebody recommends that this adjustment should be done in multiple stages, 
> e.g. increase 1024 pg each time. Is this a good practice? or should we 
> increase it to 8192 in one time. Thanks!

We also do a few at a time, mostly 8 I think.
-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Best practice for increasing number of pg and pgp

2019-01-30 Thread Matthew Vernon
Hi,

On 30/01/2019 02:39, Albert Yue wrote:

> As the number of OSDs increase in our cluster, we reach a point where
> pg/osd is lower than recommend value and we want to increase it from
> 4096 to 8192. 

For an increase that small, I'd just do it in one go (and have done so
on our production clusters without issue); I'd only think about doing it
in stages for a larger increase.

Regards,

Matthew



-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Planning all flash cluster

2019-01-30 Thread Félix Barbeira
> Is there anything that obviously stands out as severely unbalanced? The
R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a
different HBA might be a better idea, any recommendations please?
> Don't know that HBA. Does it support pass through mode or HBA mode?

H710 card does not support pass-through. With a R720 I would recommend a
JBOD card for example LSI 9207-8i.
With Dell next generation servers (H730XD) they carry H730 wich already
have pass-through.

El mié., 20 jun. 2018 a las 15:00, Luis Periquito ()
escribió:

> adding back in the list :)
>
> -- Forwarded message -
> From: Luis Periquito 
> Date: Wed, Jun 20, 2018 at 1:54 PM
> Subject: Re: [ceph-users] Planning all flash cluster
> To: 
>
>
> On Wed, Jun 20, 2018 at 1:35 PM Nick A  wrote:
> >
> > Thank you, I was under the impression that 4GB RAM per 1TB was quite
> generous, or is that not the case with all flash clusters? What's the
> recommended RAM per OSD currently? Happy to throw more at it for a
> performance boost. The important thing is that I'd like all nodes to be
> absolutely identical.
> I'm doing 8G per OSD, though I use 1.9T SSDs.
>
> >
> > Based on replies so far, it looks like 5 nodes might be a better idea,
> maybe each with 14 OSD's (960GB SSD's)? Plenty of 16 slot 2U chassis around
> to make it a no brainer if that's what you'd recommend!
> I tend to add more nodes: 1U with 4-8 SSDs per chassis to start with,
> and using a single CPU with high frequency. For IOPS/latency cpu
> frequency is really important.
> I have started a cluster that only has 2 SSDs (which I share with the
> OS) for data, but has 8 nodes. Those servers can take up to 10 drives.
>
> I'm using the Fujitsu RX1330, believe Dell would be the R330, with a
> Intel E3-1230v6 cpu and 64G of ram, dual 10G and PSAS (passthrough
> controller).
>
> >
> > The H710 doesn't do JBOD or passthrough, hence looking for an
> alternative HBA. It would be nice to do the boot drives as hardware RAID 1
> though, so a card that can do both at the same time (like the H730 found
> R630's etc) would be ideal.
> >
> > Regards,
> > Nick
> >
> > On 20 June 2018 at 13:18, Luis Periquito  wrote:
> >>
> >> Adding more nodes from the beginning would probably be a good idea.
> >>
> >> On Wed, Jun 20, 2018 at 12:58 PM Nick A  wrote:
> >> >
> >> > Hello Everyone,
> >> >
> >> > We're planning a small cluster on a budget, and I'd like to request
> any feedback or tips.
> >> >
> >> > 3x Dell R720XD with:
> >> > 2x Xeon E5-2680v2 or very similar
> >> The CPUs look good and sufficiently fast for IOPS.
> >>
> >> > 96GB RAM
> >> 4GB per OSD looks a bit on the short side. Probably 192G would help.
> >>
> >> > 2x Samsung SM863 240GB boot/OS drives
> >> > 4x Samsung SM863 960GB OSD drives
> >> > Dual 40/56Gbit Infiniband using IPoIB.
> >> >
> >> > 3 replica, MON on OSD nodes, RBD only (no object or CephFS).
> >> >
> >> > We'll probably add another 2 OSD drives per month per node until full
> (24 SSD's per node), at which point, more nodes. We've got a few SM863's in
> production on other system and are seriously impressed with them, so would
> like to use them for Ceph too.
> >> >
> >> > We're hoping this is going to provide a decent amount of IOPS, 20k
> would be ideal. I'd like to avoid NVMe Journals unless it's going to make a
> truly massive difference. Same with carving up the SSD's, would rather not,
> and just keep it as simple as possible.
> >> I agree: those SSDs shouldn't really require a journal device. Not
> >> sure about the 20k IOPS specially without any further information.
> >> Doing 20k IOPS at 1kB block is totally different at 1MB block...
> >> >
> >> > Is there anything that obviously stands out as severely unbalanced?
> The R720XD comes with a H710 - instead of putting them in RAID0, I'm
> thinking a different HBA might be a better idea, any recommendations please?
> >> Don't know that HBA. Does it support pass through mode or HBA mode?
> >> >
> >> > Regards,
> >> > Nick
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com