Re: [ceph-users] Upgrading ceph and mapped rbds

2018-04-03 Thread Konstantin Shalygin

The VMs are XenServer VMs with virtual Disk saved at the NFS Server which has 
the RBD mounted … So there is nor migration from my POV as there is no second 
storage to migrate to ...




All your pain is self-inflicted.

Just FYI clients are not interrupted when you upgrade ceph. Client will 
be interrupted only when update, so if you (suddenly) change crush 
tunables, minimum_required_version for example (for this reason clients 
must be upgraded before cluster).





k
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how the files in /var/lib/ceph/osd/ceph-0 are generated

2018-04-03 Thread Jeffrey Zhang
Btw, I am using ceph-volume.

I just test ceph-disk. In this case, the ceph-0 folder is mounted from
/dev/sdb1.

So tmpfs only happens when using ceph-volume? how it works?

On Wed, Apr 4, 2018 at 9:29 AM, Jeffrey Zhang <
zhang.lei.fly+ceph-us...@gmail.com> wrote:

> I am testing ceph Luminous, the environment is
>
> - centos 7.4
> - ceph luminous ( ceph offical repo)
> - ceph-deploy 2.0
> - bluestore + separate wal and db
>
> I found the ceph osd folder `/var/lib/ceph/osd/ceph-0` is mounted
> from tmpfs. But where the files in that folder come from? like `keyring`,
> `whoami`?
>
> $ ls -alh /var/lib/ceph/osd/ceph-0/
> lrwxrwxrwx.  1 ceph ceph   24 Apr  3 16:49 block ->
> /dev/ceph-pool/osd0.data
> lrwxrwxrwx.  1 root root   22 Apr  3 16:49 block.db ->
> /dev/ceph-pool/osd0-db
> lrwxrwxrwx.  1 root root   23 Apr  3 16:49 block.wal ->
> /dev/ceph-pool/osd0-wal
> -rw---.  1 ceph ceph   37 Apr  3 16:49 ceph_fsid
> -rw---.  1 ceph ceph   37 Apr  3 16:49 fsid
> -rw---.  1 ceph ceph   55 Apr  3 16:49 keyring
> -rw---.  1 ceph ceph6 Apr  3 16:49 ready
> -rw---.  1 ceph ceph   10 Apr  3 16:49 type
> -rw---.  1 ceph ceph2 Apr  3 16:49 whoami
>
> I guess they may be loaded from bluestore. But I can not find any clue for
> this.
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how the files in /var/lib/ceph/osd/ceph-0 are generated

2018-04-03 Thread Jeffrey Zhang
I am testing ceph Luminous, the environment is

- centos 7.4
- ceph luminous ( ceph offical repo)
- ceph-deploy 2.0
- bluestore + separate wal and db

I found the ceph osd folder `/var/lib/ceph/osd/ceph-0` is mounted
from tmpfs. But where the files in that folder come from? like `keyring`,
`whoami`?

$ ls -alh /var/lib/ceph/osd/ceph-0/
lrwxrwxrwx.  1 ceph ceph   24 Apr  3 16:49 block -> /dev/ceph-pool/osd0.data
lrwxrwxrwx.  1 root root   22 Apr  3 16:49 block.db ->
/dev/ceph-pool/osd0-db
lrwxrwxrwx.  1 root root   23 Apr  3 16:49 block.wal ->
/dev/ceph-pool/osd0-wal
-rw---.  1 ceph ceph   37 Apr  3 16:49 ceph_fsid
-rw---.  1 ceph ceph   37 Apr  3 16:49 fsid
-rw---.  1 ceph ceph   55 Apr  3 16:49 keyring
-rw---.  1 ceph ceph6 Apr  3 16:49 ready
-rw---.  1 ceph ceph   10 Apr  3 16:49 type
-rw---.  1 ceph ceph2 Apr  3 16:49 whoami

I guess they may be loaded from bluestore. But I can not find any clue for
this.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Instrumenting RBD IO

2018-04-03 Thread Jason Dillaman
You might want to take a look at the Zipkin tracing hooks that are
(semi)integrated into Ceph [1]. The hooks are disabled by default in
release builds so you would need to rebuild Ceph yourself and then
enable tracing via the 'rbd_blkin_trace_all = true' configuration
option [2].

[1] http://victoraraujo.me/babeltrace-zipkin/
[2] https://github.com/ceph/ceph/blob/master/src/common/options.cc#L6275

On Tue, Apr 3, 2018 at 1:19 PM, Alex Gorbachev  wrote:
> I was wondering if there is a mechanism to instrument an RBD workload to
> elucidate what takes place on OSDs to troubleshoot performance issues
> better.
>
> Currently, we can issue the RBD IO, such as via fio, and observe just the
> overall performance. One needs to guess what OSDs that hits and try to find
> from dump historic ops what is the bottleneck.
>
> It seems that integrating the timings into some sort of a debug flag for rbd
> bench or fio would help a lot of us locate bottlenecks faster.
>
> Thanks,
> Alex
>
>
> --
> --
> Alex Gorbachev
> Storcium
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly - April 2018

2018-04-03 Thread Leonardo Vaz
Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:

 http://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 http://wiki.ceph.com/CDM_04-APR-2018

This edition happens on EMEA friendly hours (12:30 EST) and we will
use the following Bluejeans URL for the video conference:

 https://bluejeans.com/376400604/

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] librados python pool alignment size write failures

2018-04-03 Thread Kevin Hrpcek
Thanks for the input Greg, we've submitted the patch to the ceph github 
repo https://github.com/ceph/ceph/pull/21222


Kevin

On 04/02/2018 01:10 PM, Gregory Farnum wrote:
On Mon, Apr 2, 2018 at 8:21 AM Kevin Hrpcek 
mailto:kevin.hrp...@ssec.wisc.edu>> wrote:


Hello,

We use python librados bindings for object operations on our
cluster. For a long time we've been using 2 ec pools with k=4 m=1
and a fixed 4MB read/write size with the python bindings. During
preparations for migrating all of our data to a k=6 m=2 pool we've
discovered that ec pool alignment size is dynamic and the librados
bindings for python and go fail to write objects because they are
not aware of the the pool alignment size and therefore cannot
adjust the write block size to be a multiple of that. The ec pool
alignment size seems to be (k value * 4K) on new pools, but is
only 4K on old pools from the hammer days. We haven't been able to
find much useful documentation for this pool alignment setting
other than the librados docs
(http://docs.ceph.com/docs/master/rados/api/librados)
rados_ioctx_pool_requires_alingment,
rados_ioctx_pool_requires_alignment2,
rados_ioctx_pool_required_alignment,
rados_ioctx_pool_required_alignment2. After going through the
rados binary source we found that the binary is rounding the write
op size for an ec pool to a multiple of the pool alignment size
(line ~1945
https://github.com/ceph/ceph/blob/master/src/tools/rados/rados.cc#L1945).
The min write op size can be figured out by writing to an ec pool
like this to get the binary to round up and print it out `rados -b
1k -p $pool put .`. All of the support for being alignment
aware is obviously available but simply isn't available in the
bindings, we've only tested python and go.

We've gone ahead and submitted a patch and pull request to the
pycradox project which seems to be what was merged into the ceph
project for python bindings
https://github.com/sileht/pycradox/pull/4. It replicates getting
the alignment size of the pool in the python bindings so that we
can then calculate the proper op sizes for writing to a pool

We find it hard to believe that we're the only ones to have run
into this problem when using the bindings. Have we missed
something obvious for cluster configuration? Or maybe we're just
doing things different compared to most users... Any insight would
be appreciated as we'd prefer to use an official solution rather
than our bindings fix for long term use.


It's not impossible you're the only user both using the python 
bindings and targeting EC pools. Even now with overwrites they're 
limited in terms of object class and omap support, and I think all the 
direct-access users I've heard about required at least one of omap or 
overwrites.


Just submit the patch to the Ceph github repo and it'll get fixed up! :)
-Greg


Tested on Luminous 12.2.2 and 12.2.4.

Thanks,
Kevin

-- 
Kevin Hrpcek

Linux Systems Administrator
NASA SNPP Atmospheric SIPS
Space Science & Engineering Center
University of Wisconsin-Madison

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Instrumenting RBD IO

2018-04-03 Thread Alex Gorbachev
I was wondering if there is a mechanism to instrument an RBD workload to
elucidate what takes place on OSDs to troubleshoot performance issues
better.

Currently, we can issue the RBD IO, such as via fio, and observe just the
overall performance. One needs to guess what OSDs that hits and try to find
from dump historic ops what is the bottleneck.

It seems that integrating the timings into some sort of a debug flag for
rbd bench or fio would help a lot of us locate bottlenecks faster.

Thanks,
Alex


-- 
--
Alex Gorbachev
Storcium
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph and mapped rbds

2018-04-03 Thread Götz Reinicke


> Am 03.04.2018 um 13:31 schrieb Konstantin Shalygin :
> 
>> and true the VMs have to be shut down/server rebooted
> 
> 
> Is not necessary. Just migrate VM.

Hi,

The VMs are XenServer VMs with virtual Disk saved at the NFS Server which has 
the RBD mounted … So there is nor migration from my POV as there is no second 
storage to migrate to ...

Regards . Götz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What do you use to benchmark your rgw?

2018-04-03 Thread Mohamad Gebai

On 03/28/2018 11:11 AM, Mark Nelson wrote:
> Personally I usually use a modified version of Mark Seger's getput
> tool here:
>
> https://github.com/markhpc/getput/tree/wip-fix-timing
>
> The difference between this version and upstream is primarily to make
> getput more accurate/useful when using something like CBT for
> orchestration instead of the included orchestration wrapper (gpsuite).
>
> CBT can use this version of getput and run relatively accurate
> mutli-client tests without requiring quite as much setup as cosbench. 
> Having said that, many folks have used cosbench effectively and I
> suspect that might be a good option for many people.  I'm not sure how
> much development is happening these days, I think the primary author
> may no longer be working on the project.
>

AFAIK the project is still alive. Adding Mark.

Mohamad


> Mark
>
> On 03/28/2018 09:21 AM, David Byte wrote:
>> I use cosbench (the last rc works well enough). I can get multiple
>> GB/s from my 6 node cluster with 2 RGWs.
>>
>> David Byte
>> Sr. Technical Strategist
>> IHV Alliances and Embedded
>> SUSE
>>
>> Sent from my iPhone. Typos are Apple's fault.
>>
>> On Mar 28, 2018, at 5:26 AM, Janne Johansson > > wrote:
>>
>>> s3cmd and cli version of cyberduck to test it end-to-end using
>>> parallelism if possible.
>>>
>>> Getting some 100MB/s at most, from 500km distance over https against
>>> 5*radosgw behind HAProxy.
>>>
>>>
>>> 2018-03-28 11:17 GMT+02:00 Matthew Vernon >> >:
>>>
>>>     Hi,
>>>
>>>     What are people here using to benchmark their S3 service (i.e.
>>>     the rgw)?
>>>     rados bench is great for some things, but doesn't tell me about
>>> what
>>>     performance I can get from my rgws.
>>>
>>>     It seems that there used to be rest-bench, but that isn't in Jewel
>>>     AFAICT; I had a bit of a look at cosbench but it looks fiddly to
>>>     set up
>>>     and a bit under-maintained (the most recent version doesn't work
>>>     out of
>>>     the box, and the PR to fix that has been languishing for a while).
>>>
>>>     This doesn't seem like an unusual thing to want to do, so I'd
>>> like to
>>>     know what other ceph folk are using (and, if you like, the
>>>     numbers you
>>>     get from the benchmarkers)...?
>>>
>>>     Thanks,
>>>
>>>     Matthew
>>>
>>>
>>>     --
>>>      The Wellcome Sanger Institute is operated by Genome Research
>>>      Limited, a charity registered in England with number 1021457 and a
>>>      company registered in England with number 2742969, whose
>>> registered
>>>      office is 215 Euston Road, London, NW1 2BE.
>>>     ___
>>>     ceph-users mailing list
>>>     ceph-users@lists.ceph.com 
>>>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>     
>>>
>>>
>>>
>>>
>>> -- 
>>> May the most significant bit of your life be positive.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph and mapped rbds

2018-04-03 Thread Konstantin Shalygin

and true the VMs have to be shut down/server rebooted



Is not necessary. Just migrate VM.




k

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrading ceph and mapped rbds

2018-04-03 Thread Götz Reinicke
Hi Robert,

> Am 29.03.2018 um 10:27 schrieb Robert Sander :
> 
> On 28.03.2018 11:36, Götz Reinicke wrote:
> 
>> My question is: How to proceed with the serves which map the rbds?
> 
> Do you intend to upgrade the kernels on these RBD clients acting as NFS
> servers?
> 
> If so you have to plan a reboot anyway. If not, nothing changes.

Not in that step, but soon; and true the VMs have to be shut down/server 
rebooted, I have that in mind.

Do I understand you correctly, that I don’t have to update the NFS servers ceph 
installation? So the OSD/MONs can run 12.2 and a client can still be on 10.2? 
(I remember a flag might be used to set the cluster compatibility)

Thanks for feedback and regards . Götz

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com