[ceph-users] object lifecycle and updating from jewel

2018-01-02 Thread Robert Stanford
 I would like to use the new object lifecycle feature of kraken /
luminous.  I have jewel, with buckets that have lots and lots of objects.
It won't be practical to move them, then move them back after upgrading.

 In order to use the object lifecycle feature of radosgw in
kraken/luminous, do I need to have buckets configured for this, before
installing data?  In the scenario above, am I out of luck?  Or is object
lifecycle functionality available as soon as radosgw is upgraded?

 Thank you
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Developer Monthly - January 2018

2018-01-02 Thread Leonardo Vaz
Hey Cephers,

This is just a friendly reminder that the next Ceph Developer Montly
meeting is coming up:

 http://wiki.ceph.com/Planning

If you have work that you're doing that it a feature work, significant
backports, or anything you would like to discuss with the core team,
please add it to the following page:

 https://wiki.ceph.com/CDM_03-JAN-2018

This edition happens on APAC friendly hours (21:00 EST) and we will
use the following Bluejeans URL for the video conference:

 https://bluejeans.com/9290089010/

If you have questions or comments, please let us know.

Kindest regards,

Leo

-- 
Leonardo Vaz
Ceph Community Manager
Open Source and Standards Team
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to evict a client in rbd

2018-01-02 Thread Jason Dillaman
I tried to reproduce this for over an hour today using the specified
versions w/o any success. Is this something that you can repeat
on-demand or was this a one-time occurance?

On Sat, Dec 23, 2017 at 3:48 PM, Karun Josy  wrote:
> Hello,
>
> The image is not mapped.
>
> # ceph --version
> ceph version 12.2.1  luminous (stable)
> # uname -r
> 4.14.0-1.el7.elrepo.x86_64
>
>
> Karun Josy
>
> On Sat, Dec 23, 2017 at 6:51 PM, Jason Dillaman  wrote:
>>
>> What Ceph and what kernel version are you using? Are you positive that
>> the image has been unmapped from 10.255.0.17?
>>
>> On Fri, Dec 22, 2017 at 7:14 PM, Karun Josy  wrote:
>> > Hello,
>> >
>> > I am unable to delete this abandoned image.Rbd info shows a watcher ip
>> > Image is not mapped
>> > Image has no snapshots
>> >
>> >
>> > rbd status cvm/image  --id clientuser
>> > Watchers:
>> > watcher=10.255.0.17:0/3495340192 client.390908
>> > cookie=18446462598732841114
>> >
>> > How can I  evict or black list a watcher client so that image can be
>> > deleted
>> > http://docs.ceph.com/docs/master/cephfs/eviction/
>> > I see this is possible in Cephfs
>> >
>> >
>> >
>> > Karun
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>
>



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Brady Deetz
I'd implement byte counters in base 2 (KB, MB, etc). MiB is annoying to us
old grumpy folk, but I'd live with it.

But, I absolutely hate that object count is in base 2. 1kg is not 1024
kilograms. We have a reason for bytes to be in base 2. Very few other
things are expected to be in base 2. A normal person looking at ceph status
would interpret 1M objects as one million.


On Jan 2, 2018 4:43 AM, "Jan Fajerski"  wrote:

Hi lists,
Currently the ceph status output formats all numbers with binary unit
prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
1048576 objects.  I received a bug report from a user that printing object
counts with a base 2 multiplier is confusing (I agree) so I opened a bug
and https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some
opinions on:
- Should we print binary unit prefixes (MiB, GiB, ...) since that would be
technically correct?
- Should counters (like object counts) be formatted with a base 10
multiplier or  a multiplier woth base 2?

My proposal would be to both use binary unit prefixes and use base 10
multipliers for counters. I think this aligns with user expectations as
well as the relevant standard(s?).

Best,
Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Dan Mick
On 01/02/2018 08:54 AM, John Spray wrote:
> On Tue, Jan 2, 2018 at 10:43 AM, Jan Fajerski  wrote:
>> Hi lists,
>> Currently the ceph status output formats all numbers with binary unit
>> prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
>> 1048576 objects.  I received a bug report from a user that printing object
>> counts with a base 2 multiplier is confusing (I agree) so I opened a bug and
>> https://github.com/ceph/ceph/pull/19117.
>> In the PR discussion a couple of questions arose that I'd like to get some
>> opinions on:
> 
>> - Should we print binary unit prefixes (MiB, GiB, ...) since that would be
>> technically correct?
> 
> I'm not a fan of the technically correct base 2 units -- they're still
> relatively rarely used, and I've spent most of my life using kB to
> mean 1024, not 1000.
> 
>> - Should counters (like object counts) be formatted with a base 10
>> multiplier or  a multiplier woth base 2?
> 
> I prefer base 2 for any dimensionless quantities (or rates thereof) in
> computing.  Metres and kilograms go in base 10, bytes go in base 2.
> 
> It's all very subjective and a matter of opinion of course, and my
> feelings aren't particularly strong :-)
> 
> John

100% agreed.  "iB" is an affectation IMO.  But I'm grumpy and old.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] slow 4k writes, Luminous with bluestore backend

2018-01-02 Thread Christian Wuerdig
The main difference is that rados bench uses 4MB objects while your dd
test uses 4k block size
rados bench shows an average of 283 IOPS which at 4k blocksize would
be around 1.1MB so it's somewhat consistent with the dd result
Monitor your CPU usage, network latency with something like atop on
the OSD nodes and check what might be causing the problem

On Wed, Dec 27, 2017 at 7:31 AM, kevin parrikar
 wrote:
> Hi All,
> I upgraded my cluster from Hammer to Jewel and then to Luminous , changed
> from filestore to bluestore backend.
>
> on a KVM vm with 4 cpu /2 Gb RAM i have attached a 20gb rbd volume as vdc
> and performed following test.
>
> dd if=/dev/zero of=/dev/vdc bs=4k count=1000 oflag=direct
> 1000+0 records in
> 1000+0 records out
> 4096000 bytes (4.1 MB) copied, 3.08965 s, 1.3 MB/s
>
> and its consistently giving 1.3MB/s which i feel is too low.I have 3 ceph
> osd nodes each with 24 x15k RPM with a replication of 2 ,connected 2x10G
> LACP bonded NICs with an MTU of 9100.
>
> Rados Bench results:
>
> rados bench -p volumes 4 write
> hints = 1
> Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304
> for up to 4 seconds or 0 objects
> Object prefix: benchmark_data_ceph3.sapiennetworks.com_820994
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
> 0   0 0 0 0 0   -
> 0
> 1  16   276   260   1039.98  1040   0.0165053
> 0.0381299
> 2  16   545   529   1057.92  10760.043151
> 0.0580376
> 3  16   847   831   1107.91  1208   0.0394811
> 0.0567684
> 4  16  1160  11441143.9  1252 0.63265
> 0.0541888
> Total time run: 4.099801
> Total writes made:  1161
> Write size: 4194304
> Object size:4194304
> Bandwidth (MB/sec): 1132.74
> Stddev Bandwidth:   101.98
> Max bandwidth (MB/sec): 1252
> Min bandwidth (MB/sec): 1040
> Average IOPS:   283
> Stddev IOPS:25
> Max IOPS:   313
> Min IOPS:   260
> Average Latency(s): 0.0560897
> Stddev Latency(s):  0.107352
> Max latency(s): 1.02123
> Min latency(s): 0.00920514
> Cleaning up (deleting benchmark objects)
> Removed 1161 objects
> Clean up completed and total clean up time :0.079850
>
>
> After upgrading to Luminous i have executed
>
> ceph osd crush tunables optimal
>
> ceph.conf
>
> [global]
> fsid = 06c5c906-fc43-499f-8a6f-6c8e21807acf
> mon_initial_members = node-16 node-30 node-31
> mon_host = 172.16.1.9 172.16.1.3 172.16.1.11
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> log_to_syslog_level = info
> log_to_syslog = True
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1
> osd_pool_default_pg_num = 64
> public_network = 172.16.1.0/24
> log_to_syslog_facility = LOG_LOCAL0
> osd_journal_size = 2048
> auth_supported = cephx
> osd_pool_default_pgp_num = 64
> osd_mkfs_type = xfs
> cluster_network = 172.16.1.0/24
> osd_recovery_max_active = 1
> osd_max_backfills = 1
> max_open_files = 131072
> debug_default = False
>
>
> [client]
> rbd_cache_writethrough_until_flush = True
> rbd_cache = True
>
> [client.radosgw.gateway]
> rgw_keystone_accepted_roles = _member_, Member, admin, swiftoperator
> keyring = /etc/ceph/keyring.radosgw.gateway
> rgw_frontends = fastcgi socket_port=9000 socket_host=127.0.0.1
> rgw_socket_path = /tmp/radosgw.sock
> rgw_keystone_revocation_interval = 100
> rgw_keystone_url = http://192.168.1.3:35357
> rgw_keystone_admin_token = jaJSmlTNxgsFp1ttq5SuAT1R
> rgw_init_timeout = 36
> host = controller2
> rgw_dns_name = *.sapiennetworks.com
> rgw_print_continue = True
> rgw_keystone_token_cache_size = 10
> rgw_data = /var/lib/ceph/radosgw
> user = www-data
>
> [osd]
> journal_queue_max_ops = 3000
> objecter_inflight_ops = 10240
> journal_queue_max_bytes = 1048576000
> filestore_queue_max_ops = 500
> osd_mkfs_type = xfs
> osd_mount_options_xfs = rw,relatime,inode64,logbsize=256k,allocsize=4M
> osd_op_threads = 20
> filestore_queue_committing_max_ops = 5000
> journal_max_write_entries = 1000
> objecter_infilght_op_bytes = 1048576000
> filestore_queue_max_bytes = 1048576000
> filestore_max_sync_interval = 10
> journal_max_write_bytes = 1048576000
> filestore_queue_committing_max_bytes = 1048576000
> ms_dispatch_throttle_bytes = 1048576000
>
>  ceph -s
>   cluster:
> id: 06c5c906-fc43-499f-8a6f-6c8e21807acf
> health: HEALTH_WARN
> application not enabled on 2 pool(s)
>
>   services:
> mon: 3 daemons, quorum controller3,controller2,controller1
> mgr: controller1(active)
> osd: 72 osds: 72 up, 72 in
> rgw: 1 daemon active
>
>   data:
> pools:   5 pools, 6240 pgs
> objects: 12732 objects, 72319 MB
> usage:   229 GB used, 39965 GB / 40195 GB avail
> pgs: 6240 active+clean
>
> 

Re: [ceph-users] Questions about pg num setting

2018-01-02 Thread Christian Wuerdig
Have you had a look at http://ceph.com/pgcalc/?

Generally if you have too many PGs per OSD you can get yourself into
trouble during recovery and backfilling operations consuming a lot
more RAM than you have and eventually making your cluster unusable
(some more info can be found here for example:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-October/013614.html
but there are other threads on the ML).
Also currently you cannot reduce the number of PGs for a pool so you
are much better of starting with a lower value and then gradually
increasing it.

The fact that the ceph developers introduced a config option which
prevents users from increasing the number of PGs if it exceeds the
configured limit should be a tell-tale sign that having too many PGs
per OSD is considered a problem (see also
https://bugzilla.redhat.com/show_bug.cgi?id=1489064 and linked PRs)

On Wed, Dec 27, 2017 at 3:15 PM, 于相洋  wrote:
> Hi cephers,
>
> I have two questions about pg number setting.
>
> First :
> My storage informaiton is show as belows:
> HDD: 10 * 8TB
> CPU: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz (24 cores)
> Memery: 64GB
>
> As my HDD capacity and my Mem is too large, so I want to set as many
> as 300 pgs to each OSD. Although 100 pgs per OSD is perferred. I want
> to know what is the disadvantage of setting too many pgs?
>
>
> Second:
> At begin ,I can not judge the capacity proportion of my workloads, so
> I can not set accurate pg numbers of different pools. How many pgs
> should I set for each pools first?
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph scrub logs: _scan_snaps no head for $object?

2018-01-02 Thread Sage Weil
Hi Stefan, Mehmet,

Are these clusters that were upgraded from prior versions, or fresh 
luminous installs?

This message indicates that there is a stray clone object with no 
associated head or snapdir object.  That normally should never 
happen--it's presumably the result of a (hopefully old) bug.  The scrub 
process doesn't even clean them up, which maybe says something about how 
common it is/was...

sage


On Sun, 24 Dec 2017, c...@elchaka.de wrote:

> Hi Stefan,
> 
> Am 14. Dezember 2017 09:48:36 MEZ schrieb Stefan Kooman :
> >Hi,
> >
> >We see the following in the logs after we start a scrub for some osds:
> >
> >ceph-osd.2.log:2017-12-14 06:50:47.180344 7f0f47db2700  0
> >log_channel(cluster) log [DBG] : 1.2d8 scrub starts
> >ceph-osd.2.log:2017-12-14 06:50:47.180915 7f0f47db2700 -1 osd.2
> >pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209]
> >local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f
> >11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733
> >crt=11890'165209 lcod 11890'165208 mlcod 11890'165208
> >active+clean+scrubbing] _scan_snaps no head for
> >1:1b518155:::rbd_data.620652ae8944a.0126:29 (have MIN)
> >ceph-osd.2.log:2017-12-14 06:50:47.180929 7f0f47db2700 -1 osd.2
> >pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209]
> >local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f
> >11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733
> >crt=11890'165209 lcod 11890'165208 mlcod 11890'165208
> >active+clean+scrubbing] _scan_snaps no head for
> >1:1b518155:::rbd_data.620652ae8944a.0126:14 (have MIN)
> >ceph-osd.2.log:2017-12-14 06:50:47.180941 7f0f47db2700 -1 osd.2
> >pg_epoch: 11897 pg[1.2d8( v 11890'165209 (3221'163647,11890'165209]
> >local-lis/les=11733/11734 n=67 ec=132/132 lis/c 11733/11733 les/c/f
> >11734/11734/0 11733/11733/11733) [2,45,31] r=0 lpr=11733
> >crt=11890'165209 lcod 11890'165208 mlcod 11890'165208
> >active+clean+scrubbing] _scan_snaps no head for
> >1:1b518155:::rbd_data.620652ae8944a.0126:a (have MIN)
> >ceph-osd.2.log:2017-12-14 06:50:47.214198 7f0f43daa700  0
> >log_channel(cluster) log [DBG] : 1.2d8 scrub ok
> >
> >So finally it logs "scrub ok", but what does " _scan_snaps no head for
> >..." mean?
> 
> I also see this lines in our Logfiles and am wonder  what this means.
> 
> >Does this indicate a problem?
> 
> I do not guess so because we actually have not  any issues.
>  
> >
> >Ceph 12.2.2 with bluestore on lvm
> 
> We using 12.2.2 with filestore on xfs.
> 
> - Mehmet
> >
> >Gr. Stefan
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread John Spray
On Tue, Jan 2, 2018 at 10:43 AM, Jan Fajerski  wrote:
> Hi lists,
> Currently the ceph status output formats all numbers with binary unit
> prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals
> 1048576 objects.  I received a bug report from a user that printing object
> counts with a base 2 multiplier is confusing (I agree) so I opened a bug and
> https://github.com/ceph/ceph/pull/19117.
> In the PR discussion a couple of questions arose that I'd like to get some
> opinions on:

> - Should we print binary unit prefixes (MiB, GiB, ...) since that would be
> technically correct?

I'm not a fan of the technically correct base 2 units -- they're still
relatively rarely used, and I've spent most of my life using kB to
mean 1024, not 1000.

> - Should counters (like object counts) be formatted with a base 10
> multiplier or  a multiplier woth base 2?

I prefer base 2 for any dimensionless quantities (or rates thereof) in
computing.  Metres and kilograms go in base 10, bytes go in base 2.

It's all very subjective and a matter of opinion of course, and my
feelings aren't particularly strong :-)

John

> My proposal would be to both use binary unit prefixes and use base 10
> multipliers for counters. I think this aligns with user expectations as well
> as the relevant standard(s?).
>
> Best,
> Jan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] in the same ceph cluster, why the object in the same osd some are 8M and some are 4M?

2018-01-02 Thread Richard Hesketh
On 02/01/18 02:36, linghucongsong wrote:
> Hi, all!
> 
> I just use ceph rbd for openstack.
> 
> my ceph version is 10.2.7.
> 
> I find a surprise thing that the object save in the osd , in some pgs the 
> objects are 8M, and in some pgs the objects are 4M, can someone tell me why?  
> thanks!
> 
> root@node04:/var/lib/ceph/osd/ceph-3/current/1.6e_head/DIR_E/DIR_6# ll -h
> -rw-r--r-- 1 ceph ceph 8.0M Dec 14 14:36 
> rbd\udata.0f5c1a238e1f29.012a__head_6967626E__1
> 
> root@node04:/var/lib/ceph/osd/ceph-3/current/3.13_head/DIR_3/DIR_1/DIR_3/DIR_6#
>  ll -h
> -rw-r--r--  1 ceph ceph 4.0M Oct 24 17:39 
> rbd\udata.106f835ba64e8d.04dc__head_5B646313__3
By default, rbds are striped across 4M objects, but that is a configurable 
value - you can make it larger or smaller if you like. I note that the PGs you 
are looking at are from different pools (1.xx vs 3.xx) - so I'm guessing you 
have multiple storage pools configured in your openstack cluster. Is it 
possible that for the larger ones, the rbd_store_chunk_size parameter is being 
overridden?

Rich



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing PG number

2018-01-02 Thread Karun Josy
https://access.redhat.com/solutions/2457321

It says it is a very intensive process and can affect cluster performance.

Our Version is Luminous 12.2.2
And we are using erasure coding profile for a pool 'ecpool' with k=5 and m=3
Current PG number is 256 and it has about 20 TB of data.

Should I increase it gradually? Or set pg as 512 in one step ?




Karun Josy

On Tue, Jan 2, 2018 at 9:26 PM, Hans van den Bogert 
wrote:

> Please refer to standard documentation as much as possible,
>
> http://docs.ceph.com/docs/jewel/rados/operations/
> placement-groups/#set-the-number-of-placement-groups
>
> Han’s is also incomplete, since you also need to change the ‘pgp_num’ as
> well.
>
> Regards,
>
> Hans
>
> On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev  wrote:
>
> Increased number of PGs in multiple pools in a production cluster on
> 12.2.2 recently - zero issues.
> CEPH claims that increasing pg_num and pgp_num are safe operations, which
> are essential for it's ability to scale, and this sounds pretty reasonable
> to me. [1]
>
>
> [1] https://www.sebastien-han.fr/blog/2013/03/12/ceph-change
> -pg-number-on-the-fly/
>
> 2018-01-02 18:21 GMT+03:00 Karun Josy :
>
>> Hi,
>>
>>  Initial PG count was not properly planned while setting up the cluster,
>> so now there are only less than 50 PGs per OSDs.
>>
>> What are the best practises to increase PG number of a pool ?
>> We have replicated pools as well as EC pools.
>>
>> Or is it better to create a new pool with higher PG numbers?
>>
>>
>> Karun
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing PG number

2018-01-02 Thread Hans van den Bogert
Please refer to standard documentation as much as possible, 


http://docs.ceph.com/docs/jewel/rados/operations/placement-groups/#set-the-number-of-placement-groups
 


Han’s is also incomplete, since you also need to change the ‘pgp_num’ as well.

Regards,

Hans

> On Jan 2, 2018, at 4:41 PM, Vladimir Prokofev  wrote:
> 
> Increased number of PGs in multiple pools in a production cluster on 12.2.2 
> recently - zero issues.
> CEPH claims that increasing pg_num and pgp_num are safe operations, which are 
> essential for it's ability to scale, and this sounds pretty reasonable to me. 
> [1]
> 
> 
> [1] 
> https://www.sebastien-han.fr/blog/2013/03/12/ceph-change-pg-number-on-the-fly/
>  
> 
> 
> 2018-01-02 18:21 GMT+03:00 Karun Josy  >:
> Hi,
> 
>  Initial PG count was not properly planned while setting up the cluster, so 
> now there are only less than 50 PGs per OSDs.
> 
> What are the best practises to increase PG number of a pool ?
> We have replicated pools as well as EC pools.
> 
> Or is it better to create a new pool with higher PG numbers?
> 
> 
> Karun 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Increasing PG number

2018-01-02 Thread Vladimir Prokofev
Increased number of PGs in multiple pools in a production cluster on 12.2.2
recently - zero issues.
CEPH claims that increasing pg_num and pgp_num are safe operations, which
are essential for it's ability to scale, and this sounds pretty reasonable
to me. [1]


[1] https://www.sebastien-han.fr/blog/2013/03/12/ceph-
change-pg-number-on-the-fly/

2018-01-02 18:21 GMT+03:00 Karun Josy :

> Hi,
>
>  Initial PG count was not properly planned while setting up the cluster,
> so now there are only less than 50 PGs per OSDs.
>
> What are the best practises to increase PG number of a pool ?
> We have replicated pools as well as EC pools.
>
> Or is it better to create a new pool with higher PG numbers?
>
>
> Karun
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Increasing PG number

2018-01-02 Thread Karun Josy
Hi,

 Initial PG count was not properly planned while setting up the cluster, so
now there are only less than 50 PGs per OSDs.

What are the best practises to increase PG number of a pool ?
We have replicated pools as well as EC pools.

Or is it better to create a new pool with higher PG numbers?


Karun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PG active+clean+remapped status

2018-01-02 Thread Karun Josy
Hi,

We added some more osds to the cluster and it was fixed.

Karun Josy

On Tue, Jan 2, 2018 at 6:21 AM, 한승진  wrote:

> Are all odsd are same version?
> I recently experienced similar situation.
>
> I upgraded all osds to exact same version and reset of pool configuration
> like below
>
> ceph osd pool set  min_size 5
>
> I have 5+2 erasure code the important thing is not the number of min_size
> but re-configuration I think.
> I hope this help you.
>
> 2017. 12. 19. 오전 5:25에 "Karun Josy" 님이 작성:
>
> I think what happened is this :
>>
>> http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/
>>
>>
>> Note
>>
>>
>> Sometimes, typically in a “small” cluster with few hosts (for instance
>> with a small testing cluster), the fact to take out the OSD can spawn a
>> CRUSH corner case where some PGs remain stuck in the active+remapped
>>  state
>>
>> Its a small cluster with unequal number of osds and one of the OSD disk
>> failed and I had taken it out.
>> I have already purged it, so I cannot use the reweight option mentioned
>> in that link.
>>
>>
>> So any other workarounds ?
>> Will adding more disks will clear it ?
>>
>> Karun Josy
>>
>> On Mon, Dec 18, 2017 at 9:06 AM, David Turner 
>> wrote:
>>
>>> Maybe try outing the disk that should have a copy of the PG, but
>>> doesn't. Then mark it back in. It might check that it has everything
>>> properly and pull a copy of the data it's missing. I dunno.
>>>
>>> On Sun, Dec 17, 2017, 10:00 PM Karun Josy  wrote:
>>>
 Tried restarting all osds. Still no luck.

 Will adding a new disk to any of the server forces a rebalance and fix
 it?

 Karun Josy

 On Sun, Dec 17, 2017 at 12:22 PM, Cary  wrote:

> Karun,
>
>  Could you paste in the output from "ceph health detail"? Which OSD
> was just added?
>
> Cary
> -Dynamic
>
> On Sun, Dec 17, 2017 at 4:59 AM, Karun Josy 
> wrote:
> > Any help would be appreciated!
> >
> > Karun Josy
> >
> > On Sat, Dec 16, 2017 at 11:04 PM, Karun Josy 
> wrote:
> >>
> >> Hi,
> >>
> >> Repair didnt fix the issue.
> >>
> >> In the pg dump details, I notice this None. Seems pg is missing
> from one
> >> of the OSD
> >>
> >> [0,2,NONE,4,12,10,5,1]
> >> [0,2,1,4,12,10,5,1]
> >>
> >> There is no way Ceph corrects this automatically ? I have to edit/
> >> troubleshoot it manually ?
> >>
> >> Karun
> >>
> >> On Sat, Dec 16, 2017 at 10:44 PM, Cary 
> wrote:
> >>>
> >>> Karun,
> >>>
> >>>  Running ceph pg repair should not cause any problems. It may not
> fix
> >>> the issue though. If that does not help, there is more information
> at
> >>> the link below.
> >>> http://ceph.com/geen-categorie/ceph-manually-repair-object/
> >>>
> >>> I recommend not rebooting, or restarting while Ceph is repairing or
> >>> recovering. If possible, wait until the cluster is in a healthy
> state
> >>> first.
> >>>
> >>> Cary
> >>> -Dynamic
> >>>
> >>> On Sat, Dec 16, 2017 at 2:05 PM, Karun Josy 
> wrote:
> >>> > Hi Cary,
> >>> >
> >>> > No, I didnt try to repair it.
> >>> > I am comparatively new in ceph. Is it okay to try to repair it ?
> >>> > Or should I take any precautions while doing it ?
> >>> >
> >>> > Karun Josy
> >>> >
> >>> > On Sat, Dec 16, 2017 at 2:08 PM, Cary 
> wrote:
> >>> >>
> >>> >> Karun,
> >>> >>
> >>> >>  Did you attempt a "ceph pg repair "? Replace  with
> the pg
> >>> >> ID that needs repaired, 3.4.
> >>> >>
> >>> >> Cary
> >>> >> -D123
> >>> >>
> >>> >> On Sat, Dec 16, 2017 at 8:24 AM, Karun Josy <
> karunjo...@gmail.com>
> >>> >> wrote:
> >>> >> > Hello,
> >>> >> >
> >>> >> > I added 1 disk to the cluster and after rebalancing, it shows
> 1 PG
> >>> >> > is in
> >>> >> > remapped state. How can I correct it ?
> >>> >> >
> >>> >> > (I had to restart some osds during the rebalancing as there
> were
> >>> >> > some
> >>> >> > slow
> >>> >> > requests)
> >>> >> >
> >>> >> > $ ceph pg dump | grep remapped
> >>> >> > dumped all
> >>> >> > 3.4 981  00 0   0
> >>> >> > 2655009792
> >>> >> > 1535 1535 active+clean+remapped 2017-12-15 22:07:21.663964
> >>> >> > 2824'785115
> >>> >> > 2824:2297888 [0,2,NONE,4,12,10,5,1]  0
>  [0,2,1,4,12,10,5,1]
> >>> >> > 0  2288'767367 2017-12-14 11:00:15.576741  417'518549
> 2017-12-08
> >>> >> > 03:56:14.006982
> >>> >> >
> >>> >> > That PG belongs to an erasure pool with k=5, m =3 

Re: [ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Piotr Dałek

On 18-01-02 11:43 AM, Jan Fajerski wrote:

Hi lists,
Currently the ceph status output formats all numbers with binary unit 
prefixes, i.e. 1MB equals 1048576 bytes and an object count of 1M equals 
1048576 objects. I received a bug report from a user that printing object 
counts with a base 2 multiplier is confusing (I agree) so I opened a bug and 
https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some 
opinions on:
- Should we print binary unit prefixes (MiB, GiB, ...) since that would be 
  technically correct?


+1

- Should counters (like object counts) be formatted with a base 10 
multiplier or  a multiplier woth base 2?


+1

My proposal would be to both use binary unit prefixes and use base 10 
multipliers for counters. I think this aligns with user expectations as well 
as the relevant standard(s?).


Most users expect that non-size counters - like object counts - use base-10, 
and size counters use base-2 units. Ceph's "standard" of using base-2 
everywhere was confusing for me as well initially, but I got used to that... 
Still, wouldn't mind if that would get sorted out once and for all.


--
Piotr Dałek
piotr.da...@corp.ovh.com
https://www.ovh.com/us/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] formatting bytes and object counts in ceph status ouput

2018-01-02 Thread Jan Fajerski

Hi lists,
Currently the ceph status output formats all numbers with binary unit prefixes, 
i.e. 1MB equals 1048576 bytes and an object count of 1M equals 1048576 objects.  
I received a bug report from a user that printing object counts with a base 2 
multiplier is confusing (I agree) so I opened a bug and 
https://github.com/ceph/ceph/pull/19117.
In the PR discussion a couple of questions arose that I'd like to get some 
opinions on:
- Should we print binary unit prefixes (MiB, GiB, ...) since that would be 
 technically correct?
- Should counters (like object counts) be formatted with a base 10 multiplier or 
 a multiplier woth base 2?


My proposal would be to both use binary unit prefixes and use base 10 
multipliers for counters. I think this aligns with user expectations as well as 
the relevant standard(s?).


Best,
Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about librbd with qemu-kvm

2018-01-02 Thread Alexandre DERUMIER
It's not possible to use multiple threads by disk in qemu currently. (It's on 
qemu roadmap).

but you can create multiple disk/rbd image and use multiple qemu iothreads. (1 
by disk).


(BTW, I'm able to reach around 70k iops max with 4k read, with 3,1ghz cpu, 
rbd_cache=none, disabling debug and cephx in ceph.conf)


- Mail original -
De: "冷镇宇" 
À: "ceph-users" 
Envoyé: Mardi 2 Janvier 2018 04:01:39
Objet: [ceph-users] Question about librbd with qemu-kvm



Hi all, 

I am using librbd of Ceph10.2.0 with Qemu-kvm. When the virtual machine booted, 
I found that there is only one tp_librbd thread for one rbd image. Then the 
iops of 4KB read for one rbd image is only 20,000. I'm wondering if there are 
some configures for librbd in qemu which can add librbd threads for one rbd 
image. Can someone help me? Thank you very much. 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running Jewel and Luminous mixed for a longer period

2018-01-02 Thread Luis Periquito
On Tue, Dec 5, 2017 at 1:20 PM, Wido den Hollander  wrote:
> Hi,
>
> I haven't tried this before but I expect it to work, but I wanted to check 
> before proceeding.
>
> I have a Ceph cluster which is running with manually formatted FileStore XFS 
> disks, Jewel, sysvinit and Ubuntu 14.04.
>
> I would like to upgrade this system to Luminous, but since I have to 
> re-install all servers and re-format all disks I'd like to move it to 
> BlueStore at the same time.
>
> This system however has 768 3TB disks and has a utilization of about 60%. You 
> can guess, it will take a long time before all the backfills complete.
>
> The idea is to take a machine down, wipe all disks, re-install it with Ubuntu 
> 16.04 and Luminous and re-format the disks with BlueStore.
>
> The OSDs get back, start to backfill and we wait.
Are you OUT'ing the OSDs or removing them altogether (ceph osd crush
remove + ceph osd rm)?

I've noticed that when you remove them completely the data movement is
much bigger.

>
> My estimation is that we can do one machine per day, but we have 48 machines 
> to do. Realistically this will take ~60 days to complete.

That seems a bit optimistic for me. But it depends on how aggressive
you are, and how busy those spindles are.

>
> Afaik running Jewel (10.2.10) mixed with Luminous (12.2.2) should work just 
> fine I wanted to check if there are any caveats I don't know about.
>
> I'll upgrade the MONs to Luminous first before starting to upgrade the OSDs. 
> Between each machine I'll wait for a HEALTH_OK before proceeding allowing the 
> MONs to trim their datastore.

You have to: As far as I've seen after upgrading one of the MONs to
Luminous, the new OSDs running Luminous refuse to start until you have
*ALL* MONs running Luminous.

>
> The question is: Does it hurt to run Jewel and Luminous mixed for ~60 days?
>
> I think it won't, but I wanted to double-check.

I thought the same. I was running 10.2.3 and doing about the same to
upgrade to 10.2.7, so keeping Jewel. The process was pretty much the
same, but had to pause for a month half way through (because of
unrelated issues) and every so often the cluster would just stop. At
least one of the OSDs would stop responding and piling up slow
requests, even though it was idle. It was random OSDs and happened
both on HDD and SSD (this is a cache tiered s3 storage cluster) and
either version. I tried the injectargs but no output - it just printed
and if it was idle. Restart the OSD and it would spring back to
life...

So not sure if you get similar issues, but I'm now avoiding mixed
versions as much as I can.

>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph as an Alternative to HDFS for Hadoop

2018-01-02 Thread Traiano Welcome
Hi Serkan

On Fri, Dec 22, 2017 at 12:04 PM, Serkan Çoban 
wrote:

> >Also, are there any benchmark comparisons between hdfs and ceph
> specifically around performance of apps benefiting from data locality ?
> There will be no data locality in ceph, because all the data is
> accessed through network.
>


Ok, so in "slow" networks this will be an issue.




>
> On Fri, Dec 22, 2017 at 4:52 AM, Traiano Welcome 
> wrote:
> > Hi List
> >
> > I'm researching the possibility os using ceph as a drop in replacement
> for
> > hdfs for applications using spark and hadoop.
> >
> > I note that the jewel documentation states that it requires hadoop 1.1.x,
> > which seems a little dated and would be of concern for peopel:
> >
> > http://docs.ceph.com/docs/jewel/cephfs/hadoop/
> >
> > What about the 2.x series?
> >
>


Is there any information on whether Ceph will support 2.X series Hadoop ?





> > Also, are there any benchmark comparisons between hdfs and ceph
> specifically
> > around performance of apps benefiting from data locality ?
> >
> > Many thanks in advance for any feedback!
> >
> > Regards,
> > Traiano
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph as an Alternative to HDFS for Hadoop

2018-01-02 Thread Traiano Welcome
On Wed, Dec 27, 2017 at 1:52 AM, Aristeu Gil Alves Jr 
wrote:

> In a recent thread on the list, I received various important answers to my
> questions on hadoop plugin. Maybe this thread will help you.
> https://www.spinics.net/lists/ceph-users/msg40790.html
>
> One of the most important answers is about data locality. The last message
> lead me to this article.
> https://www.bluedata.com/blog/2015/05/data-locality-is-
> irrelevant-for-hadoop/
>


Thanks, this was informative!




>
>
> Regards,
> --
> Aristeu
>
> 2017-12-22 2:04 GMT-02:00 Serkan Çoban :
>
>> >Also, are there any benchmark comparisons between hdfs and ceph
>> specifically around performance of apps benefiting from data locality ?
>> There will be no data locality in ceph, because all the data is
>> accessed through network.
>>
>> On Fri, Dec 22, 2017 at 4:52 AM, Traiano Welcome 
>> wrote:
>> > Hi List
>> >
>> > I'm researching the possibility os using ceph as a drop in replacement
>> for
>> > hdfs for applications using spark and hadoop.
>> >
>> > I note that the jewel documentation states that it requires hadoop
>> 1.1.x,
>> > which seems a little dated and would be of concern for peopel:
>> >
>> > http://docs.ceph.com/docs/jewel/cephfs/hadoop/
>> >
>> > What about the 2.x series?
>> >
>> > Also, are there any benchmark comparisons between hdfs and ceph
>> specifically
>> > around performance of apps benefiting from data locality ?
>> >
>> > Many thanks in advance for any feedback!
>> >
>> > Regards,
>> > Traiano
>> >
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com