[ceph-users] Scrub start-time and end-time

2019-08-12 Thread Torben Hørup

Hi

I have a few questions regarding the options for limiting the scrubbing 
to a certain time frame : "osd scrub begin hour" and "osd scrub end 
hour".


Is it allowed to have the scrub period cross midnight ? eg have start 
time at 22:00 and end time 07:00 next morning.


I assume that if you only configure the one of them - it still behaves 
as if it is unconfigured ??



/Torben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understanding incomplete PGs

2019-07-06 Thread Torben Hørup
Hi 

The "ec unable to recover when below min size" thing has very recently
been fixed for octopus. 

See https://tracker.ceph.com/issues/18749 and
https://github.com/ceph/ceph/pull/17619 

Docs has been updated with a section on this issue
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-coded-pool-recover
[2] 

/Torben 

On 05.07.2019 11:50, Paul Emmerich wrote:

> * There are virtually no use cases for ec pools with m=1, this is a bad 
> configuration as you can't have both availability and durability 
> 
> * Due to weird internal restrictions ec pools below their min size can't 
> recover, you'll probably have to reduce min_size temporarily to recover it 
> 
> * Depending on your version it might be necessary to restart some of the OSDs 
> due to a bug (fixed by now) that caused it to mark some objects as degraded 
> if you remove or restart an OSD while you have remapped objects 
> * run "ceph osd safe-to-destroy X" to check if it's safe to destroy a given 
> OSD
> 
> -- 
> Paul Emmerich
> 
> Looking for help with your Ceph cluster? Contact us at https://croit.io
> 
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io [1]
> Tel: +49 89 1896585 90 
> 
> On Fri, Jul 5, 2019 at 1:17 AM Kyle  wrote: 
> 
>> Hello,
>> 
>> I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore 
>> on 
>> lvm) and recently ran into a problem with 17 pgs marked as incomplete after 
>> adding/removing OSDs.
>> 
>> Here's the sequence of events:
>> 1. 7 osds in the cluster, health is OK, all pgs are active+clean
>> 2. 3 new osds on a new host are added, lots of backfilling in progress
>> 3. osd 6 needs to be removed, so we do "ceph osd crush reweight osd.6 0"
>> 4. after a few hours we see "min osd.6 with 0 pgs" from "ceph osd 
>> utilization"
>> 5. ceph osd out 6
>> 6. systemctl stop ceph-osd@6
>> 7. the drive backing osd 6 is pulled and wiped
>> 8. backfilling has now finished all pgs are active+clean except for 17 
>> incomplete pgs
>> 
>> From reading the docs, it sounds like there has been unrecoverable data loss 
>> in those 17 pgs. That raises some questions for me:
>> 
>> Was "ceph osd utilization" only showing a goal of 0 pgs allocated instead of 
>> the current actual allocation?
>> 
>> Why is there data loss from a single osd being removed? Shouldn't that be 
>> recoverable?
>> All pools in the cluster are either replicated 3 or erasure-coded k=2,m=1 
>> with 
>> default "host" failure domain. They shouldn't suffer data loss with a single 
>> osd being removed even if there were no reweighting beforehand. Does the 
>> backfilling temporarily reduce data durability in some way?
>> 
>> Is there a way to see which pgs actually have data on a given osd?
>> 
>> I attached an example of one of the incomplete pgs.
>> 
>> Thanks for any help,
>> 
>> Kyle___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 

Links:
--
[1] http://www.croit.io
[2]
http://docs.ceph.com/docs/master/rados/operations/erasure-code/#erasure-coded-pool-recovery___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw-admin list bucket based on "last modified"

2019-06-25 Thread Torben Hørup

Hi

You could look into the radosgw elasicsearch sync module, and use that 
to find the objects last modified.


http://docs.ceph.com/docs/master/radosgw/elastic-sync-module/

/Torben

On 25.06.2019 08:19, M Ranga Swami Reddy wrote:


Thanks for the reply.
Btw, one my customer wants to get the objects based on last modified 
date filed. How do we can achive this?


On Thu, Jun 13, 2019 at 7:09 PM Paul Emmerich  
wrote:


There's no (useful) internal ordering of these entries, so there isn't 
a more efficient way than getting everything and sorting it :(


Paul

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Thu, Jun 13, 2019 at 3:33 PM M Ranga Swami Reddy 
 wrote:

hello - Can we list the objects in rgw, via last modified date?

For example - I wanted to list all the objects which were modified 01 
Jun 2019.


Thanks
Swami ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Thoughts on rocksdb and erasurecode

2019-06-24 Thread Torben Hørup

Hi

Have been thinking a bit about rocksdb and EC pools:

Since a RADOS object written to a EC(k+m) pool is split into several 
minor pieces, then the OSD will receive many more smaller objects, 
compared to the amount it would receive in a replicated setup.


This must mean that the rocksdb will also need to handle this more 
entries, and will grow faster. This will have an impact when using 
bluestore for slow HDD with DB on SSD drives, where the faster growing 
rocksdb might result in spillover to slow store - if not taken into 
consideration when designing the disk layout.


Are my thoughts on the right track or am I missing something?

Has somebody done any measurement on rocksdb growth, comparing replica 
vs EC ?


/Torben
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw dying

2019-06-09 Thread Torben Hørup

For just core rgw services it will need these 4
.rgw.root
 
 
 
 default.rgw.control 
 
 
 
  default.rgw.meta

default.rgw.log

When creating buckets and uploading data RGW will need additional 3:

default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data

/Torben


On 09.06.2019 19:34, Paul Emmerich wrote:


rgw uses more than one pool. (5 or 6 IIRC)

--
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

On Sun, Jun 9, 2019 at 7:00 PM  wrote:

Huan;

I get that, but the pool already exists, why is radosgw trying to 
create one?


Dominic Hilsbos

Get Outlook for Android

On Sat, Jun 8, 2019 at 2:55 AM -0700, "huang jun"  
wrote:


From the error message, i'm decline to that 'mon_max_pg_per_osd' was 
exceed,

you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all 
pools,
you already have 448pgs, so the next pool can create at most 
500-448=52pgs.


于2019年6月8日周六 下午2:41写道:


All;

I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x 
OSD per host), and I'm trying to add a 4th host for gateway purposes.


The radosgw process keeps dying with:
2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
(d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
radosgw, pid 17588
2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
librados::Rados::pool_create returned (34) Numerical result out of 
range (this can be due to a pool or placement group misconfiguration, 
e.g. pg_num < pgp_num or mon_max_pg_per_osd exceeded)
2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider 
(RADOS)


The .rgw.root pool already exists.

ceph status returns:
cluster:
id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
health: HEALTH_OK

services:
mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
mgr: S700028(active, since 47h), standbys: S700030, S700029
osd: 6 osds: 6 up (since 2d), 6 in (since 3d)

data:
pools:   5 pools, 448 pgs
objects: 12 objects, 1.2 KiB
usage:   722 GiB used, 65 TiB / 66 TiB avail
pgs: 448 active+clean

and ceph osd tree returns:
ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
-1   66.17697 root default
-5   22.05899 host S700029
2   hdd 11.02950 osd.2up  1.0 1.0
3   hdd 11.02950 osd.3up  1.0 1.0
-7   22.05899 host S700030
4   hdd 11.02950 osd.4up  1.0 1.0
5   hdd 11.02950 osd.5up  1.0 1.0
-3   22.05899 host s700028
0   hdd 11.02950 osd.0up  1.0 1.0
1   hdd 11.02950 osd.1up  1.0 1.0

Any thoughts on what I'm missing?

Thank you,

Dominic L. Hilsbos, MBA
Director - Information Technology
Perform Air International Inc.
dhils...@performair.com
www.PerformAir.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Thank you!
HuangJun

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Massive TCP connection on radosgw

2019-05-22 Thread Torben Hørup
Which states are all these connections in ? 

ss -tn | awk '{print $1}' | sort | uniq -c 

/Torben 

On 22.05.2019 15:19, Li Wang wrote:

> Hi guys,  
> 
> Any help here?
> 
> Sent from my iPhone 
> 
> On 20 May 2019, at 2:48 PM, John Hearns  wrote:
> 
> I found similar behaviour on a Nautilus cluster on Friday. Around 300 000 
> open connections which I think were the result of a benchmarking run which 
> was terminated. I restarted the radosgw service to get rid of them. 
> 
> On Mon, 20 May 2019 at 06:56, Li Wang  wrote: Dear ceph 
> community members,
> 
> We have a ceph cluster (mimic 13.2.4) with 7 nodes and 130+ OSDs. However, we 
> observed over 70 millions active TCP connections on the radosgw host, which 
> makes the radosgw very unstable. 
> 
> After further investigation, we found most of the TCP connections on the 
> radosgw are connected to OSDs.
> 
> May I ask what might be the possible reason causing the the massive amount of 
> TCP connection? And is there anything configuration or tuning work that I can 
> do to solve this issue?
> 
> Any suggestion is highly appreciated.
> 
> Regards,
> Li Wang
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ansible 2.8 for Nautilus

2019-05-21 Thread Torben Hørup
epel-testing has a ansible 2.8 package 

/Torben 

On 21.05.2019 03:14, solarflow99 wrote:

> Does anyone know the necessary steps to install ansible 2.8 in rhel7? I'm 
> assuming most people are doing it with pip? 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Noob question - ceph-mgr crash on arm

2019-05-20 Thread Torben Hørup
Hi

Tcmalloc on arm7 is problematic. You need to compile your own with either 
jemalloc or just libc malloc

/Torben

Den 20. maj 2019 17.48.40 CEST, "Jesper Taxbøl"  skrev:
>I am trying to setup a Ceph cluster on 4 odroid-hc2 instances on top of
>Ubuntu 18.04.
>
>My ceph-mgr deamon keeps crashing on me.
>
>Any advise on how to proceed?
>
>Log on mgr node says something about ms_dispatch:
>
>2019-05-20 15:34:43.070424 b6714230  0 set uid:gid to 64045:64045
>(ceph:ceph)
>2019-05-20 15:34:43.070455 b6714230  0 ceph version 12.2.11
>(26dc3775efc7bb286a1d6d66faee0b
>a30ea23eee) luminous (stable), process ceph-mgr, pid 1169
>2019-05-20 15:34:43.070799 b6714230  0 pidfile_write: ignore empty
>--pid-file
>2019-05-20 15:34:43.101162 b6714230  1 mgr send_beacon standby
>2019-05-20 15:34:43.124462 b06f8c30 -1 *** Caught signal (Segmentation
>fault) **
>in thread b06f8c30 thread_name:ms_dispatch
>
>ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee)
>luminous
>(stable)
>1: (()+0x30133c) [0x77033c]
>2: (()+0x25750) [0xb688a750]
>3: (_ULarm_step()+0x55) [0xb6816ce6]
>4: (()+0x255e8) [0xb6cd85e8]
>5: (GetStackTrace(void**, int, int)+0x25) [0xb6cd8a3e]
>6: (tcmalloc::PageHeap::GrowHeap(unsigned int)+0xb9) [0xb6ccd36a]
>7: (tcmalloc::PageHeap::New(unsigned int)+0x79) [0xb6ccd5e6]
>8: (tcmalloc::CentralFreeList::Populate()+0x71) [0xb6ccc5ce]
>9: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**,
>void**)+0x1b) [0xb6ccc76
>0]
>10: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x6d)
>[0xb6ccc7de]
>11: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int,
>unsigned
>int)+0x51) [0xb6c
>cea56]
>12: (malloc()+0x22d) [0xb6cd9a8e]
>NOTE: a copy of the executable, or `objdump -rdS ` is
>needed to
>interpret this
>.
>
>--- begin dump of recent events ---
>  -90> 2019-05-20 15:34:43.053293 b6714230  5 asok(0x55b5320)
>register_command perfcounter
>s_dump hook 0x554c088
>  -89> 2019-05-20 15:34:43.053322 b6714230  5 asok(0x55b5320)
>register_command 1 hook 0x55
>4c088
>  -88> 2019-05-20 15:34:43.053330 b6714230  5 asok(0x55b5320)
>register_command perf dump h
>ook 0x554c088
>  -87> 2019-05-20 15:34:43.053341 b6714230  5 asok(0x55b5320)
>register_command perfcounter
>s_schema hook 0x554c088
>  -86> 2019-05-20 15:34:43.053360 b6714230  5 asok(0x55b5320)
>register_command perf histog
>ram dump hook 0x554c088
>  -85> 2019-05-20 15:34:43.053374 b6714230  5 asok(0x55b5320)
>register_command 2 hook 0x55
>4c088
>  -84> 2019-05-20 15:34:43.053381 b6714230  5 asok(0x55b5320)
>register_command perf schema
>hook 0x554c088
>  -83> 2019-05-20 15:34:43.053389 b6714230  5 asok(0x55b5320)
>register_command perf histog
>ram schema hook 0x554c088
>  -82> 2019-05-20 15:34:43.053410 b6714230  5 asok(0x55b5320)
>register_command perf reset
>hook 0x554c088
>  -81> 2019-05-20 15:34:43.053418 b6714230  5 asok(0x55b5320)
>register_command config show
>hook 0x554c088
>  -80> 2019-05-20 15:34:43.053425 b6714230  5 asok(0x55b5320)
>register_command config help
>hook 0x554c088
>  -79> 2019-05-20 15:34:43.053436 b6714230  5 asok(0x55b5320)
>register_command config set
>hook 0x554c088
>  -78> 2019-05-20 15:34:43.053444 b6714230  5 asok(0x55b5320)
>register_command config get
>hook 0x554c088
>  -77> 2019-05-20 15:34:43.053459 b6714230  5 asok(0x55b5320)
>register_command config diff
>hook 0x554c088
>  -76> 2019-05-20 15:34:43.053467 b6714230  5 asok(0x55b5320)
>register_command config diff
>get hook 0x554c088
>  -75> 2019-05-20 15:34:43.053475 b6714230  5 asok(0x55b5320)
>register_command log flush h
>ook 0x554c088
>  -74> 2019-05-20 15:34:43.053482 b6714230  5 asok(0x55b5320)
>register_command log dump ho
>ok 0x554c088
>  -73> 2019-05-20 15:34:43.053490 b6714230  5 asok(0x55b5320)
>register_command log reopen
>hook 0x554c088
>  -72> 2019-05-20 15:34:43.053513 b6714230  5 asok(0x55b5320)
>register_command dump_mempoo
>ls hook 0x56e3504
> -71> 2019-05-20 15:34:43.070424 b6714230  0 set uid:gid to 64045:64045
>(ceph:ceph)
>  -70> 2019-05-20 15:34:43.070455 b6714230  0 ceph version 12.2.11
>(26dc3775efc7bb286a1d6d
>66faee0ba30ea23eee) luminous (stable), process ceph-mgr, pid 1169
>-69> 2019-05-20 15:34:43.070799 b6714230  0 pidfile_write: ignore empty
>--pid-file
>  -68> 2019-05-20 15:34:43.074441 b6714230  5 asok(0x55b5320) init
>/var/run/ceph/ceph-mgr.
>odroid-c.asok
>  -67> 2019-05-20 15:34:43.074473 b6714230  5 asok(0x55b5320)
>bind_and_listen /var/run/cep
>h/ceph-mgr.odroid-c.asok
>  -66> 2019-05-20 15:34:43.074615 b6714230  5 asok(0x55b5320)
>register_command 0 hook 0x55
>4c1d0
>  -65> 2019-05-20 15:34:43.074633 b6714230  5 asok(0x55b5320)
>register_command version hoo
>k 0x554c1d0
>  -64> 2019-05-20 15:34:43.074654 b6714230  5 asok(0x55b5320)
>register_command git_version
>hook 0x554c1d0
>  -63> 2019-05-20 15:34:43.074674 b6714230  5 asok(0x55b5320)
>register_command help hook 0
>x554c1d8
>  -62> 2019-05-20 15:34:43.074694 b6714230  5 asok(0x55b5320)
>register_command get_command
>_descriptions hook 0x554c1e0
>-61>