Re: [ceph-users] how the files in /var/lib/ceph/osd/ceph-0 are generated

2018-04-06 Thread Jeffrey Zhang
​​
Yes, I am using ceph-volume.

And i found where the keyring comes from.

bluestore will save all the information at the starting of
disk (BDEV_LABEL_BLOCK_SIZE=4096)
this area is used for saving labels, including keyring, whoami etc.

these can be read through ceph-bluestore-tool show-lable

$ ceph-bluestore-tool  show-label --path /var/lib/ceph/osd/ceph-0
{
"/var/lib/ceph/osd/ceph-0/block": {
"osd_uuid": "c349b2ba-690f-4a36-b6f6-2cc0d0839f29",
"size": 2147483648,
"btime": "2018-04-04 10:22:25.216117",
"description": "main",
"bluefs": "1",
"ceph_fsid": "14941be9-c327-4a17-8b86-be50ee2f962e",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "AQDgNsRaVtsRIBAA6pmOf7y2GBufyE83nHwVvg==",
"ready": "ready",
"whoami": "0"
}
}

So during mounting the /var/lib/ceph/osd/ceph-0, ceph will dump these
content into the
tmpfs folder.

On Fri, Apr 6, 2018 at 10:21 PM, David Turner  wrote:

> Likely the differences you're seeing of /dev/sdb1 and tmpfs have to do
> with how ceph-disk vs ceph-volume manage the OSDs and what their defaults
> are.  ceph-disk will create partitions on devices while ceph-volume
> configures LVM on the block device.  Also with bluestore you do not have a
> standard filesystem, so ceph-volume creates a mock folder to place the
> necessary information into /var/lib/ceph/osd/ceph-0 to track the
> information for the OSD and how to start it.
>
> On Wed, Apr 4, 2018 at 6:20 PM Gregory Farnum  wrote:
>
>> On Tue, Apr 3, 2018 at 6:30 PM Jeffrey Zhang > gmail.com> wrote:
>>
>>> I am testing ceph Luminous, the environment is
>>>
>>> - centos 7.4
>>> - ceph luminous ( ceph offical repo)
>>> - ceph-deploy 2.0
>>> - bluestore + separate wal and db
>>>
>>> I found the ceph osd folder `/var/lib/ceph/osd/ceph-0` is mounted
>>> from tmpfs. But where the files in that folder come from? like `keyring`,
>>> `whoami`?
>>>
>>
>> These are generated as part of the initialization process. I don't know
>> the exact commands involved, but the keyring for instance will draw from
>> the results of "ceph osd new" (which is invoked by one of the ceph-volume
>> setup commands). That and whoami are part of the basic information an OSD
>> needs to communicate with a monitor.
>> -Greg
>>
>>
>>>
>>> $ ls -alh /var/lib/ceph/osd/ceph-0/
>>> lrwxrwxrwx.  1 ceph ceph   24 Apr  3 16:49 block ->
>>> /dev/ceph-pool/osd0.data
>>> lrwxrwxrwx.  1 root root   22 Apr  3 16:49 block.db ->
>>> /dev/ceph-pool/osd0-db
>>> lrwxrwxrwx.  1 root root   23 Apr  3 16:49 block.wal ->
>>> /dev/ceph-pool/osd0-wal
>>> -rw---.  1 ceph ceph   37 Apr  3 16:49 ceph_fsid
>>> -rw---.  1 ceph ceph   37 Apr  3 16:49 fsid
>>> -rw---.  1 ceph ceph   55 Apr  3 16:49 keyring
>>> -rw---.  1 ceph ceph6 Apr  3 16:49 ready
>>> -rw---.  1 ceph ceph   10 Apr  3 16:49 type
>>> -rw---.  1 ceph ceph2 Apr  3 16:49 whoami
>>>
>>> I guess they may be loaded from bluestore. But I can not find any clue
>>> for this.
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-06 Thread Konstantin Danilov
David,

> What happens when you deep-scrub this PG?
we haven't try to deep-scrub it, will try.

> What do the OSD logs show for any lines involving the problem PGs?
Nothing special were logged about this particular osd, except that it's
degraded.
Yet osd consume quite a lot portion of its CPU time in
snappy/leveldb/jemalloc libs.
In logs there a lot of messages from leveldb about moving data between
levels.
Needles to mention that this PG is from RGW index bucket, so it's metadata
only
and get a relatively hight load. Yet not we have 3 PG with the same
behavior from rgw data pool ()cluster have almost all data in RGW

> Was anything happening on your cluster just before this started happening
at first?
Cluster gets many updates in a week before issue, but nothing particularly
noticeable.
SSD OSD get's split in two, about 10% of OSD were removed. Some networking
issues
appears.

Thanks

On Fri, Apr 6, 2018 at 10:07 PM, David Turner  wrote:

> What happens when you deep-scrub this PG?  What do the OSD logs show for
> any lines involving the problem PGs?  Was anything happening on your
> cluster just before this started happening at first?
>
> On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov 
> wrote:
>
>> Hi all, we have a strange issue on one cluster.
>>
>> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
>> matter what how
>> we change crush map.
>> The whole picture is next:
>>
>> * This is 10.2.7 ceph version, all monitors and osd's have the same
>> version
>> * One  PG eventually get into 'active+degraded+incomplete' state. It
>> was active+clean for a long time
>> and already has some data. We can't detect the event, which leads it
>> to this state. Probably it's
>> happened after some osd was removed from the cluster
>> * This PG has all 3 required OSD up and running, and all of them
>> online (pool_sz=3, min_pool_sz=2)
>> * All requests to pg stack forever, historic_ops shows that it waiting
>> on "waiting_for_degraded_pg"
>> * ceph pg query hangs forever
>> * We can't copy data from another pool as well - copying process hangs
>> and that fails with
>> (34) Numerical result out of range
>>  * We was trying to restart osd's, nodes, mon's with no effects
>> * Eventually we found that shutting down osd Z(not primary) does solve
>> the issue, but
>> only before ceph set this osd out. If we trying to change the weight
>> of this osd or remove it from cluster problem appears again. Cluster
>> is working only while osd Z is down and not out and has the default
>> weight
>> * Then we have found that doesn't matter what we are doing with crushmap -
>> osdmaptool --test-map-pgs-dump always put this PG to the same set of
>> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
>> to remove nodes with OSD X,Y and Z completely out of it, compile it,
>> import it back to osdmap and run osdmaptool and always get the same
>> results
>> * After several nodes restart and setting osd Z down, but no out we
>> are now have 3 more PG with the same behaviour, but 'pined' to another
>> osd's
>> * We have run osdmaptool from luminous ceph to check if upmap
>> extension is somehow getting into this osd map - it is not.
>>
>> So this is where we are now. Have anyone seen something like this? Any
>> ideas are welcome. Thanks
>>
>>
>> --
>> Kostiantyn Danilov
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>


-- 
Kostiantyn Danilov aka koder.ua
Principal software engineer, Mirantis

skype:koder.ua
http://koder-ua.blogspot.com/
http://mirantis.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] "unable to connect to cluster" after monitor IP change

2018-04-06 Thread Nathan Dehnel
gentooserver ~ # ceph-mon -i mon0 --extract-monmap /tmp/monmap
2018-04-06 15:38:10.863444 7f8aa2b72f80 -1 wrote monmap to /tmp/monmap
gentooserver ~ # monmaptool --print /tmp/monmap
monmaptool: monmap file /tmp/monmap
epoch 3
fsid a736559a-92d1-483e-9289-d2c7feed510f
last_changed 2018-04-06 14:53:12.892574
created 2018-04-06 14:52:18.190509
0: [2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789/0 mon.mon0

This is the monmap I injected into my monitor.


gentooserver ~ # systemctl status ceph-mon@mon0
● ceph-mon@mon0.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; indirect; vendor
preset: disabled)
   Active: active (running) since Fri 2018-04-06 15:47:51 CDT; 16s ago
 Main PID: 4362 (ceph-mon)
   CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@mon0.service
   └─4362 /usr/bin/ceph-mon -f --cluster ceph --id mon0 --setuser
ceph --setgroup ceph

Apr 06 15:47:51 gentooserver systemd[1]: Started Ceph cluster monitor
daemon.
Apr 06 15:47:51 gentooserver ceph-mon[4362]: 2018-04-06 15:47:51.841218
7f5824d44f80 -1 distro_detect - can't detect distro_version

gentooserver ~ # systemctl status ceph-mgr@mgr0
● ceph-mgr@mgr0.service - Ceph cluster manager daemon
   Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; indirect; vendor
preset: disabled)
   Active: active (running) since Fri 2018-04-06 15:10:01 CDT; 38min ago
 Main PID: 3807 (ceph-mgr)
   CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@mgr0.service
   └─3807 /usr/bin/ceph-mgr -f --cluster ceph --id mgr0 --setuser
ceph --setgroup ceph

Apr 06 15:10:01 gentooserver systemd[1]: Started Ceph cluster manager
daemon.

gentooserver ~ # systemctl status ceph-mds@mds0
● ceph-mds@mds0.service - Ceph metadata server daemon
   Loaded: loaded (/lib/systemd/system/ceph-mds@.service; disabled; vendor
preset: disabled)
  Drop-In: /etc/systemd/system/ceph-mds@.service.d
   └─00gentoo.conf
   Active: active (running) since Fri 2018-04-06 15:10:25 CDT; 38min ago
 Main PID: 3827 (ceph-mds)
   CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@mds0.service
   └─3827 /usr/bin/ceph-mds -f --cluster ceph --id mds0 --setuser
ceph --setgroup ceph

Apr 06 15:10:25 gentooserver systemd[1]: Started Ceph metadata server
daemon.
Apr 06 15:10:25 gentooserver ceph-mds[3827]: starting mds.mds0 at -

All the daemons report they are running.


gentooserver ~ # ceph daemon mon.mon0 mon_status
{
"name": "mon0",
"rank": 0,
"state": "leader",
"election_epoch": 39,
"quorum": [
0
],
"features": {
"required_con": "153140804152475648",
"required_mon": [
"kraken",
"luminous"
],
"quorum_con": "2305244844532236283",
"quorum_mon": [
"kraken",
"luminous"
]
},
"outside_quorum": [],
"extra_probe_peers": [],
"sync_provider": [],
"monmap": {
"epoch": 3,
"fsid": "a736559a-92d1-483e-9289-d2c7feed510f",
"modified": "2018-04-06 14:53:12.892574",
"created": "2018-04-06 14:52:18.190509",
"features": {
"persistent": [
"kraken",
"luminous"
],
"optional": []
},
"mons": [
{
"rank": 0,
"name": "mon0",
"addr": "[2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789/0",
"public_addr":
"[2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789/0"
}
]
},
"feature_map": {
"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 1
}
}
}
}


gentooserver ~ # cat /etc/ceph/ceph.conf
[global]
cluster = ceph
fsid = a736559a-92d1-483e-9289-d2c7feed510f
ms bind ipv6 = true
auth cluster required = none
auth service required = none
auth client required = none
mon host = gentooserver
mon addr = [2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789
mon pg warn max per osd = 300

[mon]
mon initial members = mon0
mon host = gentooserver
mon addr = [2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789
mon pg warn max per osd = 300
mon allow pool delete = true

[mon.mon0]
host = gentooserver
mon addr = [2605:6000:1020:2056:7d79:3f08:ee64:2aa3]:6789

[osd]
osd journal size = 1
osd crush chooseleaf type = 0
host = gentooserver
osd pool default size = 3
osd pool default min size = 2

[mds.mds0]
host = gentooserver


ceph -s times out:

gentooserver ~ # ceph -s
2018-04-06 15:58:29.861647 7f5e7f891700  0 monclient(hunting): authenticate
timed out after 300
2018-04-06 15:58:29.861672 7f5e7f891700  0 librados: client.admin
authentication error (110) Connection timed out
[errno 110] error connecting to the cluster


The cluster was working before the IP address changed. HELP
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] bluestore OSD did not start at system-boot

2018-04-06 Thread Oliver Freyermuth
Hi together,

this sounds a lot like my issue and quick solution here:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-February/024858.html

It seems http://tracker.ceph.com/issues/23067 is already under review, so maybe 
that will be in a future release,
shortening the bash-script and the ungleich-implementation to a one-liner. 

Cheers,
Oliver

Am 06.04.2018 um 21:24 schrieb David Turner:
> `systemctl list-dependencies ceph.target`
> 
> Do you have ceph-osd.target listed underneath it with all of your OSDs under 
> that?  My guess is that you just need to enable them in systemctl to manage 
> them.  `systemctl enable ceph-osd@${osd}.service` where $osd is the osd 
> number to be enabled.  For example for osd.12 you would run `systemctl enable 
> ceph-osd@12.service`.
> 
> On Thu, Apr 5, 2018 at 7:09 AM Nico Schottelius  > wrote:
> 
> 
> Hey Ansgar,
> 
> we have a similar "problem": in our case all servers are wiped on
> reboot, as they boot their operating system from the network into
> initramfs.
> 
> While the OS configuration is done with cdist [0], we consider ceph osds
> more dynamic data and just re-initialise all osds on boot using the
> ungleich-tools [1] suite, which we created to work with ceph clusters
> mostly.
> 
> Especially [2] might be of interest for you.
> 
> HTH,
> 
> Nico
> 
> [0] https://www.nico.schottelius.org/software/cdist/
> [1] https://github.com/ungleich/ungleich-tools
> [2] 
> https://github.com/ungleich/ungleich-tools/blob/master/ceph-osd-activate-all
> 
> 
> 
> Ansgar Jazdzewski  > writes:
> 
> > hi folks,
> >
> > i just figured out that my ODS's did not start because the filsystem
> > is not mounted.
> >
> > So i wrote a script to Hack my way around it
> > #
> > #! /usr/bin/env bash
> >
> > DATA=( $(ceph-volume lvm list | grep -e 'osd id\|osd fsid' | awk
> > '{print $3}' | tr '\n' ' ') )
> >
> > OSDS=$(( ${#DATA[@]}/2 ))
> >
> > for OSD in $(seq 0 $(($OSDS-1))); do
> >  ceph-volume lvm activate "${DATA[( $OSD*2 )]}" "${DATA[( $OSD*2+1 )]}"
> > done
> > #
> >
> > i'am sure that this is not the way it should be!? so any help i
> > welcome to figure out why my BlueStore-OSD is not mounted at
> > boot-time.
> >
> > Thanks,
> > Ansgar
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 



smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy: recommended?

2018-04-06 Thread Anthony D'Atri
> ?I read a couple of versions ago that ceph-deploy was not recommended
> for production clusters.? 

InkTank had sort of discouraged the use of ceph-deploy; in 2014 we used it only 
to deploy OSDs.

Some time later the message changed.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Does jewel 10.2.10 support filestore_split_rand_factor?

2018-04-06 Thread David Turner
You could randomize your ceph.conf settings for filestore_merge_threshold
and filestore_split_multiple.  It's not pretty, but it would spread things
out.  You could even do this as granularly as you'd like down to the
individual OSDs while only having a single ceph.conf file to maintain.

I would probably go the route of manually splitting your subfolders,
though.  I've been using this [1] script for some time to do just that.  I
tried to make it fairly environment agnostic so people would have an easier
time implementing it for their needs.

[1] https://gist.github.com/drakonstein/cb76c7696e65522ab0e699b7ea1ab1c4

On Sun, Apr 1, 2018 at 10:42 AM shadow_lin  wrote:

> Thanks.
> Is there any workaround for 10.2.10 to avoid all osd start spliting at the
> same time?
>
> 2018-04-01
> --
> shadowlin
>
> --
>
> *发件人:*Pavan Rallabhandi 
> *发送时间:*2018-04-01 22:39
> *主题:*Re: [ceph-users] Does jewel 10.2.10 support
> filestore_split_rand_factor?
> *收件人:*"shadow_lin","ceph-users"<
> ceph-users@lists.ceph.com>
> *抄送:*
>
>
>
> No, it is supported in the next version of Jewel
> http://tracker.ceph.com/issues/22658
>
>
>
> *From: *ceph-users  on behalf of
> shadow_lin 
> *Date: *Sunday, April 1, 2018 at 3:53 AM
> *To: *ceph-users 
> *Subject: *EXT: [ceph-users] Does jewel 10.2.10 support
> filestore_split_rand_factor?
>
>
>
> Hi list,
>
> The document page of jewel has filestore_split_rand_factor config but I
> can't find the config by using 'ceph daemon osd.x config'.
>
>
>
> ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
>
> ceph daemon osd.0 config show|grep split
> "mon_osd_max_split_count": "32",
> "journaler_allow_split_entries": "true",
> "mds_bal_split_size": "1",
> "mds_bal_split_rd": "25000",
> "mds_bal_split_wr": "1",
> "mds_bal_split_bits": "3",
> "filestore_split_multiple": "4",
> "filestore_debug_verify_split": "false",
>
>
>
> 2018-04-01
> --
>
> shadow_lin
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore OSD did not start at system-boot

2018-04-06 Thread David Turner
`systemctl list-dependencies ceph.target`

Do you have ceph-osd.target listed underneath it with all of your OSDs
under that?  My guess is that you just need to enable them in systemctl to
manage them.  `systemctl enable ceph-osd@${osd}.service` where $osd is the
osd number to be enabled.  For example for osd.12 you would run `systemctl
enable ceph-osd@12.service`.

On Thu, Apr 5, 2018 at 7:09 AM Nico Schottelius <
nico.schottel...@ungleich.ch> wrote:

>
> Hey Ansgar,
>
> we have a similar "problem": in our case all servers are wiped on
> reboot, as they boot their operating system from the network into
> initramfs.
>
> While the OS configuration is done with cdist [0], we consider ceph osds
> more dynamic data and just re-initialise all osds on boot using the
> ungleich-tools [1] suite, which we created to work with ceph clusters
> mostly.
>
> Especially [2] might be of interest for you.
>
> HTH,
>
> Nico
>
> [0] https://www.nico.schottelius.org/software/cdist/
> [1] https://github.com/ungleich/ungleich-tools
> [2]
> https://github.com/ungleich/ungleich-tools/blob/master/ceph-osd-activate-all
>
>
>
> Ansgar Jazdzewski  writes:
>
> > hi folks,
> >
> > i just figured out that my ODS's did not start because the filsystem
> > is not mounted.
> >
> > So i wrote a script to Hack my way around it
> > #
> > #! /usr/bin/env bash
> >
> > DATA=( $(ceph-volume lvm list | grep -e 'osd id\|osd fsid' | awk
> > '{print $3}' | tr '\n' ' ') )
> >
> > OSDS=$(( ${#DATA[@]}/2 ))
> >
> > for OSD in $(seq 0 $(($OSDS-1))); do
> >  ceph-volume lvm activate "${DATA[( $OSD*2 )]}" "${DATA[( $OSD*2+1 )]}"
> > done
> > #
> >
> > i'am sure that this is not the way it should be!? so any help i
> > welcome to figure out why my BlueStore-OSD is not mounted at
> > boot-time.
> >
> > Thanks,
> > Ansgar
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph luminous - troubleshooting performance issues overall DSK 100%, busy 1%

2018-04-06 Thread David Turner
First and foremost, have you checked your disk controller.  Of most import
would be your cache battery.  Any time I have a single node acting up, the
controller is Suspect #1.

On Thu, Apr 5, 2018 at 11:23 AM Steven Vacaroaia  wrote:

> Hi,
>
> I have a strange issue - OSDs from a specific server are introducing huge
> performance issue
>
> This is a brand new installation on 3 identical servers -
>  DELL R620 with PERC H710 , bluestore  DB and WAL on SSD, 10GB dedicated
> private/public networks
>
>
> When I add the OSD I see gaps like below and huge latency
>
> atop provides no  clear culprit EXCEPT very low network and specific disk
> utilization BUT 100% DSK for ceph-osd process  which stay like that ( 100%)
> for the duration of the test
> ( see below)
>
> Not sure why ceph-osd process  DSK stays at 100% while all the specific
> DSK ( for sdb, sde ..etc) are 1% busy ?
>
> Any help/ instructions for how to troubleshooting this will be appreciated
>
> (apologies if the format is not being kept)
>
>
> CPU | sys   4%  | user  1%  |   | irq   1%  |
>  | idle794%  | wait  0%  |  |   |
> steal 0% |  guest 0% |  curf 2.20GHz |   |  curscal
>  ?% |
> CPL | avg10.00  |   | avg50.00  | avg15   0.00  |
>  |   |   | csw547/s |   |
> intr   832/s |   |   |  numcpu 8 |
>  |
> MEM | tot62.9G  | free   61.4G  | cache 520.6M  | dirty   0.0M  |
> buff7.5M  | slab   98.9M  | slrec  64.8M  | shmem   8.8M |  shrss
>  0.0M |  shswp   0.0M |  vmbal   0.0M |   |  hptot   0.0M |
> hpuse   0.0M |
> SWP | tot 6.0G  | free6.0G  |   |   |
>  |   |   |  |   |
>  |   |  vmcom   1.5G |   |  vmlim
> 37.4G |
> LVM | dm-0  | busy  1%  |   | read 0/s  |
> write   54/s  |   | KiB/r  0  | KiB/w455 |  MBr/s
> 0.0 |   |  MBw/s   24.0 |  avq 3.69 |   |  avio
> 0.14 ms |
> DSK |  sdb  | busy  1%  |   | read 0/s  |
> write  102/s  |   | KiB/r  0  | KiB/w240 |  MBr/s
> 0.0 |   |  MBw/s   24.0 |  avq 6.69 |   |  avio
> 0.08 ms |
> DSK |  sda  | busy  0%  |   | read 0/s  |
> write   12/s  |   | KiB/r  0  | KiB/w  4 |  MBr/s
> 0.0 |   |  MBw/s0.1 |  avq 1.00 |   |  avio
> 0.05 ms |
> DSK |  sde  | busy  0%  |   | read 0/s  |
> write0/s  |   | KiB/r  0  | KiB/w  0 |  MBr/s
> 0.0 |   |  MBw/s0.0 |  avq 1.00 |   |  avio
> 2.50 ms |
> NET | transport | tcpi   718/s  | tcpo   972/s  | udpi 0/s  |
>  | udpo 0/s  | tcpao0/s  | tcppo0/s |  tcprs   21/s |
> tcpie0/s |  tcpor0/s |   |  udpnp0/s |  udpie
> 0/s |
> NET | network   | ipi719/s  |   | ipo399/s  |
> ipfrw0/s  |   | deliv  719/s  |  |
>  |   |   |  icmpi0/s |   |  icmpo
>   0/s |
> NET | eth5  1%  | pcki  2214/s  | pcko   939/s  |   | sp
>  10 Gbps  | si  154 Mbps  | so   52 Mbps  |  |  coll 0/s |
> mlti 0/s |  erri 0/s |  erro 0/s |  drpi 0/s |  drpo
>  0/s |
> NET | eth4  0%  | pcki   712/s  | pcko54/s  |   | sp
>  10 Gbps  | si   50 Mbps  | so   90 Kbps  |  |  coll 0/s |
> mlti 0/s |  erri 0/s |  erro 0/s |  drpi 0/s |  drpo
>  0/s |
>
> PID TID
>  RDDSK   WRDSK
>  WCANCL   DSK  CMD
>  1/21
>2067   -
> 0K/s  0.0G/s
>  0K/s  100%
> ceph-osd
>
>
>
>
>
> 2018-04-05 10:55:24.316549 min lat: 0.0203278 max lat: 10.7501 avg lat:
> 0.496822
>   sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg
> lat(s)
>40  16  1096  1080   107.988 0   -
> 0.496822
>41  16  1096  1080   105.354 0   -
> 0.496822
>42  16  1096  1080   102.846 0   -
> 0.496822
>43  16  1096  1080   100.454 0   -
> 0.496822
>44  16  1205  1189   108.079   48.   0.0430396
> 0.588127
>45  16  1234  1218   108.255   116   0.0318717
> 0.575485
>46  16  1234  1218   105.901 0   -
> 0.575485
>47  16  1234  1218   103.648 0   -
> 0.575485
>48  16  1234  1218   101.489 0   

Re: [ceph-users] ceph-deploy: recommended?

2018-04-06 Thread David Turner
I looked through the backlog for ceph-deploy.  It has some pretty intense
stuff including bugs for random environments that aren't ubuntu or
redhat/centos.  Not really something I could manage in my off time.

On Thu, Apr 5, 2018 at 2:15 PM  wrote:

> ... we use (only!) ceph-deploy in all our environments, tools and scripts.
>
> If I look in the efforts went into ceph-volume and all the related issues,
> "manual LVM" overhead and/or still missing features, PLUS the in the same
> discussions mentioned recommendations to use something like ceph-ansible in
> parallel for the missing stuff, I can only hope we will find a (full
> time?!) maintainer for ceph-deploy and keep it alive. PLEASE ;)
>
>
>
> Gesendet: Donnerstag, 05. April 2018 um 08:53 Uhr
> Von: "Wido den Hollander" 
> An: ceph-users@lists.ceph.com
> Betreff: Re: [ceph-users] ceph-deploy: recommended?
>
> On 04/04/2018 08:58 PM, Robert Stanford wrote:
> >
> >  I read a couple of versions ago that ceph-deploy was not recommended
> > for production clusters.  Why was that?  Is this still the case?  We
> > have a lot of problems automating deployment without ceph-deploy.
> >
> >
>
> In the end it is just a Python tool which deploys the daemons. It is not
> active in any way. Stability of the cluster is not determined by the use
> of ceph-deploy, but by the runnings daemons.
>
> I use ceph-deploy sometimes in very large deployments to make my life a
> bit easier.
>
> Wido
>
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
>
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com[http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com]
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-06 Thread David Turner
What happens when you deep-scrub this PG?  What do the OSD logs show for
any lines involving the problem PGs?  Was anything happening on your
cluster just before this started happening at first?

On Fri, Apr 6, 2018 at 2:29 PM Konstantin Danilov 
wrote:

> Hi all, we have a strange issue on one cluster.
>
> One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
> matter what how
> we change crush map.
> The whole picture is next:
>
> * This is 10.2.7 ceph version, all monitors and osd's have the same version
> * One  PG eventually get into 'active+degraded+incomplete' state. It
> was active+clean for a long time
> and already has some data. We can't detect the event, which leads it
> to this state. Probably it's
> happened after some osd was removed from the cluster
> * This PG has all 3 required OSD up and running, and all of them
> online (pool_sz=3, min_pool_sz=2)
> * All requests to pg stack forever, historic_ops shows that it waiting
> on "waiting_for_degraded_pg"
> * ceph pg query hangs forever
> * We can't copy data from another pool as well - copying process hangs
> and that fails with
> (34) Numerical result out of range
>  * We was trying to restart osd's, nodes, mon's with no effects
> * Eventually we found that shutting down osd Z(not primary) does solve
> the issue, but
> only before ceph set this osd out. If we trying to change the weight
> of this osd or remove it from cluster problem appears again. Cluster
> is working only while osd Z is down and not out and has the default
> weight
> * Then we have found that doesn't matter what we are doing with crushmap -
> osdmaptool --test-map-pgs-dump always put this PG to the same set of
> osd - [X, Y] (in this osdmap Z is already down). We updating crush map
> to remove nodes with OSD X,Y and Z completely out of it, compile it,
> import it back to osdmap and run osdmaptool and always get the same
> results
> * After several nodes restart and setting osd Z down, but no out we
> are now have 3 more PG with the same behaviour, but 'pined' to another
> osd's
> * We have run osdmaptool from luminous ceph to check if upmap
> extension is somehow getting into this osd map - it is not.
>
> So this is where we are now. Have anyone seen something like this? Any
> ideas are welcome. Thanks
>
>
> --
> Kostiantyn Danilov
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-06 Thread Adam Tygart
I set this about 15 minutes ago, with the following:
ceph tell osd.* injectargs '--osd-recovery-max-single-start 1
--osd-recovery-max-active 1'
ceph osd unset noout
ceph osd unset norecover

I also set those settings in ceph.conf just in case the "not observed"
response was true.

Things have been stable, no segfaults at all, and recovery is
happening. Thanks for your hard work on this. I'll follow-up if
anything else crops up.

--
Adam

On Fri, Apr 6, 2018 at 11:26 AM, Josh Durgin  wrote:
> You should be able to avoid the crash by setting:
>
> osd recovery max single start = 1
> osd recovery max active = 1
>
> With that, you can unset norecover to let recovery start again.
>
> A fix so you don't need those settings is here:
> https://github.com/ceph/ceph/pull/21273
>
> If you see any other backtraces let me know - especially the
> complete_read_op one from http://tracker.ceph.com/issues/21931
>
> Josh
>
>
> On 04/05/2018 08:25 PM, Adam Tygart wrote:
>>
>> Thank you! Setting norecover has seemed to work in terms of keeping
>> the osds up. I am glad my logs were of use to tracking this down. I am
>> looking forward to future updates.
>>
>> Let me know if you need anything else.
>>
>> --
>> Adam
>>
>> On Thu, Apr 5, 2018 at 10:13 PM, Josh Durgin  wrote:
>>>
>>> On 04/05/2018 08:11 PM, Josh Durgin wrote:


 On 04/05/2018 06:15 PM, Adam Tygart wrote:
>
>
> Well, the cascading crashes are getting worse. I'm routinely seeing
> 8-10 of my 518 osds crash. I cannot start 2 of them without triggering
> 14 or so of them to crash repeatedly for more than an hour.
>
> I've ran another one of them with more logging, debug osd = 20; debug
> ms = 1 (definitely more than one crash in there):
> http://people.cs.ksu.edu/~mozes/ceph-osd.422.log
>
> Anyone have any thoughts? My cluster feels like it is getting more and
> more unstable by the hour...



 Thanks to your logs, I think I've found the root cause. It looks like a
 bug in the EC recovery code that's triggered by EC overwrites. I'm
 working
 on a fix.

 For now I'd suggest setting the noout and norecover flags to avoid
 hitting this bug any more by avoiding recovery. Backfilling with no
 client
 I/O would also avoid the bug.
>>>
>>>
>>>
>>> I forgot to mention the tracker ticket for this bug is:
>>> http://tracker.ceph.com/issues/23195
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] jewel ceph has PG mapped always to the same OSD's

2018-04-06 Thread Konstantin Danilov
Hi all, we have a strange issue on one cluster.

One PG is mapped to the particular set of OSD, say X,Y and Z doesn't
matter what how
we change crush map.
The whole picture is next:

* This is 10.2.7 ceph version, all monitors and osd's have the same version
* One  PG eventually get into 'active+degraded+incomplete' state. It
was active+clean for a long time
and already has some data. We can't detect the event, which leads it
to this state. Probably it's
happened after some osd was removed from the cluster
* This PG has all 3 required OSD up and running, and all of them
online (pool_sz=3, min_pool_sz=2)
* All requests to pg stack forever, historic_ops shows that it waiting
on "waiting_for_degraded_pg"
* ceph pg query hangs forever
* We can't copy data from another pool as well - copying process hangs
and that fails with
(34) Numerical result out of range
 * We was trying to restart osd's, nodes, mon's with no effects
* Eventually we found that shutting down osd Z(not primary) does solve
the issue, but
only before ceph set this osd out. If we trying to change the weight
of this osd or remove it from cluster problem appears again. Cluster
is working only while osd Z is down and not out and has the default
weight
* Then we have found that doesn't matter what we are doing with crushmap -
osdmaptool --test-map-pgs-dump always put this PG to the same set of
osd - [X, Y] (in this osdmap Z is already down). We updating crush map
to remove nodes with OSD X,Y and Z completely out of it, compile it,
import it back to osdmap and run osdmaptool and always get the same
results
* After several nodes restart and setting osd Z down, but no out we
are now have 3 more PG with the same behaviour, but 'pined' to another
osd's
* We have run osdmaptool from luminous ceph to check if upmap
extension is somehow getting into this osd map - it is not.

So this is where we are now. Have anyone seen something like this? Any
ideas are welcome. Thanks


-- 
Kostiantyn Danilov
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-06 Thread David Turner
I'm using filestore.  I think the root cause is something getting stuck in
the code.  As such I went ahead and created a [1] bug tracker for this.
Hopefully it gets some traction as I'm not particularly looking forward to
messing with deleting PGs with the ceph-objectstore-tool in production.

[1] http://tracker.ceph.com/issues/23577

On Fri, Apr 6, 2018 at 11:40 AM Michael Sudnick 
wrote:

> I've tried a few more things to get a deep-scrub going on my PG. I tried
> instructing the involved osds to scrub all their PGs and it looks like that
> didn't do it.
>
> Do you have any documentation on the object-store-tool? What I've found
> online talks about filestore and not bluestore.
>
> On 6 April 2018 at 09:27, David Turner  wrote:
>
>> I'm running into this exact same situation.  I'm running 12.2.2 and I
>> have an EC PG with a scrub error.  It has the same output for [1] rados
>> list-inconsistent-obj as mentioned before.  This is the [2] full health
>> detail.  This is the [3] excerpt from the log from the deep-scrub that
>> marked the PG inconsistent.  The scrub happened when the PG was starting up
>> after using ceph-objectstore-tool to split its filestore subfolders.  This
>> is using a script that I've used for months without any side effects.
>>
>> I have tried quite a few things to get this PG to deep-scrub or repair,
>> but to no avail.  It will not do anything.  I have set every osd's
>> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
>> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
>> issuing a deep-scrub.  And it will sit there for over an hour without
>> deep-scrubbing.  My current testing of this is to set all osds to 1,
>> increase all of the osds for this PG to 4, and then issue the repair... but
>> similarly nothing happens.  Each time I issue the deep-scrub or repair, the
>> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
>> nothing shows up in the log for the OSD and the PG state stays
>> 'active+clean+inconsistent'.
>>
>> My next step, unless anyone has a better idea, is to find the exact copy
>> of the PG with the missing object, use object-store-tool to back up that
>> copy of the PG and remove it.  Then starting the OSD back up should
>> backfill the full copy of the PG and be healthy again.
>>
>>
>>
>> [1] $ rados list-inconsistent-obj 145.2e3
>> No scrub information available for pg 145.2e3
>> error 2: (2) No such file or directory
>>
>> [2] $ ceph health detail
>> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
>> OSD_SCRUB_ERRORS 1 scrub errors
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>> pg 145.2e3 is active+clean+inconsistent, acting
>> [234,132,33,331,278,217,55,358,79,3,24]
>>
>> [3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
>> [DBG] : 145.2e3 deep-scrub starts
>> 2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR]
>> : 145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
>> 2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR]
>> : 145.2e3 deep-scrub 1 errors
>>
>> On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick 
>> wrote:
>>
>>> Hi Kjetil,
>>>
>>> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to
>>> be happening. I've tried it a few times over the last few days. My cluster
>>> is recovering from a failed disk (which was probably the reason for the
>>> inconsistency), do I need to wait for the cluster to heal before
>>> repair/deep scrub works?
>>>
>>> -Michael
>>>
>>> On 2 April 2018 at 14:13, Kjetil Joergensen  wrote:
>>>
 Hi,

 scrub or deep-scrub the pg, that should in theory get you back to
 list-inconsistent-obj spitting out what's wrong, then mail that info to the
 list.

 -KJ

 On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
 michael.sudn...@gmail.com> wrote:

> Hello,
>
> I have a small cluster with an inconsistent pg. I've tried ceph pg
> repair multiple times to no luck. rados list-inconsistent-obj 49.11c
> returns:
>
> # rados list-inconsistent-obj 49.11c
> No scrub information available for pg 49.11c
> error 2: (2) No such file or directory
>
> I'm a bit at a loss here as what to do to recover. That pg is part of
> a cephfs_data pool with compression set to force/snappy.
>
> Does anyone have an suggestions?
>
> -Michael
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


 --
 Kjetil Joergensen 
 SRE, Medallia Inc

>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> 

Re: [ceph-users] EC related osd crashes (luminous 12.2.4)

2018-04-06 Thread Josh Durgin

You should be able to avoid the crash by setting:

osd recovery max single start = 1
osd recovery max active = 1

With that, you can unset norecover to let recovery start again.

A fix so you don't need those settings is here: 
https://github.com/ceph/ceph/pull/21273


If you see any other backtraces let me know - especially the
complete_read_op one from http://tracker.ceph.com/issues/21931

Josh

On 04/05/2018 08:25 PM, Adam Tygart wrote:

Thank you! Setting norecover has seemed to work in terms of keeping
the osds up. I am glad my logs were of use to tracking this down. I am
looking forward to future updates.

Let me know if you need anything else.

--
Adam

On Thu, Apr 5, 2018 at 10:13 PM, Josh Durgin  wrote:

On 04/05/2018 08:11 PM, Josh Durgin wrote:


On 04/05/2018 06:15 PM, Adam Tygart wrote:


Well, the cascading crashes are getting worse. I'm routinely seeing
8-10 of my 518 osds crash. I cannot start 2 of them without triggering
14 or so of them to crash repeatedly for more than an hour.

I've ran another one of them with more logging, debug osd = 20; debug
ms = 1 (definitely more than one crash in there):
http://people.cs.ksu.edu/~mozes/ceph-osd.422.log

Anyone have any thoughts? My cluster feels like it is getting more and
more unstable by the hour...



Thanks to your logs, I think I've found the root cause. It looks like a
bug in the EC recovery code that's triggered by EC overwrites. I'm working
on a fix.

For now I'd suggest setting the noout and norecover flags to avoid
hitting this bug any more by avoiding recovery. Backfilling with no client
I/O would also avoid the bug.



I forgot to mention the tracker ticket for this bug is:
http://tracker.ceph.com/issues/23195


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-06 Thread Michael Sudnick
I've tried a few more things to get a deep-scrub going on my PG. I tried
instructing the involved osds to scrub all their PGs and it looks like that
didn't do it.

Do you have any documentation on the object-store-tool? What I've found
online talks about filestore and not bluestore.

On 6 April 2018 at 09:27, David Turner  wrote:

> I'm running into this exact same situation.  I'm running 12.2.2 and I have
> an EC PG with a scrub error.  It has the same output for [1] rados
> list-inconsistent-obj as mentioned before.  This is the [2] full health
> detail.  This is the [3] excerpt from the log from the deep-scrub that
> marked the PG inconsistent.  The scrub happened when the PG was starting up
> after using ceph-objectstore-tool to split its filestore subfolders.  This
> is using a script that I've used for months without any side effects.
>
> I have tried quite a few things to get this PG to deep-scrub or repair,
> but to no avail.  It will not do anything.  I have set every osd's
> osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
> scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
> issuing a deep-scrub.  And it will sit there for over an hour without
> deep-scrubbing.  My current testing of this is to set all osds to 1,
> increase all of the osds for this PG to 4, and then issue the repair... but
> similarly nothing happens.  Each time I issue the deep-scrub or repair, the
> output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
> nothing shows up in the log for the OSD and the PG state stays
> 'active+clean+inconsistent'.
>
> My next step, unless anyone has a better idea, is to find the exact copy
> of the PG with the missing object, use object-store-tool to back up that
> copy of the PG and remove it.  Then starting the OSD back up should
> backfill the full copy of the PG and be healthy again.
>
>
>
> [1] $ rados list-inconsistent-obj 145.2e3
> No scrub information available for pg 145.2e3
> error 2: (2) No such file or directory
>
> [2] $ ceph health detail
> HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
> OSD_SCRUB_ERRORS 1 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 145.2e3 is active+clean+inconsistent, acting
> [234,132,33,331,278,217,55,358,79,3,24]
>
> [3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
> [DBG] : 145.2e3 deep-scrub starts
> 2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
> 2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR]
> : 145.2e3 deep-scrub 1 errors
>
> On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick 
> wrote:
>
>> Hi Kjetil,
>>
>> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
>> happening. I've tried it a few times over the last few days. My cluster is
>> recovering from a failed disk (which was probably the reason for the
>> inconsistency), do I need to wait for the cluster to heal before
>> repair/deep scrub works?
>>
>> -Michael
>>
>> On 2 April 2018 at 14:13, Kjetil Joergensen  wrote:
>>
>>> Hi,
>>>
>>> scrub or deep-scrub the pg, that should in theory get you back to
>>> list-inconsistent-obj spitting out what's wrong, then mail that info to the
>>> list.
>>>
>>> -KJ
>>>
>>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
>>> michael.sudn...@gmail.com> wrote:
>>>
 Hello,

 I have a small cluster with an inconsistent pg. I've tried ceph pg
 repair multiple times to no luck. rados list-inconsistent-obj 49.11c
 returns:

 # rados list-inconsistent-obj 49.11c
 No scrub information available for pg 49.11c
 error 2: (2) No such file or directory

 I'm a bit at a loss here as what to do to recover. That pg is part of a
 cephfs_data pool with compression set to force/snappy.

 Does anyone have an suggestions?

 -Michael

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>
>>>
>>> --
>>> Kjetil Joergensen 
>>> SRE, Medallia Inc
>>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW multisite sync issues

2018-04-06 Thread Casey Bodley


On 04/06/2018 10:57 AM, Josef Zelenka wrote:

Hi everyone,

i'm currently setting up RGW multisite(one cluster is jewel(primary), 
the other is luminous - this is only for testing, on prod we will have 
the same version - jewel on both), but i can't get bucket 
synchronization to work. Data gets synchronized fine when i upload it, 
but when i delete it from the primary cluster, it only deletes the 
metadata of the file on the secondary one, the files are still 
there(can see it in rados df - pool states the same). Also, none of 
the older buckets start synchronizing to the secondary cluster. It's 
been quite a headache so far. Anyone who knows what might be wrong? I 
can supply any needed info. THanks


Josef Zelenka

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Your issue may be related to http://tracker.ceph.com/issues/22062 (fixed 
in luminous for 12.2.3)? If not, it's probably something similar. In 
general, I wouldn't recommend mixing releases in a multisite 
configuration, as it's not something we do any testing for.


Casey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW multisite sync issues

2018-04-06 Thread Josef Zelenka

Hi everyone,

i'm currently setting up RGW multisite(one cluster is jewel(primary), 
the other is luminous - this is only for testing, on prod we will have 
the same version - jewel on both), but i can't get bucket 
synchronization to work. Data gets synchronized fine when i upload it, 
but when i delete it from the primary cluster, it only deletes the 
metadata of the file on the secondary one, the files are still there(can 
see it in rados df - pool states the same). Also, none of the older 
buckets start synchronizing to the secondary cluster. It's been quite a 
headache so far. Anyone who knows what might be wrong? I can supply any 
needed info. THanks


Josef Zelenka

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how the files in /var/lib/ceph/osd/ceph-0 are generated

2018-04-06 Thread David Turner
Likely the differences you're seeing of /dev/sdb1 and tmpfs have to do with
how ceph-disk vs ceph-volume manage the OSDs and what their defaults are.
ceph-disk will create partitions on devices while ceph-volume configures
LVM on the block device.  Also with bluestore you do not have a standard
filesystem, so ceph-volume creates a mock folder to place the necessary
information into /var/lib/ceph/osd/ceph-0 to track the information for the
OSD and how to start it.

On Wed, Apr 4, 2018 at 6:20 PM Gregory Farnum  wrote:

> On Tue, Apr 3, 2018 at 6:30 PM Jeffrey Zhang <
> zhang.lei.fly+ceph-us...@gmail.com> wrote:
>
>> I am testing ceph Luminous, the environment is
>>
>> - centos 7.4
>> - ceph luminous ( ceph offical repo)
>> - ceph-deploy 2.0
>> - bluestore + separate wal and db
>>
>> I found the ceph osd folder `/var/lib/ceph/osd/ceph-0` is mounted
>> from tmpfs. But where the files in that folder come from? like `keyring`,
>> `whoami`?
>>
>
> These are generated as part of the initialization process. I don't know
> the exact commands involved, but the keyring for instance will draw from
> the results of "ceph osd new" (which is invoked by one of the ceph-volume
> setup commands). That and whoami are part of the basic information an OSD
> needs to communicate with a monitor.
> -Greg
>
>
>>
>> $ ls -alh /var/lib/ceph/osd/ceph-0/
>> lrwxrwxrwx.  1 ceph ceph   24 Apr  3 16:49 block ->
>> /dev/ceph-pool/osd0.data
>> lrwxrwxrwx.  1 root root   22 Apr  3 16:49 block.db ->
>> /dev/ceph-pool/osd0-db
>> lrwxrwxrwx.  1 root root   23 Apr  3 16:49 block.wal ->
>> /dev/ceph-pool/osd0-wal
>> -rw---.  1 ceph ceph   37 Apr  3 16:49 ceph_fsid
>> -rw---.  1 ceph ceph   37 Apr  3 16:49 fsid
>> -rw---.  1 ceph ceph   55 Apr  3 16:49 keyring
>> -rw---.  1 ceph ceph6 Apr  3 16:49 ready
>> -rw---.  1 ceph ceph   10 Apr  3 16:49 type
>> -rw---.  1 ceph ceph2 Apr  3 16:49 whoami
>>
>> I guess they may be loaded from bluestore. But I can not find any clue
>> for this.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Have an inconsistent PG, repair not working

2018-04-06 Thread David Turner
I'm running into this exact same situation.  I'm running 12.2.2 and I have
an EC PG with a scrub error.  It has the same output for [1] rados
list-inconsistent-obj as mentioned before.  This is the [2] full health
detail.  This is the [3] excerpt from the log from the deep-scrub that
marked the PG inconsistent.  The scrub happened when the PG was starting up
after using ceph-objectstore-tool to split its filestore subfolders.  This
is using a script that I've used for months without any side effects.

I have tried quite a few things to get this PG to deep-scrub or repair, but
to no avail.  It will not do anything.  I have set every osd's
osd_max_scrubs to 0 in the cluster, waited for all scrubbing and deep
scrubbing to finish, then increased the 11 OSDs for this PG to 1 before
issuing a deep-scrub.  And it will sit there for over an hour without
deep-scrubbing.  My current testing of this is to set all osds to 1,
increase all of the osds for this PG to 4, and then issue the repair... but
similarly nothing happens.  Each time I issue the deep-scrub or repair, the
output correctly says 'instructing pg 145.2e3 on osd.234 to repair', but
nothing shows up in the log for the OSD and the PG state stays
'active+clean+inconsistent'.

My next step, unless anyone has a better idea, is to find the exact copy of
the PG with the missing object, use object-store-tool to back up that copy
of the PG and remove it.  Then starting the OSD back up should backfill the
full copy of the PG and be healthy again.



[1] $ rados list-inconsistent-obj 145.2e3
No scrub information available for pg 145.2e3
error 2: (2) No such file or directory

[2] $ ceph health detail
HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent
OSD_SCRUB_ERRORS 1 scrub errors
PG_DAMAGED Possible data damage: 1 pg inconsistent
pg 145.2e3 is active+clean+inconsistent, acting
[234,132,33,331,278,217,55,358,79,3,24]

[3] 2018-04-04 15:24:53.603380 7f54d1820700  0 log_channel(cluster) log
[DBG] : 145.2e3 deep-scrub starts
2018-04-04 17:32:37.916853 7f54d1820700 -1 log_channel(cluster) log [ERR] :
145.2e3s0 deep-scrub 1 missing, 0 inconsistent objects
2018-04-04 17:32:37.916865 7f54d1820700 -1 log_channel(cluster) log [ERR] :
145.2e3 deep-scrub 1 errors

On Mon, Apr 2, 2018 at 4:51 PM Michael Sudnick 
wrote:

> Hi Kjetil,
>
> I've tried to get the pg scrubbing/deep scrubbing and nothing seems to be
> happening. I've tried it a few times over the last few days. My cluster is
> recovering from a failed disk (which was probably the reason for the
> inconsistency), do I need to wait for the cluster to heal before
> repair/deep scrub works?
>
> -Michael
>
> On 2 April 2018 at 14:13, Kjetil Joergensen  wrote:
>
>> Hi,
>>
>> scrub or deep-scrub the pg, that should in theory get you back to
>> list-inconsistent-obj spitting out what's wrong, then mail that info to the
>> list.
>>
>> -KJ
>>
>> On Sun, Apr 1, 2018 at 9:17 AM, Michael Sudnick <
>> michael.sudn...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I have a small cluster with an inconsistent pg. I've tried ceph pg
>>> repair multiple times to no luck. rados list-inconsistent-obj 49.11c
>>> returns:
>>>
>>> # rados list-inconsistent-obj 49.11c
>>> No scrub information available for pg 49.11c
>>> error 2: (2) No such file or directory
>>>
>>> I'm a bit at a loss here as what to do to recover. That pg is part of a
>>> cephfs_data pool with compression set to force/snappy.
>>>
>>> Does anyone have an suggestions?
>>>
>>> -Michael
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> Kjetil Joergensen 
>> SRE, Medallia Inc
>>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com