[ceph-users] Re: Ceph Squid released?

2024-04-29 Thread Alwin Antreich
On Mon, 29 Apr 2024 at 09:06, Robert Sander 
wrote:

> On 4/29/24 08:50, Alwin Antreich wrote:
>
> > well it says it in the article.
> >
> > The upcoming Squid release serves as a testament to how the Ceph
> > project continues to deliver innovative features to users without
> > compromising on quality.
> >
> >
> > I believe it is more a statement of having new members and tiers and to
> > sound the marketing drums a bit. :)
>
> The Ubuntu 24.04 release notes also claim that this release comes with
> Ceph Squid:
>
> https://discourse.ubuntu.com/t/noble-numbat-release-notes/39890
>
> Who knows. I don't see any packages on download.ceph.com for Squid.

Cheers,
Alwin

--
croit GmbH, Web <https://croit.io/> | LinkedIn
<http://linkedin.com/company/croit> | Youtube
<https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> | Twitter
<https://twitter.com/croit_io>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Squid released?

2024-04-29 Thread Alwin Antreich
Hi Robert,

well it says it in the article.

> The upcoming Squid release serves as a testament to how the Ceph project
> continues to deliver innovative features to users without compromising on
> quality.


I believe it is more a statement of having new members and tiers and to
sound the marketing drums a bit. :)

Cheers,
Alwin

On Mon, 29 Apr 2024 at 08:43, Robert Sander 
wrote:

> Hi,
>
>
> https://www.linuxfoundation.org/press/introducing-ceph-squid-the-future-of-storage-today
>
> Does the LF know more than the mailing list?
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Alwin Antreich
Head of Training and Infrastructure

Want to meet: https://calendar.app.google/MuA2isCGnh8xBb657

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges, Andy Muthmann - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web <https://croit.io/> | LinkedIn <http://linkedin.com/company/croit> |
Youtube <https://www.youtube.com/channel/UCIJJSKVdcSLGLBtwSFx_epw> | Twitter
<https://twitter.com/croit_io>

TOP 100 Innovator Award
<https://croit.io/blog/croit-receives-top-100-seal> Winner
by compamedia
Technology Fast50 Award
<https://croit.io/blog/deloitte-technology-fast-50-award> Winner by Deloitte
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Working ceph cluster reports large amount of pgs in state unknown/undersized and objects degraded

2024-04-21 Thread Alwin Antreich
Hi Tobias,

April 18, 2024 at 10:43 PM, "Tobias Langner"  wrote:


> While trying to dig up a bit more information, I noticed that the mgr web UI 
> was down, which is why we failed the active mgr to have one of the standbys 
> to take over, without thinking much...
> 
> Lo and behold, this completely resolved the issue from one moment to the 
> other. Now `ceph -s` return 338 active+clean pgs, as expected and desired...
> 
> While we are naturally pretty happy that the problem resolved itself, it 
> would still be good to understand
Thank you that confirms my thought.

> 
> 1. what caused this weird state in which `ceph -s` output did not match
The MGR provides the stats for it.
 
> 
> 2. how a mgr failover could cause changes in `ceph -s` output, thereby
See above.
 
> 
> 3. why `ceph osd df tree` reported a weird split state with only few
Likely the same.

You'd need to go through the MGR log and see what caused the MGR to hang.


Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Working ceph cluster reports large amount of pgs in state unknown/undersized and objects degraded

2024-04-21 Thread Alwin Antreich
Hi Tobias,

April 18, 2024 at 8:08 PM, "Tobias Langner"  wrote:



> 
> We operate a tiny ceph cluster (v16.2.7) across three machines, each 
> 
> running two OSDs and one of each mds, mgr, and mon. The cluster serves 
> 
> one main erasure-coded (2+1) storage pool and a few other 
I'd assume (w/o pool config) that the EC 2+1 is putting PG as inactive. Because 
for EC you need n-2 for redundancy and n-1 for availability.

The output got a bit mangled. Could you please provide them in some pastebin 
maybe?

Can you please post the crush rule and pool settings? To better understand the 
data distribution. And what does the logs show on one of the affected OSDs?

Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1x port from bond down causes all osd down in a single machine

2024-03-28 Thread Alwin Antreich
On March 26, 2024 5:02:16 PM GMT+01:00, "Szabo, Istvan (Agoda)" 
 wrote:
>Hi,
>
>Wonder what we are missing from the netplan configuration on ubuntu which ceph 
>needs to tolerate properly.
>We are using this bond configuration on ubuntu 20.04 with octopus ceph:
>
>
>bond1:
>  macaddress: x.x.x.x.x.50
>  dhcp4: no
>  dhcp6: no
>  addresses:
>- 192.168.199.7/24
>  interfaces:
>- ens2f0np0
>- ens2f1np1
>  mtu: 9000
>  parameters:
>mii-monitor-interval: 100
>mode: 802.3ad
>lacp-rate: fast
>transmit-hash-policy: layer3+4
>
>
>
>ens2f1np1 failed and caused slow ops, all osd down ... = disaster
>
>Any idea what is wrong with this bond config?
Two things come to my mind. Is LACP correctly configured on the switch side? 
And maybe some STP type problem, hence the switch again. Or is only one 
interface up/connected?

How does the current state of the bond look?
cat /proc/net/bonding/bond1

Cheers,
Alwin 
Hi Szabo,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-25 Thread Alwin Antreich
Hi,


March 24, 2024 at 8:19 AM, "duluxoz"  wrote:
> 
> Hi,
> 
> Yeah, I've been testing various configurations since I sent my last 
> 
> email - all to no avail.
> 
> So I'm back to the start with a brand new 4T image which is rbdmapped to 
> 
> /dev/rbd0.
> 
> Its not formatted (yet) and so not mounted.
> 
> Every time I attempt a mkfs.xfs /dev/rbd0 (or mkfs.xfs 
> 
> /dev/rbd/my_pool/my_image) I get the errors I previous mentioned and the 
> 
> resulting image then becomes unusable (in ever sense of the word).
> 
> If I run a fdisk -l (before trying the mkfs.xfs) the rbd image shows up 
> 
> in the list - no, I don't actually do a full fdisk on the image.
> 
> An rbd info my_pool:my_image shows the same expected values on both the 
> 
> host and ceph cluster.
> 
> I've tried this with a whole bunch of different sized images from 100G 
> 
> to 4T and all fail in exactly the same way. (My previous successful 100G 
> 
> test I haven't been able to reproduce).
> 
> I've also tried all of the above using an "admin" CephX(sp?) account - I 
> 
> always can connect via rbdmap, but as soon as I try an mkfs.xfs it 
> 
> fails. This failure also occurs with a mkfs.ext4 as well (all size drives).
> 
> The Ceph Cluster is good (self reported and there are other hosts 
> 
> happily connected via CephFS) and this host also has a CephFS mapping 
> 
> which is working.
> 
> Between running experiments I've gone over the Ceph Doco (again) and I 
> 
> can't work out what's going wrong.
> 
> There's also nothing obvious/helpful jumping out at me from the 
> 
> logs/journal (sample below):
> 
> ~~~
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
> 
> 524773 0~65536 result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
> 
> 524772 65536~4128768 result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: blk_print_req_error: 119 
> 
> callbacks suppressed
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector 
> 
> 4298932352 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
> 
> 524774 0~65536 result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write at objno 
> 
> 524773 65536~4128768 result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: rbd: rbd0: write result -1
> 
> Mar 24 17:38:29 my_host.my_net.local kernel: I/O error, dev rbd0, sector 
> 
> 4298940544 op 0x1:(WRITE) flags 0x4000 phys_seg 1024 prio class 2
> 
> ~~~
> 
> Any ideas what I should be looking at?

Could you please share the command you've used to create the RBD?

Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Does ceph permit the definition of new classes?

2023-07-24 Thread Alwin Antreich
Hi,

July 24, 2023 3:02 PM, "wodel youchi"  wrote:

> Hi,
> 
> Can I define new device classes in ceph, I know that there are hdd, ssd and
> nvme, but can I define other classes?
Certainly We often use dedicated device classes (eg. nvme-meta) to separate 
workloads.

Cheers,
Alwin

PS: this time replying to the list as well. :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Connect ceph to proxmox

2021-06-07 Thread Alwin Antreich
Hi Istvan,

June 7, 2021 11:54 AM, "Szabo, Istvan (Agoda)"  wrote:

> So the client is on 14.2.20 the cluster is on 14.2.21. Seems like the Debian 
> buster repo is missing
> the 21 update?
Best ask the Proxmox dev's about a 14.2.21 build. Or you could build it 
yourself, there is everything in the repo for it.
https://git.proxmox.com/?p=ceph.git;a=shortlog;h=refs/heads/nautilus-stable-6

The above aside, best upgrade Proxmox to Ceph Octopus, Nautilus is soon EoL 
anyway.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Proxmox+Ceph Benchmark 2020

2020-10-14 Thread Alwin Antreich
On Wed, Oct 14, 2020 at 02:09:22PM +0200, Andreas John wrote:
> Hello Alwin,
> 
> do you know if it makes difference to disable "all green computing" in
> the BIOS vs. settings the governor to "performance" in the OS?
Well, for one the governor will not be able to influence all BIOS
settings (eg. Infinity Fabric). And the defaults of the governor may
change or it may behave differently. Since a BIOS usually receives less
updates then the kernel, the likelihood of change is less.

My recommendation is to set it in the BIOS.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Proxmox+Ceph Benchmark 2020

2020-10-14 Thread Alwin Antreich
On Tue, Oct 13, 2020 at 09:09:27PM +0200, Maged Mokhtar wrote:
> 
> Very nice and useful document. One thing is not clear for me, the fio
> parameters in appendix 5:
> --numjobs=<1|4> --iodepths=<1|32>
> it is not clear if/when the iodepth was set to 32, was it used with all
> tests with numjobs=4 ? or was it:
> --numjobs=<1|4> --iodepths=1
We have script that permutates the values and runs fio.

But the iodepth results are not shown in the paper. They were very often
close together or in case of Windows showed lazy writing in action.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Proxmox+Ceph Benchmark 2020

2020-10-14 Thread Alwin Antreich
On Tue, Oct 13, 2020 at 11:19:33AM -0500, Mark Nelson wrote:
> Thanks for the link Alwin!
> 
> 
> On intel platforms disabling C/P state transitions can have a really big
> impact on IOPS (on RHEL for instance using the network or performance
> latency tuned profile).  It would be very interesting to know if AMD EPYC
> platforms see similar benefits.  I don't have any in house, but if you
> happen to have a chance it would be an interesting addendum to your report.
Thanks for the suggestion. I indeed did a run before disabling the C/P
states in the BIOS. But unfortunately I didn't keep the results. :/

As far as I remember though, there was a visible improvement after
disabling them.

I will have a look, once I have some time to do some more benchmarks.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Proxmox+Ceph Benchmark 2020

2020-10-13 Thread Alwin Antreich
Hello fellow Ceph users,

we have released our new Ceph benchmark paper [0]. The used platform and
Hardware is Proxmox VE 6.2 with Ceph Octopus on a new AMD Epyc Zen2 CPU
with U.2 SSDs (details in the paper).

The paper should illustrate the performance that is possible with a 3x
node cluster without significant tuning.

I welcome everyone to share their experience and add to the discussion,
perferred on our forum [1] thread with our fellow Proxmox VE users.

--
Cheers,
Alwin

[0] https://proxmox.com/en/downloads/item/proxmox-ve-ceph-benchmark-2020-09
[1] 
https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2020-09-hyper-converged-with-nvme.76516/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ERROR: osd init failed: (1) Operation not permitted

2020-02-11 Thread Alwin Antreich
Hi Mario,

On Mon, Feb 10, 2020 at 07:50:15PM +0100, Ml Ml wrote:
> Hello List,
> 
> first of all: Yes - i made mistakes. Now i am trying to recover :-/
> 
> I had a healthy 3 node cluster which i wanted to convert to a single one.
> My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes.
> 
> I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluster.
> Then the problems began.
> 
> I started to change size=1 and min_size=1. (i know, i know, i will
> never ever to that again!)
> Health was okay until here. Then over sudden both nodes got
> fenced...one node refused to boot, mons where missing, etc...to make
> long story short, here is where i am right now:
First off, you better have a backup. ;)

> 
> 
> root@node03:~ # ceph -s
> cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3
>  health HEALTH_WARN
> 53 pgs stale
> 53 pgs stuck stale
>  monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0}
> election epoch 298, quorum 0,1 1,0
>  osdmap e6097: 14 osds: 9 up, 9 in
>   pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects
> 1088 GB used, 32277 GB / 33366 GB avail
>  459 active+clean
>   53 stale+active+clean
> 
> root@node03:~ # ceph osd tree
> ID WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 32.56990 root default
> -2 25.35992 host node03
>  0  3.57999 osd.0up  1.0  1.0
>  5  3.62999 osd.5up  1.0  1.0
>  6  3.62999 osd.6up  1.0  1.0
>  7  3.62999 osd.7up  1.0  1.0
>  8  3.62999 osd.8up  1.0  1.0
> 19  3.62999 osd.19   up  1.0  1.0
> 20  3.62999 osd.20   up  1.0  1.0
> -3  7.20998 host node02
>  3  3.62999 osd.3up  1.0  1.0
>  4  3.57999 osd.4up  1.0  1.0
>  10 osd.1  down0  1.0
>  90 osd.9  down0  1.0
> 100 osd.10 down0  1.0
> 170 osd.17 down0  1.0
> 180 osd.18 down0  1.0
> 
> 
> 
> my main mistakes seemd to be:
> 
> ceph osd out osd.1
> ceph auth del osd.1
> systemctl stop ceph-osd@1
> ceph osd rm 1
> umount /var/lib/ceph/osd/ceph-1
> ceph osd crush remove osd.1
> 
> As far as i can tell, ceph waits and needs data from that OSD.1 (which
> i removed)
> 
> 
> 
> root@node03:~ # ceph health detail
> HEALTH_WARN 53 pgs stale; 53 pgs stuck stale
> pg 0.1a6 is stuck stale for 5086.552795, current state
> stale+active+clean, last acting [1]
> pg 0.142 is stuck stale for 5086.552784, current state
> stale+active+clean, last acting [1]
> pg 0.1e is stuck stale for 5086.552820, current state
> stale+active+clean, last acting [1]
> pg 0.e0 is stuck stale for 5086.552855, current state
> stale+active+clean, last acting [1]
> pg 0.1d is stuck stale for 5086.552822, current state
> stale+active+clean, last acting [1]
> pg 0.13c is stuck stale for 5086.552791, current state
> stale+active+clean, last acting [1]
> [...] SNIP [...]
> pg 0.e9 is stuck stale for 5086.552955, current state
> stale+active+clean, last acting [1]
> pg 0.87 is stuck stale for 5086.552939, current state
> stale+active+clean, last acting [1]
> 
> 
> When i try to start ODS.1 manually, i get:
> 
> 2020-02-10 18:48:26.107444 7f9ce31dd880  0 ceph version 0.94.10
> (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid
> 10210
> 2020-02-10 18:48:26.134417 7f9ce31dd880  0
> filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342)
> 2020-02-10 18:48:26.184202 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> FIEMAP ioctl is supported and appears to work
> 2020-02-10 18:48:26.184209 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> FIEMAP ioctl is disabled via 'filestore fiemap' config option
> 2020-02-10 18:48:26.184526 7f9ce31dd880  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features:
> syncfs(2) syscall fully supported (by glibc and kernel)
> 2020-02-10 18:48:26.184585 7f9ce31dd880  0
> xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize
> is disabled by conf
> 2020-02-10 18:48:26.309755 7f9ce31dd880  0
> filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal
> mode: checkpoint is not enabled
> 2020-02-10 18:48:26.633926 7f9ce31dd880  1 journal _open
> /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
> 4096 bytes, directio = 1, aio = 1
> 2020-02-10 18:48:26.642185 7f9ce31dd880  1 journal _open
> /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size
> 4096 bytes, 

[ceph-users] Re: FileStore OSD, journal direct symlinked, permission troubles.

2019-09-02 Thread Alwin Antreich
On Fri, Aug 30, 2019 at 04:39:39PM +0200, Marco Gaiarin wrote:
> 
> > But, the 'code' that identify (and change permission) for journal dev
> > are PVE specific? or Ceph generic? I suppose the latter...
> 
> OK, trying to identify how OSDs get initialized. If i understood well:
> 
> 0) systemd unit for every OSD get created following a template:
>   /lib/systemd/system/ceph-osd@.service
> 
> 1) every unit call a 'prestart' script:
>   ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} 
> --id %i
> 
> 2) The prestart script, run udev:
> 
>   udevadm settle --timeout=5
> 
>   that simply force the processing of udev queue, only to be sure
>   there's some 'unhandled' device in the queue.
> 
> 3) udev (rules in /lib/udev/rules.d/95-ceph-osd.rules), looking for
>   GPT ID_PART_ENTRY_TYPE do two things:
> 
>   a)
>   ceph-disk --log-stdout -v trigger /dev/$name
>   (that AFAIK trigger a disk mount, for filestore)
> 
>   b)
>   chown ceph:ceph /dev/$name; chmod 660 /dev/$name
> 
> 
> So, seems to me that a decent method to solve/circumvent my trouble is
> to:
> 
> i) write a 'static' udev rule that chown ceph:ceph the partition. Very
>  dirty.
> 
> ii) modify the systemd unit and add an ExecStartPost= script that chown
>  the partition. Dirty but probably effective.
> 
> iii) modify /usr/lib/ceph/ceph-osd-prestart.sh to add the condition,
>  something like (untested):
> 
>   if [ -L "$journal" -a -e "$journal" ]; then
>   dev_journal=`readlink -f $journal`
>   owner=`stat -c %U $dev_journal`
>   if [ $owner != 'ceph' ]; then
>   echo "ceph-osd(${cluster:-ceph}-$id): journal probably 
> manually symlinked, fixing permission." 1>&2
>   chown ceph: $dev_journal
>   fi
>   fi
> 
> 
> I'm not a ceph expert, but solution iii) seems decent for me, with a
> little overhead (a readlinkk and a stat for every osd start).
However you like it. But to note that in Ceph Nautilus the udev rules
aren't shipped anymore.

> 
> 
> 
> But still i don't understood why, if i have:
> 
>   root@capitanmarvel:~# LANG=C id ceph
>   uid=64045(ceph) gid=64045(ceph) groups=64045(ceph),6(disk)
> 
> and:
>   brw-rw 1 root disk 8, 6 ago 28 14:38 /dev/sda6
> 
> (so, journal partition group-owned by 'disk' and 'ceph' user in group
> 'disk'), still i have permission access.
> 
> The ceph-osd process reset group ownership on runtime?
In Luminous udev is handling all of that, see 95-ceph-osd.rules.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FileStore OSD, journal direct symlinked, permission troubles.

2019-08-30 Thread Alwin Antreich
On Thu, Aug 29, 2019 at 05:02:11PM +0200, Marco Gaiarin wrote:
> Riprendo quanto scritto nel suo messaggio del 29/08/2019...
> 
> > Another possibilty is to convert the MBR to GPT (sgdisk --mbrtogpt) and
> > give the partition its UID (also sgdisk). Then it could be linked by
> > its uuid.
> and, in another email:
> > And I forgot that you can also re-create the journal by itself. I can't
> > recall the command ATM though.
> 
> Ahem, i stated the jornal disk are also the OS disks, and i'm using old
> server, so i think that converting to GPT will lead to an unbootable
> node...
> 
> 
> But, the 'code' that identify (and change permission) for journal dev
> are PVE specific? or Ceph generic? I suppose the latter...
IIRC, in Jewel, Ceph went from running services as root to its own user
'ceph'. The permissions had to be changed once, during upgrade. And then
udev takes care of it by checking the GUID [0].

 :~# ls /lib/udev/rules.d/60-ceph-by-parttypeuuid.rules
 :~# ls /lib/udev/rules.d/95-ceph-osd.rules

Udev takes care of the correct device node owner. Changing these to
work with the msdos partition scheme will not be very practical. Alone
that you have to adapt the rule with each Ceph update.

> 
> Also, i've done:
> adduser ceph disk
> and partition devices are '660 root:disk': why still i get 'permission
> denied'?
User and group is ceph on my PVE 5.x and later installs. The user &
group should have been created during upgrade. And as said above udev
should take care of it for the device node.

As it is possible to re-create the journal for an OSD, you may consider
to add an extra SSD to create the journals there. Then you can free up
the OS disks of that burden and have a GPT partition table with the
proper GUID. Less risky, then my approach below.

I tested an conversion from MBR to GPT on the boot disk in a VM. To make
it work, I had to run the following.

  ## Caution: If it fails the system will be not booting

  # converts to GPT
  sgdisk -g /dev/sdX

  # the conversion left 1MB space
  # create a new partition with type ef02 on the boot disk uses the 1MB
  gdisk /dev/sda

  # install grub, if the disk doesn't contain a bios boot partition
  # (ef02) it will not install
  grub-install /dev/sda

After the conversion it should be no problem to got through the
permission fix on our upgrade guide [0] to get the journals recognised
by Ceph.

Well, I hope this helps.

--
Cheers,
Alwin

[0] https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel#Set_permission
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FileStore OSD, journal direct symlinked, permission troubles.

2019-08-29 Thread Alwin Antreich
On Thu, Aug 29, 2019 at 03:01:22PM +0200, Alwin Antreich wrote:
> On Thu, Aug 29, 2019 at 02:42:42PM +0200, Marco Gaiarin wrote:
> > Mandi! Alwin Antreich
> >   In chel di` si favelave...
> > 
> > > > There's something i can do? Thanks.
> > > Did you go through our upgrade guide(s)?
> > 
> > Sure!
> > 
> > 
> > > See the link [0] below, for the
> > > permission changes. They are needed when an upgrade from Hammer to Jewel
> > > is done.
> > 
> > Sure! The problem arise in the 'Set partition type' section, because:
> > 
> >  root@deadpool:~# for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); 
> > do echo $l; blkid -o udev -p $l; echo ""; done
> >  /dev/sda5
> >  ID_PART_ENTRY_SCHEME=dos
> >  ID_PART_ENTRY_UUID=9c277a97-05
> >  ID_PART_ENTRY_TYPE=0xfd
> >  ID_PART_ENTRY_NUMBER=5
> >  ID_PART_ENTRY_OFFSET=546877440
> >  ID_PART_ENTRY_SIZE=97654784
> >  ID_PART_ENTRY_DISK=8:0
> >  
> >  /dev/sda6
> >  ID_PART_ENTRY_SCHEME=dos
> >  ID_PART_ENTRY_UUID=9c277a97-06
> >  ID_PART_ENTRY_TYPE=0xfd
> >  ID_PART_ENTRY_NUMBER=6
> >  ID_PART_ENTRY_OFFSET=644534272
> >  ID_PART_ENTRY_SIZE=97654784
> >  ID_PART_ENTRY_DISK=8:0
> >  
> >  /dev/sdb7
> >  ID_PART_ENTRY_SCHEME=dos
> >  ID_PART_ENTRY_UUID=802474ca-07
> >  ID_PART_ENTRY_TYPE=0xfd
> >  ID_PART_ENTRY_NUMBER=7
> >  ID_PART_ENTRY_OFFSET=742191104
> >  ID_PART_ENTRY_SIZE=97654784
> >  ID_PART_ENTRY_DISK=8:16
> >  
> >  /dev/sdb8
> >  ID_PART_ENTRY_SCHEME=dos
> >  ID_PART_ENTRY_UUID=802474ca-08
> >  ID_PART_ENTRY_TYPE=0xfd
> >  ID_PART_ENTRY_NUMBER=8
> >  ID_PART_ENTRY_OFFSET=839847936
> >  ID_PART_ENTRY_SIZE=97853440
> >  ID_PART_ENTRY_DISK=8:16
> > 
> > 
> > As stated, partitions are 'DOS'...
> Yes, the journal is not linked through uuid, path, ... . For one, you could
> change the link by hand. But this might be not very future proof.
> 
> Another possibilty is to convert the MBR to GPT (sgdisk --mbrtogpt) and
> give the partition its UID (also sgdisk). Then it could be linked by
> its uuid.
> 
> Or if you are not in need of filestore OSDs, re-create them as bluestore
> ones. AFAICS, Ceph has laid more focus on bluestore and it might be
> better to do a conversion sooner than later. (my opinion)
And I forgot that you can also re-create the journal by itself. I can't
recall the command ATM though.

--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FileStore OSD, journal direct symlinked, permission troubles.

2019-08-29 Thread Alwin Antreich
On Thu, Aug 29, 2019 at 02:42:42PM +0200, Marco Gaiarin wrote:
> Mandi! Alwin Antreich
>   In chel di` si favelave...
> 
> > > There's something i can do? Thanks.
> > Did you go through our upgrade guide(s)?
> 
> Sure!
> 
> 
> > See the link [0] below, for the
> > permission changes. They are needed when an upgrade from Hammer to Jewel
> > is done.
> 
> Sure! The problem arise in the 'Set partition type' section, because:
> 
>  root@deadpool:~# for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); 
> do echo $l; blkid -o udev -p $l; echo ""; done
>  /dev/sda5
>  ID_PART_ENTRY_SCHEME=dos
>  ID_PART_ENTRY_UUID=9c277a97-05
>  ID_PART_ENTRY_TYPE=0xfd
>  ID_PART_ENTRY_NUMBER=5
>  ID_PART_ENTRY_OFFSET=546877440
>  ID_PART_ENTRY_SIZE=97654784
>  ID_PART_ENTRY_DISK=8:0
>  
>  /dev/sda6
>  ID_PART_ENTRY_SCHEME=dos
>  ID_PART_ENTRY_UUID=9c277a97-06
>  ID_PART_ENTRY_TYPE=0xfd
>  ID_PART_ENTRY_NUMBER=6
>  ID_PART_ENTRY_OFFSET=644534272
>  ID_PART_ENTRY_SIZE=97654784
>  ID_PART_ENTRY_DISK=8:0
>  
>  /dev/sdb7
>  ID_PART_ENTRY_SCHEME=dos
>  ID_PART_ENTRY_UUID=802474ca-07
>  ID_PART_ENTRY_TYPE=0xfd
>  ID_PART_ENTRY_NUMBER=7
>  ID_PART_ENTRY_OFFSET=742191104
>  ID_PART_ENTRY_SIZE=97654784
>  ID_PART_ENTRY_DISK=8:16
>  
>  /dev/sdb8
>  ID_PART_ENTRY_SCHEME=dos
>  ID_PART_ENTRY_UUID=802474ca-08
>  ID_PART_ENTRY_TYPE=0xfd
>  ID_PART_ENTRY_NUMBER=8
>  ID_PART_ENTRY_OFFSET=839847936
>  ID_PART_ENTRY_SIZE=97853440
>  ID_PART_ENTRY_DISK=8:16
> 
> 
> As stated, partitions are 'DOS'...
Yes, the journal is not linked through uuid, path, ... . For one, you could
change the link by hand. But this might be not very future proof.

Another possibilty is to convert the MBR to GPT (sgdisk --mbrtogpt) and
give the partition its UID (also sgdisk). Then it could be linked by
its uuid.

Or if you are not in need of filestore OSDs, re-create them as bluestore
ones. AFAICS, Ceph has laid more focus on bluestore and it might be
better to do a conversion sooner than later. (my opinion)


--
Cheers,
Alwin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: FileStore OSD, journal direct symlinked, permission troubles.

2019-08-29 Thread Alwin Antreich
Hello Marco,

On Thu, Aug 29, 2019 at 12:55:56PM +0200, Marco Gaiarin wrote:
> 
> I've just finished a double upgrade on my ceph (PVE-based) from hammer
> to jewel and from jewel to luminous.
> 
> All went well, apart that... OSD does not restart automatically,
> because permission troubles on the journal:
> 
>  Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data 
> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
>  Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 
> 7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to 
> open journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied
>  Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 
> 7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store
>  Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 
> 7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission 
> denied#033[0m
> 
> 
> A little fast rewind: when i've setup the cluster i've used some 'old'
> servers, using a couple of SSD disks as SO and as journal.
> Because servers was old, i was forced to partition the boot disk in
> DOS, not GPT mode.
> 
> While creating the OSD, i've received some warnings:
> 
>   WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. 
> Symlinking directly.
> 
> 
> Looking at the cluster now, seems to me that osd init scripts try to
> idetify journal based on GPT partition label/info, and clearly fail.
> 
> 
> Not that if i do, on servers that hold OSD:
> 
>   for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown 
> ceph: $l; done
> 
> OSD start flawlessy.
> 
> 
> There's something i can do? Thanks.
Did you go through our upgrade guide(s)? See the link [0] below, for the
permission changes. They are needed when an upgrade from Hammer to Jewel
is done.

On the wiki you can also find the upgrade guides for PVE 5.x -> 6.x and
Luminous -> Nautilus.

--
Cheers,
Alwin

[0] https://pve.proxmox.com/wiki/Ceph_Hammer_to_Jewel#Set_permission

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io