[ceph-users] Re: osd crash: Caught signal (Aborted) thread_name:tp_osd_tp

2020-11-24 Thread Igor Fedotov
Hi Milan Please DO NOT delete the object for all the EC shards (i.e. at all three OSDs) Sorry, I missed that you have three shards crashing... Removing that many object shards will cause data lost. Theoretically removing just a single object replica and then doing a scrub might help

[ceph-users] Re: Many ceph commands hang. broken mgr?

2020-11-24 Thread Rafael Lopez
Interesting. There is a more forceful way to disable progress which I had to do as we have an older version. Basically, you stop the mgrs, and then move the progress module files: systemctl stop ceph-mgr.target mv /usr/share//ceph/mgr/progress {some backup location} systemctl start

[ceph-users] Re: Many ceph commands hang. broken mgr?

2020-11-24 Thread Rafael Lopez
Hi Paul, We had similar experience with redhat ceph, and it turned out to be the mgr progress module. I think there are some works to fix this, though the one I thought would impact you seems to be in 14.2.11. https://github.com/ceph/ceph/pull/36076 If you have 14.2.15, you can try turning off

[ceph-users] Re: osd crash: Caught signal (Aborted) thread_name:tp_osd_tp

2020-11-24 Thread Milan Kupcevic
Hi Igor, Thank you for quick and useful answer. We are looking at our options. Milan On 2020-11-24 06:49, Igor Fedotov wrote: > Another workaround would be to delete the object in question using > ceph-objectstore-tool and then do a scrub on the corresponding PG to fix > the absent object.

[ceph-users] Re: Many ceph commands hang. broken mgr?

2020-11-24 Thread Paul Mezzanini
While the "progress off" was hung, I did a systemctl restart of the active ceph-mgr. The progress toggle command completed and reported that progress disabled. All commands that were hanging before are still unresponsive. That was worth a shot. Thanks -- Paul Mezzanini Sr Systems

[ceph-users] Re: Many ceph commands hang. broken mgr?

2020-11-24 Thread Paul Mezzanini
"ceph progress off" is just hanging like the others. I'll fiddle with it later tonight to see if I can get it to stick when I bounce a daemon. -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute

[ceph-users] Re: [Suspicious newsletter] Re: Unable to reshard bucket

2020-11-24 Thread Eric Ivancich
Can you clarify, Istvan, what you plan on setting to 64K? If it’s the number of shards for a bucket, that would be a mistake. > On Nov 21, 2020, at 2:09 AM, Szabo, Istvan (Agoda) > wrote: > > Seems like this sharding we need to be plan carefully since the beginning. > I'm thinking to set the

[ceph-users] RGW Data Loss Bug in Octopus 15.2.0 through 15.2.6

2020-11-24 Thread Eric Ivancich
Starting in stable release Octopus 15.2.0 and continuing through Octopus 15.2.6 there is a bug in RGW that could result in data loss. There is both an immediate configuration work-around and a fix is intended for Octopus 15.2.7. [Note: the bug was first merged in a pre-stable release — Octopus

[ceph-users] Re: smartctl UNRECOGNIZED OPTION: json=o

2020-11-24 Thread Anthony D'Atri
context : JSON output was added to smartmontools 7 explicitly for Ceph use > > I had to roll an upstream version of the smartmon tools because everything > with redhat 7/8 was too old to support the json option. > ___ ceph-users mailing list --

[ceph-users] Re: smartctl UNRECOGNIZED OPTION: json=o

2020-11-24 Thread Paul Mezzanini
I had to roll an upstream version of the smartmon tools because everything with redhat 7/8 was too old to support the json option. -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of

[ceph-users] replace osd with Octopus

2020-11-24 Thread Tony Liu
Hi, I did some search about replacing osd, and found some different steps, probably for different release? Is there recommended process to replace an osd with Octopus? Two cases here: 1) replace HDD whose WAL and DB are on a SSD. 1-1) failed disk is replaced by the same model. 1-2) working disk

[ceph-users] smartctl UNRECOGNIZED OPTION: json=o

2020-11-24 Thread Tony Liu
Hi, With Ceph Octopus 15.2.5, here is the output of command "ceph device get-health-metrics SEAGATE_DL2400MM0159_WBM2WP2S". === "20201123-000939": { "dev": "/dev/sde", "error": "smartctl failed",

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Anthony D'Atri
I had hoped to stay out of this, but here I go. > 4) SATA controller and PCIe throughput SoftIron claims “wire speed” with their custom hardware FWIW. > Unfortunately these are the kinds of things that you can't easily generalize > between ARM vs x86. Some ARM processors are going to do

[ceph-users] Many ceph commands hang. broken mgr?

2020-11-24 Thread Paul Mezzanini
Ever since we jumped from 14.2.9 to .12 (and beyond) a lot of the ceph commands just hang. The mgr daemon also just stops responding to our Prometheus scrapes occasionally. A daemon restart and it wakes back up. I have nothing pointing to these being related but it feels that way. I also

[ceph-users] Planning: Ceph User Survey 2020

2020-11-24 Thread Mike Perez
Hi everyone, The Ceph User Survey 2020 is being planned by our working group. Please review the draft survey pdf, and let's discuss any changes. You may also join us in the next meeting *on November 25th at 12pm *PT https://tracker.ceph.com/projects/ceph/wiki/User_Survey_Working_Group

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Darren Soothill
So yes you can get the servers for a considerably lower price than Intel. Its not just about the CPU cost but many arm servers are based on a SOC that includes networking so the overall cost of the motherboard/processor/networking is a lot lower. It doesn't reduce the price of the storage or

[ceph-users] Certificate for Dashboard / Grafana

2020-11-24 Thread E Taka
Hello, I'm having difficulties with setting up the web certificates for the Dashboard on hostnames ceph*01..n*.domain.tld. I set the keys and crt with ceph-config-key. ceph-config-key get mgr/dashbord/crt shows the correct certificate, the same applies to mgr/dashbord/key, mgr/cephadm/grafana_key

[ceph-users] Re: Unable to find further optimization, or distribution is already perfect

2020-11-24 Thread Toby Darling
Hi Nathan Thanks for the reply. root@ceph1 16:30 [~]: ceph osd pool autoscale-status POOLSIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE ec82pool 2886T 1.254732T 0.7625

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Peter Woodman
I've been running ceph on a heterogeneous mix of rock64 and rpi4 SBCs. i've had to do my own builds, as the upstream ones started off with thunked-out checksumming due to (afaict) different arm feature sets between upstream's build targets and my SBCs, but other than that one, haven't run into any

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread DHilsbos
Adrian; I've always considered the advantage of ARM to be the reduction in the failure domain. Instead of one server with 2 processors, and 2 power supplies, in 1 case, running 48 disks, you can do 4 cases containing 8 power supplies, and 32 processors running 32 (or 64...) disks. The

[ceph-users] Ceph on ARM ?

2020-11-24 Thread Adrian Nicolae
Hi guys, I was looking at some Huawei ARM-based servers and the datasheets are very interesting. The high CPU core numbers and the SoC architecture should be ideal for a distributed storage like Ceph, at least in theory.  I'm planning to build a new Ceph cluster in the future and my best

[ceph-users] Re: Documentation of older Ceph version not accessible anymore on docs.ceph.com

2020-11-24 Thread Marc Roos
2nd that. Why even remove old documentation before it is migrated to the new environment. It should be left online until the migration successfully completed. -Original Message- Sent: Tuesday, November 24, 2020 4:23 PM To: Frank Schilder Cc: ceph-users Subject: [ceph-users] Re:

[ceph-users] Re: Cephfs snapshots and previous version

2020-11-24 Thread DHilsbos
Oliver; You might consider asking this question of the CentOS folks. Possibly at cen...@centos.org. Thank you, Dominic L. Hilsbos, MBA Director – Information Technology Perform Air International Inc. dhils...@performair.com www.PerformAir.com -Original Message- From: Oliver

[ceph-users] Re: 14. 2.15: Question to collection_list_legacy osd bug fixed in 14.2.15

2020-11-24 Thread Konstantin Shalygin
No, you are not affected. Affected only clusters with mixed versions. k Sent from my iPhone > On 24 Nov 2020, at 18:25, Rainer Krienke wrote: > > Hello, > > thanks for your answer. If I understand you correctly then only if I > upgrade from 14.2.11 to 14.2.(12|14) this could lead to

[ceph-users] Re: Documentation of older Ceph version not accessible anymore on docs.ceph.com

2020-11-24 Thread Steven Pine
I want to just echo this sentiment. I thought the lack of older docs would be a very temporary issue, but they are still not available. It is especially frustrating when half the google searches also return a page not found error. The migration has been very badly done. Sincerely, On Tue, Nov

[ceph-users] Prometheus monitoring

2020-11-24 Thread Michael Thomas
I am gathering prometheus metrics from my (unhealthy) Octopus (15.2.4) cluster and notice a discrepency (or misunderstanding) with the ceph dashboard. In the dashboard, and with ceph -s, it reports 807 million objects objects: pgs: 169747/807333195 objects degraded (0.021%)

[ceph-users] Re: Tracing in ceph

2020-11-24 Thread Abhinav Singh
hi Seena, sorry for the late reply, I have used jaeger to trace the rgw req, the PR is still not merged with official repo, but you can give it a try https://github.com/suab321321/ceph/tree/wip-jaegerTracer-noNamespace, 1. the cmake option to build jaeger is on by default so you dont need to give

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Martin Verges
Hello, > I'm curious however if the ARM servers are better or not for this use case > (object-storage only). For example, instead of using 2xSilver/Gold server, I > can use a Taishan 5280 server with 2x Kungpen 920 ARM CPUs with up to 128 > cores in total . So I can have twice as many CPU

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Robert Sander
Am 24.11.20 um 13:12 schrieb Adrian Nicolae: >     Has anyone tested Ceph in such scenario ?  Is the Ceph software > really optimised for the ARM architecture ? Personally I have not run Ceph on ARM, but there are companies selling such setups: https://softiron.com/ https://www.ambedded.com.tw/

[ceph-users] Re: Ceph on ARM ?

2020-11-24 Thread Mark Nelson
At least in the past, there have been a couple of things you really want to focus on regarding ARM and performance (beyond the obvious core count/clockspeed/ipc/etc): 1) HW acceleration for things like CRC32, MD5, etc 2) context switching overhead 3) memory throughput 4) SATA controller

[ceph-users] Re: Documentation of older Ceph version not accessible anymore on docs.ceph.com

2020-11-24 Thread Frank Schilder
Older versions are available here: https://web.archive.org/web/20191226012841/https://docs.ceph.com/docs/mimic/ I'm actually also a bit unhappy about older versions missing. Mimic is not end of life and a lot of people still use luminous. Since there are such dramatic differences between

[ceph-users] Re: Cephfs snapshots and previous version

2020-11-24 Thread Frank Schilder
We made the same observation and found out that for CentOS8 there are extra modules for samba that provide vfs modules for certain storage systems (search for all available package names containing samba and they show up in the list). One is available and supports gluster fs. The corresponding

[ceph-users] Re: osd crash: Caught signal (Aborted) thread_name:tp_osd_tp

2020-11-24 Thread Igor Fedotov
Another workaround would be to delete the object in question using ceph-objectstore-tool and then do a scrub on the corresponding PG to fix the absent object. But I would greatly appreciate if we dissect this case for a bit On 11/24/2020 9:55 AM, Milan Kupcevic wrote: Hello, Three OSD

[ceph-users] Re: osd crash: Caught signal (Aborted) thread_name:tp_osd_tp

2020-11-24 Thread Igor Fedotov
Hi Milan, given the log output mentioning 32768 spanning blobs I believe you're facing https://tracker.ceph.com/issues/48216 The root cause for this case is still unknown but PR attached to the ticket allows to fix the issue using objectstore's fsck/repair. Hence if you're able to deploy a

[ceph-users] Re: Manual bucket resharding problem

2020-11-24 Thread Konstantin Shalygin
Try to look at `radosgw-admin reshard stale-instances list` command. And if list is not empty just rm this stale reshard and then start reshard process again. k Sent from my iPhone > On 22 Nov 2020, at 15:35, Mateusz Skała wrote: > > Thank You for response, how I can upload this to

[ceph-users] Re: 14. 2.15: Question to collection_list_legacy osd bug fixed in 14.2.15

2020-11-24 Thread Konstantin Shalygin
This bug may be affect when you upgrade from 14.2.11 to 14.2.(12|14) with low speed (e.g. one by one node). If you already upgrade from 14.2.11 you just jump over this bug. k Sent from my iPhone > On 24 Nov 2020, at 10:43, Rainer Krienke wrote: > > Hello, > > I am running a productive

[ceph-users] Re: Cephfs snapshots and previous version

2020-11-24 Thread Konstantin Shalygin
Just rpmbuild last samba version with enabled vfs features. This modules is stable. k Sent from my iPhone > On 24 Nov 2020, at 10:51, Frank Schilder wrote: > > We made the same observation and found out that for CentOS8 there are extra > modules for samba that provide vfs modules for

[ceph-users] Re: OSD Memory usage

2020-11-24 Thread Seena Fallah
I add one OSD node to the cluster and I get 500MB/s throughput over my disks and it was 2 or 3 times better than before! but my latency raised 5 times!!! When I enable bluefs_buffered_io the throughput on disks gets 200MB/s and my latency gets down! Is there any kernel config/tuning that should be