ibc_start_main()+0xf5) [0x7f09397c6505]
9: (()+0x24ad40) [0x55e5a02fad40]
-261> 2020-01-15 16:36:46.086 7f0946674a00 -1 *** Caught signal (Aborted) **
in thread 7f0946674a00 thread_name:ceph-mon
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & E
2.) Enter Meeting ID: 908675367
3.) Press #
Want to test your video connection?
https://bluejeans.com/111<https://www.google.com/url?q=https://bluejeans.com/111=D=1579363980705000=AOvVaw3UlW-AxGCX7TXfn8VAGfH4>
Kevin
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science &
ttps://www.google.com/url?q=https://bluejeans.com/111=D=1572095869727000=AOvVaw1bRfUtekflHoeS36FKwXw2>
--
Kevin Hrpcek
NASA VIIRS Atmosphere SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
___
ceph-users mailing l
Wednesday of each
month.
Here's the pad to collect agenda/notes:
https://pad.ceph.com/p/Ceph_Science_User_Group_Index
--
Mike Perez (thingee)
On Tue, Jul 23, 2019 at 10:40 AM Kevin Hrpcek
mailto:kevin.hrp...@ssec.wisc.edu>> wrote:
Update
We're going to hold off until August for this so
? Or the reweight? (I guess you change
the crush weight, I am right?)
Thanks!
El 24 jul 2019, a les 19:17, Kevin Hrpcek
mailto:kevin.hrp...@ssec.wisc.edu>> va escriure:
I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do,
you can obviously change the weight in
I often add 50+ OSDs at a time and my cluster is all NLSAS. Here is what I do,
you can obviously change the weight increase steps to what you are comfortable
with. This has worked well for me and my workloads. I've sometimes seen peering
take longer if I do steps too quickly but I don't run any
Update
We're going to hold off until August for this so we can promote it on the Ceph
twitter with more notice. Sorry for the inconvenience if you were planning on
the meeting tomorrow. Keep a watch on the list, twitter, or ceph calendar for
updates.
Kevin
On 7/5/19 11:15 PM, Kevin Hrpcek
a
topic for meetings. I will be brainstorming some conversation starters but it
would also be interesting to have people give a deep dive into their use of
ceph and what they have built around it to support the science being done at
their facility.
Kevin
On 6/17/19 10:43 AM, Kevin Hrpcek
Hey all,
At cephalocon some of us who work in scientific computing got together for a
BoF and had a good conversation. There was some interest in finding a way to
continue the conversation focused on ceph in scientific computing and htc/hpc
environments. We are considering putting together
for an
osd that reported a failure and seeing what error code it coming up on the
failed ping connection? That might provide a useful hint (e.g.,
ECONNREFUSED vs EMFILE or something).
I'd also confirm that with nodown set the mon quorum stabilizes...
sage
On Mon, 10 Sep 2018, Kevin Hrpcek
with nodown set the mon quorum stabilizes...
sage
On Mon, 10 Sep 2018, Kevin Hrpcek wrote:
Update for the list archive.
I went ahead and finished the mimic upgrade with the osds in a fluctuating
state of up and down. The cluster did start to normalize a lot easier after
everything was on
the mix of luminous and
mimic did not play well together for some reason. Maybe it has to do
with the scale of my cluster, 871 osd, or maybe I've missed some some
tuning as my cluster has scaled to this size.
Kevin
On 09/09/2018 12:49 PM, Kevin Hrpcek wrote:
Nothing too crazy for non default
things are, setting pause on
the cluster to just finish the upgrade faster might not be a bad idea
either.
This should be a simple question, have you confirmed that there are no
networking problems between the MONs while the elections are happening?
On Sat, Sep 8, 2018, 7:52 PM Kevin Hrpcek
018, Kevin Hrpcek wrote:
Hello,
I've had a Luminous -> Mimic upgrade go very poorly and my cluster is stuck
with almost all pgs down. One problem is that the mons have started to
re-elect a new quorum leader almost every minute. This is making it difficult
to monitor the cluster and even run any co
Hello,
I've had a Luminous -> Mimic upgrade go very poorly and my cluster is
stuck with almost all pgs down. One problem is that the mons have
started to re-elect a new quorum leader almost every minute. This is
making it difficult to monitor the cluster and even run any commands on
it since
I use icinga2 as well with a check_ceph.py that I wrote a couple years
ago. The method I use is that icinga2 runs the check from the icinga2
host itself. ceph-common is installed on the icinga2 host since the
check_ceph script is a wrapper and parser for the ceph command output
using python's
Hello,
I'm seeing something that seems to be odd behavior when reweighting
OSDs. I've just upgraded to 12.2.5 and am adding in a new osd server to
the cluster. I gradually weight the 10TB OSDs into the cluster by doing
a +1, letting things backfill for a while, then +1 until I reach my
Thanks for the input Greg, we've submitted the patch to the ceph github
repo https://github.com/ceph/ceph/pull/21222
Kevin
On 04/02/2018 01:10 PM, Gregory Farnum wrote:
On Mon, Apr 2, 2018 at 8:21 AM Kevin Hrpcek
<kevin.hrp...@ssec.wisc.edu <mailto:kevin.hrp...@ssec.wisc.edu&g
to most users... Any insight would be appreciated as we'd
prefer to use an official solution rather than our bindings fix for long
term use.
Tested on Luminous 12.2.2 and 12.2.4.
Thanks,
Kevin
--
Kevin Hrpcek
Linux Systems Administrator
NASA SNPP Atmospheric SIPS
Space Science & Enginee
Steven,
I've recently done some performance testing on dell hardware. Here are
some of my messy results. I was mainly testing the effects of the R0
stripe sizing on the perc card. Each disk has it's own R0 so that write
back is enabled. VDs were created like this but with different
Marc,
If you're running luminous you may need to increase osd_max_object_size.
This snippet is from the Luminous change log.
"The default maximum size for a single RADOS object has been reduced
from 100GB to 128MB. The 100GB limit was completely impractical in
practice while the 128MB limit
by quickly setting nodown,noout,noup when
everything is already down will help as well.
Sage, thanks again for your input and advice.
Kevin
On 11/04/2017 11:54 PM, Sage Weil wrote:
On Sat, 4 Nov 2017, Kevin Hrpcek wrote:
Hey Sage,
Thanks for getting back to me this late on a weekend.
Do you
Hey Sage,
Thanks for getting back to me this late on a weekend.
Do you now why the OSDs were going down? Are there any crash dumps in the
osd logs, or is the OOM killer getting them?
That's a part I can't nail down yet. OSDs didn't crash, after the
reweight-by-utilization OSDs on some of our
g
1 stale+active+clean+scrubbing
1 active+recovering+undersized+degraded
1 stale+active+remapped+backfilling
1 inactive
1 active+clean+scrubbing
1 stale+active+clean+scrubbing+d
24 matches
Mail list logo