I dont think “usually” is good enough in a production setup.
Sent from myMail for iOS
Thursday, 19 December 2019, 12.09 +0100 from Виталий Филиппов
:
>Usually it doesn't, it only harms performance and probably SSD lifetime
>too
>
>> I would not be running ceph on ssds without powerloss
I would not be running ceph on ssds without powerloss protection. I delivers a
potential data loss scenario
Jesper
Sent from myMail for iOS
Thursday, 19 December 2019, 08.32 +0100 from Виталий Филиппов
:
>https://yourcmc.ru/wiki/Ceph_performance
>
>https://docs.google.com/sprea
Hi Nathan
Is that true?
The time it takes to reallocate the primary pg delivers “downtime” by design.
right? Seen from a writing clients perspective
Jesper
Sent from myMail for iOS
Friday, 29 November 2019, 06.24 +0100 from pen...@portsip.com
:
>Hi Nathan,
>
>Thanks for
to each host - we have a
9-hdd+3-ssd scenario here.
Jesper
Sent from myMail for iOS
Monday, 18 November 2019, 07.49 +0100 from kristof.cou...@gmail.com
:
>Hi all,
>
>Thanks for the feedback.
>Though, just to be sure:
>
>1. There is no 30GB limit if I understand correctly fo
ts from the MDS servers via
>the admin socket
Is lazy IO supported by the kernel client? if so which version kernel?
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
imilar concepts in enterprise environments?
Jesper
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
(therby 909 IOPS)
Really nice writeup and very true - should be a must-read for anyone
starting out with Ceph.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
make sense - makes the cases for ec pools smaller though.
Jesper
Sent from myMail for iOS
Sunday, 9 June 2019, 17.48 +0200 from paul.emmer...@croit.io
:
>Caching is handled in BlueStore itself, erasure coding happens on a higher
>layer.
>
>
>Paul
>
>--
>Paul Emme
ld that be a feature/performance request?
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
hich is 1.5TB'ish.
If your "hot" dataset is smaller, then less will do as well.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
min/avg/max/mdev = 0.047/0.072/0.110/0.017 ms
>
> 10 packets transmitted, 10 received, 0% packet loss, time 9219ms
> rtt min/avg/max/mdev = 0.061/0.078/0.099/0.011 ms
What NIC / Switching components are in play here .. I simply cannot get
latencies
this far down.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi.
This is more an inquiry to figure out how our current setup compares
to other setups. I have a 3 x replicated SSD pool with RBD images.
When running fio on /tmp I'm interested in seeing how much IOPS a
single thread can get - as Ceph scales up very nicely with concurrency.
Currently 34 OSD
k
1/ 1 throttle
0/ 0 refs
1/ 5 xio
1/ 5 compressor
1/ 5 bluestore
1/ 5 bluefs
1/ 3 bdev
1/ 5 kstore
4/ 5 rocksdb
4/ 5 leveldb
4/ 5 memdb
1/ 5 kinetic
1/ 5 fuse
1/ 5 mgr
1/ 5 mgrc
1/ 5 dpdk
1/ 5 eventtrace
-2/-2 (syslog threshold)
-1/-1 (stderr threshold)
max_recent 1
max_new 1000
log_file /var/log/ceph/ceph-mgr.odroid-c.log
--- end dump of recent events ---
Kind regards
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Does that work together nicely? Anyone using it?
With NVMe drives fairly cheap it could stack pretty nicely.
Jesper
Sent from myMail for iOS
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users
hit.
Does anyone else have someting similar in their setup - how do you deal
with it?
KVM based virtualization, Ceph Luminous.
Any suggestions/hints/welcome
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/
> I've seen this with many customers and in those cases we offloaded the
> WAL+DB to an SSD.
Guess the SSD need to be pretty durable to handle that?
Is there a "migration path" to offload this or is it needed to destroy
and re-create the OSD?
Thanks.
Jesper
_
Hi. Knowing this is a bit off-topic but seeking recommendations
and advise anyway.
We're seeking a "management" solution for VM's - currently in the 40-50
VM - but would like to have better access in managing them and potintially
migrate them across multiple hosts, setup block devices, etc, etc.
erything else is eaten by the OSD
> itself.
Thanks for the insight - that means that the SSD-number for read/write
performance are roughly ok - I guess.
It still puzzles me why the bluestore-caching does not benefit
the read-size.
Is the cache not an LRU cache on the block device or is
quot; on OSD nodes in a loop and observe the core
> frequencies.
>
Thanks for the suggestion. They seem to be all powered up .. other
suggestion/reflections
are truely welcome.. Thanks.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http:
s it is 16GB x 84
OSD's .. with a 1GB testfile.
Any thoughts - suggestions - insights ?
Jesper
fio-single-thread-randr.ini
Description: Binary data
fio-single-thread-randw.ini
Description: Binary data
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
oxmox
>>>
>>>We had multiple drives fail(about 30%) within a few days of each other,
>>>likely faster than the cluster could recover.
>>
>>Hov did so many drives break?
>>
>>Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Saturday, 2 March 2019, 04.20 +0100 from satha...@gmail.com
:
>56 OSD, 6-node 12.2.5 cluster on Proxmox
>
>We had multiple drives fail(about 30%) within a few days of each other, likely
>faster than the cluster could recover.
Hov did so many drives br
EC pool requires as a minimum IO-requests to
k-shards for the first
bytes to be returned, with fast_read it becomes k+n, but returns when k
has responded.
Any experience with inlining data on the MDS - that would obviously help
here I guess.
Thanks.
--
Jesper
expect from the HW+network side:
N Min MaxMedian AvgStddev
x 100 0.015687 0.221538 0.0252530.03259606 0.028827849
25ms as a median, 32ms average is still on the high side,
but way, way better.
Thanks.
> I'm trying to understand the nuts and bolts of EC / CephFS
> We're running an EC4+2 pool on top of 72 x 7.2K rpm 10TB drives. Pretty
> slow bulk / archive storage.
Ok, did some more searching and found this:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021642.html.
Which to
'll always suffer a 1:4 ratio on IOPS in
a reading scenario on a 4+2 EC pool, compared to a 3x replication.
Side-note: I'm trying to get bacula (tape-backup) to read off my archive
to tape in a "resonable time/speed".
Thanks in advance.
--
Jesper
__
st block until time has passed and they get through or?
Which means that I'll get 72 x 6 seconds unavailabilty when doing
a rolling restart of my OSD's during upgrades and such? Or is a
controlled restart different than a crash?
--
Jesper.
___
cep
Yesterday I saw this one.. it puzzles me:
2019-02-15 21:00:00.000126 mon.torsk1 mon.0 10.194.132.88:6789/0 604164 :
cluster [INF] overall HEALTH_OK
2019-02-15 21:39:55.793934 mon.torsk1 mon.0 10.194.132.88:6789/0 604304 :
cluster [WRN] Health check failed: 2 slow requests are blocked > 32 sec.
al" read averages at 65.8ms - which - if the
filesize is say 1MB and
the rest of the time is 0 - caps read performance mostly 20MB/s .. at that
pace, the journey
through double digit TB is long even with 72 OSD's backing.
Spec: Ceph Luminous 12.2.5 - Bluestore
6 OSD nodes, 10TB HDDs, 4+2 EC pool, 10GbitE
Locally the drives deliver latencies of approximately 6-8ms for a random
read. Any suggestion
on where to find out where the remaining 50ms is being spend would be
truely helpful.
Large files "just works" as read-ahead does a nice job in getting
performance up.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> That's a usefull conclusion to take back.
Last question - We have our SSD pool set to 3x replication, Micron states
that NVMe is good at 2x - is this "taste and safety" or is there any
general
thoughts about SSD-robustness in a Ceph se
e about 3x better off than
where I am today. (compared to the Micron paper)
That's a usefull conclusion to take back.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
I'll have a look at that. Thanks for the suggestion.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Thanks for the confirmation Marc
Can you put in a but more hardware/network details?
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
om the SSDs on your RBD block devices.
I'm full aware of that - Ceph / RBD / etc comes with an awesome feature
packages and that flexibility deliveres overhead and eats into it.
But it helps to deliver "upper bounds" and work my way to good from there.
Thanks.
Jesper
__
/thread / 0.15ms/IOPS
=> 6.666 IOPSs * 16 threads => 10 IOPS/s
ok, thats at least an upper bound on expectations in this scenario, and I
am at 28207 thus 4x from and have
still not accounted any OSD or rdb userspace time into the equation.
Can i directly get service-time out of the
Hi List
We are in the process of moving to the next usecase for our ceph cluster
(Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and
that works fine.
We're currently on luminous / bluestore, if upgrading is deemed to
change what we're seeing then please let us know.
We have 6
> : We're currently co-locating our mons with the head node of our Hadoop
> : installation. That may be giving us some problems, we dont know yet, but
> : thus I'm speculation about moving them to dedicated hardware.
Would it be ok to run them on kvm VM’s - of course not backed by ceph
s where we usually virtualize our way out of if .. which seems very
wrong here.
Are other people just co-locating it with something random or what are
others typically using in a small ceph cluster (< 100 OSDs .. 7 OSD hosts)
Thanks.
Jesper
___
ceph-
users to
not created those ridiciouls amounts of small files.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
0m0.000s
Giving a ~20ms overhead in a single file.
This is about x3 higher than on our local filesystems (xfs) based on
same spindles.
CephFS metadata is on SSD - everything else on big-slow HDD's (in both
cases).
Is this what everyone else see?
Thanks
--
Jesper
_
Hi All.
I was reading up and especially the thread on upgrading to mimic and
stable releases - caused me to reflect a bit on our ceph journey so far.
We started approximately 6 months ago - with CephFS as the dominant
use case in our HPC setup - starting at 400TB useable capacity and
as is
b [43,71,41,29]
pg_upmap_items 6.c [22,13]
..
But .. I dont have any pg's that should only have 2 replicas.. neither any
with 4 .. how should this be interpreted?
Thanks.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ce
131944 2.0746672
It took about 24 hours to rebalance -- and move quite some TB's around.
I would still like to have a log somewhere to grep and inspect what
balancer/upmap
actually does - when in my cluster. Or some ceph commands that deliveres
so
> Have a look at this thread on the mailing list:
> https://www.mail-archive.com/ceph-users@lists.ceph.com/msg46506.html
Ok, done.. how do I see that it actually work?
Second - should the reweights be set back to 1 then?
Jesper
___
ceph-users m
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
>
> On mik, 2018-12-26 at 13:14 +0100, jes...@krogh.cc wrote:
>> Thanks for the insight and links.
>>
>> > As I can see you are on Luminous. Since Luminous Balancer plugin is
>> > available [1], you should use it instead reweight's in place,
>>
Thanks for the insight and links.
> As I can see you are on Luminous. Since Luminous Balancer plugin is
> available [1], you should use it instead reweight's in place, especially
> in upmap mode [2]
I'll try it out again - last I tried it complanied about older clients -
it should be better
> Please, paste your `ceph osd df tree` and `ceph osd dump | head -n 12`.
$ sudo ceph osd df tree
ID CLASSWEIGHTREWEIGHT SIZE USEAVAIL %USE VAR PGS TYPE NAME
-8 639.98883- 639T 327T 312T 51.24 1.00 - root
default
-10 111.73999- 111T
done the diff
to upstream (yet) and I dont intent to run our production cluster
disk-full anyware in the near future to test it out.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Hi.
In our ceph cluster we hit one OSD with 95% full while others in same pool
only hit 40% .. (total usage is ~55%). Thus I went into a:
sudo ceph osd reweight-by-utilization 110 0.05 12
Which initated some data movement.. but right after ceph status reported:
jk@bison:~/adm-git$ sudo ceph
, but I may just be missing the fine grained details.
> Hope this helps.
Definately - thanks.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ed to be
disabled to ensure that flushed are persistent, and disabling cache in ssd is
either not adhered to by firmware or plummeting the write performance.
Which is why enterprise discs had power loss protection in terms of capacitors.
again any links/info telling otherwise is ve
pdated with recent
development. Is it a solved problem today in consumergrade SSD?
.. any links to insight/testing/etc would be welcome.
https://arstechnica.com/civis/viewtopic.php?f=11=1383499
- does at least not support the viewpoint.
Jesper
___
ceph-users
> On 24 Nov 2018, at 18.09, Anton Aleksandrov wrote
> We plan to have data on dedicate disk in each node and my question is about
> WAL/DB for Bluestore. How bad would it be to place it on system-consumer-SSD?
> How big risk is it, that everything will get "slower than using spinning HDD
>
eally-mean-it to do it anyway
ok, so 4.15 kernel connects as a "hammer" (<1.0) client? Is there a
huge gap in upstreaming kernel clients to kernel.org or what am I
misreading here?
Hammer is 2015'ish - 4.15 is January 2018'ish?
Is kernel client development l
> I suspect that mds asked client to trim its cache. Please run
> following commands on an idle client.
In the mean time - we migrated to the RH Ceph version and deliered the MDS
both SSD's and more memory and the problem went away.
It still puzzles my mind a bit - why is there a connection
ions:
Did you "sleep 900" in-between the execution?
Are you using the kernel client or the fuse client?
If I run them "right after each other" .. then I get the same behaviour.
--
Jesper
___
ceph-users mailing list
ceph-u
; (or similar) .. and report back
i you can reproduce it?
Thoughts/comments/suggestions are highly apprecitated? Should I try with
the fuse-client ?
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
most of them mounts the catalog.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
he first as it also shows available and "cached" memory.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
ong?
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
On 14 Oct 2018, at 15.26, John Hearns wrote:
>
> This is a general question for the ceph list.
> Should Jesper be looking at these vm tunables?
> vm.dirty_ratio
> vm.dirty_centisecs
>
> What effect do they have when using Cephfs?
This situation is a read only, thus no dir
at root cause on the different behaviour could be?
Clients are using 4.15 kernel.. Anyone aware of newer patches in this area
that could impact ?
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
dd up to more than above.. right?
This is the only current load being put on the cluster - + 100MB/s
recovery traffic.
Thanks.
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
an I:
1) Add disks
2) Create pool
3) stop all MDS's
4) rados cppool
5) Start MDS
.. Yes, thats a cluster-down on CephFS but shouldn't take long. Or is
there a better guide?
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
when we saturate a few disks in the setup - and they are sharing. Thus
we'll move
the metadata as per recommendations to SSD.
--
Jesper
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
che works very well, but in our scenario we have 30+
hosts pulling the same data over NFS.
Is bluestore just a "bad fit" .. Filestore "should" do the right thing? Is
the recommendation to make an SSD "overlay" on the slow drives?
Thoughts?
Jesper
to get some deeper insight into Ceph recovery.
I have been though:
http://docs.ceph.com/docs/jewel/rados/troubleshooting/troubleshooting-osd/
without any luck.
What would be the next steps to try?
Thanks!
//Jesper
2016-09-30 08:51:23.464389 7f17985528c0 -1 WARNING: the following dangerous
e might decide to use old raid disks for their ceph setup ;-)
Thank you very much for your kind help!
Cheers,
Jesper
*****
On 18/12/2015 22:09, Jesper Thorhauge wrote:
> Hi Loic,
>
> Getting closer!
>
> lrwxrwxrwx 1 root root 10 Dec 18 19:43 1e9d527f-0866-4284-b77c-
s /
external devices. /dev/sdc sits on external eSata. So...
https://rhn.redhat.com/errata/RHBA-2015-1382.html
will reboot tonight and get back :-)
/jesper
***'
I guess that's the problem you need to solve : why /dev/sdc does not generate
udev events (different dri
:-)
But "HDIO_DRIVE_CMD(identify) failed: Inappropriate ioctl for device" seems to
be the underlying issue.
Any thoughts?
/Jesper
*
Hi Loic,
searched around for possible udev bugs, and then tried to run "yum update".
Udev did have a fresh update with t
retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/ceph', '--cluster', 'ceph',
'--name', 'client.bootstrap-osd', '--keyring',
'/var/lib/ceph/bootstrap-osd/ceph.keyring', 'auth', 'add', 'osd.6', '-i',
'/var/lib/ceph/tmp/mnt.A99cDp/keyring', 'osd', 'allow *', 'mon', 'allow profile
"gpt", \
RUN+="/usr/sbin/ceph-disk-udev $number $name $parent"
On 17/12/2015 08:29, Jesper Thorhauge wrote:
> Hi Loic,
>
> osd's are on /dev/sda and /dev/sdb, journal's is on /dev/sdc (sdc3 / sdc4).
>
> sgdisk for sda shows;
>
> Partition GUID code:
Hi Loic,
Sounds like something does go wrong when /dev/sdc3 shows up. Is there anyway i
can debug this further? Log-files? Modify the .rules file...?
/Jesper
The non-symlink files in /dev/disk/by-partuuid come to existence because of:
* system boots
* udev rule calls
Nope, the previous post contained all that was in the boot.log :-(
/Jesper
**
- Den 17. dec 2015, kl. 11:53, Loic Dachary <l...@dachary.org> skrev:
On 17/12/2015 11:33, Jesper Thorhauge wrote:
> Hi Loic,
>
> Sounds like something does go wrong when /de
-zero exit status 1
ceph-disk: Error: One or more partitions failed to activate
Maybe related to the "(22) Invalid argument" part..?
/Jesper
*
Hi,
I have done several reboots, and it did not lead to healthy symlinks :-(
/Jesper
Hi,
On 16/
-25ca0266fb7f ->
../../sdb1
lrwxrwxrwx 1 root root 10 Dec 16 07:35 e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 ->
../../sda1
i dont know how to verify the symlink of the journal file - can you guide me on
that one?
Thank :-) !
/Jesper
**
Hi,
On 17/12/2015 07:53, Jesper Thorhauge
Hi,
I have done several reboots, and it did not lead to healthy symlinks :-(
/Jesper
Hi,
On 16/12/2015 07:39, Jesper Thorhauge wrote:
> Hi,
>
> A fresh server install on one of my nodes (and yum update) left me with
> CentOS 6.7 / Ceph 0.94.5. All the
a5-fe77-42f6-9415-25ca0266fb7f ->
../../sdb1
lrwxrwxrwx 1 root root 10 Dec 16 07:35 e85f4d92-c8f1-4591-bd2a-aa43b80f58f6 ->
../../sda1
Re-creating them manually wont survive a reboot. Is this a problem with the
udev rules in Ceph 0.94.3+?
Hope that somebody can help me :
78 matches
Mail list logo