Hi,
I am missing one block device in rbd list. However I can normally mount it and
use it.
root@asterix:~# rados -p fast ls | grep rbd
rbd_directory
prusa_backup.rbd
root@asterix:~# rbd -p fast ls
root@asterix:~#
When I create new one it shows it correctly, but the old one still missing.
Hi the current OSD systemd unit files starts the OSD daemons correctly
and ceph is HEALTH_OK. However there are some process tracking issues
and systemd thinks the service has failed.
systemctl stop ceph-osd@0
cannot stop the OSDs.
[Service]
EnvironmentFile=-/etc/sysconfig/ceph
Hi Mark,
The results are below. These numbers look good but I'm not really sure what
to conclude now.
# rados -p performance_test bench 120 write -b 4194304 -t 100 --no-cleanup
Total time run: 120.133251
Total writes made: 17529
Write size: 4194304
Bandwidth
Dear Ilya,
It works for me, thanks a lot!
Jian Li
At 2014-11-20 16:19:44, Ilya Dryomov ilya.dryo...@inktank.com wrote:
On Thu, Nov 20, 2014 at 6:47 AM, lijian blacker1...@163.com wrote:
Hi,
I want to collect linux kernel rbd log, I know it use the dout() method to
debug after I read the
Hi,
We have a five node cluster that has been running for a long
time (over a year). A few weeks ago we upgraded to 0.87 (giant) and
things continued to work well.
Last week a drive failed on one of the nodes. We replaced the
drive and things were working well again.
Hi Jay,
The -b parameter to rados bench controls the size of the object being
written. previously you were writing out 8KB objects which behind the
scenes translates into writing out lots of small files on the OSDs
behind the scenes. Your DD tests were doing 1MB writes which are much
Hi Jeff,
it would probably wise to first check what these slow requests are:
1) ceph health detail - This will tell you which OSDs are experiencing the
slow requests
2) ceph daemon osd.{id} dump_ops_in_flight - To be issued on one of the above
OSDs will tell you what theses ops are waiting for.
Hi Anthony,
There is a discussion going on ceph-maintainers about the systemd unit
files. Dmitry Smirnov has posted his version at
http://anonscm.debian.org/cgit/pkg-ceph/ceph.git/commit/?h=experimentalid=3c22e192d964789365e8dc21c168c5fd8985f7d8
though we've made some progress since then.
Hello Mark,
sorry for barging in there but are you sure this is correct? In my tests
the -b parameter in rados bench does exactly one thing and that is it
uses the value in its output to calculate IO bandwidth: taking the OPS
value and multiplies it with the -b value for display. However it
Thanks. I should have mentioned that the errors are pretty well
distributed across the cluster:
ceph1: /var/log/ceph/ceph-osd.0.log 71
ceph1: /var/log/ceph/ceph-osd.1.log 112
ceph1: /var/log/ceph/ceph-osd.2.log 38
ceph2: /var/log/ceph/ceph-osd.3.log 88
ceph2:
Sam,
further to your email I have done the following:
1. Upgraded both osd servers with the latest updates and restarted each server
in turn
2. fired up nping utility to generate TCP connections (3 way handshake) from
each of the servers as well as from the host servers. In total i've ran 5
Hi Rene,
The easiest way to check is to create a fresh pool and look at the files
that are created under an OSD for a PG associated with that pool.
Here's an example using firefly:
perf@magna003:/$ ceph-osd --version
ceph version 0.80.7-129-gc069bce (c069bce4e8180da3c0ca4951365032a45df76468)
Thanks, I will try that.
Andrei
- Original Message -
From: Samuel Just sam.j...@inktank.com
To: Andrei Mikhailovsky and...@arhont.com
Cc: ceph-users@lists.ceph.com
Sent: Thursday, 20 November, 2014 4:26:00 PM
Subject: Re: [ceph-users] Giant upgrade - stability issues
You can try
Hi all,
What is compatibility/incompatibility of dumpling clients to talk to
firefly and giant clusters? I know that tunables=firefly will prevent
dumpling clients from talking to a firefly cluster, but how about the
existence or not of erasure pools? Can a dumpling client talk to a
Firefly/Giant
Hi David,
I've just finished running the 75GB fio test you posted a few days back on
my new test cluster.
The cluster is as follows:-
Single server with 3x hdd and 1 ssd
Ubuntu 14.04 with 3.16.7 kernel
2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also partitioned to
provide journals
You need to create two system users, in both zones. They should have the
same name, access key, and secret in both zones. By convention, these
system users are named the same as the zones.
You shouldn't use those system users for anything other than replication.
You should create a non-system
Yes, it was a healthy cluster and I had to rebuild because the OSD’s got
accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of
them.
[jshah@Lab-cephmon001 ~]$ ceph osd tree
# idweight type name up/down reweight
-1 0.5 root default
-2 0.0
Just to be clear, this is from a cluster that was healthy, had a disk
replaced, and hasn't returned to healthy? It's not a new cluster that has
never been healthy, right?
Assuming it's an existing cluster, how many OSDs did you replace? It
almost looks like you replaced multiple OSDs at the
Nick,
Can you share more datails on the configuration you are using ? I'll try and
duplicate those configurations in my environment and see what happens.
I'm mostly interested in:
- Erasure code profile (k, m, plugin, ruleset-failure-domain)
- Cache tiering pool configuration (ex: hit_set_type,
Are cache tiers reliable in firefly if you *aren't* using erasure pools?
Secondary to that - do they give a big boost with regard to read/write
performance for VM images? any real world feedback?
thanks,
--
Lindsay
signature.asc
Description: This is a digitally signed message part.
Personally I'd suggest a lot of testing first. Not sure if there are
any lingering stability issues, but as far as performance goes in
firefly you'll only likely see speed ups with very skewed hot/cold
distributions and potentially slow downs in the general case unless you
have an extremely
Here you go:-
Erasure Profile
k=2
m=1
plugin=jerasure
ruleset-failure-domain=osd
ruleset-root=hdd
technique=reed_sol_van
Cache Settings
hit_set_type: bloom
hit_set_period: 3600
hit_set_count: 1
target_max_objects
target_max_objects: 0
target_max_bytes: 10
cache_target_dirty_ratio: 0.4
On Thu, 20 Nov 2014 03:12:44 PM Mark Nelson wrote:
Personally I'd suggest a lot of testing first. Not sure if there are
any lingering stability issues, but as far as performance goes in
firefly you'll only likely see speed ups with very skewed hot/cold
distributions and potentially slow
On 11/20/2014 03:17 PM, Lindsay Mathieson wrote:
On Thu, 20 Nov 2014 03:12:44 PM Mark Nelson wrote:
Personally I'd suggest a lot of testing first. Not sure if there are
any lingering stability issues, but as far as performance goes in
firefly you'll only likely see speed ups with very skewed
So you have your crushmap set to choose osd instead of choose host?
Did you wait for the cluster to recover between each OSD rebuild? If you
rebuilt all 3 OSDs at the same time (or without waiting for a complete
recovery between them), that would cause this problem.
On Thu, Nov 20, 2014 at
Thanks for your help.
I was using puppet to install the OSD’s where it chooses a path over a device
name. Hence it created the OSD in the path within the root volume since the
path specified was incorrect.
And all 3 of the OSD’s were rebuilt at the same time because it was unused and
we had
HI Sage, Cephers
(I'm not on ceph-devel at the moment, will switch in a moment.)
Thanks. I am testing on RHEL7/CentOS 7. As quick workaround setting
the .service file to
[Service]
Type=forking
ExecStart= ceph-osd -i #without --foreground)
ExecPreStart =
works for the moment . Is there
Brief outline:
6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, Samsung M.2
PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces.
Ceph 0.80.7-0.el7.centos from the ceph repositories.
About 10 times per day, each node will oops with the following message:
An example:
If there's no data to lose, tell Ceph to re-create all the missing PGs.
ceph pg force_create_pg 2.33
Repeat for each of the missing PGs. If that doesn't do anything, you might
need to tell Ceph that you lost the OSDs. For each OSD you moved, run ceph
osd lost OSDID, then try the
On Thu, 20 Nov 2014 06:56:43 Sage Weil wrote:
Moving this thread to ceph-devel. I don't have time to work on this at
the moment but would be very happy to hear feedback on Dmitry's latest
(and review pull requests :).
Hi guys, please do not expect pull requests from me -- I'm leaving systemd
On Thu, 20 Nov 2014 22:10:02 + Bond, Darryl wrote:
Brief outline:
6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks,
Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces.
Ceph 0.80.7-0.el7.centos from the ceph repositories.
Which kernel?
Anyways,
Ok. Thanks.
—Jiten
On Nov 20, 2014, at 2:14 PM, Craig Lewis cle...@centraldesktop.com wrote:
If there's no data to lose, tell Ceph to re-create all the missing PGs.
ceph pg force_create_pg 2.33
Repeat for each of the missing PGs. If that doesn't do anything, you might
need to tell
Andrey,
The patches seem to be against infiniband drivers.
Would I get any value from trying the elrepo 3.17.3 kernel to hopefully pick up
the compaction changes?
Regards
Darryl
From: Andrey Korolyov and...@xdel.ru
Sent: Friday, 21 November 2014 8:27 AM
Hi Craig,
Recreating the missing PG’s fixed it. Thanks for your help.
But when I tried to mount the Filesystem, it gave me the “mount error 5”. I
tried to restart the MDS server but it won’t work. It tells me that it’s
laggy/unresponsive.
BTW, all these machines are VM’s.
Maybe delete the pool and start over?
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of JIten
Shah
Sent: Thursday, November 20, 2014 5:46 PM
To: Craig Lewis
Cc: ceph-users
Subject: Re: [ceph-users] pg's degraded
Hi Craig,
Recreating the missing PG's fixed it. Thanks
Hi,
I am following
http://docs.ceph.com/docs/master/radosgw/federated-config/ with giant
(0.88-340-g5bb65b3). I figured I'd do the simple case first:
- 1 region
- 2 zones (us-east, us-west) master us-east
- 2 radosgw instances (client.radosgw.us-east-1, wclient.radosgw.us-west-1)
- 1 ceph
This page is marked for removal:
http://ceph.com/docs/firefly/dev/differences-from-posix/
Is the bug in the above webpage still in the code? If not, in which
version was it fixed?
___
ceph-users mailing list
ceph-users@lists.ceph.com
On 21/11/14 14:49, Mark Kirkwood wrote:
The only things that look odd in the destination zone logs are 383
requests getting 404 rather than 200:
$ grep http_status=404 ceph-client.radosgw.us-west-1.log
...
2014-11-21 13:48:58.435201 7ffc4bf7f700 1 == req done
req=0x7ffca002df00
hi,
I upgraded CEPH to 0.87 for rbd readahead , but can't see any performance
improvement in 4K seq read in the VM.
How can I know if the readahead is take effect?
thanks.
ceph.conf
[client]
rbd_cache = true
rbd_cache_size = 335544320
rbd_cache_max_dirty = 251658240
rbd_cache_target_dirty =
On 21/11/14 15:52, Mark Kirkwood wrote:
On 21/11/14 14:49, Mark Kirkwood wrote:
The only things that look odd in the destination zone logs are 383
requests getting 404 rather than 200:
$ grep http_status=404 ceph-client.radosgw.us-west-1.log
...
2014-11-21 13:48:58.435201 7ffc4bf7f700 1
Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is a 10G
Ethernet broadcom so not infiniband.
Tried swappiness = 0 without any effect on this kernel.
I booted 3.17.3-1.el7.elrepo.x86_64 on one node about 3 hrs ago and copied a
lot of data onto the cluster. No sign of an oops
Hello,
On Fri, 21 Nov 2014 04:31:18 + Bond, Darryl wrote:
Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is a
10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without
any effect on this kernel.
I know, I read your original mail, that's why I suggested
Interestingly, on the 3.17.3 kernel ram was freed once the test activity dies
down.
total used free sharedbuffers cached
Mem: 328975288513140 24384388 9284 39241827652
-/+ buffers/cache:6681564 26215964
Swap: 31249404
Hi,
I don't have tested yet rbd readhead,
but maybe do you reach qemu limit. (by default qemu can use only 1thread/1core
to manage ios, check you qemu cpu).
Do you have some performance results ? how many iops ?
but I have had 4x improvement in qemu-kvm, with virtio-scsi + num_queues +
44 matches
Mail list logo