[ceph-users] missing rbd in list

2014-11-20 Thread houmles
Hi, I am missing one block device in rbd list. However I can normally mount it and use it. root@asterix:~# rados -p fast ls | grep rbd rbd_directory prusa_backup.rbd root@asterix:~# rbd -p fast ls root@asterix:~# When I create new one it shows it correctly, but the old one still missing.

[ceph-users] OSD systemd unit files makes it look failed

2014-11-20 Thread Anthony Alba
Hi the current OSD systemd unit files starts the OSD daemons correctly and ceph is HEALTH_OK. However there are some process tracking issues and systemd thinks the service has failed. systemctl stop ceph-osd@0 cannot stop the OSDs. [Service] EnvironmentFile=-/etc/sysconfig/ceph

Re: [ceph-users] Ceph performance - 10 times slower

2014-11-20 Thread Jay Janardhan
Hi Mark, The results are below. These numbers look good but I'm not really sure what to conclude now. # rados -p performance_test bench 120 write -b 4194304 -t 100 --no-cleanup Total time run: 120.133251 Total writes made: 17529 Write size: 4194304 Bandwidth

Re: [ceph-users] How to collect ceph linux rbd log

2014-11-20 Thread lijian
Dear Ilya, It works for me, thanks a lot! Jian Li At 2014-11-20 16:19:44, Ilya Dryomov ilya.dryo...@inktank.com wrote: On Thu, Nov 20, 2014 at 6:47 AM, lijian blacker1...@163.com wrote: Hi, I want to collect linux kernel rbd log, I know it use the dout() method to debug after I read the

[ceph-users] slow requests/blocked

2014-11-20 Thread Jeff
Hi, We have a five node cluster that has been running for a long time (over a year). A few weeks ago we upgraded to 0.87 (giant) and things continued to work well. Last week a drive failed on one of the nodes. We replaced the drive and things were working well again.

Re: [ceph-users] Ceph performance - 10 times slower

2014-11-20 Thread Mark Nelson
Hi Jay, The -b parameter to rados bench controls the size of the object being written. previously you were writing out 8KB objects which behind the scenes translates into writing out lots of small files on the OSDs behind the scenes. Your DD tests were doing 1MB writes which are much

Re: [ceph-users] slow requests/blocked

2014-11-20 Thread Jean-Charles LOPEZ
Hi Jeff, it would probably wise to first check what these slow requests are: 1) ceph health detail - This will tell you which OSDs are experiencing the slow requests 2) ceph daemon osd.{id} dump_ops_in_flight - To be issued on one of the above OSDs will tell you what theses ops are waiting for.

Re: [ceph-users] OSD systemd unit files makes it look failed

2014-11-20 Thread Sage Weil
Hi Anthony, There is a discussion going on ceph-maintainers about the systemd unit files. Dmitry Smirnov has posted his version at http://anonscm.debian.org/cgit/pkg-ceph/ceph.git/commit/?h=experimentalid=3c22e192d964789365e8dc21c168c5fd8985f7d8 though we've made some progress since then.

Re: [ceph-users] Ceph performance - 10 times slower

2014-11-20 Thread René Gallati
Hello Mark, sorry for barging in there but are you sure this is correct? In my tests the -b parameter in rados bench does exactly one thing and that is it uses the value in its output to calculate IO bandwidth: taking the OPS value and multiplies it with the -b value for display. However it

Re: [ceph-users] slow requests/blocked

2014-11-20 Thread Jeff
Thanks. I should have mentioned that the errors are pretty well distributed across the cluster: ceph1: /var/log/ceph/ceph-osd.0.log 71 ceph1: /var/log/ceph/ceph-osd.1.log 112 ceph1: /var/log/ceph/ceph-osd.2.log 38 ceph2: /var/log/ceph/ceph-osd.3.log 88 ceph2:

Re: [ceph-users] Giant upgrade - stability issues

2014-11-20 Thread Andrei Mikhailovsky
Sam, further to your email I have done the following: 1. Upgraded both osd servers with the latest updates and restarted each server in turn 2. fired up nping utility to generate TCP connections (3 way handshake) from each of the servers as well as from the host servers. In total i've ran 5

Re: [ceph-users] Ceph performance - 10 times slower

2014-11-20 Thread Mark Nelson
Hi Rene, The easiest way to check is to create a fresh pool and look at the files that are created under an OSD for a PG associated with that pool. Here's an example using firefly: perf@magna003:/$ ceph-osd --version ceph version 0.80.7-129-gc069bce (c069bce4e8180da3c0ca4951365032a45df76468)

Re: [ceph-users] Giant upgrade - stability issues

2014-11-20 Thread Andrei Mikhailovsky
Thanks, I will try that. Andrei - Original Message - From: Samuel Just sam.j...@inktank.com To: Andrei Mikhailovsky and...@arhont.com Cc: ceph-users@lists.ceph.com Sent: Thursday, 20 November, 2014 4:26:00 PM Subject: Re: [ceph-users] Giant upgrade - stability issues You can try

[ceph-users] Client forward compatibility

2014-11-20 Thread Dan van der Ster
Hi all, What is compatibility/incompatibility of dumpling clients to talk to firefly and giant clusters? I know that tunables=firefly will prevent dumpling clients from talking to a firefly cluster, but how about the existence or not of erasure pools? Can a dumpling client talk to a Firefly/Giant

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-20 Thread Nick Fisk
Hi David, I've just finished running the 75GB fio test you posted a few days back on my new test cluster. The cluster is as follows:- Single server with 3x hdd and 1 ssd Ubuntu 14.04 with 3.16.7 kernel 2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also partitioned to provide journals

Re: [ceph-users] Regarding Federated Gateways - Zone Sync Issues

2014-11-20 Thread Craig Lewis
You need to create two system users, in both zones. They should have the same name, access key, and secret in both zones. By convention, these system users are named the same as the zones. You shouldn't use those system users for anything other than replication. You should create a non-system

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah
Yes, it was a healthy cluster and I had to rebuild because the OSD’s got accidentally created on the root disk. Out of 4 OSD’s I had to rebuild 3 of them. [jshah@Lab-cephmon001 ~]$ ceph osd tree # idweight type name up/down reweight -1 0.5 root default -2 0.0

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
Just to be clear, this is from a cluster that was healthy, had a disk replaced, and hasn't returned to healthy? It's not a new cluster that has never been healthy, right? Assuming it's an existing cluster, how many OSDs did you replace? It almost looks like you replaced multiple OSDs at the

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-20 Thread David Moreau Simard
Nick, Can you share more datails on the configuration you are using ? I'll try and duplicate those configurations in my environment and see what happens. I'm mostly interested in: - Erasure code profile (k, m, plugin, ruleset-failure-domain) - Cache tiering pool configuration (ex: hit_set_type,

[ceph-users] firefly and cache tiers

2014-11-20 Thread Lindsay Mathieson
Are cache tiers reliable in firefly if you *aren't* using erasure pools? Secondary to that - do they give a big boost with regard to read/write performance for VM images? any real world feedback? thanks, -- Lindsay signature.asc Description: This is a digitally signed message part.

Re: [ceph-users] firefly and cache tiers

2014-11-20 Thread Mark Nelson
Personally I'd suggest a lot of testing first. Not sure if there are any lingering stability issues, but as far as performance goes in firefly you'll only likely see speed ups with very skewed hot/cold distributions and potentially slow downs in the general case unless you have an extremely

Re: [ceph-users] Poor RBD performance as LIO iSCSI target

2014-11-20 Thread Nick Fisk
Here you go:- Erasure Profile k=2 m=1 plugin=jerasure ruleset-failure-domain=osd ruleset-root=hdd technique=reed_sol_van Cache Settings hit_set_type: bloom hit_set_period: 3600 hit_set_count: 1 target_max_objects target_max_objects: 0 target_max_bytes: 10 cache_target_dirty_ratio: 0.4

Re: [ceph-users] firefly and cache tiers

2014-11-20 Thread Lindsay Mathieson
On Thu, 20 Nov 2014 03:12:44 PM Mark Nelson wrote: Personally I'd suggest a lot of testing first. Not sure if there are any lingering stability issues, but as far as performance goes in firefly you'll only likely see speed ups with very skewed hot/cold distributions and potentially slow

Re: [ceph-users] firefly and cache tiers

2014-11-20 Thread Mark Nelson
On 11/20/2014 03:17 PM, Lindsay Mathieson wrote: On Thu, 20 Nov 2014 03:12:44 PM Mark Nelson wrote: Personally I'd suggest a lot of testing first. Not sure if there are any lingering stability issues, but as far as performance goes in firefly you'll only likely see speed ups with very skewed

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
So you have your crushmap set to choose osd instead of choose host? Did you wait for the cluster to recover between each OSD rebuild? If you rebuilt all 3 OSDs at the same time (or without waiting for a complete recovery between them), that would cause this problem. On Thu, Nov 20, 2014 at

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah
Thanks for your help. I was using puppet to install the OSD’s where it chooses a path over a device name. Hence it created the OSD in the path within the root volume since the path specified was incorrect. And all 3 of the OSD’s were rebuilt at the same time because it was unused and we had

Re: [ceph-users] OSD systemd unit files makes it look failed

2014-11-20 Thread Anthony Alba
HI Sage, Cephers (I'm not on ceph-devel at the moment, will switch in a moment.) Thanks. I am testing on RHEL7/CentOS 7. As quick workaround setting the .service file to [Service] Type=forking ExecStart= ceph-osd -i #without --foreground) ExecPreStart = works for the moment . Is there

[ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Bond, Darryl
Brief outline: 6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces. Ceph 0.80.7-0.el7.centos from the ceph repositories. About 10 times per day, each node will oops with the following message: An example:

Re: [ceph-users] pg's degraded

2014-11-20 Thread Craig Lewis
If there's no data to lose, tell Ceph to re-create all the missing PGs. ceph pg force_create_pg 2.33 Repeat for each of the missing PGs. If that doesn't do anything, you might need to tell Ceph that you lost the OSDs. For each OSD you moved, run ceph osd lost OSDID, then try the

Re: [ceph-users] OSD systemd unit files makes it look failed

2014-11-20 Thread Dmitry Smirnov
On Thu, 20 Nov 2014 06:56:43 Sage Weil wrote: Moving this thread to ceph-devel. I don't have time to work on this at the moment but would be very happy to hear feedback on Dmitry's latest (and review pull requests :). Hi guys, please do not expect pull requests from me -- I'm leaving systemd

Re: [ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Christian Balzer
On Thu, 20 Nov 2014 22:10:02 + Bond, Darryl wrote: Brief outline: 6 Node production cluster. Each node Dell R610, 8x1.4TB SAS Disks, Samsung M.2 PCIe SSD for journals, 32GB RAM, Broadcom 10G interfaces. Ceph 0.80.7-0.el7.centos from the ceph repositories. Which kernel? Anyways,

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah
Ok. Thanks. —Jiten On Nov 20, 2014, at 2:14 PM, Craig Lewis cle...@centraldesktop.com wrote: If there's no data to lose, tell Ceph to re-create all the missing PGs. ceph pg force_create_pg 2.33 Repeat for each of the missing PGs. If that doesn't do anything, you might need to tell

Re: [ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Bond, Darryl
Andrey, The patches seem to be against infiniband drivers. Would I get any value from trying the elrepo 3.17.3 kernel to hopefully pick up the compaction changes? Regards Darryl From: Andrey Korolyov and...@xdel.ru Sent: Friday, 21 November 2014 8:27 AM

Re: [ceph-users] pg's degraded

2014-11-20 Thread JIten Shah
Hi Craig, Recreating the missing PG’s fixed it. Thanks for your help. But when I tried to mount the Filesystem, it gave me the “mount error 5”. I tried to restart the MDS server but it won’t work. It tells me that it’s laggy/unresponsive. BTW, all these machines are VM’s.

Re: [ceph-users] pg's degraded

2014-11-20 Thread Michael Kuriger
Maybe delete the pool and start over? From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of JIten Shah Sent: Thursday, November 20, 2014 5:46 PM To: Craig Lewis Cc: ceph-users Subject: Re: [ceph-users] pg's degraded Hi Craig, Recreating the missing PG's fixed it. Thanks

[ceph-users] Radosgw agent only syncing metadata

2014-11-20 Thread Mark Kirkwood
Hi, I am following http://docs.ceph.com/docs/master/radosgw/federated-config/ with giant (0.88-340-g5bb65b3). I figured I'd do the simple case first: - 1 region - 2 zones (us-east, us-west) master us-east - 2 radosgw instances (client.radosgw.us-east-1, wclient.radosgw.us-west-1) - 1 ceph

[ceph-users] non-posix cephfs page deprecated

2014-11-20 Thread Shawn Edwards
This page is marked for removal: http://ceph.com/docs/firefly/dev/differences-from-posix/ Is the bug in the above webpage still in the code? If not, in which version was it fixed? ___ ceph-users mailing list ceph-users@lists.ceph.com

Re: [ceph-users] Radosgw agent only syncing metadata

2014-11-20 Thread Mark Kirkwood
On 21/11/14 14:49, Mark Kirkwood wrote: The only things that look odd in the destination zone logs are 383 requests getting 404 rather than 200: $ grep http_status=404 ceph-client.radosgw.us-west-1.log ... 2014-11-21 13:48:58.435201 7ffc4bf7f700 1 == req done req=0x7ffca002df00

[ceph-users] RBD read-ahead didn't improve 4K read performance

2014-11-20 Thread duan . xufeng
hi, I upgraded CEPH to 0.87 for rbd readahead , but can't see any performance improvement in 4K seq read in the VM. How can I know if the readahead is take effect? thanks. ceph.conf [client] rbd_cache = true rbd_cache_size = 335544320 rbd_cache_max_dirty = 251658240 rbd_cache_target_dirty =

Re: [ceph-users] Radosgw agent only syncing metadata

2014-11-20 Thread Mark Kirkwood
On 21/11/14 15:52, Mark Kirkwood wrote: On 21/11/14 14:49, Mark Kirkwood wrote: The only things that look odd in the destination zone logs are 383 requests getting 404 rather than 200: $ grep http_status=404 ceph-client.radosgw.us-west-1.log ... 2014-11-21 13:48:58.435201 7ffc4bf7f700 1

Re: [ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Bond, Darryl
Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is a 10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without any effect on this kernel. I booted 3.17.3-1.el7.elrepo.x86_64 on one node about 3 hrs ago and copied a lot of data onto the cluster. No sign of an oops

Re: [ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Christian Balzer
Hello, On Fri, 21 Nov 2014 04:31:18 + Bond, Darryl wrote: Using the standard Centos 3.10.0-123.9.3.el7.x86_64 kernel. The NIC is a 10G Ethernet broadcom so not infiniband. Tried swappiness = 0 without any effect on this kernel. I know, I read your original mail, that's why I suggested

Re: [ceph-users] Kernel memory allocation oops Centos 7

2014-11-20 Thread Bond, Darryl
Interestingly, on the 3.17.3 kernel ram was freed once the test activity dies down. total used free sharedbuffers cached Mem: 328975288513140 24384388 9284 39241827652 -/+ buffers/cache:6681564 26215964 Swap: 31249404

Re: [ceph-users] RBD read-ahead didn't improve 4K read performance

2014-11-20 Thread Alexandre DERUMIER
Hi, I don't have tested yet rbd readhead, but maybe do you reach qemu limit. (by default qemu can use only 1thread/1core to manage ios, check you qemu cpu). Do you have some performance results ? how many iops ? but I have had 4x improvement in qemu-kvm, with virtio-scsi + num_queues +