[ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
Hi all, I am trying to remove several rbd images from the cluster. Unfortunately, that doesn't work: $ rbd info foo rbd image 'foo': size 1024 GB in 262144 objects order 22 (4096 kB objects) block_name_prefix: rb.0.919443.238e1f29 format: 1 $ rbd rm foo

Re: [ceph-users] Remove RBD Image

2015-07-29 Thread Christian Eichelmann
at 11:30 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi all, I am trying to remove several rbd images from the cluster. Unfortunately, that doesn't work: $ rbd info foo rbd image 'foo': size 1024 GB in 262144 objects order 22 (4096 kB objects

Re: [ceph-users] Scrub Error / How does ceph pg repair work?

2015-05-12 Thread Christian Eichelmann
be aiming your gun at your foot with this! Robert LeBlanc GPG Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, May 11, 2015 at 2:09 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi all! We are experiencing approximately 1 scrub error

[ceph-users] Scrub Error / How does ceph pg repair work?

2015-05-11 Thread Christian Eichelmann
Hi all! We are experiencing approximately 1 scrub error / inconsistent pg every two days. As far as I know, to fix this you can issue a ceph pg repair, which works fine for us. I have a few qestions regarding the behavior of the ceph cluster in such a case: 1. After ceph detects the scrub error,

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
to dmesg ? Cheers, Dan On Mon, Apr 20, 2015 at 9:29 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Ceph-Users! We currently have a problem where I am not sure if the it has it's cause in Ceph or something else. First, some information about our ceph-setup: * ceph version

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
has been running for years without a problem. -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Eichelmann Sent: 20 April 2015 14:41 To: Nick Fisk; ceph-users@lists.ceph.com Subject: Re: [ceph-users] 100% IO Wait with CEPH RBD

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-21 Thread Christian Eichelmann
On Tue, Apr 21, 2015 at 9:13 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Dan, we are alreay back on the kernel module since the same problems were happening with fuse. I had no special ulimit settings for the fuse-process, so that could have been an issue there. I

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
I'm using xfs on the rbd disks. They are between 1 and 10TB in size. Am 20.04.2015 um 14:32 schrieb Nick Fisk: Ah ok, good point What FS are you using on the RBD? -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Eichelmann

Re: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
might stop this from happening. Nick -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Christian Eichelmann Sent: 20 April 2015 08:29 To: ceph-users@lists.ceph.com Subject: [ceph-users] 100% IO Wait with CEPH RBD and RSYNC Hi Ceph-Users

[ceph-users] 100% IO Wait with CEPH RBD and RSYNC

2015-04-20 Thread Christian Eichelmann
Hi Ceph-Users! We currently have a problem where I am not sure if the it has it's cause in Ceph or something else. First, some information about our ceph-setup: * ceph version 0.87.1 * 5 MON * 12 OSD with 60x2TB each * 2 RSYNC Gateways with 2x10G Ethernet (Kernel: 3.16.3-2~bpo70+1, Debian

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-10 Thread Christian Eichelmann
can put this to catch more users? Or maybe a warning issued by the osds themselves or something if they see limits that are low? sage - Karan - On 09 Mar 2015, at 14:48, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi Karan, as you are actually writing in your own

Re: [ceph-users] Ceph BIG outage : 200+ OSD are down , OSD cannot create thread

2015-03-09 Thread Christian Eichelmann
http://www.csc.fi/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Eichelmann Systemadministrator 11

Re: [ceph-users] Monitor Restart triggers half of our OSDs marked down

2015-02-05 Thread Christian Eichelmann
The behaviour is exactly the same on our system, to it looks like the same issue. We are current running Giant by the way (0.87) plus many other OSDs like that. -- Christian Eichelmann Systemadministrator 11 Internet AG - IT Operations Mail Media Advertising Targeting Brauerstraße 48 · DE

[ceph-users] Monitor Restart triggers half of our OSDs marked down

2015-02-03 Thread Christian Eichelmann
Hi all, during some failover tests and some configuration tests, we currently discover a strange phenomenon: Restarting one of our monitors (5 in sum) triggers about 300 of the following events: osd.669 10.76.28.58:6935/149172 failed (20 reports from 20 peers after 22.005858 = grace 20.00)

Re: [ceph-users] Behaviour of Ceph while OSDs are down

2015-01-21 Thread Christian Eichelmann
at 9:45 AM, Gregory Farnum g...@gregs42.com wrote: On Tue, Jan 20, 2015 at 2:40 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: Hi all, I want to understand what Ceph does if several OSDs are down. First of our, some words to our Setup: We have 5 Monitors and 12 OSD Server, each has

[ceph-users] Behaviour of Ceph while OSDs are down

2015-01-20 Thread Christian Eichelmann
Hi all, I want to understand what Ceph does if several OSDs are down. First of our, some words to our Setup: We have 5 Monitors and 12 OSD Server, each has 60x2TB Disks. These Servers are spread across 4 racks in our datacenter. Every rack holds 3 OSD Server. We have a replication factor of

[ceph-users] Placementgroups stuck peering

2015-01-14 Thread Christian Eichelmann
Hi all, after our cluster problems with incomplete placementgroups, we've decided to remove our pools and create new ones. This was going fine in the beginning. After adding an additional OSD server, we now have 2 PGs that are stuck in the peering state: HEALTH_WARN 2 pgs peering; 2 pgs stuck

Re: [ceph-users] Is ceph production ready? [was: Ceph PG Incomplete = Cluster unusable]

2015-01-09 Thread Christian Eichelmann
___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Eichelmann Systemadministrator 11 Internet AG - IT Operations Mail Media Advertising Targeting Brauerstraße 48 · DE-76135 Karlsruhe Telefon: +49 721 91374-8026

[ceph-users] Documentation of ceph pg num query

2015-01-09 Thread Christian Eichelmann
Hi all, as mentioned last year, our ceph cluster is still broken and unusable. We are still investigating what has happened and I am taking more deep looks into the output of ceph pg pgnum query. The problem is that I can find some informations about what some of the sections mean, but mostly I

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
again. To tell the truth, I guess that will result in the end of our ceph project (running for already 9 Monthes). Regards, Christian Am 29.12.2014 15:59, schrieb Nico Schottelius: Hey Christian, Christian Eichelmann [Mon, Dec 29, 2014 at 10:56:59AM +0100]: [incomplete PG / RBD hanging, osd

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
for each pool, also different OSDs, may be this way you can overcome the issue. Cheers Eneko On 30/12/14 12:17, Christian Eichelmann wrote: Hi Nico and all others who answered, After some more trying to somehow get the pgs in a working state (I've tried force_create_pg, which was putting

Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-30 Thread Christian Eichelmann
in the new pools image format? On 30/12/14 12:31, Christian Eichelmann wrote: Hi Eneko, I was trying a rbd cp before, but that was haning as well. But I couldn't find out if the source image was causing the hang or the destination image. That's why I decided to try a posix copy. Our cluster

[ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Christian Eichelmann
Hi all, we have a ceph cluster, with currently 360 OSDs in 11 Systems. Last week we were replacing one OSD System with a new one. During that, we had a lot of problems with OSDs crashing on all of our systems. But that is not our current problem. After we got everything up and running again, we

Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-23 Thread Christian Eichelmann
it doesn't seem to have gotten much traction in terms of informing users. Regards Nathan On 15/09/2014 7:13 PM, Christian Eichelmann wrote: Hi all, I have no idea why running out of filehandles should produce a out of memory error, but well. I've increased the ulimit as you told me, and nothing

Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-15 Thread Christian Eichelmann
disks, with OSD SSD journals) with that kind of case and enjoy the fact that my OSDs never fail. ^o^ Christian (another one) On 9/12/2014 10:15 AM, Christian Eichelmann wrote: Hi, I am running all commands as root, so there are no limits for the processes. Regards, Christian

[ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-12 Thread Christian Eichelmann
Hi Ceph-Users, I have absolutely no idea what is going on on my systems... Hardware: 45 x 4TB Harddisks 2 x 6 Core CPUs 256GB Memory When initializing all disks and join them to the cluster, after approximately 30 OSDs, other osds are crashing. When I try to start them again I see different

Re: [ceph-users] OSDs are crashing with Cannot fork or cannot create thread but plenty of memory is left

2014-09-12 Thread Christian Eichelmann
Hi, I am running all commands as root, so there are no limits for the processes. Regards, Christian ___ Von: Mariusz Gronczewski [mariusz.gronczew...@efigence.com] Gesendet: Freitag, 12. September 2014 15:33 An: Christian Eichelmann Cc: ceph-users

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Christian Eichelmann
I can also confirm that after upgrading to firefly both of our clusters (test and live) were going from 0 scrub errors each for about 6 Month to about 9-12 per week... This also makes me kind of nervous, since as far as I know everything ceph pg repair does, is to copy the primary object to all

Re: [ceph-users] external monitoring tools for ceph

2014-07-01 Thread Christian Eichelmann
://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Christian Eichelmann Systemadministrator 11 Internet AG - IT Operations Mail Media Advertising Targeting Brauerstraße 48 · DE-76135 Karlsruhe Telefon: +49 721 91374-8026 christian.eichelm...@1und1.de Amtsgericht Montabaur / HRB 6484 Vorstände

[ceph-users] Behaviour of ceph pg repair on different replication levels

2014-06-23 Thread Christian Eichelmann
Hi ceph users, since our cluster had a few inconsistent pgs in the last time, i was wondering what ceph pg repair does, depending on the replication level. So I just wanted to check if my assumptions are correct: Replication 2x Since the cluster can not decide which version is correct one, it

[ceph-users] PG Scrub Error / active+clean+inconsistent

2014-06-10 Thread Christian Eichelmann
Hi all, after coming back from a long weekend, I found my production cluster in an error state, mentioning 6 scrub errors and 6 pg's in active+clean+inconsistent state. Strange is, that my Prelive-Cluster, running on different Hardware, are also showing 1 scrub error and 1 inconsisten pg... pg

Re: [ceph-users] PG Scrub Error / active+clean+inconsistent

2014-06-10 Thread Christian Eichelmann
Hi again, just found the ceph pg repair command :) Now both clusters are OK again. Anyways, I'm really interested in the caus of the problem. Regards, Christian Am 10.06.2014 10:28, schrieb Christian Eichelmann: Hi all, after coming back from a long weekend, I found my production cluster

[ceph-users] Nagios Check for Ceph-Dash

2014-06-02 Thread Christian Eichelmann
Hi Folks! For those of you, who are using ceph-dash (https://github.com/Crapworks/ceph-dash), I've created a Nagios-Plugin, that uses the json endpoint to monitor your cluster remotely: * https://github.com/Crapworks/check_ceph_dash I think this can be easily adopted to use the ceph-rest-api as

Re: [ceph-users] visualizing a ceph cluster automatically

2014-05-16 Thread Christian Eichelmann
I have written a small and lightweight gui, which can also acts as a json rest api (for non-interactive monitoring): https://github.com/Crapworks/ceph-dash Maybe thats what you searching for. Regards, Christian Von: ceph-users