Re: [ceph-users] OSD crash after change of osd_memory_target

2020-01-22 Thread Igor Fedotov
Hi Martin, looks like a bug to me. You might want to remove all custom settings from config database and try to set osd-memory-target only. Would it help? Thanks, Igor On 1/22/2020 3:43 PM, Martin Mlynář wrote: Dne 21. 01. 20 v 21:12 Stefan Kooman napsal(a): Quoting Martin Mlynář

Re: [ceph-users] OSD crash after change of osd_memory_target

2020-01-22 Thread Martin Mlynář
Dne 21. 01. 20 v 21:12 Stefan Kooman napsal(a): > Quoting Martin Mlynář (nexus+c...@smoula.net): > >> Do you think this could help? OSD does not even start, I'm getting a little >> lost how flushing caches could help. > I might have mis-understood. I though the OSDs crashed when you set the >

Re: [ceph-users] OSD crash after change of osd_memory_target

2020-01-21 Thread Stefan Kooman
Quoting Martin Mlynář (nexus+c...@smoula.net): > Do you think this could help? OSD does not even start, I'm getting a little > lost how flushing caches could help. I might have mis-understood. I though the OSDs crashed when you set the config setting. > According to trace I suspect something

Re: [ceph-users] OSD crash after change of osd_memory_target

2020-01-21 Thread Martin Mlynář
Dne út 21. 1. 2020 17:09 uživatel Stefan Kooman napsal: > Quoting Martin Mlynář (nexus+c...@smoula.net): > > > > > When I remove this option: > > # ceph config rm osd osd_memory_target > > > > OSD starts without any trouble. I've seen same behaviour when I wrote > > this parameter into

Re: [ceph-users] OSD crash after change of osd_memory_target

2020-01-21 Thread Stefan Kooman
Quoting Martin Mlynář (nexus+c...@smoula.net): > > When I remove this option: > # ceph config rm osd osd_memory_target > > OSD starts without any trouble. I've seen same behaviour when I wrote > this parameter into /etc/ceph/ceph.conf > > Is this a known bug? Am I doing something wrong? I

[ceph-users] OSD crash after change of osd_memory_target

2020-01-21 Thread Martin Mlynář
Hi, I'm having troubles changing osd_memory_target on my test cluster. I've upgraded whole cluster from luminous to nautiuls, all OSDs are running bluestore. Because this testlab is short in RAM, I wanted to lower osd_memory_target to save some memory. # ceph version ceph version 14.2.6

Re: [ceph-users] OSD Crash When Upgrading from Jewel to Luminous?

2018-08-22 Thread Gregory Farnum
Adjusting CRUSH weight shouldn't have caused this. Unfortunately the logs don't have a lot of hints — the thread that crashed doesn't have any output except for the Crashed state. If you can reproduce this with more debugging on we ought to be able to track it down; if not it seems we missed a

Re: [ceph-users] OSD Crash When Upgrading from Jewel to Luminous?

2018-08-21 Thread Kenneth Van Alstyne
After looking into this further, is it possible that adjusting CRUSH weight of the OSDs while running mis-matched versions of the ceph-osd daemon across the cluster can cause this issue? Under certain circumstances in our cluster, this may happen automatically on the backend. I can’t

Re: [ceph-users] OSD Crash When Upgrading from Jewel to Luminous?

2018-08-17 Thread Gregory Farnum
Do you have more logs that indicate what state machine event the crashing OSDs received? This obviously shouldn't have happened, but it's a plausible failure mode, especially if it's a relatively rare combination of events. -Greg On Fri, Aug 17, 2018 at 4:49 PM Kenneth Van Alstyne <

[ceph-users] OSD Crash When Upgrading from Jewel to Luminous?

2018-08-17 Thread Kenneth Van Alstyne
Hello all: I ran into an issue recently with one of my clusters when upgrading from 10.2.10 to 12.2.7. I have previously tested the upgrade in a lab and upgraded one of our five production clusters with no issues. On the second cluster, however, I ran into an issue where all OSDs that

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
On Tue, Mar 27, 2018 at 9:04 PM, Dietmar Rieder wrote: > Thanks Brad! Hey Dietmar, yw. > > I added some information to the ticket. > Unfortunately I still could not grab a coredump, since there was no > segfault lately. OK. That may help to get us started. Getting

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Dietmar Rieder
Thanks Brad! I added some information to the ticket. Unfortunately I still could not grab a coredump, since there was no segfault lately. http://tracker.ceph.com/issues/23431 Maybe Oliver has something to add as well. Dietmar On 03/27/2018 11:37 AM, Brad Hubbard wrote: > "NOTE: a copy of

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-27 Thread Brad Hubbard
"NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this." Have you ever wondered what this means and why it's there? :) This is at least something you can try. it may provide useful information, it may not. This stack looks like it is either corrupted, or possibly not in

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-23 Thread Dietmar Rieder
Hi, I encountered one more two days ago, and I opened a ticket: http://tracker.ceph.com/issues/23431 In our case it is more like 1 every two weeks, for now... And it is affecting different OSDs on different hosts. Dietmar On 03/23/2018 11:50 AM, Oliver Freyermuth wrote: > Hi together, > > I

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-23 Thread Oliver Freyermuth
Hi together, I notice exactly the same, also the same addresses, Luminous 12.2.4, CentOS 7. Sadly, logs are equally unhelpful. It happens randomly on an OSD about once per 2-3 days (of the 196 total OSDs we have). It's also not a container environment. Cheers, Oliver Am 08.03.2018

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-09 Thread Dietmar Rieder
On 03/09/2018 12:49 AM, Brad Hubbard wrote: > On Fri, Mar 9, 2018 at 3:54 AM, Subhachandra Chandra > wrote: >> I noticed a similar crash too. Unfortunately, I did not get much info in the >> logs. >> >> *** Caught signal (Segmentation fault) ** >> >> Mar 07 17:58:26 data7

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-08 Thread Brad Hubbard
On Fri, Mar 9, 2018 at 3:54 AM, Subhachandra Chandra wrote: > I noticed a similar crash too. Unfortunately, I did not get much info in the > logs. > > *** Caught signal (Segmentation fault) ** > > Mar 07 17:58:26 data7 ceph-osd-run.sh[796380]: in thread 7f63a0a97700 >

Re: [ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-08 Thread Subhachandra Chandra
I noticed a similar crash too. Unfortunately, I did not get much info in the logs. *** Caught signal (Segmentation fault) ** Mar 07 17:58:26 data7 ceph-osd-run.sh[796380]: in thread 7f63a0a97700 thread_name:safe_timer Mar 07 17:58:28 data7 ceph-osd-run.sh[796380]: docker_exec.sh: line 56:

[ceph-users] OSD crash with segfault Luminous 12.2.4

2018-03-08 Thread Dietmar Rieder
Hi, I noticed in my client (using cephfs) logs that an osd was unexpectedly going down. While checking the osd logs for the affected OSD I found that the osd was seg faulting: [] 2018-03-07 06:01:28.873049 7fd9af370700 -1 *** Caught signal (Segmentation fault) ** in thread 7fd9af370700

Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-07 Thread Jan Pekař - Imatic
On 6.3.2018 22:28, Gregory Farnum wrote: On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic > wrote: Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few

Re: [ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-06 Thread Gregory Farnum
On Sat, Mar 3, 2018 at 2:28 AM Jan Pekař - Imatic wrote: > Hi all, > > I have few problems on my cluster, that are maybe linked together and > now caused OSD down during pg repair. > > First few notes about my cluster: > > 4 nodes, 15 OSDs installed on Luminous (no upgrade).

[ceph-users] OSD crash during pg repair - recovery_info.ss.clone_snaps.end and other problems

2018-03-03 Thread Jan Pekař - Imatic
Hi all, I have few problems on my cluster, that are maybe linked together and now caused OSD down during pg repair. First few notes about my cluster: 4 nodes, 15 OSDs installed on Luminous (no upgrade). Replicated pools with 1 pool (pool 6) cached by ssd disks. I don't detect any hardware

Re: [ceph-users] osd crash because rocksdb report  ‘Compaction error: Corruption: block checksum mismatch’

2017-09-17 Thread wei.qiaomiao
om build? Are there any changes to the > > source code? yes, we builded the source code by ourself, but there are not any changes for source code . Original Mail Sender: <s...@newdream.net> To: WeiQiaoMiao00105316 CC: <ceph-users@lists.ceph.com> Date: 2017/09/17 01:56 S

Re: [ceph-users] osd crash because rocksdb report  ‘Compaction error: Corruption: block checksum mismatch’

2017-09-16 Thread Sage Weil
On Fri, 15 Sep 2017, wei.qiaom...@zte.com.cn wrote: > > Hi,all    > >    My cluster running  12.2.0  with bluestore, we used fio tool with > librbd ioengine make io test  yesterday, and serval osds crash one after > another. > >    3 * node, 30 OSD, 1TB SATA HDD for OSD data, 1GB SATA SSD  

[ceph-users] osd crash because rocksdb report  ‘Compaction error: Corruption: block checksum mismatch’

2017-09-15 Thread wei.qiaomiao
Hi,all My cluster running 12.2.0 with bluestore, we used fio tool with librbd ioengine make io test yesterday, and serval osds crash one after another. 3 * node, 30 OSD, 1TB SATA HDD for OSD data, 1GB SATA SSD partition for db, 576 MB SATA SSD partition for wal. ceph

[ceph-users] OSD crash (hammer): osd/ReplicatedPG.cc: 7477: FAILED assert(repop_queue.front() == repop)

2017-06-09 Thread Ricardo J. Barberis
Hi list, A few days ago we had some problems with our ceph cluster, and now we have some OSDs crashing on start with messages like this right before crashing: 2017-06-09 15:35:02.226430 7fb46d9e4700 -1 log_channel(cluster) log [ERR] : trim_object Snap 4aae0 not in clones I can start those OSDs

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-06-05 Thread Stephen M. Anthony ( Faculty/Staff - Ctr for Innovation in Teach & )
Using rbd ls -l poolname to list all images and their snapshots, then purging snapshots from each image with rbd snap purge poolname/imagename, then finally reweighing each flapping OSD to 0.0 resolved this issue. -Steve On 2017-06-02 14:15, Steve Anthony wrote: I'm seeing this again on two

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-06-02 Thread Steve Anthony
I'm seeing this again on two OSDs after adding another 20 disks to my cluster. Is there someway I can maybe determine which snapshots the recovery process is looking for? Or maybe find and remove the objects it's trying to recover, since there's apparently a problem with them? Thanks! -Steve On

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-05-18 Thread Steve Anthony
Hmmm, after crashing for a few days every 30 seconds it's apparently running normally again. Weird. I was thinking since it's looking for a snapshot object, maybe re-enabling snaptrimming and removing all the snapshots in the pool would remove that object (and the problem)? Never got to that point

Re: [ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-05-17 Thread Gregory Farnum
On Wed, May 17, 2017 at 10:51 AM Steve Anthony wrote: > Hello, > > After starting a backup (create snap, export and import into a second > cluster - one RBD image still exporting/importing as of this message) > the other day while recovery operations on the primary cluster

[ceph-users] OSD crash loop - FAILED assert(recovery_info.oi.snaps.size())

2017-05-17 Thread Steve Anthony
Hello, After starting a backup (create snap, export and import into a second cluster - one RBD image still exporting/importing as of this message) the other day while recovery operations on the primary cluster were ongoing I noticed an OSD (osd.126) start to crash; I reweighted it to 0 to prepare

Re: [ceph-users] osd crash - disk hangs

2016-12-01 Thread Warren Wang - ISD
com'" <ceph-users@lists.ceph.com> Subject: [ceph-users] osd crash - disk hangs Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it hangs. Thank You! Dec 1 00:31:30 ceph2 kern

Re: [ceph-users] osd crash

2016-12-01 Thread VELARTIS Philipp Dürhammer
to the latest in the 4.4 series. From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of VELARTIS Philipp Dürhammer Sent: 01 December 2016 12:04 To: 'ceph-us...@ceph.com' <ceph-us...@ceph.com<mailto:ceph-us...@ceph.com>> Subject: [ceph-users] osd crash Hello! Tonight

[ceph-users] osd crash - disk hangs

2016-12-01 Thread VELARTIS Philipp Dürhammer
Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? I cant do a lsof or ps ax because it hangs. Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP Dec 1 00:31:30 ceph2 kernel:

Re: [ceph-users] osd crash

2016-12-01 Thread Nick Fisk
...@ceph.com' <ceph-us...@ceph.com> Subject: [ceph-users] osd crash Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#

[ceph-users] osd crash

2016-12-01 Thread VELARTIS Philipp Dürhammer
Hello! Tonight i had a osd crash. See the dump below. Also this osd is still mounted. Whats the cause? A bug? What to do next? Thank You! Dec 1 00:31:30 ceph2 kernel: [17314369.493029] divide error: [#1] SMP Dec 1 00:31:30 ceph2 kernel: [17314369.493062] Modules linked in: act_police

Re: [ceph-users] OSD crash after conversion to bluestore

2016-03-31 Thread Oliver Dzombic
Hi, if i understand it correct, bluestore wont use / is not a filesystem to be mounted. So if an osd is up and in, while we dont see its mounted into the filesystem and accessable, we could assume that it must be powered by bluestore... !??! -- Mit freundlichen Gruessen / Best regards Oliver

[ceph-users] OSD crash after conversion to bluestore

2016-03-30 Thread Adrian Saul
I upgraded my lab cluster to 10.1.0 specifically to test out bluestore and see what latency difference it makes. I was able to one by one zap and recreate my OSDs to bluestore and rebalance the cluster (the change to having new OSDs start with low weight threw me at first, but once I worked

Re: [ceph-users] OSD Crash with scan_min and scan_max values reduced

2016-02-22 Thread M Ranga Swami Reddy
So basically the issue - http://tracker.ceph.com/issues/4698 osd suicide timeout On Mon, Feb 22, 2016 at 7:06 PM, M Ranga Swami Reddy wrote: > Hello, > I have reduced the scan_min and scan_max as below. After the below > change, during the scrubbing, got the op_tp_thread

[ceph-users] OSD Crash with scan_min and scan_max values reduced

2016-02-22 Thread M Ranga Swami Reddy
Hello, I have reduced the scan_min and scan_max as below. After the below change, during the scrubbing, got the op_tp_thread time out after 15. After some time, OSDs crashed also... Any suggestions will be helpful... Thanking you. == -osd_backfill_scan_min = 64 -osd_backfill_scan_max = 512

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Major Csaba
Hi, On 12/02/2015 08:12 PM, Gregory Farnum wrote: On Wed, Dec 2, 2015 at 11:11 AM, Major Csaba wrote: Hi, [ sorry, I accidentaly left out the list address ] This is the content of the LOG file in the directory /var/lib/ceph/osd/ceph-7/current/omap:

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Gregory Farnum
On Wed, Dec 2, 2015 at 10:54 AM, Major Csaba wrote: > Hi, > > I have a small cluster(5 nodes, 20OSDs), where an OSD crashed. There is no > any other signal of problems. No kernel message, so the disks seem to be OK. > > I tried to restart the OSD but the process stops

[ceph-users] OSD crash, unable to restart

2015-12-02 Thread Major Csaba
Hi, I have a small cluster(5 nodes, 20OSDs), where an OSD crashed. There is no any other signal of problems. No kernel message, so the disks seem to be OK. I tried to restart the OSD but the process stops almost immediately with the same logs. Version is 0.94.5

Re: [ceph-users] OSD crash, unable to restart

2015-12-02 Thread Gregory Farnum
On Wed, Dec 2, 2015 at 11:11 AM, Major Csaba wrote: > Hi, > [ sorry, I accidentaly left out the list address ] > > This is the content of the LOG file in the directory > /var/lib/ceph/osd/ceph-7/current/omap: > 2015/12/02-18:48:12.241386 7f805fc27900 Recovering log #26281

Re: [ceph-users] osd crash and high server load - ceph-osd crashes with stacktrace

2015-10-25 Thread Jacek Jarosiewicz
We've upgraded ceph to 0.94.4 and kernel to 3.16.0-51-generic but the problem still persists. Lately we see these crashes on a daily basis. I'm leaning toward the conclusion that this is a software problem - this hardware ran stable before and we're seeing all four nodes crash randomly with

[ceph-users] osd crash and high server load - ceph-osd crashes with stacktrace

2015-10-09 Thread Jacek Jarosiewicz
Hi, We've noticed a problem with our cluster setup: 4 x OSD nodes: E5-1630 CPU 32 GB RAM Mellanox MT27520 56Gbps network cards SATA controller LSI Logic SAS3008 Storage nodes are connected to two SuperMicro chassis: 847E1C-R1K28JBOD Each node has 2-3 spinning OSDs (6TB drives) and 2 ssd drives

Re: [ceph-users] OSD crash

2015-09-22 Thread Alex Gorbachev
t; <a...@iss-integration.com> > > To: "ceph-users" <ceph-users@lists.ceph.com> > > Sent: Wednesday, 9 September, 2015 6:38:50 AM > > Subject: [ceph-users] OSD crash > > > Hello, > > > We have run into an OSD crash this weekend with the fol

Re: [ceph-users] OSD crash

2015-09-22 Thread Brad Hubbard
- Original Message - > From: "Alex Gorbachev" <a...@iss-integration.com> > To: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Wednesday, 9 September, 2015 6:38:50 AM > Subject: [ceph-users] OSD crash > Hello, > We have run into

[ceph-users] OSD crash

2015-09-08 Thread Alex Gorbachev
Hello, We have run into an OSD crash this weekend with the following dump. Please advise what this could be. Best regards, Alex 2015-09-07 14:55:01.345638 7fae6c158700 0 -- 10.80.4.25:6830/2003934 >> 10.80.4.15:6813/5003974 pipe(0x1dd73000 sd=257 :6830 s=2 pgs=14271 cs=251 l=0

[ceph-users] osd crash with object store as newstore

2015-05-30 Thread Srikanth Madugundi
Hi, I build ceph code from wip-newstore on RHEL7 and running performance tests to compare with filestore. After few hours of running the tests the osd daemons started to crash. Here is the stack trace, the osd crashes immediately after the restart. So I could not get the osd up and running. ceph

[ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Christoph Adomeit
Hi there, today I had an osd crash with ceph 0.87/giant which made my hole cluster unusable for 45 Minutes. First it began with a disk error: sd 0:1:2:0: [sdc] CDB: Read(10)Read(10):: 28 28 00 00 0d 15 fe d0 fd 7b e8 f8 00 00 00 00 b0 08 00 00 XFS (sdc1): xfs_imap_to_bp: xfs_trans_read_buf()

Re: [ceph-users] OSD Crash makes whole cluster unusable ?

2014-12-16 Thread Craig Lewis
So the problem started once remapping+backfilling started, and lasted until the cluster was healthy again? Have you adjusted any of the recovery tunables? Are you using SSD journals? I had a similar experience the first time my OSDs started backfilling. The average RadosGW operation latency

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-19 Thread Francois Deppierraz
Hi Craig, I'm planning to completely re-install this cluster with firefly because I started to see other OSDs crashes with the same trim_object error... So now, I'm more interested in figuring out exactly why data corruption happened in the first place than repairing the cluster. Comments

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-19 Thread Craig Lewis
On Fri, Sep 19, 2014 at 2:35 AM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi Craig, I'm planning to completely re-install this cluster with firefly because I started to see other OSDs crashes with the same trim_object error... I did lose data because of this, but it was unrelated

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-16 Thread Craig Lewis
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250) All logs from before the disaster are still there, do you have any advise on what would be relevant? This is a problem. It's not

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-12 Thread Francois Deppierraz
Hi, Following-up this issue, I've identified that almost all unfound objects belongs to a single RBD volume (with the help of the script below). Now what's the best way to try to recover the filesystem stored on this RBD volume? 'mark_unfound_lost revert' or 'mark_unfound_lost lost' and then

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-12 Thread Gregory Farnum
On Fri, Sep 12, 2014 at 4:41 AM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi, Following-up this issue, I've identified that almost all unfound objects belongs to a single RBD volume (with the help of the script below). Now what's the best way to try to recover the filesystem stored

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-11 Thread Francois Deppierraz
Hi Greg, An attempt to recover pg 3.3ef by copying it from broken osd.6 to working osd.32 resulted in one more broken osd :( Here's what was actually done: root@storage1:~# ceph pg 3.3ef list_missing | head { offset: { oid: , key: , snapid: 0, hash: 0, max: 0,

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Gregory Farnum
On Mon, Sep 8, 2014 at 1:42 AM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi, This issue is on a small 2 servers (44 osds) ceph cluster running 0.72.2 under Ubuntu 12.04. The cluster was filling up (a few osds near full) and I tried to increase the number of pg per pool to 1024 for

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Francois Deppierraz
Hi Greg, Thanks for your support! On 08. 09. 14 20:20, Gregory Farnum wrote: The first one is not caused by the same thing as the ticket you reference (it was fixed well before emperor), so it appears to be some kind of disk corruption. The second one is definitely corruption of some kind

Re: [ceph-users] osd crash: trim_objectcould not find coid

2014-09-08 Thread Gregory Farnum
On Mon, Sep 8, 2014 at 2:53 PM, Francois Deppierraz franc...@ctrlaltdel.ch wrote: Hi Greg, Thanks for your support! On 08. 09. 14 20:20, Gregory Farnum wrote: The first one is not caused by the same thing as the ticket you reference (it was fixed well before emperor), so it appears to be

[ceph-users] Osd crash and misplaced objects after rapid object deletion

2013-07-23 Thread Michael Lowe
On two different occasions I've had an osd crash and misplace objects when rapid object deletion has been triggered by discard/trim operations with the qemu rbd driver. Has anybody else had this kind of trouble? The objects are still on disk, just not in a place where the osd thinks is valid.

[ceph-users] OSD crash upon pool creation

2013-07-15 Thread Andrey Korolyov
Hello, Using db2bb270e93ed44f9252d65d1d4c9b36875d0ea5 I had observed some disaster-alike behavior after ``pool create'' command - every osd daemon in the cluster will die at least once(some will crash times in a row after bringing back). Please take a look on the backtraces(almost identical)

Re: [ceph-users] OSD crash during script, 0.56.4

2013-05-13 Thread Travis Rhoden
I'm afraid I don't. I don't think I looked when it happened, and searching for one just now came up empty. :/ If it happens again, I'll be sure to keep my eye out for one. FWIW, this particular server (1 out of 5) has 8GB *less* RAM than the others (one bad stick, it seems), and this has

[ceph-users] OSD crash during script, 0.56.4

2013-05-07 Thread Travis Rhoden
Hey folks, Saw this crash the other day: ceph version 0.56.4 (63b0f854d1cef490624de5d6cf9039735c7de5ca) 1: /usr/bin/ceph-osd() [0x788fba] 2: (()+0xfcb0) [0x7f19d1889cb0] 3: (gsignal()+0x35) [0x7f19d0248425] 4: (abort()+0x17b) [0x7f19d024bb8b] 5: