Re: qemu-1.7.0 and internal snapshot, Was: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-12-20 Thread Oliver Francke
Hi Wido, On 12/20/2013 08:06 AM, Wido den Hollander wrote: On 12/17/2013 05:07 PM, Oliver Francke wrote: Hi Alexandre and Wido ;) well, I know this is a pretty old question... but saw some comments from you Wido as well as a most current patch for qemu-1.7.0 in the git.proxmox ( internal

RBD/qemu-1.7.0 memory leak with drive_mirror/live-migration

2013-12-17 Thread Oliver Francke
cancel the job, memory still occupied, after another try more RSS-memory gets filled. Same procedure with qcow2 does not need any more memory, and the job gets cleared after completion. Possibly @Josh: any idea? Logfiles with debug_what=? needed? ;) Thnx in @vance, Oliver. -- Oliver Francke

qemu-1.7.0 and internal snapshot, Was: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-12-17 Thread Oliver Francke
=virtio0,media=disk,index=0 Did I miss a relevant point? What would be the correct strategy? Thnx in advance and kind regards, Oliver. P.S.: I don't use libvirt nor proxmox as a complete system. On 05/24/2013 10:57 PM, Oliver Francke wrote: Hi Alexandre, Am 24.05.2013 um 17:37 schrieb

Re: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-06-11 Thread Oliver Francke
den Hollander w...@42on.com À: Oliver Francke oliver.fran...@filoo.de Cc: ceph-devel@vger.kernel.org Envoyé: Vendredi 24 Mai 2013 17:08:35 Objet: Re: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend On 05/24/2013 09:46 AM, Oliver Francke wrote: Hi, with a running VM

qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-05-24 Thread Oliver Francke
of this restriction -, it's more work for the customers to perform a shutdown before the wonna do some changes to their VM ;) Any hints welcome, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf

Re: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-05-24 Thread Oliver Francke
Well, On 05/24/2013 05:08 PM, Wido den Hollander wrote: On 05/24/2013 09:46 AM, Oliver Francke wrote: Hi, with a running VM I encounter this strange behaviour, former qemu-versions don't show up such an error. Perhaps this comes from the rbd-backend in qemu-1.5.0 in combination with ceph

Re: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

2013-05-24 Thread Oliver Francke
a closer look next week. Thnx n regards, Oliver. De: Wido den Hollander w...@42on.com À: Oliver Francke oliver.fran...@filoo.de Cc: ceph-devel@vger.kernel.org Envoyé: Vendredi 24 Mai 2013 17:08:35 Objet: Re: qemu-1.5.0 savevm error -95 while writing vm with ceph-rbd as storage-backend

OSD memory leak when scrubbing [0.56.6]

2013-05-21 Thread Oliver Francke
large pg's experiencing such behaviour? Any advice on how to proceed? Thnx in advance, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

Re: OSD memory leak when scrubbing [0.56.6]

2013-05-21 Thread Oliver Francke
Uhm, to be most correct... there was a follow-up even with version 0.56 ;) On 05/21/2013 05:24 PM, Oliver Francke wrote: Well, subject seems familiar, version was 0.48.3 in the last mail. Some more of the story. Before successful upgrade to latest bobtail everything with regards

Re: OSD memory leak when scrubbing [0.56.6]

2013-05-21 Thread Oliver Francke
. Cheers, Sylvain -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body

Re: OSD memory leak when scrubbing [0.56.6]

2013-05-21 Thread Oliver Francke
Well, Am 21.05.2013 um 21:31 schrieb Sage Weil s...@inktank.com: On Tue, 21 May 2013, Stefan Priebe wrote: Am 21.05.2013 17:44, schrieb Sage Weil: On Tue, 21 May 2013, Stefan Priebe - Profihost AG wrote: Am 21.05.2013 um 17:35 schrieb Sylvain Munaut s.mun...@whatever-company.com: Hi,

Re: Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing

2013-03-26 Thread Oliver Francke
Hi Josh, thanks for the quick response and... On 03/26/2013 09:30 AM, Josh Durgin wrote: On 03/25/2013 03:04 AM, Oliver Francke wrote: Hi josh, logfile is attached... Thanks. It shows nothing out of the ordinary, but I just reproduced the incorrect rollback locally, so it shouldn't be hard

Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing

2013-03-22 Thread Oliver Francke
Hi Josh, all, I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things. Today I installed a fresh cluster with mkephfs, went fine, imported a master debian 6.0 image with format 2, made a snapshot, protected it, and made some clones. Clones

Latest 0.56.3 and qemu-1.4.0 and cloned VM-image producing massive fs-corruption, not crashing

2013-03-22 Thread Oliver Francke
Hi Josh, all, I did not want to hijack the thread dealing with a crashing VM, but perhaps there are some common things. Today I installed a fresh cluster with mkephfs, went fine, imported a master debian 6.0 image with format 2, made a snapshot, protected it, and made some clones. Clones

Re: A couple of OSD-crashes after serious network trouble

2012-12-13 Thread Oliver Francke
and both files are md5-wise identical?! Not checked the other 5 inconsistencies. Still having three headers missing and 6 OSD's not checked with scrub, though. Will be back... for sure ;) Thnx for now, Oliver. -Sam On Tue, Dec 11, 2012 at 11:38 AM, Oliver Francke oliver.fran...@filoo.de wrote

Re: A couple of OSD-crashes after serious network trouble

2012-12-11 Thread Oliver Francke
Hi Sam, perhaps you have overlooked my comments further down, beginning with been there ? ;) If so, please have a look, cause I'm clueless 8-) On 12/10/2012 11:48 AM, Oliver Francke wrote: Hi Sam, helpful input.. and... not so... On 12/07/2012 10:18 PM, Samuel Just wrote: Ah

Re: A couple of OSD-crashes after serious network trouble

2012-12-11 Thread Oliver Francke
Hi Sage, Am 11.12.2012 um 18:04 schrieb Sage Weil s...@inktank.com: On Tue, 11 Dec 2012, Oliver Francke wrote: Hi Sam, perhaps you have overlooked my comments further down, beginning with been there ? ;) We're pretty swamped with bobtail stuff at the moment, so ceph-devel inquiries

Re: A couple of OSD-crashes after serious network trouble

2012-12-10 Thread Oliver Francke
repeat for the other 5 cases. Let me know if you have any questions. -Sam On Fri, Dec 7, 2012 at 11:09 AM, Oliver Francke oliver.fran...@filoo.de wrote: Hi Sam, Am 07.12.2012 um 19:37 schrieb Samuel Just sam.j...@inktank.com: That is very likely to be one of the merge_log bugs fixed between

Re: A couple of OSD-crashes after serious network trouble

2012-12-07 Thread Oliver Francke
-osd.40.log.1.gz: 23: (clone()+0x6d) [0x7f7f2f3fc92d] Thnx for looking, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe

Re: A couple of OSD-crashes after serious network trouble

2012-12-07 Thread Oliver Francke
nodes a scrub leads to slow-requests… couple of minutes, so VM's got stalled… customers pressing the reset-button, so losing caches… Comments welcome, Oliver. On Fri, Dec 7, 2012 at 6:39 AM, Oliver Francke oliver.fran...@filoo.de wrote: Hi, is the following a known one, too? Would be good

Re: A couple of OSD-crashes after serious network trouble

2012-12-06 Thread Oliver Francke
Hi, On 12/05/2012 03:54 PM, Sage Weil wrote: On Wed, 5 Dec 2012, Oliver Francke wrote: Hi *, around midnight yesterday we faced some layer-2 network problems. OSD's started to lose heartbeats and so on. Slow requests... you name it. So, after all OSD's doing their work, we had in sum around 6

A couple of OSD-crashes after serious network trouble

2012-12-05 Thread Oliver Francke
some good news ;) Comments welcome, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line unsubscribe

Best practice with 0.48.2 to take a node into maintenance

2012-12-03 Thread Oliver Francke
Hi *, well, even if 0.48.2 is really stable and reliable, it is not everytime the case with linux kernel. We have a couple of nodes, where an update would make life better. So, as our OSD-nodes have to care for VM's too, it's not the problem to let them drain so migrate all of them to other

Re: Best practice with 0.48.2 to take a node into maintenance

2012-12-03 Thread Oliver Francke
Hi Josh, Am 03.12.2012 um 20:14 schrieb Josh Durgin josh.dur...@inktank.com: On 12/03/2012 11:05 AM, Oliver Francke wrote: Hi *, well, even if 0.48.2 is really stable and reliable, it is not everytime the case with linux kernel. We have a couple of nodes, where an update would make life

Re: Best practice with 0.48.2 to take a node into maintenance

2012-12-03 Thread Oliver Francke
Hi Florian, Am 03.12.2012 um 20:45 schrieb Smart Weblications GmbH - Florian Wiessner f.wiess...@smart-weblications.de: Am 03.12.2012 20:21, schrieb Oliver Francke: Hi Josh, Am 03.12.2012 um 20:14 schrieb Josh Durgin josh.dur...@inktank.com: On 12/03/2012 11:05 AM, Oliver Francke wrote

Re: rbd STDIN import does not work / wip-rbd-export-stdout

2012-11-26 Thread Oliver Francke
, Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh

Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Oliver Francke
-partition? Kind regards, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line unsubscribe ceph

Re: Ubuntu 12.04.1 + xfs + syncfs is still not our friend

2012-11-06 Thread Oliver Francke
Hi Jens, sorry for the double work… answered off-list already ;) Oliver. Am 06.11.2012 um 19:46 schrieb Jens Rehpöhler jens.rehpoeh...@filoo.de: On 06.11.2012 18:33, Gandalf Corvotempesta wrote: 2012/11/6 Oliver Francke oliver.fran...@filoo.de: 2012-11-06 17:05:51.863921 7f5cc52e3780 0

Reduce bandwidth for remapping/backfill/recover?

2012-11-03 Thread Oliver Francke
Hi * anybody out there, who can help with an idea for reducing bandwidth when incorporating 2 new nodes into a cluster? I know of osd recovery max active = X ( 5 default), but with 4 OSD's per node, there is enough possibility to saturate our backnet ( 1Gig at the moment). Any other way to not

Re: v0.53 released

2012-10-19 Thread Oliver Francke
Hi Josh, On 10/19/2012 07:42 AM, Josh Durgin wrote: On 10/17/2012 04:26 AM, Oliver Francke wrote: Hi Sage, *, after having some trouble with the journals - had to erase the partition and redo a ceph... --mkjournal - I started my testing... Everything fine. This would be due to the change

Re: v0.53 released

2012-10-19 Thread Oliver Francke
Hi Sage, Am 19.10.2012 um 17:48 schrieb Sage Weil s...@inktank.com: On Fri, 19 Oct 2012, Oliver Francke wrote: Hi Josh, On 10/19/2012 07:42 AM, Josh Durgin wrote: On 10/17/2012 04:26 AM, Oliver Francke wrote: Hi Sage, *, after having some trouble with the journals - had to erase

Re: v0.53 released

2012-10-17 Thread Oliver Francke
: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Re: v0.48.2 argonaut update released

2012-10-01 Thread Oliver Francke
/Ubuntu packages, see http://ceph.newdream.net/docs/master/install/debian -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Oliver Francke filoo GmbH

Re: v0.48.2 argonaut update released

2012-10-01 Thread Oliver Francke
Well, Am 01.10.2012 um 18:07 schrieb Sage Weil s...@inktank.com: On Mon, 1 Oct 2012, Oliver Francke wrote: Hi *, with reference to the below mentioned objecter: misc fixes for op reordering I assumed it could have something to do with slow requests not being solved for too long. I

OSD-crash on 0.48.1argonout, error void ReplicatedPG::recover_got(hobject_t, eversion_t) not seen on list

2012-09-19 Thread Oliver Francke
OSDs per node, one per HDD, 1Gbit was completely saturated, time for next step towards 10Gbit ;) Regards, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http

Re: v0.48.1 argonaut stable update released

2012-08-15 Thread Oliver Francke
Well, On 08/14/2012 09:29 PM, Sage Weil wrote: On Tue, 14 Aug 2012, Oliver Francke wrote: Hi Sage, I just updated to debian-testing/0.50 this afternoon, after some hint: * osd: better tracking of recent slow operations This is actually about the admin socket command to dump operations

Re: v0.48.1 argonaut stable update released

2012-08-14 Thread Oliver Francke
Hi Sage, I just updated to debian-testing/0.50 this afternoon, after some hint: * osd: better tracking of recent slow operations and it is hereby confirmed to be better in my testing environment. Before I had requests, which could be there for 480 seconds… not any more. How's about this fix

Some findings on 0.48, qemu-1.0.1 eating up RDB-write-cache memory

2012-07-09 Thread Oliver Francke
. 100 VM's running). If same VM started with: :rbd_cache=false everything stays as it should. Anybody with similar setup willing to do some testing? Other than that: fast and stable release, it seems ;) Thnx in @vance, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh

Re: all rbd users: set 'filestore fiemap = false'

2012-06-18 Thread Oliver Francke
-info.html -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message

Re: Random data corruption in VM, possibly caused by rbd

2012-06-08 Thread Oliver Francke
: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz

Re: Random data corruption in VM, possibly caused by rbd

2012-06-08 Thread Oliver Francke
Well then, quite busy, too with some other stuff, but... On 06/08/2012 04:50 PM, Josh Durgin wrote: On 06/08/2012 06:55 AM, Sage Weil wrote: On Fri, 8 Jun 2012, Oliver Francke wrote: Hi Guido, yeah, there is something weird going on. I just started to establish some test-VM's. Freshly

Re: Random data corruption in VM, possibly caused by rbd

2012-06-07 Thread Oliver Francke
Hi Guido, unfortunately this sounds very familiar to me. We have been on a long road with similar weird errors. Our setup is something like start a couple of VM's ( qemu-*), let them create a 1G-file each and randomly seek and write 4MB blocks filled with md5sums of the block as payload, to be

Re: q. about rbd-header

2012-03-15 Thread Oliver Francke
Hi Josh, On 03/14/2012 10:59 PM, Josh Durgin wrote: On 03/14/2012 01:49 PM, Oliver Francke wrote: Well, nobody able to sched some light in? Did some math and found out how to fill the size bytes. Sorry I didn't respond faster. But, one question never got answered: - why

q. about rbd-header

2012-03-14 Thread Oliver Francke
Hey, anybody out there who could explain the structure of a rbd-header? After last crash we have about 10 images with a: 2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading header: 2 No such file or directory error opening image vm-266-disk-1.rbd: 2 No such file or directory ...

Re: q. about rbd-header

2012-03-14 Thread Oliver Francke
um 16:05 schrieb Oliver Francke: Hey, anybody out there who could explain the structure of a rbd-header? After last crash we have about 10 images with a: 2012-03-14 15:22:47.998790 7f45a61e3760 librbd: Error reading header: 2 No such file or directory error opening image vm-266-disk-1

Still inconsistant pg's, ceph-osd crashes reliably after trying to repair

2012-03-01 Thread Oliver Francke
-osd died, too, after doing some rbd rm pool/image the one block in question remained, visable via rados ls -p pool Any idea, o better clue? ;-) Kind reg's, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing

Re: Still inconsistant pg's, ceph-osd crashes reliably after trying to repair

2012-03-01 Thread Oliver Francke
Well, Am 01.03.2012 um 18:15 schrieb Oliver Francke: Hi *, after some crashes we still had to care for some remaining inconsistancies reported via ceph -w and friends. Well, we traced one of them down via ceph pg dump and we picked 79. pg=79.7 and found the corresponding file

Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Oliver Francke
some light in 8-) Kind regards, Oliver. -- Oliver Francke filoo GmbH Moltkestraße 25a 0 Gütersloh HRB4355 AG Gütersloh Geschäftsführer: S.Grewing | J.Rehpöhler | C.Kunz Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh -- To unsubscribe from this list: send the line unsubscribe ceph

Re: Recommended number of pools, one Q. ever wanted to ask

2012-02-28 Thread Oliver Francke
Well, On 02/28/2012 10:42 AM, Wido den Hollander wrote: Hi, On 02/28/2012 10:35 AM, Oliver Francke wrote: Hi *, well, there was once a comment on our layout in means of too many pools. Our setup is to have a pool per customer, to simplify the view on used storage capacity. So, if we have

Re: Problem with inconsistent PG

2012-02-17 Thread Oliver Francke
Well, Am 17.02.2012 um 18:54 schrieb Sage Weil: On Fri, 17 Feb 2012, Oliver Francke wrote: Well then, found it via the ceph osd dump via the pool-id, thanks. The according customer opened a ticket this morning for not being able to boot his VM after shutdown. So I had to do some

Re: Problem with inconsistent PG

2012-02-16 Thread Oliver Francke
Hi Sage, *, your tip with truncating from below did not solve the problem. Just to recap: we had two inconsistencies, which we could break down to something like: rb.0.0.__head_DA680EE2 according to the ceph dump from below. Walking to the node with the OSD mounted on /data/osd3

Re: Problem with inconsistent PG

2012-02-16 Thread Oliver Francke
Hi Sage, thnx for the quick response, Am 16.02.2012 um 18:17 schrieb Sage Weil: On Thu, 16 Feb 2012, Oliver Francke wrote: Hi Sage, *, your tip with truncating from below did not solve the problem. Just to recap: we had two inconsistencies, which we could break down to something like