from:"Andrey Korolyov"

Re: still recovery issues with cuttlefish

2013-08-01 Thread Andrey Korolyov

Second this. Also for long-lasting snapshot problem and related performance issues I may say that cuttlefish improved things greatly, but creation/deletion of large snapshot (hundreds of gigabytes of commited data) still can bring down cluster for a minutes, despite usage of every possible

Re: still recovery issues with cuttlefish

2013-08-02 Thread Andrey Korolyov

Created #5844. On Thu, Aug 1, 2013 at 10:38 PM, Samuel Just sam.j...@inktank.com wrote: Is there a bug open for this? I suspect we don't sufficiently throttle the snapshot removal work. -Sam On Thu, Aug 1, 2013 at 7:50 AM, Andrey Korolyov and...@xdel.ru wrote: Second this. Also for long

Re: libvirt: Removing RBD volumes with snapshots, auto purge or not?

2013-08-20 Thread Andrey Korolyov

On Tue, Aug 20, 2013 at 7:36 PM, Wido den Hollander w...@42on.com wrote: Hi, The current [0] libvirt storage pool code simply calls rbd_remove without anything else. As far as I know rbd_remove will fail if the image still has snapshots, you have to remove those snapshots first before you

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov

You may want to reduce scrubbing pgs per osd to 1 using config option and check the results. On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote: We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests. As we've been chasing this bug, we've

Re: Deep-Scrub and High Read Latency with QEMU/RBD

2013-08-30 Thread Andrey Korolyov

PM, Andrey Korolyov wrote: You may want to reduce scrubbing pgs per osd to 1 using config option and check the results. On Fri, Aug 30, 2013 at 8:03 PM, Mike Dawson mike.daw...@cloudapt.com wrote: We've been struggling with an issue of spikes of high i/o latency with qemu/rbd guests

Hiding auth key string for the qemu process

2013-09-22 Thread Andrey Korolyov

Hello, Since it was a long time from enabling cephx by default and we may think that everyone using it, is seems worthy to introduce bits of code hiding the key from cmdline. First applicable place for such improvement is most-likely OpenStack envs with their sparse security and usage of admin

Re: Ceph users meetup

2013-09-25 Thread Andrey Korolyov

If anyone attends to the CloudConf Europe, it would be nice to meet in in real world too. On Wed, Sep 25, 2013 at 2:29 PM, Wido den Hollander w...@42on.com wrote: On 09/25/2013 10:53 AM, Loic Dachary wrote: Hi Eric Patrick, Yesterday morning Eric suggested that organizing a ceph user meetup

Large time shift causes OSD to hit suicide timeout and ABRT

2013-10-03 Thread Andrey Korolyov

Hello, Not sure if this matches any real-world problem: step time server 192.168.10.125 offset 30763065.968946 sec #0 0x7f2d0294d405 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x7f2d02950b5b in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x7f2d0324b875 in

Re: xfs Warnings in syslog

2013-10-22 Thread Andrey Korolyov

Just my two cents: XFS is a quite unstable with Ceph especially along with heavy CPU usage up to 3.7(primarily soft lockups). I used 3.7 for eight months before upgrade on production system and it performs just perfectly. On Tue, Oct 22, 2013 at 1:29 PM, Jeff Liu jeff@oracle.com wrote:

XFS preallocation with lot of small objects

2013-12-10 Thread Andrey Korolyov

Hello, Due to lot of reports of ENOSPC for xfs-based stores may be it worth to introduce an option to, say, ceph-deploy which will pass allocsize= param to the mount effectively disabling Dynamic Preallocation? Of course not every case really worth it because of related performance impact. If

Re: [librbd] Add interface of get the snapshot size?

2014-03-24 Thread Andrey Korolyov

On 03/24/2014 05:30 PM, Haomai Wang wrote: Hi all, As we know, snapshot is a lightweight resource in librbd and we doesn't have any statistic informations about it. But it causes some problems to the cloud management. We can't measure the size of snapshot, different snapshot will occur

Helper for state replication machine

2014-05-20 Thread Andrey Korolyov

Hello, I do not know about how many of you aware of this work of Michael Hines [0], but looks like it can be extremely usable for critical applications using qemu and, of course, Ceph at the block level. My thought was that if qemu rbd driver can provide any kind of metadata interface to mark

Re: SMART monitoring

2014-05-22 Thread Andrey Korolyov

On Fri, Dec 27, 2013 at 9:09 PM, Andrey Korolyov and...@xdel.ru wrote: On 12/27/2013 08:15 PM, Justin Erenkrantz wrote: On Thu, Dec 26, 2013 at 9:17 PM, Sage Weil s...@inktank.com wrote: I think the question comes down to whether Ceph should take some internal action based on the information

Re: Ceph slow request unstable issue

2013-01-16 Thread Andrey Korolyov

On Wed, Jan 16, 2013 at 10:35 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jan 16, 2013 at 8:58 PM, Sage Weil s...@inktank.com wrote: Hi, On Wed, 16 Jan 2013, Andrey Korolyov wrote: On Wed, Jan 16, 2013 at 4:58 AM, Chen, Xiaoxi xiaoxi.c...@intel.com wrote: Hi list, We

Re: Single host VM limit when using RBD

2013-01-17 Thread Andrey Korolyov

Hi Matthew, Seems to a low value in /proc/sys/kernel/threads-max value. On Thu, Jan 17, 2013 at 12:37 PM, Matthew Anderson matth...@base3.com.au wrote: I've run into a limit on the maximum number of RBD backed VM's that I'm able to run on a single host. I have 20 VM's (21 RBD volumes open)

Re: flashcache

2013-01-17 Thread Andrey Korolyov

On Thu, Jan 17, 2013 at 7:00 PM, Atchley, Scott atchle...@ornl.gov wrote: On Jan 17, 2013, at 9:48 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/1/17 Atchley, Scott atchle...@ornl.gov: IB DDR should get you close to 2 GB/s with IPoIB. I have gotten our IB QDR PCI-E

Re: handling fs errors

2013-01-21 Thread Andrey Korolyov

On Tue, Jan 22, 2013 at 10:05 AM, Sage Weil s...@inktank.com wrote: We observed an interesting situation over the weekend. The XFS volume ceph-osd locked up (hung in xfs_ilock) for somewhere between 2 and 4 minutes. After 3 minutes (180s), ceph-osd gave up waiting and committed suicide. XFS

Re: Hit suicide timeout after adding new osd

2013-01-23 Thread Andrey Korolyov

On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again.

Re: Hit suicide timeout after adding new osd

2013-01-23 Thread Andrey Korolyov

On Thu, Jan 24, 2013 at 8:39 AM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set

Re: how to protect rbd from multiple simultaneous mapping

2013-01-25 Thread Andrey Korolyov

On Fri, Jan 25, 2013 at 7:51 PM, Sage Weil s...@inktank.com wrote: On Fri, 25 Jan 2013, Andrey Korolyov wrote: On Fri, Jan 25, 2013 at 4:52 PM, Ugis ugi...@gmail.com wrote: I mean if you map rbd and do not use rbd lock.. command. Can you tell which client has mapped certain rbd anyway

Re: page allocation failures on osd nodes

2013-01-26 Thread Andrey Korolyov

On Sat, Jan 26, 2013 at 3:40 AM, Sam Lang sam.l...@inktank.com wrote: On Fri, Jan 25, 2013 at 10:07 AM, Andrey Korolyov and...@xdel.ru wrote: Sorry, I have written too less yesterday because of being sleepy. That`s obviously a cache pressure since dropping caches resulted in disappearance

Re: page allocation failures on osd nodes

2013-01-27 Thread Andrey Korolyov

On Sat, Jan 26, 2013 at 12:41 PM, Andrey Korolyov and...@xdel.ru wrote: On Sat, Jan 26, 2013 at 3:40 AM, Sam Lang sam.l...@inktank.com wrote: On Fri, Jan 25, 2013 at 10:07 AM, Andrey Korolyov and...@xdel.ru wrote: Sorry, I have written too less yesterday because of being sleepy. That`s

Re: page allocation failures on osd nodes

2013-01-28 Thread Andrey Korolyov

On Mon, Jan 28, 2013 at 5:48 PM, Sam Lang sam.l...@inktank.com wrote: On Sun, Jan 27, 2013 at 2:52 PM, Andrey Korolyov and...@xdel.ru wrote: Ahem. once on almost empty node same trace produced by qemu process(which was actually pinned to the specific numa node), so seems that`s generally

Re: page allocation failures on osd nodes

2013-01-29 Thread Andrey Korolyov

On Mon, Jan 28, 2013 at 8:55 PM, Andrey Korolyov and...@xdel.ru wrote: On Mon, Jan 28, 2013 at 5:48 PM, Sam Lang sam.l...@inktank.com wrote: On Sun, Jan 27, 2013 at 2:52 PM, Andrey Korolyov and...@xdel.ru wrote: Ahem. once on almost empty node same trace produced by qemu process(which

Re: Paxos and long-lasting deleted data

2013-01-31 Thread Andrey Korolyov

http://xdel.ru/downloads/ceph-log/rados-out.txt.gz On Thu, Jan 31, 2013 at 10:31 PM, Gregory Farnum g...@inktank.com wrote: Can you pastebin the output of rados -p rbd ls? On Thu, Jan 31, 2013 at 10:17 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, Please take a look, this data remains

Re: Paxos and long-lasting deleted data

2013-02-03 Thread Andrey Korolyov

On Thu, Jan 31, 2013 at 11:18 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Jan 31, 2013 at 10:56 PM, Gregory Farnum g...@inktank.com wrote: On Thu, Jan 31, 2013 at 10:50 AM, Andrey Korolyov and...@xdel.ru wrote: http://xdel.ru/downloads/ceph-log/rados-out.txt.gz On Thu, Jan 31, 2013

Re: Paxos and long-lasting deleted data

2013-02-03 Thread Andrey Korolyov

On Mon, Feb 4, 2013 at 1:46 AM, Gregory Farnum g...@inktank.com wrote: On Sunday, February 3, 2013 at 11:45 AM, Andrey Korolyov wrote: Just an update: this data stayed after pool deletion, so there is probably a way to delete garbage bytes on live pool without doing any harm(hope so), since

Re: rbd export speed limit

2013-02-12 Thread Andrey Korolyov

Hi Stefan, you may be interested in throttle(1) as a side solution with stdout export option. By the way, on which interconnect you have manage to get such speeds, if you mean 'commited' bytes(e.g. not almost empty allocated image)? On Wed, Feb 13, 2013 at 12:22 AM, Stefan Priebe

Re: [0.48.3] OSD memory leak when scrubbing

2013-02-15 Thread Andrey Korolyov

Can anyone who hit this bug please confirm that your system contains libc 2.15+? On Tue, Feb 5, 2013 at 1:27 AM, Sébastien Han han.sebast...@gmail.com wrote: oh nice, the pattern also matches path :D, didn't know that thanks Greg -- Regards, Sébastien Han. On Mon, Feb 4, 2013 at 10:22 PM,

Re: Hit suicide timeout after adding new osd

2013-02-17 Thread Andrey Korolyov

On Thu, Jan 24, 2013 at 10:01 PM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 8:39 AM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j

Re: rbd export speed limit

2013-02-20 Thread Andrey Korolyov

On Wed, Feb 13, 2013 at 12:22 AM, Stefan Priebe s.pri...@profihost.ag wrote: Hi, is there a speed limit option for rbd export? Right now i'm able to produce several SLOW requests from IMPORTANT valid requests while just exporting a snapshot which is not really important. rbd export runs

Re: maintanance on osd host

2013-02-26 Thread Andrey Korolyov

On Tue, Feb 26, 2013 at 6:56 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi list, how can i do a short maintanance like a kernel upgrade on an osd host? Right now ceph starts to backfill immediatly if i say: ceph osd out 41 ... Without ceph osd out command all clients

Re: Ceph availability test recovering question

2013-03-18 Thread Andrey Korolyov

Hello, I`m experiencing same long-lasting problem - during recovery ops, some percentage of read I/O remains in-flight for seconds, rendering upper-level filesystem on the qemu client very slow and almost unusable. Different striping has almost no effect on visible delays and reads may be

Accidental image deletion

2013-04-07 Thread Andrey Korolyov

Hello, Is there an existing or planned way to save an image from such thing, except protected snapshot? Since ``rbd snap protect'' is good enough for a small or inactive images, large ones may add significant overhead by space or by I/O when 'locking' snapshot is present, so it would be nice to

Re: poor write performance

2013-04-18 Thread Andrey Korolyov

On Thu, Apr 18, 2013 at 5:43 PM, Mark Nelson mark.nel...@inktank.com wrote: On 04/18/2013 06:46 AM, James Harper wrote: I'm doing some basic testing so I'm not really fussed about poor performance, but my write performance appears to be so bad I think I'm doing something wrong. Using dd to

OSD crash upon pool creation

2013-07-15 Thread Andrey Korolyov

Hello, Using db2bb270e93ed44f9252d65d1d4c9b36875d0ea5 I had observed some disaster-alike behavior after ``pool create'' command - every osd daemon in the cluster will die at least once(some will crash times in a row after bringing back). Please take a look on the backtraces(almost identical)

Re: Read ahead affect Ceph read performance much

2013-07-29 Thread Andrey Korolyov

Wow, very glad to hear that. I tried with the regular FS tunable and there was almost no effect on the regular test, so I thought that reads cannot be improved at all in this direction. On Mon, Jul 29, 2013 at 2:24 PM, Li Wang liw...@ubuntukylin.com wrote: We performed Iozone read test on a

ceph status reporting non-existing osd

2012-07-13 Thread Andrey Korolyov

Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed a bunch of rbd objects during recovery to avoid overfill. Right now I`m constantly receiving a warn about nearfull state on non-existing osd: health HEALTH_WARN 1 near full osd(s) monmap

Re: ceph status reporting non-existing osd

2012-07-14 Thread Andrey Korolyov

On Fri, Jul 13, 2012 at 9:09 PM, Sage Weil s...@inktank.com wrote: On Fri, 13 Jul 2012, Gregory Farnum wrote: On Fri, Jul 13, 2012 at 1:17 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, Recently I`ve reduced my test suite from 6 to 4 osds at ~60% usage on six-node, and I have removed

Re: ceph status reporting non-existing osd

2012-07-16 Thread Andrey Korolyov

On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio 0.94 On Monday, July 16, 2012 at 11:42 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 8:12 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com

Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov

On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com wrote: On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov wrote: On Mon, Jul 16, 2012 at 10:48 PM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: ceph pg set_full_ratio 0.95 ceph pg set_nearfull_ratio

Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov

On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18, 2012 at 10:09 AM, Gregory Farnum g...@inktank.com (mailto:g...@inktank.com) wrote: On Monday, July 16, 2012 at 11:55 AM, Andrey Korolyov

Re: ceph status reporting non-existing osd

2012-07-18 Thread Andrey Korolyov

On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 11:18 AM, Gregory Farnum g...@inktank.com wrote: On Tuesday, July 17, 2012 at 11:22 PM, Andrey Korolyov wrote: On Wed, Jul 18

Re: ceph status reporting non-existing osd

2012-07-19 Thread Andrey Korolyov

On Thu, Jul 19, 2012 at 1:28 AM, Gregory Farnum g...@inktank.com wrote: On Wed, Jul 18, 2012 at 12:07 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Jul 18, 2012 at 10:30 PM, Gregory Farnum g...@inktank.com wrote: On Wed, Jul 18, 2012 at 12:47 AM, Andrey Korolyov and...@xdel.ru wrote

another performance-related thread

2012-07-31 Thread Andrey Korolyov

Hi, I`ve finally managed to run rbd-related test on relatively powerful machines and what I have got: 1) Reads on almost fair balanced cluster(eight nodes) did very well, utilizing almost all disk and bandwidth (dual gbit 802.3ad nics, sata disks beyond lsi sas 2108 with wt cache gave me

Re: another performance-related thread

2012-07-31 Thread Andrey Korolyov

On 07/31/2012 07:17 PM, Mark Nelson wrote: Hi Andrey! On 07/31/2012 10:03 AM, Andrey Korolyov wrote: Hi, I`ve finally managed to run rbd-related test on relatively powerful machines and what I have got: 1) Reads on almost fair balanced cluster(eight nodes) did very well, utilizing almost

Re: another performance-related thread

2012-07-31 Thread Andrey Korolyov

On 07/31/2012 07:53 PM, Josh Durgin wrote: On 07/31/2012 08:03 AM, Andrey Korolyov wrote: Hi, I`ve finally managed to run rbd-related test on relatively powerful machines and what I have got: 1) Reads on almost fair balanced cluster(eight nodes) did very well, utilizing almost all disk

Re: OSD crash

2012-08-22 Thread Andrey Korolyov

On Thu, Aug 23, 2012 at 2:33 AM, Sage Weil s...@inktank.com wrote: On Thu, 23 Aug 2012, Andrey Korolyov wrote: Hi, today during heavy test a pair of osds and one mon died, resulting to hard lockup of some kvm processes - they went unresponsible and was killed leaving zombie processes ([kvm

Re: Ceph benchmarks

2012-08-27 Thread Andrey Korolyov

On Tue, Aug 28, 2012 at 12:47 AM, Sébastien Han han.sebast...@gmail.com wrote: Hi community, For those of you who are interested, I performed several benchmarks of RADOS and RBD on different types of hardware and use case. You can find my results here:

Re: OSD crash

2012-09-04 Thread Andrey Korolyov

(commit:a7ad701b9bd479f20429f19e6fea7373ca6bba7c) On Sun, Aug 26, 2012 at 8:52 PM, Andrey Korolyov and...@xdel.ru wrote: During recovery, following crash happens(simular to http://tracker.newdream.net/issues/2126 which marked resolved long ago): http://xdel.ru/downloads/ceph-log/osd-2012-08-26.txt

Collection of strange lockups on 0.51

2012-09-12 Thread Andrey Korolyov

Hi, This is completely off-list, but I`m asking because only ceph trigger such a bug :) . With 0.51, following happens: if I kill an osd, one or more neighbor nodes may go to hanged state with cpu lockups, not related to temperature or overall interrupt count or la and it happens randomly over

Re: Collection of strange lockups on 0.51

2012-09-12 Thread Andrey Korolyov

On Thu, Sep 13, 2012 at 1:09 AM, Tommi Virtanen t...@inktank.com wrote: On Wed, Sep 12, 2012 at 10:33 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, This is completely off-list, but I`m asking because only ceph trigger such a bug :) . With 0.51, following happens: if I kill an osd, one

Re: enabling cephx by default

2012-09-18 Thread Andrey Korolyov

On Tue, Sep 18, 2012 at 4:37 PM, Guido Winkelmann guido-c...@thisisnotatest.de wrote: Am Dienstag, 11. September 2012, 17:25:49 schrieben Sie: The next stable release will have cephx authentication enabled by default. Hm, that could be a problem for me. I have tried multiple times to get cephx

Re: enabling cephx by default

2012-09-18 Thread Andrey Korolyov

On Tue, Sep 18, 2012 at 5:34 PM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Sep 18, 2012 at 4:37 PM, Guido Winkelmann guido-c...@thisisnotatest.de wrote: Am Dienstag, 11. September 2012, 17:25:49 schrieben Sie: The next stable release will have cephx authentication enabled by default. Hm

Re: Collection of strange lockups on 0.51

2012-09-30 Thread Andrey Korolyov

On Thu, Sep 13, 2012 at 1:43 AM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Sep 13, 2012 at 1:09 AM, Tommi Virtanen t...@inktank.com wrote: On Wed, Sep 12, 2012 at 10:33 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, This is completely off-list, but I`m asking because only ceph trigger

Re: Collection of strange lockups on 0.51

2012-10-03 Thread Andrey Korolyov

On Mon, Oct 1, 2012 at 8:42 PM, Tommi Virtanen t...@inktank.com wrote: On Sun, Sep 30, 2012 at 2:55 PM, Andrey Korolyov and...@xdel.ru wrote: Short post mortem - EX3200/12.1R2.9 may begin to drop packets (seems to appear more likely on 0.51 traffic patterns, which is very strange for L2

Ignore O_SYNC for rbd cache

2012-10-10 Thread Andrey Korolyov

Hi, Recent tests on my test rack with 20G IB(iboip, 64k mtu, default CUBIC, CFQ, LSI SAS 2108 w/ wb cache) interconnect shows a quite fantastic performance - on both reads and writes Ceph completely utilizing all disk bandwidth as high as 0.9 of theoretical limit of sum of all bandwidths bearing

Re: Different geoms for an rbd block device

2012-10-30 Thread Andrey Korolyov

On Wed, Oct 31, 2012 at 1:07 AM, Josh Durgin josh.dur...@inktank.com wrote: On 10/28/2012 03:02 AM, Andrey Korolyov wrote: Hi, Should following behavior considered to be normal? $ rbd map test-rack0/debiantest --user qemukvm --secret qemukvm.key $ fdisk /dev/rbd1 Command (m for help): p

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Andrey Korolyov

On Mon, Nov 5, 2012 at 11:33 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly.

Re: clock syncronisation

2012-11-08 Thread Andrey Korolyov

On Thu, Nov 8, 2012 at 4:00 PM, Wido den Hollander w...@widodh.nl wrote: On 08-11-12 10:04, Stefan Priebe - Profihost AG wrote: Hello list, is there any prefered way to use clock syncronisation? I've tried running openntpd and ntpd on all servers but i'm still getting: 2012-11-08

Re: SSD journal suggestion

2012-11-08 Thread Andrey Korolyov

On Thu, Nov 8, 2012 at 7:02 PM, Atchley, Scott atchle...@ornl.gov wrote: On Nov 8, 2012, at 10:00 AM, Scott Atchley atchle...@ornl.gov wrote: On Nov 8, 2012, at 9:39 AM, Mark Nelson mark.nel...@inktank.com wrote: On 11/08/2012 07:55 AM, Atchley, Scott wrote: On Nov 8, 2012, at 3:22 AM,

Re: less cores more iops / speed

2012-11-08 Thread Andrey Korolyov

On Thu, Nov 8, 2012 at 7:53 PM, Alexandre DERUMIER aderum...@odiso.com wrote: So it is a problem of KVM which let's the processes jump between cores a lot. maybe numad from redhat can help ? http://fedoraproject.org/wiki/Features/numad It's try to keep process on same numa node and I think

``rbd mv'' crash when no destination issued

2012-11-09 Thread Andrey Korolyov

Hi, Please take a look, seems harmless: $ rbd mv vm0 terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid *** Caught signal (Aborted) ** in thread 7f85f5981780 ceph version 0.53 (commit:2528b5ee105b16352c91af064af5c0b5a7d45d7c)

changed rbd cp behavior in 0.53

2012-11-12 Thread Andrey Korolyov

Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path is omitted. rbd cp install/img testimg rbd ls install img testimg Is this change permanent? Thanks! -- To unsubscribe from this list: send the line unsubscribe

Authorization issues in the 0.54

2012-11-14 Thread Andrey Korolyov

Hi, In the 0.54 cephx is probably broken somehow: $ ceph auth add client.qemukvm osd 'allow *' mon 'allow *' mds 'allow *' -i qemukvm.key 2012-11-14 15:51:23.153910 7ff06441f780 -1 read 65 bytes from qemukvm.key added key for client.qemukvm $ ceph auth list ... client.admin key: [xx]

Re: changed rbd cp behavior in 0.53

2012-11-14 Thread Andrey Korolyov

On Thu, Nov 15, 2012 at 4:56 AM, Dan Mick dan.m...@inktank.com wrote: On 11/12/2012 02:47 PM, Josh Durgin wrote: On 11/12/2012 08:30 AM, Andrey Korolyov wrote: Hi, For this version, rbd cp assumes that destination pool is the same as source, not 'rbd', if pool in the destination path

Re: Authorization issues in the 0.54

2012-11-15 Thread Andrey Korolyov

On Thu, Nov 15, 2012 at 5:03 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Nov 15, 2012 at 5:12 AM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Nov 14, 2012 at 4:20 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, In the 0.54 cephx is probably broken somehow: $ ceph auth add

Re: changed rbd cp behavior in 0.53

2012-11-15 Thread Andrey Korolyov

closely to /dev layout, or, at least iSCSI targets, when not specifying full path or use some predefined default prefix make no sense at all. On Wed, Nov 14, 2012 at 10:43 PM, Andrey Korolyov and...@xdel.ru wrote: On Thu, Nov 15, 2012 at 4:56 AM, Dan Mick dan.m...@inktank.com wrote: On 11/12

'zombie snapshot' problem

2012-11-21 Thread Andrey Korolyov

Hi, Somehow I have managed to produce unkillable snapshot, which does not allow to remove itself or parent image: $ rbd snap purge dev-rack0/vm2 Removing all snapshots: 100% complete...done. $ rbd rm dev-rack0/vm2 2012-11-21 16:31:24.184626 7f7e0d172780 -1 librbd: image has snapshots - not

Re: 'zombie snapshot' problem

2012-11-22 Thread Andrey Korolyov

On Thu, Nov 22, 2012 at 2:05 AM, Josh Durgin josh.dur...@inktank.com wrote: On 11/21/2012 04:50 AM, Andrey Korolyov wrote: Hi, Somehow I have managed to produce unkillable snapshot, which does not allow to remove itself or parent image: $ rbd snap purge dev-rack0/vm2 Removing all

Hangup during scrubbing - possible solutions

2012-11-22 Thread Andrey Korolyov

Hi, In the recent versions Ceph introduces some unexpected behavior for the permanent connections (VM or kernel clients) - after crash recovery, I/O will hang on the next planned scrub on the following scenario: - launch a bunch of clients doing non-intensive writes, - lose one or more osd, mark

Re: Hangup during scrubbing - possible solutions

2012-11-25 Thread Andrey Korolyov

On Fri, Nov 23, 2012 at 12:35 AM, Sage Weil s...@inktank.com wrote: On Thu, 22 Nov 2012, Andrey Korolyov wrote: Hi, In the recent versions Ceph introduces some unexpected behavior for the permanent connections (VM or kernel clients) - after crash recovery, I/O will hang on the next planned

Re: endless flying slow requests

2012-11-27 Thread Andrey Korolyov

On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil s...@inktank.com wrote: Hi Stefan, On Thu, 15 Nov 2012, Sage Weil wrote: On Thu, 15 Nov 2012, Stefan Priebe - Profihost AG wrote: Am 14.11.2012 15:59, schrieb Sage Weil: Hi Stefan, I would be nice to confirm that no clients are waiting on

Re: endless flying slow requests

2012-11-28 Thread Andrey Korolyov

readjusted cluster before bug shows itself(say, in a day-long distance)? On Tue, Nov 27, 2012 at 11:47 PM, Andrey Korolyov and...@xdel.ru wrote: On Wed, Nov 28, 2012 at 5:51 AM, Sage Weil s...@inktank.com wrote: Hi Stefan, On Thu, 15 Nov 2012, Sage Weil wrote: On Thu, 15 Nov 2012, Stefan

Re: parsing in the ceph osd subsystem

2012-11-29 Thread Andrey Korolyov

On Thu, Nov 29, 2012 at 8:34 PM, Sage Weil s...@inktank.com wrote: On Thu, 29 Nov 2012, Andrey Korolyov wrote: $ ceph osd down - osd.0 is already down $ ceph osd down --- osd.0 is already down the same for ``+'', ``/'', ``%'' and so - I think that for osd subsys ceph cli should explicitly

Re: Hangup during scrubbing - possible solutions

2012-11-30 Thread Andrey Korolyov

: If you can reproduce it again, what we really need are the osd logs from the acting set of a pg stuck in scrub with debug osd = 20 debug ms = 1 debug filestore = 20. Thanks, -Sam On Sun, Nov 25, 2012 at 2:08 PM, Andrey Korolyov and...@xdel.ru wrote: On Fri, Nov 23, 2012 at 12:35 AM, Sage

Re: Hangup during scrubbing - possible solutions

2012-12-01 Thread Andrey Korolyov

, it's our handling of active_pushes. I'll have a patch shortly. Thanks! -Sam On Fri, Nov 30, 2012 at 4:14 AM, Andrey Korolyov and...@xdel.ru wrote: http://xdel.ru/downloads/ceph-log/ceph-scrub-stuck.log.gz http://xdel.ru/downloads/ceph-log/cluster-w.log.gz Here, please. I have initiated

misdirected client messages

2012-12-12 Thread Andrey Korolyov

Hi, Today during planned kernel upgrade one of osds (which I have not touched yet), started to claim about ``misdirected client'': 2012-12-12 21:22:59.107648 osd.20 [WRN] client.2774043 10.5.0.33:0/1013711 misdirected client.2774043.0:114 pg 5.ad140d42 to osd.20 in e23834, client e23834 pg 5.542

Re: Slow requests

2012-12-16 Thread Andrey Korolyov

On Sun, Dec 16, 2012 at 5:59 PM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi, My log is filling up with warnings about a single slow request that has been around for a very long time: osd.1 10.0.0.2:6800/900 162926 : [WRN] 1 slow requests, 1 included below; oldest blocked for

Striped images and cluster misbehavior

2012-12-16 Thread Andrey Korolyov

Hi, After recent switch do default ``--stripe-count 1'' on image upload I have observed some strange thing - single import or deletion of the striped image may temporarily turn off entire cluster, literally(see log below). Of course next issued osd map fix the situation, but all in-flight

Re: Slow requests

2012-12-16 Thread Andrey Korolyov

On Mon, Dec 17, 2012 at 2:42 AM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi Andrey, Thanks for your reply! Please take a look to this thread: http://comments.gmane.org/gmane.comp.file-systems.ceph.devel/10843 I took your advice and restarted each of my three osd's

Re: Striped images and cluster misbehavior

2012-12-18 Thread Andrey Korolyov

On Mon, Dec 17, 2012 at 2:36 AM, Andrey Korolyov and...@xdel.ru wrote: Hi, After recent switch do default ``--stripe-count 1'' on image upload I have observed some strange thing - single import or deletion of the striped image may temporarily turn off entire cluster, literally(see log below

Re: Striped images and cluster misbehavior

2012-12-30 Thread Andrey Korolyov

, I have started playing with TCP settings and found that ipv4.tcp_low_latency raising possibility of ``wrong mark'' event several times when enabled - so area of all possible causes quickly collapsed to the media-only problem and I fixed problem soon. On Wed, Dec 19, 2012 at 3:53 AM, Andrey

Re: Improving responsiveness of KVM guests on Ceph storage

2012-12-30 Thread Andrey Korolyov

On Sun, Dec 30, 2012 at 9:05 PM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi guys, I'm testing Ceph as storage for KVM virtual machine images and found an inconvenience that I am hoping it is possible to find the cause of. I'm running a single KVM Linux guest on top of Ceph

Re: Improving responsiveness of KVM guests on Ceph storage

2012-12-31 Thread Andrey Korolyov

On Mon, Dec 31, 2012 at 3:12 AM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi Andrey, Thanks for your reply! You may try do play with SCHED_RT, I have found it hard to use for myself, but you can achieve your goal by adding small RT slices via ``cpu'' cgroup to vcpu/emulator

Re: Improving responsiveness of KVM guests on Ceph storage

2012-12-31 Thread Andrey Korolyov

On Mon, Dec 31, 2012 at 2:58 PM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi Andrey, As I understood right, you have md device holding both journal and filestore? What type of raid you have here? Yes, same md device holding both journal and filestore. It is a raid5. Ahem, of

0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov

Hi, All osds in the dev cluster died shortly after upgrade (packet-only, i.e. binary upgrade, even without restart running processes), please see attached file. Was: 0.55.1-356-g850d1d5 Upgraded to: 0.56 tag The only one difference is a version of the gcc corresponding libstdc++ - 4.6 on the

Re: 0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov

On Tue, Jan 1, 2013 at 9:49 PM, Andrey Korolyov and...@xdel.ru wrote: Hi, All osds in the dev cluster died shortly after upgrade (packet-only, i.e. binary upgrade, even without restart running processes), please see attached file. Was: 0.55.1-356-g850d1d5 Upgraded to: 0.56 tag The only

Re: 0.55 crashed during upgrade to bobtail

2013-01-01 Thread Andrey Korolyov

On Wed, Jan 2, 2013 at 12:16 AM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Jan 1, 2013 at 9:49 PM, Andrey Korolyov and...@xdel.ru wrote: Hi, All osds in the dev cluster died shortly after upgrade (packet-only, i.e. binary upgrade, even without restart running processes), please see

Very intensive I/O under mon process

2013-01-02 Thread Andrey Korolyov

I have just observed that ceph-mon process, at least bobtail one, has an extremely high density of writes - times above _overall_ cluster amount of writes, measured by qemu driver(and they are very close to be fair). For example, test cluster of 32 osds have 7.5 MByte/s of writes on each mon node

Re: Very intensive I/O under mon process

2013-01-02 Thread Andrey Korolyov

On Wed, Jan 2, 2013 at 8:00 PM, Joao Eduardo Luis joao.l...@inktank.com wrote: On 01/02/2013 03:40 PM, Andrey Korolyov wrote: I have just observed that ceph-mon process, at least bobtail one, has an extremely high density of writes - times above _overall_ cluster amount of writes, measured

Re: v0.56.1 released

2013-01-07 Thread Andrey Korolyov

On Tue, Jan 8, 2013 at 11:30 AM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Hi, i cannot see any git tag or branch claming to be 0.56.1? Which commit id is this? Greets Stefan Same for me, github simply does not sent a new tag in the pull to local tree by some reason.

Re: Striped images and cluster misbehavior

2013-01-12 Thread Andrey Korolyov

/osd-lockup-2-14-33-16.741603.log.gz Timestamps in filenames added for easier lookup, osdmap have marked osds as down after couple of beats after those marks. On Mon, Dec 31, 2012 at 1:16 AM, Andrey Korolyov and...@xdel.ru wrote: On Sun, Dec 30, 2012 at 10:56 PM, Samuel Just sam.j...@inktank.com

Re: Mysteriously poor write performance

2012-03-19 Thread Andrey Korolyov

with? If you run a rados bench from both machines, what do the results look like? Also, can you do the ceph osd bench on each of your OSDs, please? (http://ceph.newdream.net/wiki/Troubleshooting#OSD_performance) -Greg On Monday, March 19, 2012 at 6:46 AM, Andrey Korolyov wrote: More strangely

Re: Mysteriously poor write performance

2012-03-20 Thread Andrey Korolyov

~4096] 0.17eb9fd8) v4) Sorry for my previous question about rbd chunks, it was really stupid :) On Mon, Mar 19, 2012 at 10:40 PM, Josh Durgin josh.dur...@dreamhost.com wrote: On 03/19/2012 11:13 AM, Andrey Korolyov wrote: Nope, I`m using KVM for rbd guests. Surely I`ve been noticed that Sage

Re: Mysteriously poor write performance

2012-03-24 Thread Andrey Korolyov

On Fri, Mar 23, 2012 at 5:25 AM, Andrey Korolyov and...@xdel.ru wrote: Hi Sam, Can you please suggest on where to start profiling osd? If the bottleneck has related to such non-complex things as directio speed, I`m sure that I was able to catch it long ago, even crossing around by results

Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov

Hi, # virsh blkdeviotune Test vdb --write_iops_sec 50 //file block device # virsh blkdeviotune Test vda --write_iops_sec 50 //rbd block device error: Unable to change block I/O throttle error: invalid argument: No device found for specified path 2012-04-03 07:38:49.170+: 30171: debug :

Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov

But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. On Tue, Apr 3, 2012 at 12:21 PM, Wido den Hollander w...@widodh.nl wrote: Hi, Op 3-4-2012 10:02, Andrey Korolyov schreef: Hi, # virsh

Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov

-2012 10:28, Andrey Korolyov schreef: But I am able to set static limits in the config for rbd :) All I want is a change on-the-fly. It is NOT cgroups mechanism, but completely qemu-driven. Are you sure about that? http://libvirt.org/formatdomain.html#elementsBlockTuning Browsing through

Re: Setting iotune limits on rbd

2012-04-03 Thread Andrey Korolyov

Suggested hack works, seems that libvirt devs does not remove block limitation as they count this feature as experimental, or forgot about it. On Tue, Apr 3, 2012 at 12:55 PM, Andrey Korolyov and...@xdel.ru wrote: At least, elements under iotune block applies to rbd and you can test

1 2 >

1 - 100 of 119 matches

Mail list logo