Re: rbd map command hangs for 15 minutes during system start up

2012-11-22 Thread Nick Bartos
Here are the ceph log messages (including the libceph kernel debug stuff you asked for) from a node boot with the rbd command hung for a couple of minutes: https://raw.github.com/gist/4132395/7cb5f0150179b012429c6e57749120dd88616cce/gistfile1.txt On Wed, Nov 21, 2012 at 9:49 PM, Nick Bartos

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
Hum sorry, you're right. Forget about what I said :) On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: I thought the Client would then write to the 2nd is this wrong? Stefan Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com: But

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
We need something like tmpfs - running in local memory but support dio. Maybe with ramdisk, /dev/ram0 ? we can format it with standard filesystem (ext3,ext4,...) so maybe dio works with it ? - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Sébastien

Re: OSD daemon changes port no

2012-11-22 Thread Sage Weil
On Thu, 22 Nov 2012, hemant surale wrote: Sir, Thanks for the direction . Here I was using mount.ceph monaddr:ip:/ /home/hemant/mntpoint cmd . Is it possible to do achieve same effect with mount.ceph of what you suggested with cephfs. (cephfs /mnt/ceph/foo --pool poolid) But I see that

Re: Files lost after mds rebuild

2012-11-22 Thread Drunkard Zhang
2012/11/22 Gregory Farnum g...@inktank.com: On Tue, Nov 20, 2012 at 8:28 PM, Drunkard Zhang gongfan...@gmail.com wrote: 2012/11/21 Gregory Farnum g...@inktank.com: No, absolutely not. There is no relationship between different RADOS pools. If you've been using the cephfs tool to place some

Re: RBD Backup

2012-11-22 Thread Wido den Hollander
On 11/22/2012 06:57 PM, Stefan Priebe - Profihost AG wrote: Hi, Am 21.11.2012 14:47, schrieb Wido den Hollander: The snapshot isn't consistent since it has no way of telling the VM to flush it's buffers. To make it consistent you have to run sync (In the VM) just prior to creating the

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
I thought the Client would then write to the 2nd is this wrong? Stefan Am 22.11.2012 um 16:49 schrieb Sébastien Han han.sebast...@gmail.com: But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes but you could also suffer a crash while writing the first

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 13:50, schrieb Sébastien Han: journal is running on tmpfs to me but that changes nothing. I don't think it works then. According to the doc: Enables using libaio for asynchronous writes to the journal. Requires journal dio set to true. Ah might be but as the SSDs are pretty

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 15:37, schrieb Mark Nelson: I don't think we recommend tmpfs at all for anything other than playing around. :) I discussed this with somebody frmo inktank. Had to search the mailinglist. It might be OK if you're working with enough replicas and UPS. I see no other option while

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
On 11/22/2012 04:49 AM, Sébastien Han wrote: @Alexandre: cool! @ Stefan: Full SSD cluster and 10G switches? Couple of weeks ago I saw that you use journal aio, did you notice performance improvement with it? @Mark Kampe If I read the above correctly, your random operations are 4K and your

Re: 'zombie snapshot' problem

2012-11-22 Thread Andrey Korolyov
On Thu, Nov 22, 2012 at 2:05 AM, Josh Durgin josh.dur...@inktank.com wrote: On 11/21/2012 04:50 AM, Andrey Korolyov wrote: Hi, Somehow I have managed to produce unkillable snapshot, which does not allow to remove itself or parent image: $ rbd snap purge dev-rack0/vm2 Removing all

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Otherwise you would have the same problem with the disk crashes Am 22.11.2012 um 16:55 schrieb Sébastien Han han.sebast...@gmail.com: Hum sorry, you're right. Forget about what I said :) On Thu, Nov 22, 2012 at 4:54 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: I thought

Question about simulation with crushtool

2012-11-22 Thread Nam Dang
Dear all, I am trying to do some small experiment with crushtool by simulating different variants of CRUSHs. However, I encounter some problem with crushtool due to its lack of documentation. I want to ask the command to simulate the placement in a 32-device bucket system (only 1 bucket)? And

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
I don't think we recommend tmpfs at all for anything other than playing around. :) On 11/22/2012 08:22 AM, Stefan Priebe - Profihost AG wrote: Hi, can someone from inktank comment this? Might be using /dev/ram0 with an fs on it be better than tmpfs as we can use dio? Greets, Stefan -

Re: rbd map command hangs for 15 minutes during system start up

2012-11-22 Thread Nick Bartos
It's very easy to reproduce now with my automated install script, the most I've seen it succeed with that patch is 2 in a row, and hanging on the 3rd, although it hangs on most builds. So it shouldn't take much to get it to do it again. I'll try and get to that tomorrow, when I'm a bit more

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 14:22, schrieb Sébastien Han: And RAMDISK devices are too expensive. It would make sense in your infra, but yes they are really expensive. We need something like tmpfs - running in local memory but support dio. Stefan -- To unsubscribe from this list: send the line

[PATCH] rbd block driver fix race between aio completition and aio cancel

2012-11-22 Thread Stefan Priebe
This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Signed-off-by: Stefan Priebe

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 15:52, schrieb Alexandre DERUMIER: I discussed this with somebody frmo inktank. Had to search the mailinglist. It might be OK if you're working with enough replicas and UPS. I see no other option while working with SSDs - the only Option would be to be able to deaktivate the

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 15:46, schrieb Mark Nelson: I haven't played a whole lot with SSD only OSDs yet (other than noting last summer that iop performance wasn't as high as I wanted it). Is a second partition on the SSD for the journal not an option for you? Haven't tested that. But does this makes

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 11:49, schrieb Sébastien Han: @Alexandre: cool! @ Stefan: Full SSD cluster and 10G switches? Yes Couple of weeks ago I saw that you use journal aio, did you notice performance improvement with it? journal is running on tmpfs to me but that changes nothing. Stefan -- To

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
I discussed this with somebody frmo inktank. Had to search the mailinglist. It might be OK if you're working with enough replicas and UPS. I see no other option while working with SSDs - the only Option would be to be able to deaktivate the journal at all. But ceph does not support this. Do you

Re: [Qemu-devel] [PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Stefan Priebe - Profihost AG
Hi Andreas, thanks for your comment. Do i have to resend this patch? -- Greets, Stefan Am 22.11.2012 17:40, schrieb Andreas Färber: Am 22.11.2012 10:07, schrieb Stefan Priebe: When acb-cmd is WRITE or DISCARD block/rbd stores rcb-size into acb-ret Look here: if (acb-cmd == RBD_AIO_WRITE

tiering of storage pools in ceph in general

2012-11-22 Thread Jimmy Tang
Hi All, Is it possible at this point in time to setup some form of tiering of storage pools in ceph by modifying the crush map? For example I want to have my most recently used data on a small set of nodes that have SSD's and over time migrate data from the SSD's to some bulk spinning disk

Re: OSD daemon changes port no

2012-11-22 Thread hemant surale
Sir, Thanks for the direction . Here I was using mount.ceph monaddr:ip:/ /home/hemant/mntpoint cmd . Is it possible to do achieve same effect with mount.ceph of what you suggested with cephfs. (cephfs /mnt/ceph/foo --pool poolid) But I see that cephfs is able to set which osd to use , the

Hangup during scrubbing - possible solutions

2012-11-22 Thread Andrey Korolyov
Hi, In the recent versions Ceph introduces some unexpected behavior for the permanent connections (VM or kernel clients) - after crash recovery, I/O will hang on the next planned scrub on the following scenario: - launch a bunch of clients doing non-intensive writes, - lose one or more osd, mark

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Same to me: rand 4k: 23.000 iops seq 4k: 13.000 iops Even in writeback mode where normally seq 4k should be merged into bigger requests. Stefan Am 21.11.2012 17:34, schrieb Mark Nelson: Responding to my own message. :) Talked to Sage a bit offline about this. I think there are two

[PATCH V2] mds: fix CDir::_commit_partial() bug

2012-11-22 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When a null dentry is encountered, CDir::_commit_partial() adds a OSD_TMAP_RM command to delete the dentry. But if the dentry is new, the osd will not find the dentry when handling the command and the tmap update operation will fail totally. This patch also

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
but it seems that Alexandre and I have the same results (more rand than seq), he has (at least) one cluster and I have 2. Thus I start to think that's not an isolated issue. Hi, I have bought new servers with more powerfull cpus to made a new 3 nodes cluster to compare. I'll redo tests in 1

Fwd: does still not recommended place rbd device on nodes, where osd daemon located?

2012-11-22 Thread ruslan usifov
Hello Thank for your attention, and sorry for my bad english! In my draft architecture, i want use same hardware for osd and rbd devices. In other words, i have 5 nodes this 5TB software raid on each Disk space. I want build on this nodes, ceph cluster. All 5 nodes will be run OSD and, on the

Re: [Qemu-devel] [PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Andreas Färber
Am 22.11.2012 10:07, schrieb Stefan Priebe: When acb-cmd is WRITE or DISCARD block/rbd stores rcb-size into acb-ret Look here: if (acb-cmd == RBD_AIO_WRITE || acb-cmd == RBD_AIO_DISCARD) { if (r 0) { acb-ret = r; acb-error = 1; } else

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 22.11.2012 16:26, schrieb Alexandre DERUMIER: Haven't tested that. But does this makes sense? I mean data goes to Disk journal - same disk then has to copy the Data from part A to part B. Why is this an advantage? Well, if you are cpu limited, I don't think you can use all 8*35000iops by

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Stefan Priebe - Profihost AG
Am 21.11.2012 23:32, schrieb Peter Maydell: On 21 November 2012 17:03, Stefan Weil s...@weilnetz.de wrote: Why do you use int64_t instead of off_t? If the value is related to file sizes, off_t would be a good choice. Looking at the librbd API (which is what the size and ret values come from),

Re: [PATCH] mds: fix CDir::_commit_partial() bug

2012-11-22 Thread Sage Weil
On Thu, 22 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com When a null dentry is encountered, CDir::_commit_partial() adds a OSD_TMAP_RM command to delete the dentry. But if the dentry is new, the osd will not find the dentry when handling the command and the tmap update

Re: rbd map command hangs for 15 minutes during system start up

2012-11-22 Thread Sage Weil
On Wed, 21 Nov 2012, Nick Bartos wrote: FYI the build which included all 3.5 backports except patch #50 is still going strong after 21 builds. Okay, that one at least makes some sense. I've opened http://tracker.newdream.net/issues/3519 How easy is this to reproduce? If it is

Re: [Qemu-devel] [PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Stefan Weil
Am 22.11.2012 20:09, schrieb Stefan Priebe - Profihost AG: Hi Andreas, thanks for your comment. Do i have to resend this patch? -- Greets, Stefan Hi Stefan, I'm afraid yes, you'll have to resend the patch. Signed-off-by is a must, see http://wiki.qemu.org/Contribute/SubmitAPatch When

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Peter Maydell
On 22 November 2012 08:23, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Am 21.11.2012 23:32, schrieb Peter Maydell: Looking at the librbd API (which is what the size and ret values come from), it uses size_t and ssize_t for these. So I think probably ssize_t is the right type for

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Peter Maydell
On 21 November 2012 17:03, Stefan Weil s...@weilnetz.de wrote: Why do you use int64_t instead of off_t? If the value is related to file sizes, off_t would be a good choice. Looking at the librbd API (which is what the size and ret values come from), it uses size_t and ssize_t for these. So I

Re: how to create snapshots

2012-11-22 Thread Stefan Priebe - Profihost AG
Hi, Am 21.11.2012 15:29, schrieb Wido den Hollander: Use: $ rbd -p kvmpool1 snap create --image vm-113-disk-1 BACKUP rbd -h also tells: image-name, snap-name are [pool/]name[@snap], or you may specify individual pieces of names with -p/--pool, --image, and/or --snap. Never tried it, but you

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
Hi, can someone from inktank comment this? Might be using /dev/ram0 with an fs on it be better than tmpfs as we can use dio? Greets, Stefan - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À: Sébastien Han han.sebast...@gmail.com Cc: Mark Nelson

Re: RBD Backup

2012-11-22 Thread Stefan Priebe - Profihost AG
Hi, Am 21.11.2012 14:47, schrieb Wido den Hollander: The snapshot isn't consistent since it has no way of telling the VM to flush it's buffers. To make it consistent you have to run sync (In the VM) just prior to creating the snapshot. Mhm but between executing sync and executing snap is

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Kampe
Sequential is faster than random on a disk, but we are not doing I/O to a disk, but a distributed storage cluster: small random operations are striped over multiple objects and servers, and so can proceed in parallel and take advantage of more nodes and disks. This parallelism can

incremental rbd export / sparse files?

2012-11-22 Thread Stefan Priebe - Profihost AG
Hello list, right now a rbd export exports exactly the size of the disk even if there is KNOWN free space. Is this inteded to change? Might it be possible to export just differences between snapshots and merge them later? Greets, Stefan -- To unsubscribe from this list: send the line

Re: rbd map command hangs for 15 minutes during system start up

2012-11-22 Thread Nick Bartos
FYI the build which included all 3.5 backports except patch #50 is still going strong after 21 builds. On Wed, Nov 21, 2012 at 9:34 AM, Nick Bartos n...@pistoncloud.com wrote: With 8 successful installs already done, I'm reasonably confident that it's patch #50. I'm making another build which

[PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Stefan Priebe
When acb-cmd is WRITE or DISCARD block/rbd stores rcb-size into acb-ret Look here: if (acb-cmd == RBD_AIO_WRITE || acb-cmd == RBD_AIO_DISCARD) { if (r 0) { acb-ret = r; acb-error = 1; } else if (!acb-error) { acb-ret = rcb-size;

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Stefan Priebe - Profihost AG
Hello, i send a new patch using ssize_t. (Subject [PATCH] overflow of int ret: use ssize_t for ret) Stefan Am 22.11.2012 09:40, schrieb Peter Maydell: On 22 November 2012 08:23, Stefan Priebe - Profihost AG s.pri...@profihost.ag wrote: Am 21.11.2012 23:32, schrieb Peter Maydell: Looking

Re: [Qemu-devel] [PATCH] use int64_t for return values from rbd instead of int

2012-11-22 Thread Stefan Weil
Am 21.11.2012 21:53, schrieb Stefan Priebe - Profihost AG: Not sure about off_t. What is min and max size? Stefan off_t is a signed value which is used in function lseek to address any byte of a seekable file. The range is typically 64 bit

Re: 'zombie snapshot' problem

2012-11-22 Thread Josh Durgin
On 11/21/2012 04:50 AM, Andrey Korolyov wrote: Hi, Somehow I have managed to produce unkillable snapshot, which does not allow to remove itself or parent image: $ rbd snap purge dev-rack0/vm2 Removing all snapshots: 100% complete...done. I see one bug with 'snap purge' ignoring the return

[TRIVIAL PATCH] bdi_register: Add __printf verification, fix arg mismatch

2012-11-22 Thread Joe Perches
__printf is useful to verify format and arguments. Signed-off-by: Joe Perches j...@perches.com --- fs/ceph/super.c |2 +- include/linux/backing-dev.h |1 + 2 files changed, 2 insertions(+), 1 deletions(-) diff --git a/fs/ceph/super.c b/fs/ceph/super.c index 2eb43f2..e7dbb5c

Re: Hangup during scrubbing - possible solutions

2012-11-22 Thread Sage Weil
On Thu, 22 Nov 2012, Andrey Korolyov wrote: Hi, In the recent versions Ceph introduces some unexpected behavior for the permanent connections (VM or kernel clients) - after crash recovery, I/O will hang on the next planned scrub on the following scenario: - launch a bunch of clients

[PATCH] mds: fix CDir::_commit_partial() bug

2012-11-22 Thread Yan, Zheng
From: Yan, Zheng zheng.z@intel.com When a null dentry is encountered, CDir::_commit_partial() adds a OSD_TMAP_RM command to delete the dentry. But if the dentry is new, the osd will not find the dentry when handling the command and the tmap update operation will fail totally. Signed-off-by:

Re: [PATCH] overflow of int ret: use ssize_t for ret

2012-11-22 Thread Stefan Priebe - Profihost AG
Signed-off-by: Stefan Priebe s.pri...@profihost.ag Am 22.11.2012 10:07, schrieb Stefan Priebe: When acb-cmd is WRITE or DISCARD block/rbd stores rcb-size into acb-ret Look here: if (acb-cmd == RBD_AIO_WRITE || acb-cmd == RBD_AIO_DISCARD) { if (r 0) {

Re: RBD fio Performance concerns

2012-11-22 Thread Stefan Priebe - Profihost AG
In my test it was just recovering some replicas not the whole osd. Am 22.11.2012 um 16:35 schrieb Alexandre DERUMIER aderum...@odiso.com: But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes, but rebuilding a dead node use cpu and ios. (but it should be

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
Hi Mark, Well the most concerning thing is that I have 2 Ceph clusters and both of them show better rand than seq... I don't have enough background to argue on your assomptions but I could try to skrink my test platform to a single OSD and how it performs. We keep in touch on that one. But it

Problem with SGID and new inode

2012-11-22 Thread Giorgos Kappes
Ηι, I was looking at the source code of the ceph MDS and in particular at the function CInode* Server::prepare_new_inode(...) in the mds/Server.cc file which creates a new inode. At lines 1739-1747 the code checks if the parent directory has the set-group-ID bit set. If this bit is set and the

Very bad behavior when

2012-11-22 Thread Sylvain Munaut
Hi, I know that ceph has time synced servers has a requirements, but I think a sane failure mode like a message in the logs instead of incontrollably growing memory usage would be a good idea. I had the NTP process die on me tonight on an OSD (for unknown reason so far ...) and the clock went

Re: RBD fio Performance concerns

2012-11-22 Thread Mark Nelson
I haven't played a whole lot with SSD only OSDs yet (other than noting last summer that iop performance wasn't as high as I wanted it). Is a second partition on the SSD for the journal not an option for you? Mark On 11/22/2012 08:42 AM, Stefan Priebe - Profihost AG wrote: Am 22.11.2012

Re: RBD Backup

2012-11-22 Thread Josh Durgin
On 11/22/2012 05:13 AM, Wido den Hollander wrote: On 11/22/2012 06:57 PM, Stefan Priebe - Profihost AG wrote: Hi, Am 21.11.2012 14:47, schrieb Wido den Hollander: The snapshot isn't consistent since it has no way of telling the VM to flush it's buffers. To make it consistent you have to

Re: RBD Backup

2012-11-22 Thread Stefan Priebe - Profihost AG
Hi Josh, Am 22.11.2012 22:08, schrieb Josh Durgin: This way you have a pretty consistent snapshot. You can get an entirely consistent snapshot using xfs_freeze to stop I/O to the fs until you thaw it. It's done at the vfs level these days, so it works on all filesystems. Great thing we

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes, but rebuilding a dead node use cpu and ios. (but it should be benched too, to see the impact on the production) - Mail original - De: Stefan Priebe - Profihost AG s.pri...@profihost.ag À:

Re: Problem with SGID and new inode

2012-11-22 Thread Sage Weil
On Thu, 22 Nov 2012, Giorgos Kappes wrote: ??, I was looking at the source code of the ceph MDS and in particular at the function CInode* Server::prepare_new_inode(...) in the mds/Server.cc file which creates a new inode. At lines 1739-1747 the code checks if the parent directory has the

Re: incremental rbd export / sparse files?

2012-11-22 Thread Sage Weil
On Thu, 22 Nov 2012, Stefan Priebe - Profihost AG wrote: Hello list, right now a rbd export exports exactly the size of the disk even if there is KNOWN free space. Is this inteded to change? Might it be possible to export just differences between snapshots and merge them later? We were

Re: RBD fio Performance concerns

2012-11-22 Thread Alexandre DERUMIER
Haven't tested that. But does this makes sense? I mean data goes to Disk journal - same disk then has to copy the Data from part A to part B. Why is this an advantage? Well, if you are cpu limited, I don't think you can use all 8*35000iops by node. So, maybe a benchmark can tell us if the

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
But who cares? it's also on the 2nd node. or even on the 3rd if you have replicas 3. Yes but you could also suffer a crash while writing the first replica. If the journal is in tmpfs, there is nothing to replay. On Thu, Nov 22, 2012 at 4:35 PM, Alexandre DERUMIER aderum...@odiso.com wrote:

Re: RBD fio Performance concerns

2012-11-22 Thread Sébastien Han
journal is running on tmpfs to me but that changes nothing. I don't think it works then. According to the doc: Enables using libaio for asynchronous writes to the journal. Requires journal dio set to true. On Thu, Nov 22, 2012 at 12:48 PM, Stefan Priebe - Profihost AG s.pri...@profihost.ag

Debian/Ubuntu packages for ceph-deploy

2012-11-22 Thread Martin Gerhard Loschwitz
Hi folks, I figured it might be a cool thing to have packages of ceph-deploy for Debian and Ubuntu 12.04; I took the time and created them (along with packages of python-pushy, which ceph-deploy needs but which was not present in the Debian archive and thus in the Ubuntu archive either). They

RE: [Discussion] Enhancement for CRUSH rules

2012-11-22 Thread Chen, Xiaoxi
Hi list, I am thinking about the possibility to add some primitive in CRUSH to meet the following user stories: A. Same host, Same rack To balance between availability and performance ,one may like such a rule: 3 Replicas, Replica 1 and Replica 2 should in the same rack while

RE: incremental rbd export / sparse files?

2012-11-22 Thread Dietmar Maurer
Step 2 is to export the incremental changes. The hangup there is figuring out a generic and portable file format to represent those incremental changes; we'd rather not invent something ourselves that is ceph-specific. Suggestions welcome! AFAIK, both 'zfs' and 'btrfs' already have such

Re: Cephfs losing files and corrupting others

2012-11-22 Thread Nathan Howell
I upgraded to 0.54 and now there are some hints in the logs. The directories referenced in the log entries are now missing: 2012-11-23 07:28:04.802864 mds.0 [ERR] loaded dup inode 100662f [2,head] v3851654 at /xxx/20120203, but inode 100662f.head v3853093 already exists at