Re: Ignoresync hack no longer applies on 3.6.5

2012-11-05 Thread Sage Weil
On Sun, 4 Nov 2012, Nick Bartos wrote: Unfortunately I'm still seeing deadlocks. The trace was taken after a 'sync' from the command line was hung for a couple minutes. There was only one debug message (one fs on the system was mounted with 'mand'): This was with the updated patch

Re: Ubuntu R-Series plans for Ceph

2012-11-05 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Wido On 04/11/12 21:54, Wido den Hollander wrote: I've started to put together a Launchpad Blueprint to act as a placeholder for discussion at the Ubuntu Developer Summit in a couple of weeks time. I'd be interested in feedback on what

RADOS extensions directory

2012-11-05 Thread Pascal de Bruijn
Hi, In light of #cephday I noticed something that struck me as odd. The presenation listed the following as the RADOS extensions directory (if I recall correctly): /var/lib/rados/*.so However, it's very atypical to have shared libraries in /var. As far as I know /var/lib is typically used

massive rbd speed decrease with latest git master

2012-11-05 Thread Stefan Priebe - Profihost AG
Hello list, while updating ceph to the latest master i noticed this: Commit a01b112d71a0b6a1bb206d53867d13536d17bbf6: rand. 4k WRITE: 3000 iop/s Commit 72a710ac47355ae3ca1d055be1496c91956d549d: rand. 4k WRITE: 900 iop/s Same for rand 4k read - haven't tested sequential stuff. Stefan -- To

Re: massive rbd speed decrease with latest git master

2012-11-05 Thread Stefan Priebe - Profihost AG
Am 05.11.2012 11:39, schrieb Stefan Priebe - Profihost AG: Hello list, while updating ceph to the latest master i noticed this: Commit a01b112d71a0b6a1bb206d53867d13536d17bbf6: rand. 4k WRITE: 3000 iop/s Commit 72a710ac47355ae3ca1d055be1496c91956d549d: rand. 4k WRITE: 900 iop/s Same for rand

Re: Large numbers of OSD per node

2012-11-05 Thread Wido den Hollander
Hi, On 05-11-12 08:14, Andrew Thrift wrote: Hi, We are evaluating CEPH for deployment. I was wondering if there are any current best practices around the number of OSD's per node ? e.g. We are looking at deploying 3 nodes, each with 72x SAS disks, and 2x 10gigabit Ethernet bonded. Would

Re: UBUNTU kernel version and ceph 0.51 compatibility

2012-11-05 Thread Wido den Hollander
On 05-11-12 05:55, hemant surale wrote: Hi Community, I want to know that is there any compatibility chart between UBUNTU kernel version ceph v 0.51 ? Or does anyone know best suite kernel version of UBUNTU with ceph v0.51. Till now I was working with UBUNTU 12.04 kernel 3.2.0

Re: RADOS extensions directory

2012-11-05 Thread Wido den Hollander
On 05-11-12 10:30, Pascal de Bruijn wrote: Hi, In light of #cephday I noticed something that struck me as odd. The presenation listed the following as the RADOS extensions directory (if I recall correctly): /var/lib/rados/*.so /var/lib/rados-classes/*.so It's handled by this configuration

use striping feature

2012-11-05 Thread Stefan Priebe - Profihost AG
Hello list, is the following syntax to use the new rbd v2 striping feature correct? rbd create -p kvmstor --format 2 --size 32000 --stripe-count 8 --stripe-unit 131072 -s 1048576 $imagename Idea is to have 8 stripes per 1MB object size. Or are there any other recommondations regarding

Re: RGW: Pools .rgw .rgw.control .users.uid .users.email .users

2012-11-05 Thread Sylvain Munaut
Hi, Also, I assume those pools will actually be pretty small and so I can just leave them with PG_NUM=8 without much issue ? Data will not be distributed evenly across the cluster, and there may be a high contention on these pgs so it'd affect performance. But what is stored in those

Re: Large numbers of OSD per node

2012-11-05 Thread Mark Nelson
On 11/05/2012 05:01 AM, Wido den Hollander wrote: Hi, On 05-11-12 08:14, Andrew Thrift wrote: Hi, We are evaluating CEPH for deployment. I was wondering if there are any current best practices around the number of OSD's per node ? e.g. We are looking at deploying 3 nodes, each with 72x SAS

chaning pg_num / pgp_num after adding more osds

2012-11-05 Thread Stefan Priebe - Profihost AG
Hello list, Is there a way to change the number of pg_num / pgp_num after adding more osds? I mean i would like to start with 16 OSDs but i think i'll expand over time to up to 100 OSDs. So i think i need to tune pg_num / pgp_num. Greets, Stefan -- To unsubscribe from this list: send the

recommanded cache setting for rbd image

2012-11-05 Thread Stefan Priebe - Profihost AG
Hello list, right now i'm testing rbd block devices with kvm using Default cache (no cache). Is there any recommanded value for RBD? (i don't wanna loose data) Stefan -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org

What would a good OSD node hardware configuration look like?

2012-11-05 Thread Dennis Jacobfeuerborn
Hi, I'm thinking about building a ceph cluster and I'm wondering what a good configuration would look like for 4-8 (and maybe more) 2HU 8-disk or 3HU 16-disk systems. Would it make sense to make each disk an individual OSD or should I perhaps create several raid-0 and create OSDs from those? Also

Re: [PATCH V2 2/2] ceph: Fix i_size update race

2012-11-05 Thread Sage Weil
On Mon, 5 Nov 2012, Yan, Zheng wrote: On Mon, Nov 5, 2012 at 12:45 AM, Sage Weil s...@inktank.com wrote: On Sun, 4 Nov 2012, Yan, Zheng wrote: Short write happens when we fail to get 'Fb' cap for all pages. Why shouldn't we fall back to sync write, I think some user programs assume

Re: massive rbd speed decrease with latest git master

2012-11-05 Thread Stefan Priebe
Am 05.11.2012 11:39, schrieb Stefan Priebe - Profihost AG: Hello list, while updating ceph to the latest master i noticed this: Commit a01b112d71a0b6a1bb206d53867d13536d17bbf6: rand. 4k WRITE: 3000 iop/s Commit 72a710ac47355ae3ca1d055be1496c91956d549d: rand. 4k WRITE: 900 iop/s Same for rand

Re: [PATCH 1/2] mds: Don't acquire replica object's versionlock

2012-11-05 Thread Sage Weil
On Thu, 1 Nov 2012, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com Both CInode and CDentry's versionlocks are of type LocalLock. Acquiring LocalLock in replica object is useless and problematic. For example, if two requests try acquiring a replica object's versionlock, the first

Re: explicitly specifying pgnum on pool creation

2012-11-05 Thread Josh Durgin
On 11/04/2012 03:42 AM, Sage Weil wrote: The wip-explicit-pgnum changes the 'ceph osd pool create name pgnum' command to require the pg_num value instead of defaulting to 8. This would make it harder for users to get this wrong. I like this. It'd be great if we could add a pgnum argument to

Re: massive rbd speed decrease with latest git master

2012-11-05 Thread Mark Nelson
On 11/05/2012 12:48 PM, Stefan Priebe wrote: Am 05.11.2012 11:39, schrieb Stefan Priebe - Profihost AG: Hello list, while updating ceph to the latest master i noticed this: Commit a01b112d71a0b6a1bb206d53867d13536d17bbf6: rand. 4k WRITE: 3000 iop/s Commit

Re: massive rbd speed decrease with latest git master

2012-11-05 Thread Stefan Priebe
Am 05.11.2012 19:58, schrieb Mark Nelson: On 11/05/2012 12:48 PM, Stefan Priebe wrote: Am 05.11.2012 11:39, schrieb Stefan Priebe - Profihost AG: Hello list, while updating ceph to the latest master i noticed this: Commit a01b112d71a0b6a1bb206d53867d13536d17bbf6: rand. 4k WRITE: 3000 iop/s

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Stefan Priebe
Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly. Hopefully later today. Thanks - seems to be fixed with wip-rbd-read but i have

Re: explicitly specifying pgnum on pool creation

2012-11-05 Thread Sage Weil
On Mon, 5 Nov 2012, Josh Durgin wrote: On 11/04/2012 03:42 AM, Sage Weil wrote: The wip-explicit-pgnum changes the 'ceph osd pool create name pgnum' command to require the pg_num value instead of defaulting to 8. This would make it harder for users to get this wrong. I like this. It'd

Re: use striping feature

2012-11-05 Thread Josh Durgin
On 11/05/2012 04:03 AM, Stefan Priebe - Profihost AG wrote: Hello list, is the following syntax to use the new rbd v2 striping feature correct? rbd create -p kvmstor --format 2 --size 32000 --stripe-count 8 --stripe-unit 131072 -s 1048576 $imagename Idea is to have 8 stripes per 1MB object

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Stefan Priebe
Am 05.11.2012 20:33, schrieb Stefan Priebe: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly. Hopefully later today. Thanks -

Re: OSD deadlock with cephfs client and OSD on same machine

2012-11-05 Thread Cláudio Martins
On Fri, 1 Jun 2012 11:35:37 +0200 Amon Ott a@m-privacy.de wrote: After backporting syncfs() support into Debian stable libc6 2.11 and recompiling Ceph with it, our test cluster is now running with syncfs(). Hi, We're running OSDs on top of Debian wheezy, which unfortunately has

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Andrey Korolyov
On Mon, Nov 5, 2012 at 11:33 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly.

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Stefan Priebe
Am 05.11.2012 21:20, schrieb Andrey Korolyov: On Mon, Nov 5, 2012 at 11:33 PM, Stefan Priebe s.pri...@profihost.ag wrote: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first

Re: BUG: kvm crashing in void librbd::AioCompletion::complete_request

2012-11-05 Thread Stefan Priebe
Am 05.11.2012 20:33, schrieb Stefan Priebe: Am 04.11.2012 15:12, schrieb Sage Weil: On Sun, 4 Nov 2012, Stefan Priebe wrote: Can i merge wip-rbd-read into master? Yeah. I'm going to do a bit more testing first before I do it, but it should apply cleanly. Hopefully later today. Thanks -

Re: chaning pg_num / pgp_num after adding more osds

2012-11-05 Thread Josh Durgin
On 11/05/2012 06:14 AM, Stefan Priebe - Profihost AG wrote: Hello list, Is there a way to change the number of pg_num / pgp_num after adding more osds? The pg_num/pgp_num settings are only used by mkcephfs at install time. I mean i would like to start with 16 OSDs but i think i'll expand

Re: RGW: Pools .rgw .rgw.control .users.uid .users.email .users

2012-11-05 Thread Yehuda Sadeh
On Mon, Nov 5, 2012 at 4:11 AM, Sylvain Munaut s.mun...@whatever-company.com wrote: Hi, Also, I assume those pools will actually be pretty small and so I can just leave them with PG_NUM=8 without much issue ? Data will not be distributed evenly across the cluster, and there may be a high

Re: Large numbers of OSD per node

2012-11-05 Thread Andrew Thrift
Mark, Wido, Thank you very much for your informed responses. What you have mentioned makes a lot of sense. If we had a single node completely fail, we would have 72TB of data that needed to be replicated to a new OSD. This would take approximately 10.5 hours to complete over 2x Bonded 10gig

Re: python-ceph

2012-11-05 Thread Travis Rhoden
Hi Greg, I'm not familiar with Python packaging, can you talk about this a bit more? I'd be happy to. PyPI (the Python Package Index) is a repo on python.org for distributing/sharing Python projects. People can publish their code onto it for others to download. If you are a Perl guy, it's a

Re: python-ceph

2012-11-05 Thread Travis Rhoden
On Sat, Nov 3, 2012 at 2:59 PM, Sage Weil s...@inktank.com wrote: On Fri, 2 Nov 2012, Travis Rhoden wrote: Hi folks, Are there any plans to release python-ceph to pypi? It would be nice to see it packaged up in distutils/egg format and added to pypi, that way other python packages can list

Re: What would a good OSD node hardware configuration look like?

2012-11-05 Thread Dennis Jacobfeuerborn
On 11/06/2012 01:14 AM, Josh Durgin wrote: On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote: Hi, I'm thinking about building a ceph cluster and I'm wondering what a good configuration would look like for 4-8 (and maybe more) 2HU 8-disk or 3HU 16-disk systems. Would it make sense to make

Fs to use?

2012-11-05 Thread Stefan Priebe - Profihost AG
Hello list, is there any recommendation regarding fs? I mean btrfs is still experimental would you still use it with ceph in production? Do I need big metadata with btrfs? (seems to make btrfs slow) Greets Stefan-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body

Re: chaning pg_num / pgp_num after adding more osds

2012-11-05 Thread Stefan Priebe - Profihost AG
Am 06.11.2012 00:45, schrieb Josh Durgin: On 11/05/2012 06:14 AM, Stefan Priebe - Profihost AG wrote: Hello list, Is there a way to change the number of pg_num / pgp_num after adding more osds? The pg_num/pgp_num settings are only used by mkcephfs at install time. I mean i would like to

Re: recommanded cache setting for rbd image

2012-11-05 Thread Stefan Priebe - Profihost AG
Am 06.11.2012 00:57, schrieb Josh Durgin: On 11/05/2012 06:53 AM, Stefan Priebe - Profihost AG wrote: Hello list, right now i'm testing rbd block devices with kvm using Default cache (no cache). Is there any recommanded value for RBD? (i don't wanna loose data) It acts like a well-behaved

Re: What would a good OSD node hardware configuration look like?

2012-11-05 Thread Stefan Priebe - Profihost AG
Am 06.11.2012 01:14, schrieb Josh Durgin: On 11/05/2012 09:13 AM, Dennis Jacobfeuerborn wrote: Hi, I'm thinking about building a ceph cluster and I'm wondering what a good configuration would look like for 4-8 (and maybe more) 2HU 8-disk or 3HU 16-disk systems. Would it make sense to make each