Re: 0.55 init script Issue?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 06/12/12 13:35, Sage Weil wrote: Yeah, this or something very similar is definitely the correct solution. Sage recently added the ceph upstart job, and we didn't put it through sufficient verification prior to release in order to notice this issue. Users who aren't using upstart (I expect that's all of them) should just delete the job after running the package install. We'll certainly sort this out prior to the next release; I'm not sure if we want to roll a v0.55.1 right away or not. Let's push it to the testing branch, but make sure any other fixes are there before rolling a .1.. maybe tomorrow? I've pushed this to the testing branch. If someone wants to verify the packages built at http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/testing/ are fixed, that would be fabulous! I think the radosgw init script and upstart configuration are going to conflict in a similar way to the ceph one. I've been working on integrating the upstart configurations into the Ubuntu distro packaging of ceph; it is possible to install multiple upstart configurations into single binary packages using dh_installinit: dh_installinit --no-start # Install upstart configurations using dh_installinit for conf in `ls -1 src/upstart/ceph-*.conf | grep -v mds`; do \ name=`basename $$conf | cut -d . -f 1`; \ cp $$conf debian/ceph.$$name.upstart; \ dh_installinit -pceph --upstart-only --no-start --name=$$name; \ done for conf in `ls -1 src/upstart/ceph-mds*.conf`; do \ name=`basename $$conf | cut -d . -f 1`; \ cp $$conf debian/ceph-mds.$$name.upstart; \ dh_installinit -pceph-mds --upstart-only --no-start - --name=$$name; \ done for conf in `ls -1 src/upstart/radosgw*.conf`; do \ name=`basename $$conf | cut -d . -f 1`; \ cp $$conf debian/radosgw.$$name.upstart; \ dh_installinit -pradosgw --upstart-only --no-start - --name=$$name; \ done - -- James Page Ubuntu Core Developer Debian Maintainer james.p...@ubuntu.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCAAGBQJQxafNAAoJEL/srsug59jDGloP/0NK8RAFLLGcwHBOGqPSiOSr e61NAus6zM0gRHdY0GxPLLzh8XfcoJtpiOdJE2PeC0YbNX7SRLnlHBc3uh3nRv0+ LF5RfzN0MASpk5miCVIpZCNhRVbg+tte1k3tEbZEgwsFavmhXdXzL7bwaKDKVuRp Y8/XENu6qF2tSt1A1P5ABOqetjrZi78Z6bNSBS20N2PIZHAXwdb2MdFPKO600n2w jVPjXh/nfCuPpmehE+ZVP/1y/7jLvtgYTIppg8bHNG0X4TI6aarHh3fiypti+UNu y/fQvx48ktkCtaF7nfoIx+Kr1qWO9xcuHYZy/35a9woJIyBWGTHS4yT/UYG202gM JjelhXu0BM4WrmTf9bdFo2iK55MNtKOEPOAjmy4FZbVtBc1iUNtyBOJGDSv5Ls+0 uwDImZbg1W8VqtnioQRrhNUelgV3SLYoWfzOjAsHMoS25/WsLwvnRm8XExD2223l SQOfcwXjTDldRRc04wohr2Rc1/vuTEFGt94cLVf67UutEDgN4T8LK2Za+RV/D6/e GmrofhClKQUIeHm3WaD3P2oJJZIG+MOeNG0+Y1JB4aOgL+ZUtuLTQo3xa8fhvhDA mRcYrCvUwS+k70OcSh2GTpt9y+1mnk+qAB4JR3SclTVOUUh4i9KtYa+1F5+8NWwU DsfC6VS1vdgPnBEcKw0d =5jkr -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: on disk encryption
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 19/09/12 02:53, Dustin Kirkland wrote: Looking forward, another option might be to implement encryption inside btrfs (placeholder fields are there in the disk format, introduced along with the compression code way back when). This would let ceph-osd handle more of the key handling internally and do something like, say, only encrypt the current/ and snap_*/ subdirectories. Other ideas? Thoughts? sage I love the idea of btrfs supporting encryption natively much like it does compression. It may be some time before that happens, so in the meantime, I'd love to see Ceph support dm-crypt and/or eCryptfs beneath. Has this discussion progressed into any sort of implementation yet? It sounds like this is going to be a key feature for users who want top-to-bottom encryption of data right down to the block level. - -- James Page Ubuntu Core Developer Debian Maintainer james.p...@ubuntu.com -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCAAGBQJQxaiUAAoJEL/srsug59jDULUP/2pbVVNJC/Dt6S7A+uUGMGQJ /jgFqu6SVHplTGs3cKqDYH22W9b34Gr/kcga9qj00lo844drRNRo/AVfdaA+j7Ge gkqc4ZiwNgZSHmu+I9/4fDpSRJf19i2le1/qtIToAXsxZJyefM4clPrWblK24bRd T7yWbVJBxjiYv7FziHZohDEJ/jz2OMk4THZYVkB+yuUPLbDnbFxqK17gRtKPuS/K EeuFBw1kFgB0OKQ4LGy/GSOK1xM4NiGKpdV9beeSfu1L5f1ClW0Drl221gnhZ4qe g6HXAdCK1xhDU2xUhrrPSp0iVFGjxjnvoQz7PikX6Hn5lhqjbAHVaoQ9dJpshAsY 86XDVFJJF2ca9FjzBGo+Cx7Ap0ahI4eK1NTiNc/zPEb8TgM9q1OtIlAb9A6pyC/E l0WQ/0WzhbbnjeByXloLkTG2K0WkaJYovemc959VUrdpP5Di2vsEhhFsVFlFUlTC i8xQaQZmoXXp8mhzNwdSLIcoUb9Y5MnghNO3mdz6WfM2KtyrTobi5lKZyFxZJfhA oGt5It6AF/fRHi2Xu9yLyfVYrnf/oDJn1vjzJ0BkJLZ8rUANLVGYrpiKAECY1EF3 Nb2kXnhBVs1426TgvcAlchDUACPNUR2YVx9s12gHVTZURgrSr0+QMPHJL9uRJPxE 5T4wqmJNV2Caponla/fr =wHrw -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: A couple of OSD-crashes after serious network trouble
Hi Sam, helpful input.. and... not so... On 12/07/2012 10:18 PM, Samuel Just wrote: Ah... unfortunately doing a repair in these 6 cases would probably result in the wrong object surviving. It should work, but it might corrupt the rbd image contents. If the images are expendable, you could repair and then delete the images. The red flag here is that the known size is smaller than the other size. This indicates that it most likely chose the wrong file as the correct one since rbd image blocks usually get bigger over time. To fix this, you will need to manually copy the file for the larger of the two object replicas to replace the smaller of the two object replicas. For the first, soid 87c96f10/rb.0.47d9b.1014b7b4.02df/head//65 in pg 65.10: 1) Find the object on the primary and the replica (from above, primary is 12 and replica is 40). You can use find in the primary and replica current/65.10_head directories to look for a file matching *rb.0.47d9b.1014b7b4.02df*). The file name should be 'rb.0.47d9b.1014b7b4.02df__head_87C96F10__65' I think. 2) Stop the primary and replica osds 3) Compare the file sizes for the two files -- you should find that the file sizes do not match. 4) Replace the smaller file with the larger one (you'll probably want to keep a copy of the smaller one around just in case). 5) Restart the osds and scrub pg 65.10 -- the pg should come up clean (possibly with a relatively harmless stat mismatch) been there. on OSD.12 it's -rw-r--r-- 1 root root 699904 Dec 9 06:25 rb.0.47d9b.1014b7b4.02df__head_87C96F10__41 on OSD.40: -rw-r--r-- 1 root root 4194304 Dec 9 06:25 rb.0.47d9b.1014b7b4.02df__head_87C96F10__41 going by a short glance into the file, there are some readable syslog-entries, in both files. For the bad luck in this example, the shorter file contains the more current entries?! What exactly happens, if I try to copy or export the file? Which block will be chosen? VM is running as I'm writing, so flexibility reduced. Regards, Oliver. If this worked our correctly, you can repeat for the other 5 cases. Let me know if you have any questions. -Sam On Fri, Dec 7, 2012 at 11:09 AM, Oliver Francke oliver.fran...@filoo.de wrote: Hi Sam, Am 07.12.2012 um 19:37 schrieb Samuel Just sam.j...@inktank.com: That is very likely to be one of the merge_log bugs fixed between 0.48 and 0.55. I could confirm with a stacktrace from gdb with line numbers or the remainder of the logging dumped when the daemon crashed. My understanding of your situation is that currently all pgs are active+clean but you are missing some rbd image headers and some rbd images appear to be corrupted. Is that accurate? -Sam thnx for droppig in. Uhm almost correct, there are now 6 pg in state inconsistent: HEALTH_WARN 6 pgs inconsistent pg 65.da is active+clean+inconsistent, acting [1,33] pg 65.d7 is active+clean+inconsistent, acting [13,42] pg 65.10 is active+clean+inconsistent, acting [12,40] pg 65.f is active+clean+inconsistent, acting [13,31] pg 65.75 is active+clean+inconsistent, acting [1,33] pg 65.6a is active+clean+inconsistent, acting [13,31] I know which images are affected, but does a repair help? 0 log [ERR] : 65.10 osd.40: soid 87c96f10/rb.0.47d9b.1014b7b4.02df/head//65 size 4194304 != known size 699904 0 log [ERR] : 65.6a osd.31: soid 19a2526a/rb.0.2dcf2.1da2a31e.0737/head//65 size 4191744 != known size 2757632 0 log [ERR] : 65.75 osd.33: soid 20550575/rb.0.2d520.5c17a6e3.0339/head//65 size 4194304 != known size 1238016 0 log [ERR] : 65.d7 osd.42: soid fa3a5d7/rb.0.2c2a8.12ec359d.205c/head//65 size 4194304 != known size 1382912 0 log [ERR] : 65.da osd.33: soid c2a344da/rb.0.2be17.cb4bd69.0081/head//65 size 4191744 != known size 1815552 0 log [ERR] : 65.f osd.31: soid e8d2430f/rb.0.2d1e9.1339c5dd.0c41/head//65 size 2424832 != known size 2331648 of make things worse? I could only check 14 out of 20 OSD's so far, cause from two older nodes a scrub leads to slow-requests… couple of minutes, so VM's got stalled… customers pressing the reset-button, so losing caches… Comments welcome, Oliver. On Fri, Dec 7, 2012 at 6:39 AM, Oliver Francke oliver.fran...@filoo.de wrote: Hi, is the following a known one, too? Would be good to get it out of my head: /var/log/ceph/ceph-osd.40.log.1.gz: 1: /usr/bin/ceph-osd() [0x706c59] /var/log/ceph/ceph-osd.40.log.1.gz: 2: (()+0xeff0) [0x7f7f306c0ff0] /var/log/ceph/ceph-osd.40.log.1.gz: 3: (gsignal()+0x35) [0x7f7f2f35f1b5] /var/log/ceph/ceph-osd.40.log.1.gz: 4: (abort()+0x180) [0x7f7f2f361fc0] /var/log/ceph/ceph-osd.40.log.1.gz: 5: (__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7f2fbf3dc5] /var/log/ceph/ceph-osd.40.log.1.gz: 6: (()+0xcb166) [0x7f7f2fbf2166] /var/log/ceph/ceph-osd.40.log.1.gz: 7: (()+0xcb193) [0x7f7f2fbf2193] /var/log/ceph/ceph-osd.40.log.1.gz: 8: (()+0xcb28e) [0x7f7f2fbf228e] /var/log/ceph/ceph-osd.40.log.1.gz:
Re: [Qemu-devel] [PATCHv6] rbd block driver fix race between aio completition and aio cancel
Am 30.11.2012 14:50, schrieb Stefan Hajnoczi: On Fri, Nov 30, 2012 at 9:55 AM, Stefan Priebe s.pri...@profihost.ag wrote: This one fixes a race which qemu had also in iscsi block driver between cancellation and io completition. qemu_rbd_aio_cancel was not synchronously waiting for the end of the command. To archieve this it introduces a new status flag which uses -EINPROGRESS. Changes since PATCHv5: - qemu_aio_release has to be done in qemu_rbd_aio_cancel if I/O was cancelled Changes since PATCHv4: - removed unnecessary qemu_vfree of acb-bounce as BH will always run Changes since PATCHv3: - removed unnecessary if condition in rbd_start_aio as we haven't start io yet - moved acb-status = 0 to rbd_aio_bh_cb so qemu_aio_wait always waits until BH was executed Changes since PATCHv2: - fixed missing braces - added vfree for bounce Signed-off-by: Stefan Priebe s.pri...@profihost.ag --- block/rbd.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@gmail.com Thanks, applied to the block branch. For future patches, please put a --- line between the real commit message (including the SoB, of course) and the changelog so that git am automatically removes the changelog. Kevin -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: on disk encryption
On Monday, December 10, 2012 at 1:17 AM, James Page wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 19/09/12 02:53, Dustin Kirkland wrote: Looking forward, another option might be to implement encryption inside btrfs (placeholder fields are there in the disk format, introduced along with the compression code way back when). This would let ceph-osd handle more of the key handling internally and do something like, say, only encrypt the current/ and snap_*/ subdirectories. Other ideas? Thoughts? sage I love the idea of btrfs supporting encryption natively much like it does compression. It may be some time before that happens, so in the meantime, I'd love to see Ceph support dm-crypt and/or eCryptfs beneath. Has this discussion progressed into any sort of implementation yet? It sounds like this is going to be a key feature for users who want top-to-bottom encryption of data right down to the block level. Peter is working on this now — I'll let him discuss the details. :) -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX
On 9 Dec 2012, at 18:22, Noah Watkins wrote: On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote: Oooh, very nice! Do you have a list of the dependencies that you actually needed to install? I can put that together. They were boost, gperf, fuse4x, cryptopp. I think that might have been it. Is libaio really needed to build ceph-fuse? I use macports on my system and the last time I tried to make a change set to let ceph/ceph-fuse build on my laptop failed as I didn't have libaio, though I could just write a port for it. Apart from breaking this up into smaller patches, we'll also want to reformat some of it. Rather than sticking an #if APPLE on top of every spin lock, we should have utility functions that do this for us. ;) Definitely. OSX has spinlock implementations for user space, but it's going to take some reading. For example, spinlocks in Ceph are initialized for shared memory, rather than the default private. It isn't clear from documentation what the semantics are of OSX spinlocks, nor is it clear if the shared memory attribute is needed. Also, we should be able to find libatomic_ops for OS X (its parent project works under OS X), and we can use that to construct a spin lock if we think it'll be useful. I'm not too sure how effective its muteness are at spinlock-y workloads. This patch set uses the OSX atomic inc/dec ops, rather than spinlocks. Another fun fact: msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o and libcommon_la-pipe.o, but HFS+ is case-insensitive by default. Result is duplicate symbols. That took a while to figure out :P good catch, that might explain why my last look at ceph on osx failed so miserably. Jimmy. -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie Tel: +353-1-896-3847 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX
On Mon, 10 Dec 2012, Jimmy Tang wrote: On 9 Dec 2012, at 18:22, Noah Watkins wrote: On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote: Oooh, very nice! Do you have a list of the dependencies that you actually needed to install? I can put that together. They were boost, gperf, fuse4x, cryptopp. I think that might have been it. Is libaio really needed to build ceph-fuse? I use macports on my system and the last time I tried to make a change set to let ceph/ceph-fuse build on my laptop failed as I didn't have libaio, though I could just write a port for it. libaio is only used by ceph-osd. Not needed by fuse. sage Apart from breaking this up into smaller patches, we'll also want to reformat some of it. Rather than sticking an #if APPLE on top of every spin lock, we should have utility functions that do this for us. ;) Definitely. OSX has spinlock implementations for user space, but it's going to take some reading. For example, spinlocks in Ceph are initialized for shared memory, rather than the default private. It isn't clear from documentation what the semantics are of OSX spinlocks, nor is it clear if the shared memory attribute is needed. Also, we should be able to find libatomic_ops for OS X (its parent project works under OS X), and we can use that to construct a spin lock if we think it'll be useful. I'm not too sure how effective its muteness are at spinlock-y workloads. This patch set uses the OSX atomic inc/dec ops, rather than spinlocks. Another fun fact: msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o and libcommon_la-pipe.o, but HFS+ is case-insensitive by default. Result is duplicate symbols. That took a while to figure out :P good catch, that might explain why my last look at ceph on osx failed so miserably. Jimmy. -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie Tel: +353-1-896-3847 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX
On 12/10/2012 07:01 AM, Sage Weil wrote: On Mon, 10 Dec 2012, Jimmy Tang wrote: On 9 Dec 2012, at 18:22, Noah Watkins wrote: On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote: Oooh, very nice! Do you have a list of the dependencies that you actually needed to install? I can put that together. They were boost, gperf, fuse4x, cryptopp. I think that might have been it. Is libaio really needed to build ceph-fuse? I use macports on my system and the last time I tried to make a change set to let ceph/ceph-fuse build on my laptop failed as I didn't have libaio, though I could just write a port for it. libaio is only used by ceph-osd. Not needed by fuse. An alternative on OSX could be aio-lite: https://trac.mcs.anl.gov/projects/aio-lite It might perform better on linux as well because of the request serialization there, although that library was implemented a few years ago, and the linux implementation may have improved significantly since then. It also wouldn't be hard to do something similar with ceph thread structures instead of depending on an external library like this one. -sam sage Apart from breaking this up into smaller patches, we'll also want to reformat some of it. Rather than sticking an #if APPLE on top of every spin lock, we should have utility functions that do this for us. ;) Definitely. OSX has spinlock implementations for user space, but it's going to take some reading. For example, spinlocks in Ceph are initialized for shared memory, rather than the default private. It isn't clear from documentation what the semantics are of OSX spinlocks, nor is it clear if the shared memory attribute is needed. Also, we should be able to find libatomic_ops for OS X (its parent project works under OS X), and we can use that to construct a spin lock if we think it'll be useful. I'm not too sure how effective its muteness are at spinlock-y workloads. This patch set uses the OSX atomic inc/dec ops, rather than spinlocks. Another fun fact: msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o and libcommon_la-pipe.o, but HFS+ is case-insensitive by default. Result is duplicate symbols. That took a while to figure out :P good catch, that might explain why my last look at ceph on osx failed so miserably. Jimmy. -- Senior Software Engineer, Digital Repository of Ireland (DRI) Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie Tel: +353-1-896-3847 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
subscribe ceph-devel -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mds don´t start after upgrade 0.55
Hi. I can´t start mds server. The logs show: --- begin dump of recent events --- -30 2012-12-09 16:49:52.181838 7f547966a780 5 asok(0x1d8c000) register_command perfcounters_dump hook 0x1d74010 -29 2012-12-09 16:49:52.181884 7f547966a780 5 asok(0x1d8c000) register_command 1 hook 0x1d74010 -28 2012-12-09 16:49:52.181890 7f547966a780 5 asok(0x1d8c000) register_command perf dump hook 0x1d74010 -27 2012-12-09 16:49:52.181901 7f547966a780 5 asok(0x1d8c000) register_command perfcounters_schema hook 0x1d74010 -26 2012-12-09 16:49:52.181907 7f547966a780 5 asok(0x1d8c000) register_command 2 hook 0x1d74010 -25 2012-12-09 16:49:52.181910 7f547966a780 5 asok(0x1d8c000) register_command perf schema hook 0x1d74010 -24 2012-12-09 16:49:52.181915 7f547966a780 5 asok(0x1d8c000) register_command config show hook 0x1d74010 -23 2012-12-09 16:49:52.181919 7f547966a780 5 asok(0x1d8c000) register_command config set hook 0x1d74010 -22 2012-12-09 16:49:52.181926 7f547966a780 5 asok(0x1d8c000) register_command log flush hook 0x1d74010 -21 2012-12-09 16:49:52.181932 7f547966a780 5 asok(0x1d8c000) register_command log dump hook 0x1d74010 -20 2012-12-09 16:49:52.181936 7f547966a780 5 asok(0x1d8c000) register_command log reopen hook 0x1d74010 -19 2012-12-09 16:49:52.183484 7f547966a780 0 ceph version 0.55 (690f8175606edf37a3177c27a3949c78fd37099f), process ceph-mds, pid 2400 -18 2012-12-09 16:49:52.184629 7f547966a780 1 finished global_init_daemonize -17 2012-12-09 16:49:52.187153 7f547966a780 5 asok(0x1d8c000) init /var/run/ceph/ceph-mds.a.asok -16 2012-12-09 16:49:52.187209 7f547966a780 5 asok(0x1d8c000) bind_and_listen /var/run/ceph/ceph-mds.a.asok -15 2012-12-09 16:49:52.187274 7f547966a780 5 asok(0x1d8c000) register_command 0 hook 0x1d720b8 -14 2012-12-09 16:49:52.187291 7f547966a780 5 asok(0x1d8c000) register_command version hook 0x1d720b8 -13 2012-12-09 16:49:52.187306 7f547966a780 5 asok(0x1d8c000) register_command git_version hook 0x1d720b8 -12 2012-12-09 16:49:52.187316 7f547966a780 5 asok(0x1d8c000) register_command help hook 0x1d740c0 -11 2012-12-09 16:49:52.187369 7f547966a780 10 monclient(hunting): build_initial_monmap -10 2012-12-09 16:49:52.187697 7f547966a780 10 monclient(hunting): init -9 2012-12-09 16:49:52.188025 7f547966a780 10 monclient(hunting): auth_supported 2 -8 2012-12-09 16:49:52.188049 7f547966a780 10 monclient(hunting): _reopen_session -7 2012-12-09 16:49:52.188099 7f547966a780 10 monclient(hunting): _pick_new_mon picked mon.d con 0x1d9cf20 addr 10.0.1.244:6789/0 -6 2012-12-09 16:49:52.188129 7f547966a780 10 monclient(hunting): _send_mon_message to mon.d at 10.0.1.244:6789/0 -5 2012-12-09 16:49:52.188142 7f547966a780 10 monclient(hunting): renew_subs -4 2012-12-09 16:49:52.188224 7f5475915700 5 asok(0x1d8c000) entry start -3 2012-12-09 16:49:52.189164 7f5474112700 0 mds.-1.0 ms_handle_connect on 10.0.1.244:6789/0 -2 2012-12-09 16:49:52.189942 7f5474112700 10 monclient(hunting): no handler for protocol 0 -1 2012-12-09 16:49:52.189965 7f5474112700 10 monclient(hunting): none of our auth protocols are supported by the server 0 2012-12-09 16:49:52.190925 7f547966a780 -1 *** Caught signal (Segmentation fault) ** Maybe and issue with cephx? How I can check this? Cristian. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: OSDs don't actually delete files when two CephFS (or more) are in use
Ok! I'm not in production yet. Only testing. In about 1/2months I will use in real (waiting for v0.56 bobtail) Thanks for your consideration! really appreciate. I have an other strange behavior between CephFS and kernel mount With ceph-fuse command : Client 1 create 2000 files and do ls = 1/2sec Client 2 do ls = 1/2sec With mount -t ceph ... Client 1 create 2000 files and do ls = few ms Client 2 do ls = 4-5sec 2012/12/10 Sam Lang sam.l...@inktank.com: On 12/10/2012 10:34 AM, Geoffrey Hartz wrote: Hi. I wait about more than 15 minutes. I was doing some benchmark when I noticed that the space never go back to normal.. How can I disable this behavior? It's on Ceph side? With one client, OSD are cleaned after few seconds, this sounds normal Actually after discussing with Greg and Sage, it sounds like this is a bug. The issue is that the client that didn't remove the file is caching the dentry indefinitely. I've created a ticket to track the issue here: http://tracker.newdream.net/issues/3601. Unfortunately, I don't think there's a good workaround for the time being. You can try to evict those cached dentries by creating/accessing a bunch of other files, but the default cache size on the client is 16384, which is a lot of files to touch just to free up the space for those removed files. :-) You can decrease the cache size with the config option client_cache_size: [client] client cache size = 128 Then you only have to create/touch 128 files to evict the other files from the cache. That's not ideal, because reducing the cache size will affect your overall performance, but if you know that you won't be accessing a lot of files anyway, its probably your best bet. -sam 2012/12/10 Sam Lang sam.l...@inktank.com: On 12/10/2012 08:29 AM, Geoffrey Hartz wrote: Hi! I'm new to Ceph and I have a strange behavior with CephFS Config is : Ubuntu 12.04 Kernel 3.6.9 Ceph V0.55 2 OSD, 1 mon, 1 MDS, all on same host 2 clients, separate Hosts Ceph.conf: http://paste.ubuntu.com/1423712/ To mount the share I use : sudo ceph-fuse -m 192.168.80.139:6789 /mnt When I create a file on one client, the other see the file, can be downloaded etc. But when I delete the file, both clients don't see the file anymore BUT the file is still there on OSD (using space disk). Removing a file removes the directory entry (as you've seen), but the inode itself doesn't get removed until all references to it are dropped. The clients may cache the capability for those inodes for a period of time, so you're not seeing the references drop until they get evicted from the cache. Unmounting ensures that they get evicted from the client caches, so all references go to zero. Also, removal of the underlying objects is done lazily, so you may not see the space get freed up right away. -sam When I umount from BOTH clients, OSD are update and file is actually delete (same behavior with mount -t ceph) I'm missing something? Thanks! -- Geoffrey HARTZ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Geoffrey HARTZ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Few questions about Ceph
3) Forgive the style, it'll be going into the docs shortly :) It's possible to have multiple independent crush heirarchies within the same crush map. Suppose you want to have pools default to osds backed by large spinning disks but have some pools mapped to osds backed by fast ssds: device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 host ceph-osd-ssd-server-1 { id -1 alg straw hash 0 item osd.0 weight 1.00 item osd.1 weight 1.00 } host ceph-osd-ssd-server-2 { id -2 alg straw hash 0 item osd.2 weight 1.00 item osd.3 weight 1.00 } host ceph-osd-platter-server-1 { id -3 alg straw hash 0 item osd.4 weight 1.00 item osd.5 weight 1.00 } host ceph-osd-platter-server-2 { id -4 alg straw hash 0 item osd.6 weight 1.00 item osd.7 weight 1.00 } root platter { id -5 alg straw hash 0 item ceph-osd-platter-server-1 weight 2.00 item ceph-osd-platter-server-2 weight 2.00 } root ssd { id -6 alg straw hash 0 item ceph-osd-ssd-server-1 weight 2.00 item ceph-osd-ssd-server-2 weight 2.00 } rule data { ruleset 0 type replicated min_size 2 max_size 2 step take platter step chooseleaf 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule platter { ruleset 3 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule ssd { ruleset 4 type replicated min_size 0 max_size 10 step take ssd step chooseleaf 0 type host step emit } rule ssd-primary { ruleset 4 type replicated min_size 0 max_size 10 step take ssd step chooseleaf 1 type host step emit step take platter step chooseleaf -1 type host step emit } You can then set a pool to use the ssd rule by: ceph osd pool set poolname crush_ruleset 4 Similarly, using the ssd-primary rule will cause each pg in the pool to be placed with an ssd as the primary and platters as the replicas. -Sam On Mon, Dec 10, 2012 at 11:17 AM, Alexandre Maumené alexan...@maumene.org wrote: Hello all, I have a few questions about Ceph: 1) Is it possible to run a cluster with some lantecy between monitor nodes? Latency will be 30ms at worst. 2) When using RBD what are the best practices for a direct mount using XFS filesystem? And for a qemu/kvm devices? I'm thinking about writeback, rbd_cache, ... 3) About the CRUSH map, how can I separate 2 pools on different OSD? I'd like to setup a cluster with different disks (like SATA/SAS) and I want to be able to specify on which disks (or OSD) my data are going to be write. Thanks in advance for any answer. Regards, -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Few questions about Ceph
oops, ssd-primary should be ruleset 5 -Sam On Mon, Dec 10, 2012 at 2:22 PM, Samuel Just sam.j...@inktank.com wrote: 3) Forgive the style, it'll be going into the docs shortly :) It's possible to have multiple independent crush heirarchies within the same crush map. Suppose you want to have pools default to osds backed by large spinning disks but have some pools mapped to osds backed by fast ssds: device 0 osd.0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5 device 6 osd.6 device 7 osd.7 host ceph-osd-ssd-server-1 { id -1 alg straw hash 0 item osd.0 weight 1.00 item osd.1 weight 1.00 } host ceph-osd-ssd-server-2 { id -2 alg straw hash 0 item osd.2 weight 1.00 item osd.3 weight 1.00 } host ceph-osd-platter-server-1 { id -3 alg straw hash 0 item osd.4 weight 1.00 item osd.5 weight 1.00 } host ceph-osd-platter-server-2 { id -4 alg straw hash 0 item osd.6 weight 1.00 item osd.7 weight 1.00 } root platter { id -5 alg straw hash 0 item ceph-osd-platter-server-1 weight 2.00 item ceph-osd-platter-server-2 weight 2.00 } root ssd { id -6 alg straw hash 0 item ceph-osd-ssd-server-1 weight 2.00 item ceph-osd-ssd-server-2 weight 2.00 } rule data { ruleset 0 type replicated min_size 2 max_size 2 step take platter step chooseleaf 0 type host step emit } rule metadata { ruleset 1 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule rbd { ruleset 2 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule platter { ruleset 3 type replicated min_size 0 max_size 10 step take platter step chooseleaf 0 type host step emit } rule ssd { ruleset 4 type replicated min_size 0 max_size 10 step take ssd step chooseleaf 0 type host step emit } rule ssd-primary { ruleset 4 type replicated min_size 0 max_size 10 step take ssd step chooseleaf 1 type host step emit step take platter step chooseleaf -1 type host step emit } You can then set a pool to use the ssd rule by: ceph osd pool set poolname crush_ruleset 4 Similarly, using the ssd-primary rule will cause each pg in the pool to be placed with an ssd as the primary and platters as the replicas. -Sam On Mon, Dec 10, 2012 at 11:17 AM, Alexandre Maumené alexan...@maumene.org wrote: Hello all, I have a few questions about Ceph: 1) Is it possible to run a cluster with some lantecy between monitor nodes? Latency will be 30ms at worst. 2) When using RBD what are the best practices for a direct mount using XFS filesystem? And for a qemu/kvm devices? I'm thinking about writeback, rbd_cache, ... 3) About the CRUSH map, how can I separate 2 pools on different OSD? I'd like to setup a cluster with different disks (like SATA/SAS) and I want to be able to specify on which disks (or OSD) my data are going to be write. Thanks in advance for any answer. Regards, -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Segmentation fault on rbd client ceph version 0.48.2argonaut
On 12/10/2012 01:54 PM, Vladislav Gorbunov wrote: but access to iscsi/seodo1 and iscsi/siri1 fail on every rbd client hosts. Data completely inaccessible. root@bender:~# rbd info iscsi/seodo1 *** Caught signal (Segmentation fault) ** in thread 7fb8c93f5780 ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: rbd() [0x41dfea] 2: (()+0xfcb0) [0x7fb8c796fcb0] 3: (()+0x16244d) [0x7fb8c6ae444d] 4: (librbd::read_header_bl(librados::IoCtx, std::string const, ceph::buffer::list, unsigned long*)+0xf9) [0x7fb8c8fadb99] 5: (librbd::read_header(librados::IoCtx, std::string const, rbd_obj_header_ondisk*, unsigned long*)+0x82) [0x7fb8c8fadda2] 6: (librbd::ictx_refresh(librbd::ImageCtx*)+0x90b) [0x7fb8c8fb05eb] 7: (librbd::open_image(librbd::ImageCtx*)+0x1b5) [0x7fb8c8fb1165] 8: (librbd::RBD::open(librados::IoCtx, librbd::Image, char const*, char const*)+0x5f) [0x7fb8c8fb16af] 9: (main()+0x73c) [0x41721c] 10: (__libc_start_main()+0xed) [0x7fb8c69a376d] 11: rbd() [0x41a0c9] 2012-12-11 09:33:14.264755 7fb8c93f5780 -1 *** Caught signal (Segmentation fault) ** in thread 7fb8c93f5780 It sounds like the header object (which rbd uses to determine the prefix for data object names) is corrupted or otherwise inaccessible. Could you save the header object to a file ('rados -p iscsi get seodo1.rbd') and put that file somewhere accessible? Did anything happen to your cluster before this header became unreadable? Any disk problems, or osds crashing? Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Errors attaching RBD image to a running VM
On Mon, Dec 10, 2012 at 01:12:45PM -0800, Josh Durgin wrote: There was a regression in 1.0.0 with attaching non-files, such as RBD. This is fixed by f0e72b2f5c675f927d04545dc5095f9e5998f171, which you could cherry-pick onto 1.0.0. If you'd rather just use a released version, 0.10.2 should be fine. Libvirt 0.10.2 appears to have fixed my problems. Booting from RBD works as well as attaching/detaching multiple additional volumes. Thanks so much Josh! -Mike -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Segmentation fault on rbd client ceph version 0.48.2argonaut
Look like the header object on broken images is empty. root@bender:~# rados -p iscsi stat seodo1.rbd iscsi/seodo1.rbd mtime 1354795057, size 0 root@bender:~# rados -p iscsi stat siri.rbd iscsi/siri.rbd mtime 1355151093, size 0 On accessible image header size not empty: root@bender:~# rados -p iscsi stat siri1.rbd iscsi/siri1.rbd mtime 1355174156, size 112 and header can't saved: root@bender:~# rados -p iscsi get seodo1.rbd seodo1.header 2012-12-11 11:34:06.044164 7fe732f52780 0 wrote 0 byte payload to seodo1.header Before this header became unreadable new osd server added and cluster was rebalanced. One of the mon server (mon.0) crushed, and i restart them. 2012/12/11 Josh Durgin josh.dur...@inktank.com: On 12/10/2012 01:54 PM, Vladislav Gorbunov wrote: but access to iscsi/seodo1 and iscsi/siri1 fail on every rbd client hosts. Data completely inaccessible. root@bender:~# rbd info iscsi/seodo1 *** Caught signal (Segmentation fault) ** in thread 7fb8c93f5780 ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe) 1: rbd() [0x41dfea] 2: (()+0xfcb0) [0x7fb8c796fcb0] 3: (()+0x16244d) [0x7fb8c6ae444d] 4: (librbd::read_header_bl(librados::IoCtx, std::string const, ceph::buffer::list, unsigned long*)+0xf9) [0x7fb8c8fadb99] 5: (librbd::read_header(librados::IoCtx, std::string const, rbd_obj_header_ondisk*, unsigned long*)+0x82) [0x7fb8c8fadda2] 6: (librbd::ictx_refresh(librbd::ImageCtx*)+0x90b) [0x7fb8c8fb05eb] 7: (librbd::open_image(librbd::ImageCtx*)+0x1b5) [0x7fb8c8fb1165] 8: (librbd::RBD::open(librados::IoCtx, librbd::Image, char const*, char const*)+0x5f) [0x7fb8c8fb16af] 9: (main()+0x73c) [0x41721c] 10: (__libc_start_main()+0xed) [0x7fb8c69a376d] 11: rbd() [0x41a0c9] 2012-12-11 09:33:14.264755 7fb8c93f5780 -1 *** Caught signal (Segmentation fault) ** in thread 7fb8c93f5780 It sounds like the header object (which rbd uses to determine the prefix for data object names) is corrupted or otherwise inaccessible. Could you save the header object to a file ('rados -p iscsi get seodo1.rbd') and put that file somewhere accessible? Did anything happen to your cluster before this header became unreadable? Any disk problems, or osds crashing? Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mounting fuse from fstab
On Mon, 10 Dec 2012, Sam Lang wrote: On 12/10/2012 10:33 AM, Sage Weil wrote: We put together a simple helper script for mounting ceph-fuse via fstab (below). Some of man pages indicate that the # syntax is deprecated, however, and it's not clear to me that whatever replaces it (mount.fuse) will let us accomplish the same thing (pass something along the mount, control command line options). Also, it's unclear *when* it was deprecated; if we want this to work on, say, RHEL, the replacement might not be there. Would the fuse options fsname and subtype be more portable? Looking at the subtype stuff a bit more, I finally understand.. if you set type to fuse.ceph, it will run /sbin/mount.fuse.ceph with the usual arguments (which include the device name). ...and it appears that that support is present in RHEL6, which is probably the oldest thing we care about. I think that's a better route. Something like: id=user,foo=bar/foofuse.ceph defaults0 0 where the key/value pairs are passed by /sbin/mount.fuse.ceph to ceph-fuse on the command line? sage -sam Anybody know if doing something like the below is a bad idea? Thanks! sage --- #!/bin/sh # # Helper to mount ceph-fuse from /etc/fstab. To use, add an entry # like: # # # DEVICE PATH TYPE OPTIONS # /sbin/ceph-fuse-mount#admin /mnt/cephceph defaults 0 0 # # where 'admin' can be replaced with the client id to use when # authenticating (if it is not client.admin). This will also control # which section of ceph.conf will be applied to the ceph-fuse process. set -e id=$1 shift exec ceph-fuse -i $id $@ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Behavior of librbd::Image::read_iterate() changed?
Thanks for your reply, Dan. It really seems due to new stripping codes. Below are the debug messages of rbd export: root@ceph1:~# rbd -p rbdtest export image1 out1 --debug_rbd 20 --debug_striper 20 2012-12-11 12:06:21.538932 7f50a5fea780 20 librbd: open_image: ictx = 0x27b33a0 name = 'image1' id = '' snap_name = '' 2012-12-11 12:06:21.542949 7f50a5fea780 20 librbd: detect format of image1 : old 2012-12-11 12:06:21.542970 7f50a5fea780 20 librbd: ictx_refresh 0x27b33a0 2012-12-11 12:06:21.547103 7f50a5fea780 10 librbd::ImageCtx: init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304 prefix rb.0.652e.6b8b4567 format rb.0.652e.6b8b4567.%012llx 2012-12-11 12:06:21.550876 7f50a5fea780 20 librbd::ImageCtx: watching header object returned 0 2012-12-11 12:06:21.550916 7f50a5fea780 20 librbd: info 0x27b33a0 2012-12-11 12:06:21.550921 7f50a5fea780 20 librbd: ictx_check 0x27b33a0 2012-12-11 12:06:21.550983 7f50a5fea780 20 librbd: read_iterate 0x27b33a0 off = 0 len = 1073741824 2012-12-11 12:06:21.550988 7f50a5fea780 20 librbd: ictx_check 0x27b33a0 2012-12-11 12:06:21.551006 7f50a5fea780 20 librbd: aio_read 0x27b33a0 completion 0x27b4b10 [0,4194304] 2012-12-11 12:06:21.551009 7f50a5fea780 20 librbd: ictx_check 0x27b33a0 2012-12-11 12:06:21.551017 7f50a5fea780 10 striper file_to_extents 0~4194304 format rb.0.652e.6b8b4567.%012llx 2012-12-11 12:06:21.553971 7f50a5fea780 20 striper su 4194304 sc 1 os 4194304 stripes_per_object 1 2012-12-11 12:06:21.553981 7f50a5fea780 20 striper off 0 blockno 0 stripeno 0 stripepos 0 objectsetno 0 objectno 0 block_start 0 block_off 0 0~4194304 2012-12-11 12:06:21.554146 7f50a5fea780 20 striper added new extent(rb.0.652e.6b8b4567. (0) in @14 0~4194304 - []) 2012-12-11 12:06:21.554176 7f50a5fea780 15 striper file_to_extents extent(rb.0.652e.6b8b4567. (0) in @14 0~4194304 - [0,4194304]) in @14 2012-12-11 12:06:21.554180 7f50a5fea780 20 librbd: oid rb.0.652e.6b8b4567. 0~4194304 from [0,4194304] 2012-12-11 12:06:21.554194 7f50a5fea780 20 librbd::AioRequest: send 0x27b5530 rb.0.652e.6b8b4567. 0~4194304 2012-12-11 12:06:21.555059 7f50a5fea780 20 librbd::AioCompletion: AioCompletion::finish_adding_requests 0x27b4b10 pending 1 2012-12-11 12:06:21.556193 7f50937fe700 20 librbd::AioRequest: should_complete 0x27b5530 rb.0.652e.6b8b4567. 0~4194304 r = -2 2012-12-11 12:06:21.556246 7f50937fe700 10 striper extent_to_file 0 0~4194304 2012-12-11 12:06:21.556248 7f50937fe700 20 striper stripes_per_object 1 2012-12-11 12:06:21.556249 7f50937fe700 20 striper object 0~4194304 - file 0~4194304 2012-12-11 12:06:21.556251 7f50937fe700 10 librbd::ImageCtx: prune_parent_extents image overlap 0, object overlap 0 from image extents [] 2012-12-11 12:06:21.556254 7f50937fe700 10 librbd::AioCompletion: C_AioRead::finish() 0x27b3fc0 r = -2 2012-12-11 12:06:21.556255 7f50937fe700 10 librbd::AioCompletion: got {} for [0,4194304] bl 0 2012-12-11 12:06:21.556264 7f50937fe700 10 striper add_partial_sparse_result(0x27b4bf8) 0 covering {0=0} (offset 0) to [0,4194304] 2012-12-11 12:06:21.556276 7f50937fe700 20 striper t 0~4194304 bl has 0 off 0 2012-12-11 12:06:21.556277 7f50937fe700 20 striper t 0~4194304 bl has 0 off 0 2012-12-11 12:06:21.556278 7f50937fe700 20 striper s at end 2012-12-11 12:06:21.556282 7f50937fe700 20 librbd::AioCompletion: AioCompletion::complete_request() 0x27b4b10 complete_cb=0x7f50a5b7e4c0 pending 1 2012-12-11 12:06:21.556284 7f50937fe700 20 librbd::AioCompletion: AioCompletion::finalize() 0x27b4b10 rval 4194304 read_buf 0 read_bl 0x7fff8f12a9b0 2012-12-11 12:06:21.556285 7f50937fe700 10 striper assemble_result(0x27b4bf8) zero_tail=1 2012-12-11 12:06:21.556290 7f50937fe700 20 striper assemble_result(0x27b4bf8) 0~4194304 0 bytes 2012-12-11 12:06:21.576042 7f50937fe700 20 librbd::AioCompletion: AioCompletion::finalize() moving resulting 4194304 bytes to bl 0x7fff8f12a9b0 writing 4194304 bytes at ofs 0 2012/12/8 Dan Mick dan.m...@inktank.com: I suspect, but have not figured out yet, that this is due to the new striping code (even on images that don't have advanced striping enabled). I know we want to look at it further; it might be that this is a regression. On 12/07/2012 07:44 PM, Henry C Chang wrote: Hi, I am testing v0.55. I noticed that the behavior of librbd::Image::read_iterate() changed. With 0.48.2, when hitting the hole, the callback function will be called with the buf set to NULL. However, with v0.55, I got the zero-ed buffer of full length of the object (e.g., 4MB). Is it the expected behavior or a bug? Henry -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at
Debian packaging question
Hi - I'm looking for advice on debian multiple architecture repositories. To date we have been building ceph debian packages on two different machines for the i386 and amd64 platforms, rsyncing the results to a common directory on the build host, then putting the results together using the reprepro command to push out to ceph.com. As all the packages are architecture=linux-any, the arch is embedded in the file names and we don't have any collisions. The new libcephfs-java, which is architecture=all, ends up being built twice with the same resulting file name, but different checksums depending on where it was built. Not unexpectedly, reprepro complains about this. I know just enough about debian packaging to be a danger to myself and others. I can see how to fix up the checksums after the fact, but what is the right way to fix the problem ? Thanks, Gary -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian packaging question
Hi, On 12/11/2012 01:19 PM, Gary Lowell wrote: Hi - I'm looking for advice on debian multiple architecture repositories. To date we have been building ceph debian packages on two different machines for the i386 and amd64 platforms, rsyncing the results to a common directory on the build host, then putting the results together using the reprepro command to push out to ceph.com. As all the packages are architecture=linux-any, the arch is embedded in the file names and we don't have any collisions. The new libcephfs-java, which is architecture=all, ends up being built twice with the same resulting file name, but different checksums depending on where it was built. Not unexpectedly, reprepro complains about this. I know just enough about debian packaging to be a danger to myself and others. I can see how to fix up the checksums after the fact, but what is the right way to fix the problem ? I assume you are building with dpkg-buildpackage ? The manpage shows: -B Specifies a binary-only build, limited to architecture dependent packages. Passed to dpkg-genchanges. -A Specifies a binary-only build, limited to architecture independent packages. Passed to dpkg-genchanges. So on the i386 and amd64 machines you'd run with -B and sync them to ceph.com On one of the machines you'd also run with -A which should produce the architecture independent packages like libcephfs-java. That's the theory, I haven't tested it :) Wido Thanks, Gary -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian packaging question
On Mon, 10 Dec 2012, Gary Lowell wrote: Hi - I'm looking for advice on debian multiple architecture repositories. To date we have been building ceph debian packages on two different machines for the i386 and amd64 platforms, rsyncing the results to a common directory on the build host, then putting the results together using the reprepro command to push out to ceph.com. As all the packages are architecture=linux-any, the arch is embedded in the file names and we don't have any collisions. The new libcephfs-java, which is architecture=all, ends up being built twice with the same resulting file name, but different checksums depending on where it was built. Not unexpectedly, reprepro complains about this. Can we just ignore the second attempt that fails? Or only try to add the arch=all .dsc once? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RBD using problem
Hi cephers, I know that we can setup cluster addr and public addr to let OSDs listen on different subnet for different purpose, however, I don't have so much public IPs to give all my OSDs an IP per OSD. So my question is, if I follow the link: http://www.spinics.net/lists/ceph-devel/msg10941.html to set all my pools using the ssd-primary rule or something like that, than I bind public IP on every ceph-osd-ssd-server-* (it means all my primary osd have public IP, right?), can this method make me using RBD correctly? Or do we have some method to let public network clients use RBD and no need to let all OSDs have public IP? Regards, Chuanyu. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian packaging question
On Dec 10, 2012, at 9:34 PM, Wido den Hollander wrote: Hi, On 12/11/2012 01:19 PM, Gary Lowell wrote: Hi - I'm looking for advice on debian multiple architecture repositories. To date we have been building ceph debian packages on two different machines for the i386 and amd64 platforms, rsyncing the results to a common directory on the build host, then putting the results together using the reprepro command to push out to ceph.com. As all the packages are architecture=linux-any, the arch is embedded in the file names and we don't have any collisions. The new libcephfs-java, which is architecture=all, ends up being built twice with the same resulting file name, but different checksums depending on where it was built. Not unexpectedly, reprepro complains about this. I know just enough about debian packaging to be a danger to myself and others. I can see how to fix up the checksums after the fact, but what is the right way to fix the problem ? I assume you are building with dpkg-buildpackage ? The manpage shows: -B Specifies a binary-only build, limited to architecture dependent packages. Passed to dpkg-genchanges. -A Specifies a binary-only build, limited to architecture independent packages. Passed to dpkg-genchanges. So on the i386 and amd64 machines you'd run with -B and sync them to ceph.com On one of the machines you'd also run with -A which should produce the architecture independent packages like libcephfs-java. That's the theory, I haven't tested it :) Wido Thanks Wido. We're using pbuilder, but it looks like it has similar options, or can pass an option string to dpkg_buildpackage. I'll do some testing. Cheers, Gary-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Debian packaging question
On Dec 10, 2012, at 9:34 PM, Sage Weil wrote: On Mon, 10 Dec 2012, Gary Lowell wrote: Hi - I'm looking for advice on debian multiple architecture repositories. To date we have been building ceph debian packages on two different machines for the i386 and amd64 platforms, rsyncing the results to a common directory on the build host, then putting the results together using the reprepro command to push out to ceph.com. As all the packages are architecture=linux-any, the arch is embedded in the file names and we don't have any collisions. The new libcephfs-java, which is architecture=all, ends up being built twice with the same resulting file name, but different checksums depending on where it was built. Not unexpectedly, reprepro complains about this. Can we just ignore the second attempt that fails? Or only try to add the arch=all .dsc once? For 0.55 that's pretty much what I did and it still required fixing up the changelog checksums before reprepro would run without error. I was hoping for a cleaner solution.Wido's suggestion looks like it will allow me to build just one version libcephfs-java, which will help, and shouldn't require much change to the build scripts. Cheers, Gary-- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html