Re: MDS placement
On 01/22/2013 10:19 PM, Gandalf Corvotempesta wrote: Where should MDS server be placed, in the cluster network or in the public network with MONs ? It should be located in the public network since clients need to be able to interact with it. Wido -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: /etc/init.d/ceph bug for multi-host when using -a option
On 01/22/2013 11:18 PM, Chen, Xiaoxi wrote: Hi List, Here is part of /etc/init.d/ceph script: case $command in start) # Increase max_open_files, if the configuration calls for it. get_conf max_open_files 8192 max open files if [ $max_open_files != 0 ]; then # Note: Don't try to do math with these numbers, because POSIX shells # can't do 64-bit math (natively). Just treat them as strings. cur=`ulimit -n`xjkk if [ x$max_open_files != x$cur ]; then Line253: ulimit -n $max_open_files fi fi When using with -a option, for **remote** osd , this script also run ulimit on **local**, and results in ulimit -n didn't change in remote nodes. I think the Line 253 shoud use do_cmd instead of run directly,also line 251. I think you're right. I opened http://tracker.newdream.net/issues/3900 to track this. Here is the output of local osd daemon's limits: root@ceph-4:~# cat /proc/13131/limits | grep file Max file size unlimitedunlimitedbytes Max core file sizeunlimitedunlimitedbytes Max open files131072 131072 files Max file locksunlimitedunlimitedlocks Here is a remote one: root@snb-15:~# cat /proc/23709/limits | grep file Max file size unlimitedunlimitedbytes Max core file sizeunlimitedunlimitedbytes Max open files1024 4096 files Max file locksunlimitedunlimitedlocks Xiaoxi -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: MDS placement
On 01/23/2013 10:44 AM, Gandalf Corvotempesta wrote: 2013/1/23 Wido den Hollander w...@widodh.nl: It should be located in the public network since clients need to be able to interact with it. Ok. Cluster network is only needed by OSDs ? No other devices should be able to access it ? Indeed. The cluster network is only used by the OSDs for their replication and heartbeat traffic. No other daemon or client needs access to it. Wido -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
Hi Sage, I think the problem now is just that 'osd target transaction size' is too big (default is 300). Recommended 50.. let's see how that goes. Even smaller (20 or 25) would probably be fine. I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Thanks for all your help! -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://www.mermaidconsulting.com/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On 01/23/2013 01:14 PM, Jens Kristian Søgaard wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is too big (default is 300). Recommended 50.. let's see how that goes. Even smaller (20 or 25) would probably be fine. Going through the code and reading that this solved it for Jens, could this issue be traced back to less powerful CPUs? I've seen this on Atom and Fusion platforms which both don't excel in their computing power. From what I read is that the OSD by default does 300 transactions and then commits them? If the CPU is to slow to handle all the work timeouts can occur because it can't do all the transactions inside the set window? By lowering the number of transactions it sends out a heartbeat more often thus keeping itself alive. Correct? Wido I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Thanks for all your help! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
Hi Wido, Going through the code and reading that this solved it for Jens, could this issue be traced back to less powerful CPUs? Depends on what you mean by less powerful. All my OSD servers are equipped with Xeon E5606 CPUs. That is a quad-core 2.13 Ghz CPU. They are not used for anything else than Ceph. -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://www.mermaidconsulting.com/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Will multi-monitor speed up pg initializing?
On Wed, 23 Jan 2013, Chen, Xiaoxi wrote: Hi list, When first time I start my ceph cluster,it takes more than 15 minutes to get all the pg activeclean. It's fast at first (say 100pg/s) but quite slow when only hundreds of PG left peering. Is it a common situation? Since there is quite a few disk IO and network IO during this time, what stop osd from faster initialize? Can multi-monitor do any help with this? How many PGs and OSDs? Do the OSDs come up at the same time, or is there a small number of OSDs that initially get all PGs? sage
Using a Data Pool
Hello All; I have been trying to associate a directory to a data pool (both called 'Media') according to a previous thread on this list. It all works except the last line: ceph osd pool create Media 500 500 ceph mds add_data_pool 3 added data pool 3 to mdsmap mkdir /mnt/ceph/Media cephfs /mnt/ceph/Media set_layout -p 3 Segmentation fault What am I doing wrong? I'm running 0.56.1 on Ubantu 12.10 Kernel 3.7.1 Thanks -Paul -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Wed, 23 Jan 2013, Wido den Hollander wrote: On 01/23/2013 01:14 PM, Jens Kristian S?gaard wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is too big (default is 300). Recommended 50.. let's see how that goes. Even smaller (20 or 25) would probably be fine. Going through the code and reading that this solved it for Jens, could this issue be traced back to less powerful CPUs? I've seen this on Atom and Fusion platforms which both don't excel in their computing power. From what I read is that the OSD by default does 300 transactions and then commits them? If the CPU is to slow to handle all the work timeouts can occur because it can't do all the transactions inside the set window? By lowering the number of transactions it sends out a heartbeat more often thus keeping itself alive. Correct? In this case, it controls how many operations we stuff into an atomic transaction when doing something big (like deleting an entire PG). The speed is as much about the storage as the CPU, although I'm sure a small CPU helps slow things down. The thread needs to be able to do those N unlinks (or whatever) within the heartbeat interval or else the OSD will consider the thread stuck and zap itself. I think 300 was just a silly initial value... Te default is now either 30 or 50. sage Wido I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Thanks for all your help! -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding Ceph
On Sun, Jan 20, 2013 at 10:39 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 1/19/2013 12:16 PM, Sage Weil wrote: We generally recommend the KVM+librbd route, as it is easier to manage the dependencies, and is well integrated with libvirt. FWIW this is what OpenStack and CloudStack normally use. OK, so is there a quick stat document for that configuration? http://ceph.com/docs/master/rbd/rbd-openstack/ -sam (Oh, and form in my other message is supposed to be from: tyop) Dima -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding Ceph
Dimitri, For what it's worth I also stepped through the process of spinning up Ceph and OpenStack on a single EC2 node in a recent blog entry: http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/ It has some shortcuts (read: not meant to be production) but it may help give you a quicker quickstart. Feel free to shout if you have questions, either here or poke scuttlemonkey on #ceph or twitter. Good luck. Thanks. Best Regards, Patrick On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang sam.l...@inktank.com wrote: On Sun, Jan 20, 2013 at 10:39 AM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 1/19/2013 12:16 PM, Sage Weil wrote: We generally recommend the KVM+librbd route, as it is easier to manage the dependencies, and is well integrated with libvirt. FWIW this is what OpenStack and CloudStack normally use. OK, so is there a quick stat document for that configuration? http://ceph.com/docs/master/rbd/rbd-openstack/ -sam (Oh, and form in my other message is supposed to be from: tyop) Dima -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Patrick McGarry Director, Community Inktank @scuttlemonkey @inktank @ceph -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ceph/osdmap.c: fix undefined behavior when using snprintf()
On 01/22/2013 01:20 PM, Cong Ding wrote: The variable str is used as both the source and destination in function snprintf(), which is undefined behavior based on C11. The original description in C11 is: If copying takes place between objects that overlap, the behavior is undefined. Yes, this was an ill-advised thing to do in this function. In fact, the only place this function is used (in osdmap_show()), the non-static buffer was not initialized before the call. (It might happen to work because the same stack space was getting reused each time through the loop. Ew!) This is just an awful couple of functions. And, the function of ceph_osdmap_state_str() is to return the osdmap state, so it should return doesn't exist when all the conditions are not satisfied. I fix it in this patch. Based on C11, snprintf() does nothing if n==0: If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. so I remove the unnecessary check of len (because it is not a busy path and saves a few lines of code). True. But since you know it's not going to do anything why not only make the call if len is non-zero? I.e.: else if (len) snprintf(str, len, doesn't exist); With your permission I'll make this change and will commit this for you. OK? Signed-off-by: Cong Ding ding...@gmail.com Reviewed-by: Alex Elder el...@inktank.com --- net/ceph/osdmap.c | 27 --- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/net/ceph/osdmap.c b/net/ceph/osdmap.c index de73214..3131a99d3 100644 --- a/net/ceph/osdmap.c +++ b/net/ceph/osdmap.c @@ -13,26 +13,15 @@ char *ceph_osdmap_state_str(char *str, int len, int state) { - int flag = 0; - - if (!len) - goto done; - - *str = '\0'; - if (state) { - if (state CEPH_OSD_EXISTS) { - snprintf(str, len, exists); - flag = 1; - } - if (state CEPH_OSD_UP) { - snprintf(str, len, %s%s%s, str, (flag ? , : ), - up); - flag = 1; - } - } else { + if ((state CEPH_OSD_EXISTS) (state CEPH_OSD_UP)) + snprintf(str, len, exists, up); + else if (state CEPH_OSD_EXISTS) + snprintf(str, len, exists); + else if (state CEPH_OSD_UP) + snprintf(str, len, up); + else snprintf(str, len, doesn't exist); - } -done: + return str; } -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/25] mds: fix end check in Server::handle_client_readdir()
Hi Yan, I pushed this one to next, thanks. BTW what are you using to reproduce this? We'd like to continue to improve the coverage of ceph-qa-suite.git/suites/fs. Thanks! sage On Wed, 23 Jan 2013, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com commit 1174dd3188 (don't retry readdir request after issuing caps) introduced an bug that wrongly marks 'end' in the the readdir reply. The code that touches existing dentries re-uses an iterator, and the iterator is used for checking if readdir is end. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Server.cc | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/mds/Server.cc b/src/mds/Server.cc index b70445e..45eed81 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -2895,11 +2895,9 @@ void Server::handle_client_readdir(MDRequest *mdr) continue; } else { // touch everything i _do_ have - for (it = dir-begin(); - it != dir-end(); - it++) - if (!it-second-get_linkage()-is_null()) - mdcache-lru.lru_touch(it-second); + for (CDir::map_t::iterator p = dir-begin(); p != dir-end(); p++) + if (!p-second-get_linkage()-is_null()) + mdcache-lru.lru_touch(p-second); // already issued caps and leases, reply immediately. if (dnbl.length() 0) { -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ceph/osdmap.c: fix undefined behavior when using snprintf()
On Wed, Jan 23, 2013 at 10:48:07AM -0600, Alex Elder wrote: On 01/22/2013 01:20 PM, Cong Ding wrote: The variable str is used as both the source and destination in function snprintf(), which is undefined behavior based on C11. The original description in C11 is: If copying takes place between objects that overlap, the behavior is undefined. Yes, this was an ill-advised thing to do in this function. In fact, the only place this function is used (in osdmap_show()), the non-static buffer was not initialized before the call. (It might happen to work because the same stack space was getting reused each time through the loop. Ew!) This is just an awful couple of functions. And, the function of ceph_osdmap_state_str() is to return the osdmap state, so it should return doesn't exist when all the conditions are not satisfied. I fix it in this patch. Based on C11, snprintf() does nothing if n==0: If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. so I remove the unnecessary check of len (because it is not a busy path and saves a few lines of code). True. But since you know it's not going to do anything why not only make the call if len is non-zero? I.e.: else if (len) snprintf(str, len, doesn't exist); With your permission I'll make this change and will commit this for you. OK? It's fine, thanks. But I think it's better to check len in the beginning because other conditions also call snprintf with parameter len. Like this: if (!len) return str; if ((state CEPH_OSD_EXISTS) (state CEPH_OSD_UP)) snprintf(str, len, exists, up); else if (state CEPH_OSD_EXISTS) snprintf(str, len, exists); else if (state CEPH_OSD_UP) snprintf(str, len, up); else snprintf(str, len, doesn't exist); return str; or like this: if (len) { if ((state CEPH_OSD_EXISTS) (state CEPH_OSD_UP)) snprintf(str, len, exists, up); else if (state CEPH_OSD_EXISTS) snprintf(str, len, exists); else if (state CEPH_OSD_UP) snprintf(str, len, up); else snprintf(str, len, doesn't exist); } return str; Thanks, - cong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] net/ceph/osdmap.c: fix undefined behavior when using snprintf()
On 01/23/2013 11:41 AM, Cong Ding wrote: On Wed, Jan 23, 2013 at 10:48:07AM -0600, Alex Elder wrote: On 01/22/2013 01:20 PM, Cong Ding wrote: The variable str is used as both the source and destination in function snprintf(), which is undefined behavior based on C11. The original description in C11 is: If copying takes place between objects that overlap, the behavior is undefined. Yes, this was an ill-advised thing to do in this function. In fact, the only place this function is used (in osdmap_show()), the non-static buffer was not initialized before the call. (It might happen to work because the same stack space was getting reused each time through the loop. Ew!) This is just an awful couple of functions. And, the function of ceph_osdmap_state_str() is to return the osdmap state, so it should return doesn't exist when all the conditions are not satisfied. I fix it in this patch. Based on C11, snprintf() does nothing if n==0: If n is zero, nothing is written, and s may be a null pointer. Otherwise, output characters beyond the n-1st are discarded rather than being written to the array, and a null character is written at the end of the characters actually written into the array. so I remove the unnecessary check of len (because it is not a busy path and saves a few lines of code). True. But since you know it's not going to do anything why not only make the call if len is non-zero? I.e.: else if (len) snprintf(str, len, doesn't exist); With your permission I'll make this change and will commit this for you. OK? It's fine, thanks. But I think it's better to check len in the beginning because other conditions also call snprintf with parameter len. Like this: OK. I'll do this. Thank you. -Alex if (!len) return str; if ((state CEPH_OSD_EXISTS) (state CEPH_OSD_UP)) snprintf(str, len, exists, up); else if (state CEPH_OSD_EXISTS) snprintf(str, len, exists); else if (state CEPH_OSD_UP) snprintf(str, len, up); else snprintf(str, len, doesn't exist); return str; or like this: if (len) { if ((state CEPH_OSD_EXISTS) (state CEPH_OSD_UP)) snprintf(str, len, exists, up); else if (state CEPH_OSD_EXISTS) snprintf(str, len, exists); else if (state CEPH_OSD_UP) snprintf(str, len, up); else snprintf(str, len, doesn't exist); } return str; Thanks, - cong -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] Fix some autoconf issues
These patches contains some autoconf fixes/cleanups. Danny Al-Gaaf (2): configure: fix RPM_RELEASE configure: remove -m4_include(m4/acx_pthread.m4) configure.ac | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] configure: fix RPM_RELEASE
Use git to get RPM_RELEASE only if this is a git repo clone and if the git command is available on the system. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- configure.ac | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/configure.ac b/configure.ac index b67e5cd..f87140f 100644 --- a/configure.ac +++ b/configure.ac @@ -12,8 +12,15 @@ AC_PREREQ(2.59) AC_INIT([ceph], [0.56], [ceph-devel@vger.kernel.org]) # Create release string. Used with VERSION for RPMs. +RPM_RELEASE=0 AC_SUBST(RPM_RELEASE) -RPM_RELEASE=`if expr index $(git describe --always) '-' /dev/null ; then git describe --always | cut -d- -f2- | tr '-' '.' ; else echo 0; fi` +if test -d .git ; then + AC_CHECK_PROG(GIT_CHECK, git, yes) + if test x$GIT_CHECK = xyes; then +RPM_RELEASE=`if expr index $(git describe --always) '-' /dev/null ; then git describe --always | cut -d- -f2- | tr '-' '.' ; else echo 0; fi` + fi +fi +AC_MSG_NOTICE([RPM_RELEASE='$RPM_RELEASE']) AC_CONFIG_MACRO_DIR([m4]) -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw
On Wed, Jan 23, 2013 at 9:56 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: I'm trying to configure RadosGW but actually, radosgw doesn't start. I've followed this guide: http://ceph.com/docs/master/radosgw/config/ Apache configuration is OK and i've added the following configuration to my ceph.conf: [client.radosgw.gateway] host = {host-name} keyring = /etc/ceph/keyring.radosgw.gateway rgw socket path = /tmp/radosgw.sock log file = /var/log/ceph/radosgw.log i've copied ceph.conf to all servers, even on radosgw. Then, on radosgw node, i've run: mkdir -p /var/lib/ceph/radosgw/ceph-radosgw.gateway ceph-authtool --create-keyring /etc/ceph/keyring.radosgw.gateway chmod +r /etc/ceph/keyring.radosgw.gateway ceph-authtool /etc/ceph/keyring.radosgw.gateway -n client.radosgw.gateway --gen-key ceph-authtool -n client.radosgw.gateway --cap osd 'allow rwx' --cap mon 'allow r' /etc/ceph/keyring.radosgw.gateway ceph -k /etc/ceph/ceph.keyring auth add client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway but after that, radosgw doesn't start: # radosgw radosgw: must specify 'rgw socket path' to run as a daemon but rgw socket path is already present in ceph.conf (resending to all) try # radosgw -n client radosgw.gateway Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw
On Wed, Jan 23, 2013 at 10:02 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/1/23 Yehuda Sadeh yeh...@inktank.com: try # radosgw -n client radosgw.gateway Still doesn't work. whoops, was missing a period there. # radosgw -n client radosgw.gateway error parsing 'client': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, client # radosgw -n client.radosgw.gateway That's one step further. What does the log show now? Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Consistently reading/writing rados objects via command line
This seems to be working ok for the most part, but I noticed that using large files gives errors getting them (but not putting them). The problems start after 2GB which, as you said, is larger than should be used in this method. It shouldn't affect us since we shouldn't be using this for files that large, but I thought it was worth reporting. This is the test: dd if=/dev/zero of=4.bin bs=1M count=100 export FILE=4.bin rados -p swift_ring ls - rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring rm $FILE.tmp --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring stat $FILE rm -f $FILE.downloaded rados -p swift_ring get $FILE $FILE.downloaded These are the results: dd if=/dev/zero of=4.bin bs=1M count=1000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967088, size 1048576000 rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=2000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967172, size 2097152000 # rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=3000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358966844, size 3145728000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Unknown error 1149239296 dd if=/dev/zero of=4.bin bs=1M count=8000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967388, size 8388608000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Bad address On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Thanks! Is it safe to just apply that last commit to 0.56.1? Also, is the rados command 'clonedata' instead of 'clone'? That's what it looked like in the code. Yep, and yep! s On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Assuming that the clone is atomic so that the client only ever grabbed a complete old or new version of the file, that method really seems ideal. How much work/time would that be? The objects will likely average around 10-20MB, but it's possible that in some cases they may grow to a few hundred MB. You're in luck--my email load was mercifully light this morning. 713 ./rados -p data ls - 714 ./rados put foo.tmp /etc/passwd -p data --object-locator foo 715 ./rados clone foo.tmp foo -p data --object-locator foo 716 ./rados -p data ls - 717 ./rados -p data rm foo.tmp --object-locator foo 718 ./rados -p data ls - 719 ./rados -p data get foo - see wip-rados-clone. sage On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil s...@inktank.com wrote: With a bit of additional support in the rados tool, we could write to object $foo.tmp with key $foo, and then clone it into position and delete the .tmp. If they're really big objects, though, you may also be better off with radosgw, which provides striping and atomicity.. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Installing Rados Gateway from sources
On Wed, Jan 23, 2013 at 11:53 AM, Cesar Mello cme...@gmail.com wrote: Hi, Sorry if this question is too off-topic. I am a Windows guy with no much knowledge about Linux. I have successfully installed a Rados Gateway through apt-get in a virtual machine. Now I would like to build and install from the sources in order to play and debug the latest stuff, in my ubuntu workstation. After building ceph, I run 'make install'. After following the instructions from docs I have 0.56-395-g371e6fb running nicely. Now I'm trying to setup radosgw. But this doesn't work: sudo /etc/init.d/radosgw start You need to run the configure with the '--with-radosgw' option. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Installing Rados Gateway from sources
Thanks so much Yehuda! Best regards! Mello On Wed, Jan 23, 2013 at 6:02 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jan 23, 2013 at 11:53 AM, Cesar Mello cme...@gmail.com wrote: Hi, Sorry if this question is too off-topic. I am a Windows guy with no much knowledge about Linux. I have successfully installed a Rados Gateway through apt-get in a virtual machine. Now I would like to build and install from the sources in order to play and debug the latest stuff, in my ubuntu workstation. After building ceph, I run 'make install'. After following the instructions from docs I have 0.56-395-g371e6fb running nicely. Now I'm trying to setup radosgw. But this doesn't work: sudo /etc/init.d/radosgw start You need to run the configure with the '--with-radosgw' option. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://www.mermaidconsulting.com/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw
Now that I built from the sources I think I got stuck in this too. The logs: root@l3:/etc/ceph# cat /var/log/ceph/radosgw.log 2013-01-23 19:05:42.233438 7ff3dae2c780 0 ceph version 0.56-395-g371e6fb (371e6fbed624ececb385663a59dad907e9153d6a), process radosgw, pid 3811 2013-01-23 19:05:43.937851 7ff3c9ffb700 2 garbage collection: start 2013-01-23 19:06:12.234511 7ff3d2bbe700 -1 Initialization timeout, failed to initialize 2013-01-23 19:06:17.581501 7f7016801780 0 ceph version 0.56-395-g371e6fb (371e6fbed624ececb385663a59dad907e9153d6a), process radosgw, pid 3831 2013-01-23 19:06:17.596145 7f70057fa700 2 garbage collection: start root@l3:/etc/ceph# On Wed, Jan 23, 2013 at 4:06 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jan 23, 2013 at 10:02 AM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/1/23 Yehuda Sadeh yeh...@inktank.com: try # radosgw -n client radosgw.gateway Still doesn't work. whoops, was missing a period there. # radosgw -n client radosgw.gateway error parsing 'client': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, client # radosgw -n client.radosgw.gateway That's one step further. What does the log show now? Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
python examples for librados
I just recently found out that ceph has some python bindings (yay!). I see there are a couple of examples for using the rbd bindings here: http://ceph.com/docs/master/rbd/librbdpy/ But that doesn't really include much about the librados bindings. Are there any examples for that? For example I'm interested in converting the following commands into python: rados -p foo put test.tmp test --object-locator test rados -p foo clonedata test.tmp test --object-locator test rados -p foo rm test.tmp --object-locator test rados -p foo get test test.downloaded -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw
On Wed, Jan 23, 2013 at 1:08 PM, Cesar Mello cme...@gmail.com wrote: Now that I built from the sources I think I got stuck in this too. The logs: root@l3:/etc/ceph# cat /var/log/ceph/radosgw.log 2013-01-23 19:05:42.233438 7ff3dae2c780 0 ceph version 0.56-395-g371e6fb (371e6fbed624ececb385663a59dad907e9153d6a), process radosgw, pid 3811 2013-01-23 19:05:43.937851 7ff3c9ffb700 2 garbage collection: start 2013-01-23 19:06:12.234511 7ff3d2bbe700 -1 Initialization timeout, failed to initialize Either your ceph backend is not completely healthy, or it cannot connect to it. Try add some logging info (debug ms = 1 in your ceph.conf). Also, try to verify that the ceph cluster is healthy (ceph health, ceph -s). Finally make sure that the client.radosgw.gateway user is configured correctly and can access the backend: # rados -n client.radosgw.gateway lspools # rados -n client.radosgw.gateway ls -p name one of the existing pools Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw
On Wed, Jan 23, 2013 at 1:25 PM, Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/1/23 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com: I've solved. Ceph primary keyring should be also placed on radosgw. On radosgw these two files are needed: /etc/ceph/ceph.conf (the same for the whole cluster) /etc/ceph/ceph.keyring (the same for the whole cluster) Now I'm unable to create a rados user: 2013-01-23 22:25:25.679678 7f6e814c2700 0 -- x.y.z.111:0/1006191 x.y.z.102:6801/7493 pipe(0x1819490 sd=4 :0 pgs=0 cs=0 l=1).fault You mean a radosgw user? You need to run radosgw-admin also with '-n client.radosgw.gateway', or have the client.admin key in your keyring. -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian Søgaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? I was able to catch this too on rejoin to very busy cluster and seems I need to lower this value at least at start time. Also c5fe0965572c074a2a33660719ce3222d18c1464 has increased overall time before restarted or new osd will join a cluster, and for 2M objects/3T of replicated data restart of the cluster was took almost a hour before it actually begins to work. The worst thing is that a single osd, if restarted, will mark as up after couple of minutes, then after almost half of hour(eating 100 percent of one cpu, ) as down and then cluster will start to redistribute data after 300s timeout, osd still doing something. -- Jens Kristian Søgaard, Mermaid Consulting ApS, j...@mermaidconsulting.dk, http://www.mermaidconsulting.com/ -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Using a Data Pool
On Wednesday, January 23, 2013 at 5:01 AM, Paul Sherriffs wrote: Hello All; I have been trying to associate a directory to a data pool (both called 'Media') according to a previous thread on this list. It all works except the last line: ceph osd pool create Media 500 500 ceph mds add_data_pool 3 added data pool 3 to mdsmap mkdir /mnt/ceph/Media cephfs /mnt/ceph/Media set_layout -p 3 Segmentation fault cephfs is not a super-friendly tool right now — sorry! :( I believe you will find it works correctly if you specify all the layout parameters, not just one of them. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: some questions about ceph
On Wednesday, January 23, 2013 at 3:35 PM, Yue Li wrote: Hi, i have some questions about ceph. ceph provide a POSIX client for users. for aio-read/write, it still use page cache on client side (seems to me). How long will the page cache expire (in case the data on server side has changed)? The kernel client does this automatically; ceph-fuse currently doesn't do page cache invalidation (so, yes, you can get stale data), but fixing this is in our queue and should be coming pretty soon: http://tracker.newdream.net/issues/2215 if we miss the page cache, we need to fetch data from server side for read accesses, what's the minimum transfer unit between client and OSDs? There is no hard limit, although there will be a practical minimum based on the read ahead and prefetch settings you specify. for write accesses, will the client batch the write request data into units of obj size then transferring to OSDs? It will try to write out what it can, but no — if you aren't doing any syncs yourself, then the client will write out dirty data according to an LRU (in ceph-fuse) or the regular page cache eviction algorithms (for the kernel), aggregating the dirty data it has available. generally what's the minimum transfer unit between client and OSDs? No minimum. How to ensure the consistency for multi-write from clients on the same piece of data or parallel read and write on the same data? If you have multiple clients accessing the same piece of data and at least one is a writer, they will go into a synchronous mode and data access is coordinated and ordered by the MDS. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding Ceph
On 01/23/2013 10:19 AM, Patrick McGarry wrote: http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/ On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang sam.l...@inktank.com wrote: http://ceph.com/docs/master/rbd/rbd-openstack/ These are both great, I'm sure, but Patrick's page says I chose to follow the 5 minute quickstart guide and the rbd-openstack page says Important ... you must have a running Ceph cluster. My problem is I can;t find a 5 minute quickstart guide for RHEL 6. and I didn't get a running ceph cluster by trying to follow the existing (ubuntu) guide and adjust for centos 6.3. So I'm stuck at a point way before those guides become relevant: once I had one OSD/MDS/MON box up, I got HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) (384 appears be the number of placement groups created by default). What does that mean? That I only have one OSD? Or is it genuinely unhealthy? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature
Re: Understanding Ceph
On Jan 23, 2013, at 5:10 PM, Dimitri Maziuk dmaz...@bmrb.wisc.edu wrote: On 01/23/2013 10:19 AM, Patrick McGarry wrote: http://ceph.com/howto/building-a-public-ami-with-ceph-and-openstack/ On Wed, Jan 23, 2013 at 10:13 AM, Sam Lang sam.l...@inktank.com wrote: http://ceph.com/docs/master/rbd/rbd-openstack/ These are both great, I'm sure, but Patrick's page says I chose to follow the 5 minute quickstart guide and the rbd-openstack page says Important ... you must have a running Ceph cluster. My problem is I can;t find a 5 minute quickstart guide for RHEL 6. and I didn't get a running ceph cluster by trying to follow the existing (ubuntu) guide and adjust for centos 6.3. http://ceph.com/docs/master/install/rpm/ http://ceph.com/docs/master/start/quick-start/ Between those two links my own quick-start on CentOS 6.3 was maybe 6 minutes. YMMV. After learning that qemu uses librbd (and thus doesn't rely on the rbd kernel module) I was happy to stick with the stock CentOS kernel for my servers (with updated qemu and libvirt builds). So I'm stuck at a point way before those guides become relevant: once I had one OSD/MDS/MON box up, I got HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) (384 appears be the number of placement groups created by default). What does that mean? That I only have one OSD? Or is it genuinely unhealthy? Assuming you have more than one host, be sure that iptables or another firewall isn't preventing communication between the ceph daemons. JN -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] configure: remove -m4_include(m4/acx_pthread.m4)
Hi Danny - These two patches are now in the wip-rpm-update-2 branch. Will merge into master after build test. Thanks, Gary On Jan 23, 2013, at 9:57 AM, Danny Al-Gaaf wrote: Since we use already AC_CONFIG_MACRO_DIR, no need to include m4/acx_pthread.m4 extra. Signed-off-by: Danny Al-Gaaf danny.al-g...@bisect.de --- configure.ac | 1 - 1 file changed, 1 deletion(-) diff --git a/configure.ac b/configure.ac index f87140f..ffbd150 100644 --- a/configure.ac +++ b/configure.ac @@ -1,6 +1,5 @@ # -*- Autoconf -*- # Process this file with autoconf to produce a configure script. -m4_include(m4/acx_pthread.m4) # Autoconf AC_PREREQ(2.59) -- 1.8.1.1 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
radosgw: MethodNotAllowed response for AWS C# Sample
Hi, When running the PutObject sample from http://ceph.com/docs/master/radosgw/s3/csharp/ I get a MethodNotAllowed response. Please has anyone successfully run this sample? I have tested with a current local build (0.56). Thank you a lot for the attention! Best regards Mello -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw: MethodNotAllowed response for AWS C# Sample
On Wed, Jan 23, 2013 at 4:46 PM, Cesar Mello cme...@gmail.com wrote: Hi, When running the PutObject sample from http://ceph.com/docs/master/radosgw/s3/csharp/ I get a MethodNotAllowed response. Please has anyone successfully run this sample? I have tested with a current local build (0.56). Is there anything in the radosgw log? in the apache access, error logs? It might be that your apache has some other (maybe default) site configured to handle requests, and it doesn't really reach radosgw. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw: MethodNotAllowed response for AWS C# Sample
Re-sending this due to plain text limitations: Are you just testing a build following the 5 min guide? I did this today, and the method not allowed just meant that apache wasn't acutally set up to call rados gateway via rewrite. I verified this by placing an index.html in /var/www and seeing that I was able to see that page. If you're using the quick guide (http://ceph.com/docs/master/start/quick-rgw/), what I did to fix my issue was change a bit in the supplied apache rgw.conf from this: /VirtualHost RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] VirtualHost *:80 to this: #/VirtualHost RewriteEngine On RewriteRule ^/([a-zA-Z0-9-_.]*)([/]?.*) /s3gw.fcgi?page=$1params=$2%{QUERY_STRING} [E=HTTP_AUTHORIZATION:%{HTTP:Authorization},L] #VirtualHost *:80 On Wed, Jan 23, 2013 at 5:51 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jan 23, 2013 at 4:46 PM, Cesar Mello cme...@gmail.com wrote: Hi, When running the PutObject sample from http://ceph.com/docs/master/radosgw/s3/csharp/ I get a MethodNotAllowed response. Please has anyone successfully run this sample? I have tested with a current local build (0.56). Is there anything in the radosgw log? in the apache access, error logs? It might be that your apache has some other (maybe default) site configured to handle requests, and it doesn't really reach radosgw. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw: MethodNotAllowed response for AWS C# Sample
Yes the radosgw log shows the request. The handler-get_op call is returning null at rgw/rgw_main.cc (RGWProcess::handle_request). I've confirmed putting this log: if (!op) { req-log(s, get_op failed!); abort_early(s, -ERR_METHOD_NOT_ALLOWED); goto done; } I'm pasting a copy of the log below. Any suggestion for helping me debug this by myself is appreciated. Thanks for the attention! Best regards Mello 2013-01-24 00:06:32.935275 7f2ec1c28780 20 enqueued request req=0x194c9c0 2013-01-24 00:06:32.935317 7f2ec1c28780 20 RGWWQ: 2013-01-24 00:06:32.935327 7f2ec1c28780 20 req: 0x194c9c0 2013-01-24 00:06:32.935344 7f2ec1c28780 10 allocated request req=0x1951ef0 2013-01-24 00:06:32.935357 7f2e5a79c700 20 dequeued request req=0x194c9c0 2013-01-24 00:06:32.935379 7f2e5a79c700 20 RGWWQ: empty 2013-01-24 00:06:32.935387 7f2e5a79c700 1 == starting new request req=0x194c9c0 = 2013-01-24 00:06:32.935458 7f2e5a79c700 2 req 1:0.71initializing 2013-01-24 00:06:32.935470 7f2e5a79c700 2 req 1:0.84initializing do MELLO 2013-01-24 00:06:32.935500 7f2e5a79c700 10 meta HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935521 7f2e5a79c700 10 x x-amz-date:Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935569 7f2e5a79c700 10 s-object=NULL s-bucket=NULL 2013-01-24 00:06:32.935583 7f2e5a79c700 20 FCGI_ROLE=RESPONDER 2013-01-24 00:06:32.935584 7f2e5a79c700 20 SCRIPT_URL=/ 2013-01-24 00:06:32.935585 7f2e5a79c700 20 SCRIPT_URI=http://my-new-bucket.l3/ 2013-01-24 00:06:32.935586 7f2e5a79c700 20 HTTP_AUTHORIZATION=AWS JJABVJ3AWBS1ZOCML7NS:iASHPmV0rFQH5/zPslZDs4Wa+A8= 2013-01-24 00:06:32.935587 7f2e5a79c700 20 HTTP_USER_AGENT=aws-sdk-dotnet/1.5.10.0 .NET Runtime/4.0 .NET Framework/4.0 OS/6.0.6002.131072 S3Sync 2013-01-24 00:06:32.935590 7f2e5a79c700 20 HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935591 7f2e5a79c700 20 HTTP_HOST=my-new-bucket.l3 2013-01-24 00:06:32.935592 7f2e5a79c700 20 CONTENT_LENGTH=0 2013-01-24 00:06:32.935593 7f2e5a79c700 20 HTTP_CONNECTION=Keep-Alive 2013-01-24 00:06:32.935594 7f2e5a79c700 20 PATH=/usr/local/bin:/usr/bin:/bin 2013-01-24 00:06:32.935595 7f2e5a79c700 20 SERVER_SIGNATURE= 2013-01-24 00:06:32.935596 7f2e5a79c700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) 2013-01-24 00:06:32.935597 7f2e5a79c700 20 SERVER_NAME=my-new-bucket.l3 2013-01-24 00:06:32.935598 7f2e5a79c700 20 SERVER_ADDR=192.168.25.2 2013-01-24 00:06:32.935601 7f2e5a79c700 20 SERVER_PORT=80 2013-01-24 00:06:32.935602 7f2e5a79c700 20 REMOTE_ADDR=192.168.25.3 2013-01-24 00:06:32.935603 7f2e5a79c700 20 DOCUMENT_ROOT=/var/www 2013-01-24 00:06:32.935604 7f2e5a79c700 20 SERVER_ADMIN=cme...@gmail.com 2013-01-24 00:06:32.935605 7f2e5a79c700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi 2013-01-24 00:06:32.935606 7f2e5a79c700 20 REMOTE_PORT=50190 2013-01-24 00:06:32.935606 7f2e5a79c700 20 GATEWAY_INTERFACE=CGI/1.1 2013-01-24 00:06:32.935607 7f2e5a79c700 20 SERVER_PROTOCOL=HTTP/1.1 2013-01-24 00:06:32.935608 7f2e5a79c700 20 REQUEST_METHOD=PUT 2013-01-24 00:06:32.935609 7f2e5a79c700 20 QUERY_STRING=page=params= 2013-01-24 00:06:32.935610 7f2e5a79c700 20 REQUEST_URI=/ 2013-01-24 00:06:32.935611 7f2e5a79c700 20 SCRIPT_NAME=/ 2013-01-24 00:06:32.935619 7f2e5a79c700 2 req 1:0.000233:s3:PUT /::getting op 2013-01-24 00:06:32.935625 7f2e5a79c700 2 req 1:0.000239:s3:PUT /::get_op failed! 2013-01-24 00:06:32.935687 7f2e5a79c700 2 req 1:0.000301:s3:PUT /::http status=405 2013-01-24 00:06:32.935781 7f2e5a79c700 1 == req done req=0x194c9c0 http_status=405 == 2013-01-24 00:06:33.056517 7f2d5e7fc700 1 -- 127.0.0.1:0/1008137 == osd.1 127.0.0.1:6804/5559 49 osd_op_reply(77 gc.31 [call] ack = 0) v4 104+0+0 (4138896711 0 0) 0x7f2d3c000d40 con 0x7f2d48002260 On Wed, Jan 23, 2013 at 10:51 PM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jan 23, 2013 at 4:46 PM, Cesar Mello cme...@gmail.com wrote: Hi, When running the PutObject sample from http://ceph.com/docs/master/radosgw/s3/csharp/ I get a MethodNotAllowed response. Please has anyone successfully run this sample? I have tested with a current local build (0.56). Is there anything in the radosgw log? in the apache access, error logs? It might be that your apache has some other (maybe default) site configured to handle requests, and it doesn't really reach radosgw. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 01/25] mds: fix end check in Server::handle_client_readdir()
On 01/24/2013 01:17 AM, Sage Weil wrote: Hi Yan, I pushed this one to next, thanks. BTW what are you using to reproduce this? We'd like to continue to improve the coverage of ceph-qa-suite.git/suites/fs. My scripts delete the test directory after finishing a round of fsstress testing. I noticed that 'rm' emitted cannot remove directory : Directory not empty errors These errors are tentative, if re-try deleting them, it will succeed. Regards Yan, Zheng Thanks! sage On Wed, 23 Jan 2013, Yan, Zheng wrote: From: Yan, Zheng zheng.z@intel.com commit 1174dd3188 (don't retry readdir request after issuing caps) introduced an bug that wrongly marks 'end' in the the readdir reply. The code that touches existing dentries re-uses an iterator, and the iterator is used for checking if readdir is end. Signed-off-by: Yan, Zheng zheng.z@intel.com --- src/mds/Server.cc | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/src/mds/Server.cc b/src/mds/Server.cc index b70445e..45eed81 100644 --- a/src/mds/Server.cc +++ b/src/mds/Server.cc @@ -2895,11 +2895,9 @@ void Server::handle_client_readdir(MDRequest *mdr) continue; } else { // touch everything i _do_ have -for (it = dir-begin(); - it != dir-end(); - it++) - if (!it-second-get_linkage()-is_null()) -mdcache-lru.lru_touch(it-second); +for (CDir::map_t::iterator p = dir-begin(); p != dir-end(); p++) + if (!p-second-get_linkage()-is_null()) +mdcache-lru.lru_touch(p-second); // already issued caps and leases, reply immediately. if (dnbl.length() 0) { -- 1.7.11.7 -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw: MethodNotAllowed response for AWS C# Sample
On Wed, Jan 23, 2013 at 6:13 PM, Cesar Mello cme...@gmail.com wrote: Yes the radosgw log shows the request. The handler-get_op call is returning null at rgw/rgw_main.cc (RGWProcess::handle_request). I've confirmed putting this log: if (!op) { req-log(s, get_op failed!); abort_early(s, -ERR_METHOD_NOT_ALLOWED); goto done; } I'm pasting a copy of the log below. Any suggestion for helping me debug this by myself is appreciated. Thanks for the attention! Best regards Mello 2013-01-24 00:06:32.935275 7f2ec1c28780 20 enqueued request req=0x194c9c0 2013-01-24 00:06:32.935317 7f2ec1c28780 20 RGWWQ: 2013-01-24 00:06:32.935327 7f2ec1c28780 20 req: 0x194c9c0 2013-01-24 00:06:32.935344 7f2ec1c28780 10 allocated request req=0x1951ef0 2013-01-24 00:06:32.935357 7f2e5a79c700 20 dequeued request req=0x194c9c0 2013-01-24 00:06:32.935379 7f2e5a79c700 20 RGWWQ: empty 2013-01-24 00:06:32.935387 7f2e5a79c700 1 == starting new request req=0x194c9c0 = 2013-01-24 00:06:32.935458 7f2e5a79c700 2 req 1:0.71initializing 2013-01-24 00:06:32.935470 7f2e5a79c700 2 req 1:0.84initializing do MELLO 2013-01-24 00:06:32.935500 7f2e5a79c700 10 meta HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935521 7f2e5a79c700 10 x x-amz-date:Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935569 7f2e5a79c700 10 s-object=NULL s-bucket=NULL 2013-01-24 00:06:32.935583 7f2e5a79c700 20 FCGI_ROLE=RESPONDER 2013-01-24 00:06:32.935584 7f2e5a79c700 20 SCRIPT_URL=/ 2013-01-24 00:06:32.935585 7f2e5a79c700 20 SCRIPT_URI=http://my-new-bucket.l3/ 2013-01-24 00:06:32.935586 7f2e5a79c700 20 HTTP_AUTHORIZATION=AWS JJABVJ3AWBS1ZOCML7NS:iASHPmV0rFQH5/zPslZDs4Wa+A8= 2013-01-24 00:06:32.935587 7f2e5a79c700 20 HTTP_USER_AGENT=aws-sdk-dotnet/1.5.10.0 .NET Runtime/4.0 .NET Framework/4.0 OS/6.0.6002.131072 S3Sync 2013-01-24 00:06:32.935590 7f2e5a79c700 20 HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935591 7f2e5a79c700 20 HTTP_HOST=my-new-bucket.l3 2013-01-24 00:06:32.935592 7f2e5a79c700 20 CONTENT_LENGTH=0 2013-01-24 00:06:32.935593 7f2e5a79c700 20 HTTP_CONNECTION=Keep-Alive 2013-01-24 00:06:32.935594 7f2e5a79c700 20 PATH=/usr/local/bin:/usr/bin:/bin 2013-01-24 00:06:32.935595 7f2e5a79c700 20 SERVER_SIGNATURE= 2013-01-24 00:06:32.935596 7f2e5a79c700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) 2013-01-24 00:06:32.935597 7f2e5a79c700 20 SERVER_NAME=my-new-bucket.l3 2013-01-24 00:06:32.935598 7f2e5a79c700 20 SERVER_ADDR=192.168.25.2 2013-01-24 00:06:32.935601 7f2e5a79c700 20 SERVER_PORT=80 2013-01-24 00:06:32.935602 7f2e5a79c700 20 REMOTE_ADDR=192.168.25.3 2013-01-24 00:06:32.935603 7f2e5a79c700 20 DOCUMENT_ROOT=/var/www 2013-01-24 00:06:32.935604 7f2e5a79c700 20 SERVER_ADMIN=cme...@gmail.com 2013-01-24 00:06:32.935605 7f2e5a79c700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi 2013-01-24 00:06:32.935606 7f2e5a79c700 20 REMOTE_PORT=50190 2013-01-24 00:06:32.935606 7f2e5a79c700 20 GATEWAY_INTERFACE=CGI/1.1 2013-01-24 00:06:32.935607 7f2e5a79c700 20 SERVER_PROTOCOL=HTTP/1.1 2013-01-24 00:06:32.935608 7f2e5a79c700 20 REQUEST_METHOD=PUT 2013-01-24 00:06:32.935609 7f2e5a79c700 20 QUERY_STRING=page=params= 2013-01-24 00:06:32.935610 7f2e5a79c700 20 REQUEST_URI=/ 2013-01-24 00:06:32.935611 7f2e5a79c700 20 SCRIPT_NAME=/ 2013-01-24 00:06:32.935619 7f2e5a79c700 2 req 1:0.000233:s3:PUT /::getting op 2013-01-24 00:06:32.935625 7f2e5a79c700 2 req 1:0.000239:s3:PUT /::get_op failed! 2013-01-24 00:06:32.935687 7f2e5a79c700 2 req 1:0.000301:s3:PUT /::http status=405 2013-01-24 00:06:32.935781 7f2e5a79c700 1 == req done req=0x194c9c0 http_status=405 == 2013-01-24 00:06:33.056517 7f2d5e7fc700 1 -- 127.0.0.1:0/1008137 == osd.1 127.0.0.1:6804/5559 49 osd_op_reply(77 gc.31 [call] ack = 0) v4 104+0+0 (4138896711 0 0) 0x7f2d3c000d40 con 0x7f2d48002260 It's using the bucket virtual subdomain calling convention, but you haven't set up 'rgw dns name'. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: radosgw: MethodNotAllowed response for AWS C# Sample
Oh man now it works perfectly! Thank you so much!!! Just added the line 'rgw dns name=l3' to the [client.radosgw.gateway] section of ceph.conf. Best regards Mello On Thu, Jan 24, 2013 at 12:26 AM, Yehuda Sadeh yeh...@inktank.com wrote: On Wed, Jan 23, 2013 at 6:13 PM, Cesar Mello cme...@gmail.com wrote: Yes the radosgw log shows the request. The handler-get_op call is returning null at rgw/rgw_main.cc (RGWProcess::handle_request). I've confirmed putting this log: if (!op) { req-log(s, get_op failed!); abort_early(s, -ERR_METHOD_NOT_ALLOWED); goto done; } I'm pasting a copy of the log below. Any suggestion for helping me debug this by myself is appreciated. Thanks for the attention! Best regards Mello 2013-01-24 00:06:32.935275 7f2ec1c28780 20 enqueued request req=0x194c9c0 2013-01-24 00:06:32.935317 7f2ec1c28780 20 RGWWQ: 2013-01-24 00:06:32.935327 7f2ec1c28780 20 req: 0x194c9c0 2013-01-24 00:06:32.935344 7f2ec1c28780 10 allocated request req=0x1951ef0 2013-01-24 00:06:32.935357 7f2e5a79c700 20 dequeued request req=0x194c9c0 2013-01-24 00:06:32.935379 7f2e5a79c700 20 RGWWQ: empty 2013-01-24 00:06:32.935387 7f2e5a79c700 1 == starting new request req=0x194c9c0 = 2013-01-24 00:06:32.935458 7f2e5a79c700 2 req 1:0.71initializing 2013-01-24 00:06:32.935470 7f2e5a79c700 2 req 1:0.84initializing do MELLO 2013-01-24 00:06:32.935500 7f2e5a79c700 10 meta HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935521 7f2e5a79c700 10 x x-amz-date:Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935569 7f2e5a79c700 10 s-object=NULL s-bucket=NULL 2013-01-24 00:06:32.935583 7f2e5a79c700 20 FCGI_ROLE=RESPONDER 2013-01-24 00:06:32.935584 7f2e5a79c700 20 SCRIPT_URL=/ 2013-01-24 00:06:32.935585 7f2e5a79c700 20 SCRIPT_URI=http://my-new-bucket.l3/ 2013-01-24 00:06:32.935586 7f2e5a79c700 20 HTTP_AUTHORIZATION=AWS JJABVJ3AWBS1ZOCML7NS:iASHPmV0rFQH5/zPslZDs4Wa+A8= 2013-01-24 00:06:32.935587 7f2e5a79c700 20 HTTP_USER_AGENT=aws-sdk-dotnet/1.5.10.0 .NET Runtime/4.0 .NET Framework/4.0 OS/6.0.6002.131072 S3Sync 2013-01-24 00:06:32.935590 7f2e5a79c700 20 HTTP_X_AMZ_DATE=Thu, 24 Jan 2013 02:06:35 GMT 2013-01-24 00:06:32.935591 7f2e5a79c700 20 HTTP_HOST=my-new-bucket.l3 2013-01-24 00:06:32.935592 7f2e5a79c700 20 CONTENT_LENGTH=0 2013-01-24 00:06:32.935593 7f2e5a79c700 20 HTTP_CONNECTION=Keep-Alive 2013-01-24 00:06:32.935594 7f2e5a79c700 20 PATH=/usr/local/bin:/usr/bin:/bin 2013-01-24 00:06:32.935595 7f2e5a79c700 20 SERVER_SIGNATURE= 2013-01-24 00:06:32.935596 7f2e5a79c700 20 SERVER_SOFTWARE=Apache/2.2.22 (Ubuntu) 2013-01-24 00:06:32.935597 7f2e5a79c700 20 SERVER_NAME=my-new-bucket.l3 2013-01-24 00:06:32.935598 7f2e5a79c700 20 SERVER_ADDR=192.168.25.2 2013-01-24 00:06:32.935601 7f2e5a79c700 20 SERVER_PORT=80 2013-01-24 00:06:32.935602 7f2e5a79c700 20 REMOTE_ADDR=192.168.25.3 2013-01-24 00:06:32.935603 7f2e5a79c700 20 DOCUMENT_ROOT=/var/www 2013-01-24 00:06:32.935604 7f2e5a79c700 20 SERVER_ADMIN=cme...@gmail.com 2013-01-24 00:06:32.935605 7f2e5a79c700 20 SCRIPT_FILENAME=/var/www/s3gw.fcgi 2013-01-24 00:06:32.935606 7f2e5a79c700 20 REMOTE_PORT=50190 2013-01-24 00:06:32.935606 7f2e5a79c700 20 GATEWAY_INTERFACE=CGI/1.1 2013-01-24 00:06:32.935607 7f2e5a79c700 20 SERVER_PROTOCOL=HTTP/1.1 2013-01-24 00:06:32.935608 7f2e5a79c700 20 REQUEST_METHOD=PUT 2013-01-24 00:06:32.935609 7f2e5a79c700 20 QUERY_STRING=page=params= 2013-01-24 00:06:32.935610 7f2e5a79c700 20 REQUEST_URI=/ 2013-01-24 00:06:32.935611 7f2e5a79c700 20 SCRIPT_NAME=/ 2013-01-24 00:06:32.935619 7f2e5a79c700 2 req 1:0.000233:s3:PUT /::getting op 2013-01-24 00:06:32.935625 7f2e5a79c700 2 req 1:0.000239:s3:PUT /::get_op failed! 2013-01-24 00:06:32.935687 7f2e5a79c700 2 req 1:0.000301:s3:PUT /::http status=405 2013-01-24 00:06:32.935781 7f2e5a79c700 1 == req done req=0x194c9c0 http_status=405 == 2013-01-24 00:06:33.056517 7f2d5e7fc700 1 -- 127.0.0.1:0/1008137 == osd.1 127.0.0.1:6804/5559 49 osd_op_reply(77 gc.31 [call] ack = 0) v4 104+0+0 (4138896711 0 0) 0x7f2d3c000d40 con 0x7f2d48002260 It's using the bucket virtual subdomain calling convention, but you haven't set up 'rgw dns name'. Yehuda -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Understanding Ceph
On 01/23/2013 06:17 PM, John Nielsen wrote: ... http://ceph.com/docs/master/install/rpm/ http://ceph.com/docs/master/start/quick-start/ Between those two links my own quick-start on CentOS 6.3 was maybe 6 minutes. YMMV. It does, obviously, since Deploy the configuration ... 2. Execute the following on the Ceph server host cd /etc/ceph sudo mkcephfs -a -c /etc/ceph/ceph.conf -k ceph.keyring was failing here until I booted an elrepo 3.7 kernel with rbd.ko. HEALTH_WARN 384 pgs degraded; 384 pgs stuck unclean; recovery 21/42 degraded (50.000%) What does that mean? That I only have one OSD? Or is it genuinely unhealthy? Assuming you have more than one host ... I just said I have one host. So is that expected when I only have one host? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature
Re: Consistently reading/writing rados objects via command line
Hi Nick- The problem here looks to just be that do_get() in rados.cc isn't making any attempt to read large objects in chunks. I'm not sure where the 2GB limit is, but it well beyond non-optimal before it gets to that point. That function needs to read in chunks of a few MB and keep going until it gets a short read, modulo some extra futzing for stdout. Any takers? :) sage On Wed, 23 Jan 2013, Nick Bartos wrote: This seems to be working ok for the most part, but I noticed that using large files gives errors getting them (but not putting them). The problems start after 2GB which, as you said, is larger than should be used in this method. It shouldn't affect us since we shouldn't be using this for files that large, but I thought it was worth reporting. This is the test: dd if=/dev/zero of=4.bin bs=1M count=100 export FILE=4.bin rados -p swift_ring ls - rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring rm $FILE.tmp --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring stat $FILE rm -f $FILE.downloaded rados -p swift_ring get $FILE $FILE.downloaded These are the results: dd if=/dev/zero of=4.bin bs=1M count=1000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967088, size 1048576000 rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=2000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967172, size 2097152000 # rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=3000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358966844, size 3145728000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Unknown error 1149239296 dd if=/dev/zero of=4.bin bs=1M count=8000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967388, size 8388608000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Bad address On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Thanks! Is it safe to just apply that last commit to 0.56.1? Also, is the rados command 'clonedata' instead of 'clone'? That's what it looked like in the code. Yep, and yep! s On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Assuming that the clone is atomic so that the client only ever grabbed a complete old or new version of the file, that method really seems ideal. How much work/time would that be? The objects will likely average around 10-20MB, but it's possible that in some cases they may grow to a few hundred MB. You're in luck--my email load was mercifully light this morning. 713 ./rados -p data ls - 714 ./rados put foo.tmp /etc/passwd -p data --object-locator foo 715 ./rados clone foo.tmp foo -p data --object-locator foo 716 ./rados -p data ls - 717 ./rados -p data rm foo.tmp --object-locator foo 718 ./rados -p data ls - 719 ./rados -p data get foo - see wip-rados-clone. sage On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil s...@inktank.com wrote: With a bit of additional support in the rados tool, we could write to object $foo.tmp with key $foo, and then clone it into position and delete the .tmp. If they're really big objects, though, you may also be better off with radosgw, which provides striping and atomicity.. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Wed, 23 Jan 2013, Jens Kristian S?gaard wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? Can you share the output from 'thread apply all bt' so we can see what it was doing? thanks! s -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? I was able to catch this too on rejoin to very busy cluster and seems I need to lower this value at least at start time. Also c5fe0965572c074a2a33660719ce3222d18c1464 has increased overall time before restarted or new osd will join a cluster, and for 2M objects/3T of replicated data restart of the cluster was took almost a hour before it actually begins to work. The worst thing is that a single osd, if restarted, will mark as up after couple of minutes, then after almost half of hour(eating 100 percent of one cpu, ) as down and then cluster will start to redistribute data after 300s timeout, osd still doing something. Okay, something is very wrong. Can you reproduce this with a log? Or even a partial log while it is spinning? You can adjust the log level on a running process with ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_osd 20 ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_ms 1 We haven't been able to reproduce this, so I'm very much interested in any light you can shine here. Thanks! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Consistently reading/writing rados objects via command line
Try wip-rados-get On Wed, 23 Jan 2013, Sage Weil wrote: Hi Nick- The problem here looks to just be that do_get() in rados.cc isn't making any attempt to read large objects in chunks. I'm not sure where the 2GB limit is, but it well beyond non-optimal before it gets to that point. That function needs to read in chunks of a few MB and keep going until it gets a short read, modulo some extra futzing for stdout. Any takers? :) sage On Wed, 23 Jan 2013, Nick Bartos wrote: This seems to be working ok for the most part, but I noticed that using large files gives errors getting them (but not putting them). The problems start after 2GB which, as you said, is larger than should be used in this method. It shouldn't affect us since we shouldn't be using this for files that large, but I thought it was worth reporting. This is the test: dd if=/dev/zero of=4.bin bs=1M count=100 export FILE=4.bin rados -p swift_ring ls - rados -p swift_ring put $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring clonedata $FILE.tmp $FILE --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring rm $FILE.tmp --object-locator $FILE rados -p swift_ring ls - rados -p swift_ring stat $FILE rm -f $FILE.downloaded rados -p swift_ring get $FILE $FILE.downloaded These are the results: dd if=/dev/zero of=4.bin bs=1M count=1000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967088, size 1048576000 rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=2000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967172, size 2097152000 # rados -p swift_ring get $FILE $FILE.downloaded ok dd if=/dev/zero of=4.bin bs=1M count=3000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358966844, size 3145728000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Unknown error 1149239296 dd if=/dev/zero of=4.bin bs=1M count=8000: # rados -p swift_ring stat $FILE swift_ring/4.bin mtime 1358967388, size 8388608000 # rados -p swift_ring get $FILE $FILE.downloaded error getting swift_ring/4.bin: Bad address On Tue, Jan 22, 2013 at 12:28 PM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Thanks! Is it safe to just apply that last commit to 0.56.1? Also, is the rados command 'clonedata' instead of 'clone'? That's what it looked like in the code. Yep, and yep! s On Tue, Jan 22, 2013 at 9:27 AM, Sage Weil s...@inktank.com wrote: On Tue, 22 Jan 2013, Nick Bartos wrote: Assuming that the clone is atomic so that the client only ever grabbed a complete old or new version of the file, that method really seems ideal. How much work/time would that be? The objects will likely average around 10-20MB, but it's possible that in some cases they may grow to a few hundred MB. You're in luck--my email load was mercifully light this morning. 713 ./rados -p data ls - 714 ./rados put foo.tmp /etc/passwd -p data --object-locator foo 715 ./rados clone foo.tmp foo -p data --object-locator foo 716 ./rados -p data ls - 717 ./rados -p data rm foo.tmp --object-locator foo 718 ./rados -p data ls - 719 ./rados -p data get foo - see wip-rados-clone. sage On Mon, Jan 21, 2013 at 9:14 PM, Sage Weil s...@inktank.com wrote: With a bit of additional support in the rados tool, we could write to object $foo.tmp with key $foo, and then clone it into position and delete the .tmp. If they're really big objects, though, you may also be better off with radosgw, which provides striping and atomicity.. sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Hit suicide timeout after adding new osd
On Thu, Jan 24, 2013 at 8:39 AM, Sage Weil s...@inktank.com wrote: On Thu, 24 Jan 2013, Andrey Korolyov wrote: On Thu, Jan 24, 2013 at 12:59 AM, Jens Kristian S?gaard j...@mermaidconsulting.dk wrote: Hi Sage, I think the problem now is just that 'osd target transaction size' is I set it to 50, and that seems to have solved all my problems. After a day or so my cluster got to a HEALTH_OK state again. It has been running for a few days now without any crashes! Hmm, one of the OSDs crashed again, sadly. It logs: -2 2013-01-23 18:01:23.563624 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had timed out after 60 -1 2013-01-23 18:01:23.563657 7f67524da700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7f673affd700' had suicide timed out after 180 0 2013-01-23 18:01:24.257996 7f67524da700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f67524da700 time 2013-01-23 18:01:23.563677 common/HeartbeatMap.cc: 78: FAILED assert(0 == hit suicide timeout) With this stack trace: ceph version 0.56.1-26-g3bd8f6b (3bd8f6b7235eb14cab778e3c6dcdc636aff4f539) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x2eb) [0x846ecb] 2: (ceph::HeartbeatMap::is_healthy()+0x8e) [0x8476ae] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0x8478d8] 4: (CephContextServiceThread::entry()+0x55) [0x8e0f45] 5: /lib64/libpthread.so.0() [0x3cbc807d14] 6: (clone()+0x6d) [0x3cbc0f167d] I have saved the core file, if there's anything in there you need? Or do you think I just need to set the target transaction size even lower than 50? I was able to catch this too on rejoin to very busy cluster and seems I need to lower this value at least at start time. Also c5fe0965572c074a2a33660719ce3222d18c1464 has increased overall time before restarted or new osd will join a cluster, and for 2M objects/3T of replicated data restart of the cluster was took almost a hour before it actually begins to work. The worst thing is that a single osd, if restarted, will mark as up after couple of minutes, then after almost half of hour(eating 100 percent of one cpu, ) as down and then cluster will start to redistribute data after 300s timeout, osd still doing something. Okay, something is very wrong. Can you reproduce this with a log? Or even a partial log while it is spinning? You can adjust the log level on a running process with ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_osd 20 ceph --admin-daemon /var/run/ceph-osd.NN.asok config set debug_ms 1 We haven't been able to reproduce this, so I'm very much interested in any light you can shine here. Unfortunately cluster finally hit ``suicide timeout'' by every osd, so there was no logs, only some backtraces[1]. Yesterday after an osd was not able to join cluster in a hour, I decided to wait until data is remapped, then tried to restart cluster, leaving it overnight, to morning all osd processes are dead, with the same backtraces. Before it, after a silly node crash(related to deadlocks in kernel kvm code), some pgs remains to stay in peering state without any blocker in json output, so I had decided to restart osd to which primary copy belongs, because it helped before. So most interesting part is missing, but I`ll reformat cluster soon and will try to catch this again after filling some data in. [1]. http://xdel.ru/downloads/ceph-log/osd-heartbeat/ Thanks! sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html