Re: [ceph-users] Ceph Supermicro hardware recommendation
I ended up destroying the EC pool and starting over. It was killing all of my OSD machines, and I couldn't keep anything working right with EC in use. So, no core dumps and I'm not in a place to reproduce easily anymore. This was with Giant on Ubuntu 14.04. On Thu Feb 12 2015 at 7:07:38 AM Mark Nelson mnel...@redhat.com wrote: On 02/08/2015 10:41 PM, Scott Laird wrote: Does anyone have a good recommendation for per-OSD memory for EC? My EC test blew up in my face when my OSDs suddenly spiked to 10+ GB per OSD process as soon as any reconstruction was needed. Which (of course) caused OSDs to OOM, which meant more reconstruction, which fairly immediately led to a dead cluster. This was with Giant. Is this typical? Doh, that shouldn't happen. Can you reproduce it? Would be especially nice if we could get a core dump or if you could make it happen under valgrind. If the CPUs are spinning, even a perf report might prove useful. On Fri Feb 06 2015 at 2:41:50 AM Mohamed Pakkeer mdfakk...@gmail.com mailto:mdfakk...@gmail.com wrote: Hi all, We are building EC cluster with cache tier for CephFS. We are planning to use the following 1U chassis along with Intel SSD DC S3700 for cache tier. It has 10 * 2.5 slots. Could you recommend a suitable Intel processor and amount of RAM to cater 10 * SSDs?. http://www.supermicro.com/products/system/1U/1028/SYS-1028R-WTRT.cfm Regards K.Mohamed Pakkeer On Fri, Feb 6, 2015 at 2:57 PM, Stephan Seitz s.se...@heinlein-support.de mailto:s.se...@heinlein-support.de wrote: Hi, Am Dienstag, den 03.02.2015, 15:16 + schrieb Colombo Marco: Hi all, I have to build a new Ceph storage cluster, after i‘ve read the hardware recommendations and some mail from this mailing list i would like to buy these servers: just FYI: SuperMicro already focuses on ceph with a productline: http://www.supermicro.com/solutions/datasheet_Ceph.pdf http://www.supermicro.com/solutions/storage_ceph.cfm regards, Stephan Seitz -- Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-44 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Thanks Regards K.Mohamed Pakkeer Mobile- 0091-8754410114 _ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/__listinfo.cgi/ceph-users-ceph.__com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
Hello Yehuda, Thanks for your response! This is my RGW configuration: https://gist.github.com/anonymous/c0f62783feac88e069c7 https://gist.github.com/anonymous/c0f62783feac88e069c7 and This is Tengine configuration: https://gist.github.com/anonymous/90b77c168ed0606db03d https://gist.github.com/anonymous/90b77c168ed0606db03d Please let me know if you need something else? Best! On Feb 14, 2015, at 6:22 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: - Original Message - From: B L super.itera...@gmail.com To: ceph-users@lists.ceph.com Sent: Friday, February 13, 2015 11:55:22 PM Subject: [ceph-users] Having problem to start Radosgw Hi all, I’m having a problem to start radosgw, giving me error that I can’t diagnose: $ radosgw -c ceph.conf -d 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned ret=-2 ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown since it complains that this file exists: /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this error: $ radosgw -c ceph.conf -d 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9 Error 9 is EBADF (bad file number). Looks like there's an issue with the socket created for the fastcgi communication. How did you configure it? Yehuda 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned ret=-2 Cant somebody help me where to start fixing this? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] CRUSHMAP for chassis balance
Hi Gregory, Thanks for the direction that I finish with 3 different rule in a ruleset for different rep size: Tested no bad-mapping and host / osd are correctly balanced between 2 chassis. Not sure if it can be optimized but I am happy with current result: rule rule_rep2 { ruleset 0 type replicated min_size 2 max_size 2 step take chassis1 step chooseleaf firstn 1 type host step emit step take chassis2 step chooseleaf firstn 1 type host step emit } rule rule_rep34 { ruleset 0 type replicated min_size 3 max_size 4 step take default step choose firstn 2 type chassis step chooseleaf firstn 2 type host step emit } rule rule_rep56 { ruleset 0 type replicated min_size 5 max_size 6 step take default step choose firstn 3 type chassis step chooseleaf firstn 3 type host step emit } Luke From: Gregory Farnum [mailto:g...@gregs42.com] Sent: Friday, February 13, 2015 11:01 PM To: Luke Kao; ceph-users@lists.ceph.com Subject: Re: [ceph-users] CRUSHMAP for chassis balance With sufficiently new CRUSH versions (all the latest point releases on LTS?) I think you can simply have the rule return extra IDs which are dropped if they exceed the number required. So you can choose two chassis, then have those both choose to lead OSDs, and return those 4 from the rule. -Greg On Fri, Feb 13, 2015 at 6:13 AM Luke Kao luke@mycom-osi.commailto:luke@mycom-osi.com wrote: Dear cepher, Currently I am working on crushmap to try to make sure the at least one copy are going to different chassis. Say chassis1 has host1,host2,host3, and chassis2 has host4,host5,host6. With replication =2, it’s not a problem, I can use the following step in rule step take chasses1 step chooseleaf firstn 1 type host step emit step take chasses2 step chooseleaf firstn 1 type host step emit But for replication=3, I tried step take chasses1 step chooseleaf firstn 1 type host step emit step take chasses2 step chooseleaf firstn 1 type host step emit step take default step chooseleaf firstn 1 type host step emit At the end, the 3rd osd returned in rule test is always duplicate with first 1 or first 2. Any idea or what’s the direction to move forward? Thanks in advance BR, Luke MYCOM-OSI This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately. ___ ceph-users mailing list ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This electronic message contains information from Mycom which may be privileged or confidential. The information is intended to be for the use of the individual(s) or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or any other use of the contents of this information is prohibited. If you have received this electronic message in error, please notify us by post or telephone (to the numbers or correspondence address above) or by email (at the email address above) immediately. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
- Original Message - From: B L super.itera...@gmail.com To: ceph-users@lists.ceph.com Sent: Friday, February 13, 2015 11:55:22 PM Subject: [ceph-users] Having problem to start Radosgw Hi all, I’m having a problem to start radosgw, giving me error that I can’t diagnose: $ radosgw -c ceph.conf -d 2015-02-14 07:46:58.435802 7f9d739557c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27609 2015-02-14 07:46:58.437284 7f9d739557c0 -1 asok(0x7f9d74da80a0) AdminSocketConfigObs::init: failed: AdminSocket::bind_and_listen: failed to bind the UNIX domain socket to '/var/run/ceph/ceph-client.admin.asok': (17) File exists 2015-02-14 07:46:58.499004 7f9d739557c0 0 framework: fastcgi 2015-02-14 07:46:58.499016 7f9d739557c0 0 starting handler: fastcgi 2015-02-14 07:46:58.501160 7f9d477fe700 0 ERROR: FCGX_Accept_r returned -9 2015-02-14 07:46:58.594271 7f9d648ab700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:46:58.594276 7f9d648ab700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:46:58.594278 7f9d648ab700 0 ERROR: sync_all_users() returned ret=-2 ^C2015-02-14 07:47:29.119185 7f9d47fff700 1 handle_sigterm 2015-02-14 07:47:29.119214 7f9d47fff700 1 handle_sigterm set alarm for 120 2015-02-14 07:47:29.119222 7f9d739557c0 -1 shutting down 2015-02-14 07:47:29.142726 7f9d739557c0 1 final shutdown since it complains that this file exists: /var/run/ceph/ceph-client.admin.asok, I removed it, but now, I get this error: $ radosgw -c ceph.conf -d 2015-02-14 07:47:55.140276 7f31cc0637c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 27741 2015-02-14 07:47:55.201561 7f31cc0637c0 0 framework: fastcgi 2015-02-14 07:47:55.201567 7f31cc0637c0 0 starting handler: fastcgi 2015-02-14 07:47:55.203443 7f319effd700 0 ERROR: FCGX_Accept_r returned -9 Error 9 is EBADF (bad file number). Looks like there's an issue with the socket created for the fastcgi communication. How did you configure it? Yehuda 2015-02-14 07:47:55.304048 7f319700 -1 failed to list objects pool_iterate returned r=-2 2015-02-14 07:47:55.304054 7f319700 0 ERROR: lists_keys_next(): ret=-2 2015-02-14 07:47:55.304060 7f319700 0 ERROR: sync_all_users() returned ret=-2 Cant somebody help me where to start fixing this? Thanks! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
Shall I run it like this: sudo radosgw -c ceph.conf -d strace -F -T -tt -o/tmp/strace.out radosgw -f On Feb 14, 2015, at 6:55 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: strace -F -T -tt -o/tmp/strace.out radosgw -f ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
Hello Yehyda, The strace command you referred to me, shows this: https://gist.github.com/anonymous/8e9f1ced485996a263bb https://gist.github.com/anonymous/8e9f1ced485996a263bb Additionally, I traced this log file: /var/log/radosgw/ceph-client.radosgw.gateway it has the following: 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using default settings. 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider (RADOS) 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using default settings. 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using default settings. 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using default settings. 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using default settings. 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider (RADOS) Turns out to be that the last line of the logs is thrown out by this piece of code in rgw_main.cc: … … FCGX_Init(); RGWStoreManager store_manager; if (!store_manager.init(rados, g_ceph_context)) { derr Couldn't init storage provider (RADOS) dendl; return EIO; } RGWProcess process(g_ceph_context, 20); process.run(); return 0; N.B. you can find it in:(http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc) , 10th line from below. Is that by any means related to the problem? On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
- Original Message - From: B L super.itera...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, February 14, 2015 11:03:42 AM Subject: Re: [ceph-users] Having problem to start Radosgw Hello Yehyda, The strace command you referred to me, shows this: https://gist.github.com/anonymous/8e9f1ced485996a263bb Additionally, I traced this log file: /var/log/radosgw/ceph-client.radosgw.gateway it has the following: 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using default settings. 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider (RADOS) 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using default settings. 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using default settings. 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using default settings. 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using default settings. 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider (RADOS) Turns out to be that the last line of the logs is thrown out by this piece of code in rgw_main.cc: … … FCGX_Init(); RGWStoreManager store_manager; if (!store_manager.init(rados, g_ceph_context)) { derr Couldn't init storage provider (RADOS) dendl; return EIO; } RGWProcess process(g_ceph_context, 20); process.run(); return 0; N.B. you can find it in:( http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc ) , 10th line from below. Is that by any means related to the problem? Not related. This actually means that it couldn't connect to the rados backend, so there's a different issue now. The strace log doesn't provide much with regard to the original issue as it didn't get to that part now. You can try bumping up the debug level (debug rgw = 20, debug ms = 1). I assume that the issue that you're seeing is that the wrong rados user and/or wrong cephx keys are being used. Try to run it again as you do usually, and see what the regular params that are being passed when starting radosgw; use these when running the strace command. Yehuda On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: sudo strace -F -T -tt -o/tmp/strace.out radosgw -c ceph.conf -f ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Having problem to start Radosgw
That’s what I usually do to check if rgw is running with no problems: sudo radosgw -c ceph.conf -d I already pumped up the log level, but I can’t see any change or verbosity level increase of the logs, I still get the same: 2015-02-14 22:27:57.513151 7f26c79d27c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 7924 2015-02-14 22:27:57.573564 7f26c79d27c0 0 framework: fastcgi 2015-02-14 22:27:57.573569 7f26c79d27c0 0 starting handler: fastcgi 2015-02-14 22:27:57.575349 7f269affd700 0 ERROR: FCGX_Accept_r returned -9 2015-02-14 22:27:57.670610 7f269bfff700 0 ERROR: can't read user header: ret=-2 2015-02-14 22:27:57.670613 7f269bfff700 0 ERROR: sync_user() failed, user=cephtest ret=-2 2015-02-14 22:27:57.671382 7f269bfff700 0 ERROR: can't read user header: ret=-2 2015-02-14 22:27:57.671384 7f269bfff700 0 ERROR: sync_user() failed, user=cephtestss ret=-2 ^C2015-02-14 22:28:30.693140 7f269b7fe700 1 handle_sigterm 2015-02-14 22:28:30.693170 7f269b7fe700 1 handle_sigterm set alarm for 120 2015-02-14 22:28:30.693179 7f26c79d27c0 -1 shutting down 2015-02-14 22:28:30.717340 7f26c79d27c0 1 final shutdown Please let me know if I can do something more .. Now I have 2 questions: 1- what RADOS user you refer to? 2- How would I know that I use wrong cephx keys unless I see authentication error or relevant warning? Thanks! Beanos On Feb 14, 2015, at 11:29 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: From: B L super.itera...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, February 14, 2015 11:03:42 AM Subject: Re: [ceph-users] Having problem to start Radosgw Hello Yehyda, The strace command you referred to me, shows this: https://gist.github.com/anonymous/8e9f1ced485996a263bb https://gist.github.com/anonymous/8e9f1ced485996a263bb Additionally, I traced this log file: /var/log/radosgw/ceph-client.radosgw.gateway it has the following: 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using default settings. 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider (RADOS) 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using default settings. 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using default settings. 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using default settings. 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config file, using default settings. 2015-02-14 17:21:00.950916 7ffee3a3b7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3086 2015-02-14 17:21:00.954085 7ffee3a3b7c0 -1 Couldn't init storage provider (RADOS) Turns out to be that the last line of the logs is thrown out by this piece of code in rgw_main.cc: … … FCGX_Init(); RGWStoreManager store_manager; if (!store_manager.init(rados, g_ceph_context)) { derr Couldn't init storage provider (RADOS) dendl; return EIO; } RGWProcess process(g_ceph_context, 20); process.run(); return 0; N.B. you can find it in:(http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc http://workbench.dachary.org/ceph/ceph/raw/8d63e140777bbdd061baa6845d57e6c3cc771f76/src/rgw/rgw_main.cc) , 10th line from below. Is that by any means related to the problem? Not related. This actually means that it couldn't connect to the rados backend, so there's a different issue now. The strace log doesn't provide much with regard to the original issue as it didn't get to that part now. You can try bumping up the debug level (debug rgw = 20, debug ms = 1). I assume that the issue that you're seeing is that the wrong rados user and/or wrong cephx keys are being used. Try to run it again as you do usually, and see what the regular params that are being passed when starting radosgw; use these when running the strace command. Yehuda On Feb 14, 2015, at 7:24 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com mailto:yeh...@redhat.com wrote: sudo
Re: [ceph-users] Having problem to start Radosgw
Hello Yehuda, this is the resulting output after adding “-n client.radosgw.gateway” : https://gist.github.com/anonymous/f16701d6cacc8911620f https://gist.github.com/anonymous/f16701d6cacc8911620f I can see one problem only in the above output: -1 Couldn't init storage provider (RADOS) .. please check the output, probably you can find something useful On Feb 15, 2015, at 1:28 AM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: add the '-n client.radosgw.gateway' param when you're running the gateway, all your settings are under that user. Yehuda - Original Message - From: B L super.itera...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, February 14, 2015 2:56:54 PM Subject: Re: [ceph-users] Having problem to start Radosgw Yehuda .. In case you will need to know more about my system Here is my full cluster configuration: https://gist.github.com/anonymous/fb4c314320d7df75569a And, that’s my ceph cluster status: $ ceph -s cluster 17bea68b-1634-4cd1-8b2a-00a60ef4761d health HEALTH_WARN 203 pgs degraded; 203 pgs stuck unclean; recovery 6/151 objects degraded (3.974%) monmap e1: 1 mons at {ceph-node1=172.31.0.84:6789/0}, election epoch 2, quorum 0 ceph-node1 osdmap e93: 6 osds: 6 up, 6 in pgmap v3676: 1920 pgs, 16 pools, 10241 kB data, 51 objects 279 MB used, 18086 MB / 18365 MB avail 6/151 objects degraded (3.974%) 203 active+degraded 1717 active+clean It was fully healthy before adding the radosgw pools .. yet, I still can put objects to the cluster (without using RGW) Best! On Feb 15, 2015, at 12:39 AM, B L super.itera...@gmail.com wrote: That’s what I usually do to check if rgw is running with no problems: sudo radosgw -c ceph.conf -d I already pumped up the log level, but I can’t see any change or verbosity level increase of the logs, I still get the same: 2015-02-14 22:27:57.513151 7f26c79d27c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 7924 2015-02-14 22:27:57.573564 7f26c79d27c0 0 framework: fastcgi 2015-02-14 22:27:57.573569 7f26c79d27c0 0 starting handler: fastcgi 2015-02-14 22:27:57.575349 7f269affd700 0 ERROR: FCGX_Accept_r returned -9 2015-02-14 22:27:57.670610 7f269bfff700 0 ERROR: can't read user header: ret=-2 2015-02-14 22:27:57.670613 7f269bfff700 0 ERROR: sync_user() failed, user=cephtest ret=-2 2015-02-14 22:27:57.671382 7f269bfff700 0 ERROR: can't read user header: ret=-2 2015-02-14 22:27:57.671384 7f269bfff700 0 ERROR: sync_user() failed, user=cephtestss ret=-2 ^C2015-02-14 22:28:30.693140 7f269b7fe700 1 handle_sigterm 2015-02-14 22:28:30.693170 7f269b7fe700 1 handle_sigterm set alarm for 120 2015-02-14 22:28:30.693179 7f26c79d27c0 -1 shutting down 2015-02-14 22:28:30.717340 7f26c79d27c0 1 final shutdown Please let me know if I can do something more .. Now I have 2 questions: 1- what RADOS user you refer to? 2- How would I know that I use wrong cephx keys unless I see authentication error or relevant warning? Thanks! Beanos On Feb 14, 2015, at 11:29 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: From: B L super.itera...@gmail.com To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Saturday, February 14, 2015 11:03:42 AM Subject: Re: [ceph-users] Having problem to start Radosgw Hello Yehyda, The strace command you referred to me, shows this: https://gist.github.com/anonymous/8e9f1ced485996a263bb Additionally, I traced this log file: /var/log/radosgw/ceph-client.radosgw.gateway it has the following: 2015-02-12 18:23:32.247679 7fecca5257c0 -1 did not load config file, using default settings. 2015-02-12 18:23:32.247745 7fecca5257c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20477 2015-02-12 18:23:32.251192 7fecca5257c0 -1 Couldn't init storage provider (RADOS) 2015-02-12 18:23:58.494026 7faab31377c0 -1 did not load config file, using default settings. 2015-02-12 18:23:58.494092 7faab31377c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 20509 2015-02-12 18:23:58.497420 7faab31377c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:03.478688 7f86f09567c0 -1 did not load config file, using default settings. 2015-02-14 17:13:03.478778 7f86f09567c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 2989 2015-02-14 17:13:03.482850 7f86f09567c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:13:29.477530 7ff18226a7c0 -1 did not load config file, using default settings. 2015-02-14 17:13:29.477595 7ff18226a7c0 0 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3), process radosgw, pid 3033 2015-02-14 17:13:29.481173 7ff18226a7c0 -1 Couldn't init storage provider (RADOS) 2015-02-14 17:21:00.950847 7ffee3a3b7c0 -1 did not load config