New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. Defect(s) Reported-by: Coverity Scan Showing 6 of 6 defect(s) ** CID 1019567: Thread deadlock (ORDER_REVERSAL) ** CID 1231681: Thread deadlock (ORDER_REVERSAL) ** CID 1231682: Thread deadlock (ORDER_REVERSAL) ** CID 1231683: Thread deadlock (ORDER_REVERSAL) ** CID 1231684: Thread deadlock (ORDER_REVERSAL) ** CID 1231685: Use after free (USE_AFTER_FREE) *** CID 1019567: Thread deadlock (ORDER_REVERSAL) /osd/OSD.cc: 3689 in OSD::handle_osd_ping(MOSDPing *)() 3683 << ", " << debug_heartbeat_drops_remaining[from] 3684 << " remaining to drop" << dendl; 3685 break; 3686} 3687 } 3688 >>> CID 1019567: Thread deadlock (ORDER_REVERSAL) >>> Calling "is_healthy" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 3689 if (!cct->get_heartbeat_map()->is_healthy()) { 3690dout(10) << "internal heartbeat not healthy, dropping ping request" << dendl; 3691break; 3692 } 3693 3694 Message *r = new MOSDPing(monc->get_fsid(), *** CID 1231681: Thread deadlock (ORDER_REVERSAL) /librados/RadosClient.cc: 111 in librados::RadosClient::lookup_pool(const char *)() 105 int r = wait_for_osdmap(); 106 if (r < 0) { 107 lock.Unlock(); 108 return r; 109 } 110 int64_t ret = osdmap.lookup_pg_pool_name(name); >>> CID 1231681: Thread deadlock (ORDER_REVERSAL) >>> Calling "get_write" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 111 pool_cache_rwl.get_write(); 112 lock.Unlock(); 113 if (ret < 0) { 114 pool_cache_rwl.unlock(); 115 return -ENOENT; 116 } *** CID 1231682: Thread deadlock (ORDER_REVERSAL) /osd/OSD.cc: 2369 in OSD::shutdown()() 2363 service.start_shutdown(); 2364 2365 clear_waiting_sessions(); 2366 2367 // Shutdown PGs 2368 { >>> CID 1231682: Thread deadlock (ORDER_REVERSAL) >>> Calling "RLocker" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 2369 RWLock::RLocker l(pg_map_lock); 2370 for (ceph::unordered_map::iterator p = pg_map.begin(); 2371 p != pg_map.end(); 2372 ++p) { 2373 dout(20) << " kicking pg " << p->first << dendl; 2374 p->second->lock(); *** CID 1231683: Thread deadlock (ORDER_REVERSAL) /client/Client.cc: 372 in Client::init()() 366 client_lock.Unlock(); 367 objecter->init_unlocked(); 368 client_lock.Lock(); 369 370 objecter->init_locked(); 371 >>> CID 1231683: Thread deadlock (ORDER_REVERSAL) >>> Calling "set_want_keys" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 372 monclient->set_want_keys(CEPH_ENTITY_TYPE_MDS | CEPH_ENTITY_TYPE_OSD); 373 monclient->sub_want("mdsmap", 0, 0); 374 monclient->sub_want("osdmap", 0, CEPH_SUBSCRIBE_ONETIME); 375 monclient->renew_subs(); 376 377 // logger *** CID 1231684: Thread deadlock (ORDER_REVERSAL) /osd/OSD.h: 2237 in OSD::RepScrubWQ::_process(MOSDRepScrub *, ThreadPool::TPHandle &)() 2231 ThreadPool::TPHandle ) { 2232 osd->osd_lock.Lock(); 2233 if (osd->is_stopping()) { 2234osd->osd_lock.Unlock(); 2235return; 2236 } >>> CID 1231684: Thread deadlock (ORDER_REVERSAL) >>> Calling "_have_pg" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 2237 if (osd->_have_pg(msg->pgid)) { 2238PG *pg = osd->_lookup_lock_pg(msg->pgid); 2239osd->osd_lock.Unlock(); 2240pg->replica_scrub(msg, handle); 2241msg->put(); 2242pg->unlock(); /osd/OSD.h: 2238 in OSD::RepScrubWQ::_process(MOSDRepScrub *, ThreadPool::TPHandle &)() 2232 osd->osd_lock.Lock(); 2233 if (osd->is_stopping()) { 2234osd->osd_lock.Unlock(); 2235return; 2236 } 2237 if (osd->_have_pg(msg->pgid)) { >>> CID 1231684: Thread deadlock (ORDER_REVERSAL) >>> Calling "_lookup_lock_pg" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 7 / 14). 2238PG *pg =
New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. Defect(s) Reported-by: Coverity Scan Showing 1 of 1 defect(s) ** CID 1230671: Missing unlock (LOCK) /msg/SimpleMessenger.cc: 258 in SimpleMessenger::reaper()() *** CID 1230671: Missing unlock (LOCK) /msg/SimpleMessenger.cc: 258 in SimpleMessenger::reaper()() 252 ::close(p->sd); 253 ldout(cct,10) << "reaper reaped pipe " << p << " " << p->get_peer_addr() << dendl; 254 p->put(); 255 ldout(cct,10) << "reaper deleted pipe " << p << dendl; 256 } 257 ldout(cct,10) << "reaper done" << dendl; >>> CID 1230671: Missing unlock (LOCK) >>> Returning without unlocking "this->lock._m". 258 } 259 260 void SimpleMessenger::queue_reap(Pipe *pipe) 261 { 262 ldout(cct,10) << "queue_reap " << pipe << dendl; 263 lock.Lock(); To view the defects in Coverity Scan visit, http://scan.coverity.com/projects/25?tab=overview To unsubscribe from the email notification for new defects, http://scan5.coverity.com/cgi-bin/unsubscribe.py -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. Defect(s) Reported-by: Coverity Scan Showing 1 of 1 defect(s) ** CID 1241497: Thread deadlock (ORDER_REVERSAL) *** CID 1241497: Thread deadlock (ORDER_REVERSAL) /osdc/Filer.cc: 314 in Filer::_do_purge_range(PurgeRange *, int)() 308 return; 309 } 310 311 int max = 10 - pr->uncommitted; 312 while (pr->num > 0 && max > 0) { 313 object_t oid = file_object_t(pr->ino, pr->first); >>> CID 1241497: Thread deadlock (ORDER_REVERSAL) >>> Calling "get_osdmap_read" acquires lock "RWLock.L" while holding lock >>> "Mutex._m" (count: 15 / 30). 314 const OSDMap *osdmap = objecter->get_osdmap_read(); 315 object_locator_t oloc = osdmap->file_to_object_locator(pr->layout); 316 objecter->put_osdmap_read(); 317 objecter->remove(oid, oloc, pr->snapc, pr->mtime, pr->flags, 318 NULL, new C_PurgeRange(this, pr)); 319 pr->uncommitted++; To view the defects in Coverity Scan visit, http://scan.coverity.com/projects/25?tab=overview To unsubscribe from the email notification for new defects, http://scan5.coverity.com/cgi-bin/unsubscribe.py -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. Defect(s) Reported-by: Coverity Scan Showing 1 of 1 defect(s) ** CID 1243158: Resource leak (RESOURCE_LEAK) /test/librbd/test_librbd.cc: 1370 in LibRBD_ListChildrenTiered_Test::TestBody()() *** CID 1243158: Resource leak (RESOURCE_LEAK) /test/librbd/test_librbd.cc: 1370 in LibRBD_ListChildrenTiered_Test::TestBody()() 1364 1365 int features = RBD_FEATURE_LAYERING; 1366 rbd_image_t parent; 1367 int order = 0; 1368 1369 // make a parent to clone from >>> CID 1243158: Resource leak (RESOURCE_LEAK) >>> Variable "ioctx2" going out of scope leaks the storage it points to. 1370 ASSERT_EQ(0, create_image_full(ioctx1, "parent", 4<<20, , 1371 false, features)); 1372 ASSERT_EQ(0, rbd_open(ioctx1, "parent", , NULL)); 1373 // create a snapshot, reopen as the parent we're interested in 1374 ASSERT_EQ(0, rbd_snap_create(parent, "parent_snap")); 1375 ASSERT_EQ(0, rbd_snap_set(parent, "parent_snap")); To view the defects in Coverity Scan visit, http://scan.coverity.com/projects/25?tab=overview To unsubscribe from the email notification for new defects, http://scan5.coverity.com/cgi-bin/unsubscribe.py -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. 14 new defect(s) introduced to ceph found with Coverity Scan. 4 defect(s), reported by Coverity Scan earlier, were marked fixed in the recent build analyzed by Coverity Scan. New defect(s) Reported-by: Coverity Scan Showing 14 of 14 defect(s) ** CID 1296388: Uninitialized members (UNINIT_CTOR) /librbd/RebuildObjectMapRequest.cc: 35 in librbdC_VerifyObject::C_VerifyObject(librbd::AsyncObjectThrottle &, librbd::ImageCtx *, unsigned long, unsigned long)() *** CID 1296388: Uninitialized members (UNINIT_CTOR) /librbd/RebuildObjectMapRequest.cc: 35 in librbdC_VerifyObject::C_VerifyObject(librbd::AsyncObjectThrottle &, librbd::ImageCtx *, unsigned long, unsigned long)() 29 : C_AsyncObjectThrottle(throttle), m_image_ctx(*image_ctx), 30 m_snap_id(snap_id), m_object_no(object_no), 31 m_oid(m_image_ctx.get_object_name(m_object_no)) 32 { 33 m_io_ctx.dup(m_image_ctx.md_ctx); 34 m_io_ctx.snap_set_read(CEPH_SNAPDIR); >>> CID 1296388: Uninitialized members (UNINIT_CTOR) >>> Non-static class member "m_snap_list_ret" is not initialized in this >>> constructor nor in any functions that it calls. 35 } 36 37 virtual void complete(int r) { 38 if (should_complete(r)) { 39 ldout(m_image_ctx.cct, 20) << m_oid << " C_VerifyObject completed " 40 << dendl; ** CID 1296387:(UNCAUGHT_EXCEPT) /test/system/rados_watch_notify.cc: 59 in main() /test/system/rados_watch_notify.cc: 59 in main() /test/system/rados_watch_notify.cc: 59 in main() /test/system/rados_watch_notify.cc: 59 in main() *** CID 1296387:(UNCAUGHT_EXCEPT) /test/system/rados_watch_notify.cc: 59 in main() 53 54 const char *get_id_str() 55 { 56 return "main"; 57 } 58 >>> CID 1296387:(UNCAUGHT_EXCEPT) >>> In function "main(int, char const **)" an exception of type >>> "ceph::FailedAssertion" is thrown and never caught. 59 int main(int argc, const char **argv) 60 { 61 std::string pool = "foo." + stringify(getpid()); 62 CrossProcessSem *setup_sem = NULL; 63 RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem)); 64 CrossProcessSem *watch_sem = NULL; /test/system/rados_watch_notify.cc: 59 in main() 53 54 const char *get_id_str() 55 { 56 return "main"; 57 } 58 >>> CID 1296387:(UNCAUGHT_EXCEPT) >>> In function "main(int, char const **)" an exception of type >>> "ceph::FailedAssertion" is thrown and never caught. 59 int main(int argc, const char **argv) 60 { 61 std::string pool = "foo." + stringify(getpid()); 62 CrossProcessSem *setup_sem = NULL; 63 RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem)); 64 CrossProcessSem *watch_sem = NULL; /test/system/rados_watch_notify.cc: 59 in main() 53 54 const char *get_id_str() 55 { 56 return "main"; 57 } 58 >>> CID 1296387:(UNCAUGHT_EXCEPT) >>> In function "main(int, char const **)" an exception of type >>> "ceph::FailedAssertion" is thrown and never caught. 59 int main(int argc, const char **argv) 60 { 61 std::string pool = "foo." + stringify(getpid()); 62 CrossProcessSem *setup_sem = NULL; 63 RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem)); 64 CrossProcessSem *watch_sem = NULL; /test/system/rados_watch_notify.cc: 59 in main() 53 54 const char *get_id_str() 55 { 56 return "main"; 57 } 58 >>> CID 1296387:(UNCAUGHT_EXCEPT) >>> In function "main(int, char const **)" an exception of type >>> "ceph::FailedAssertion" is thrown and never caught. 59 int main(int argc, const char **argv) 60 { 61 std::string pool = "foo." + stringify(getpid()); 62 CrossProcessSem *setup_sem = NULL; 63 RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem)); 64 CrossProcessSem *watch_sem = NULL; ** CID 1296386:(UNCAUGHT_EXCEPT) /test/system/rados_open_pools_parallel.cc: 98 in main() /test/system/rados_open_pools_parallel.cc: 98 in main() *** CID 1296386:(UNCAUGHT_EXCEPT) /test/system/rados_open_pools_parallel.cc: 98 in main() 92 93 const char *get_id_str() 94 { 95 return "main"; 96 } 97 >>> CID 1296386:(UNCAUGHT_EXCEPT) >>> In function "main(int, char const **)" an exception of type >>> "ceph::FailedAssertion" is thrown and never caught. 98 int main(int argc, const char **argv) 99 { 100 // first test: create a pool, shut down the client, access
RE: loadable objectstore
Hi Varada, Have you rebased the pull request to master already? Thanks, James -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari Sent: Friday, September 11, 2015 3:28 AM To: Sage Weil; Matt W. Benjamin; Loic Dachary Cc: ceph-devel Subject: RE: loadable objectstore Hi Sage/ Matt, I have submitted the pull request based on wip-plugin branch for the object store factory implementation at https://github.com/ceph/ceph/pull/5884 . Haven't rebased to the master yet. Working on rebase and including new store in the factory implementation. Please have a look and let me know your comments. Will submit a rebased PR soon with new store integration. Thanks, Varada -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari Sent: Friday, July 03, 2015 7:31 PM To: Sage Weil; Adam Crume Cc: Loic Dachary ; ceph-devel ; Matt W. Benjamin Subject: RE: loadable objectstore Hi All, Not able to make much progress after making common as a shared object along with object store. Compilation of the test binaries are failing with "./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'". CXXLDceph_streamtest ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen' collect2: error: ld returned 1 exit status make[3]: *** [ceph_streamtest] Error 1 But libfilestore.so is linked with lttng-ust. src/.libs$ ldd libceph_filestore.so libceph_keyvaluestore.so.1 => /home/varada/obs-factory/plugin-work/src/.libs/libceph_keyvaluestore.so.1 (0x7f5e50f5) libceph_os.so.1 => /home/varada/obs-factory/plugin-work/src/.libs/libceph_os.so.1 (0x7f5e4f93a000) libcommon.so.1 => /home/varada/ obs-factory/plugin-work/src/.libs/libcommon.so.1 (0x7f5e4b5df000) liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 (0x7f5e4b179000) liblttng-ust-tracepoint.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x7f5e4a021000) liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x7f5e49e1a000) liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x7f5e49c12000) Edited the above output just show the dependencies. Did anyone face this issue before? Any help would be much appreciated. Thanks, Varada -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari Sent: Friday, June 26, 2015 3:34 PM To: Sage Weil Cc: Loic Dachary; ceph-devel; Matt W. Benjamin Subject: RE: loadable objectstore Hi, Made some more changes to resolve lttng problems at https://github.com/varadakari/ceph/commits/wip-plugin. But couldn’t by pass the issues. Facing some issues like mentioned below. ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen' Compiling with -llttng-ust is not resolving the problem. Seen some threads in devel list before, mentioning this problem. Can anyone take a look and guide me to fix this problem? Haven't made the changes to change the plugin name etc... will be making them as part of cleanup. Thanks, Varada -Original Message- From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari Sent: Monday, June 22, 2015 8:57 PM To: Matt W. Benjamin Cc: Loic Dachary; ceph-devel; Sage Weil Subject: RE: loadable objectstore Hi Matt, Majority of the changes are segregating the files to corresponding shared object and creating a factory object. And the naming is mostly taken from Erasure-coding plugins. Want a good naming convention :-), hence a preliminary review. Do agree, we have lot of loadable interfaces, and I think we are in the way of making them on-demand (if possible) loadable modules. Varada -Original Message- From: Matt W. Benjamin [mailto:m...@cohortfs.com] Sent: Monday, June 22, 2015 8:37 PM To: Varada Kari Cc: Loic Dachary; ceph-devel; Sage Weil Subject: Re: loadable objectstore Hi, It's just aesthetic, but it feels clunky to change the names of well known modules to Plugin--esp. if that generalizes forward to new loadable modules (and we have a lot of loadable interfaces). Matt - "Varada Kari" wrote: > Hi Sage, > > Please find the initial implementation of objects store factory > (initial cut) at > https://github.com/varadakari/ceph/commit/9d5fe2fecf38ba106c7c7b7a3ede > 4f189ec7e1c8 > > This is still work in progress branch. Right now I am facing Lttng > issues, > LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate > registration of tracepoint probes having the same name is not allowed. > > Might be an issue with libcommon inclusion. Trying resolving the issue > now. Seems I need to make libcommon
Re: Adding Data-At-Rest compression support to Ceph
On 25.09.2015 17:14, Sage Weil wrote: On Fri, 25 Sep 2015, Igor Fedotov wrote: Another thing to note is that we don't have the whole object ready for compression. We just have some new data block written(appended) to the object. And we should either compress that block and save mentioned mapping data or decompress the existing object data and do full compression again. And IMO introducing seek points is largely similar to what we were talking about - it requires a sort of offset mapping as well. Probably compression at OSD has some Pros as well. But it wouldn't eliminate the need to "muck with stripe sizes or anything". I think the best option here is going to be to compress the "stripe unit". I.e., if you have a stripe_size of 64K, and are doing k=4 m=2, then the stripe unit is 16K (64/4). Then each shard has an independent unit it can compress/decompress and we don't break the ability to read a small extent by talking to only a single shard. Sage, are you considering compression applied after erasure coding here? Please note that one needs to compress additional 50% of data this way. Generated 'm' chunks need to be processed as well. And you lose an ability to perform recovery on OSD down without applying decompression ( and probably another compression) to remaining shards. Contrary doing compression before EC produces reduced data set for EC ( some CPU cycles saving) and is suitable for recovery procedure not involving additional decompression/compression pair. But I suppose 'stripe unit' from the above wouldn't work in this case - compression entity has to produce blocks having "stripe unit" size. This way you can fit all compressed data into single shard only. But that's hard to achieve Thus as usual we should choose what drawbacks(benefits) are less(more) important here: ability to read small extent from single shard + increased data set for compression vs. ability to omit total decompression on recovery + reduced data set for EC. *Maybe* the shard could compress contiguous stripe units if multiple stripes are written together.. In any case, though, there will some metadata it has to track with the object, because the stripe units are no longer fixed size, and there will be object_size/stripe_size of them. I forget if we are already storing a CRC for each stripe unit or if it is for the entire shard... if it's the former then this won't be a huge change, I think. sage On 24.09.2015 20:53, Samuel Just wrote: The catch is that currently accessing 4k in the middle of a 4MB object does not require reading the whole object, so you'd need some kind of logical offset -> compressed offset mapping. -Sam On Thu, Sep 24, 2015 at 10:36 AM, Robert LeBlancwrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA256 I'm probably missing something, but since we are talking about data at rest, can't we just have the OSD compress the object as it goes to disk? Instead of rbd\udata.1ba49c10d9b00c.6859__head_2AD1002B__11 it would be rbd\udata.1ba49c10d9b00c.6859__head_2AD1002B__11.{gz,xz,bz2,lzo,etc}. Then it seems that you don't have to muck with stripe sizes or anything. For compressible objects they would be less than 4MB, some of theses algorithms already say if it is not compressible enough, just store it. Something like zlib Z_FULL_FLUSH may help provide some seek points within an archive to prevent decompressing the whole object for reads? - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Sep 24, 2015 at 10:25 AM, Igor Fedotov wrote: On 24.09.2015 19:03, Sage Weil wrote: On Thu, 24 Sep 2015, Igor Fedotov wrote: Dynamic stripe sizes are possible but it's a significant change from the way the EC pool currently works. I would make that a separate project (as its useful in its own right) and not complicate the compression situation. Or, if it simplifies the compression approach, then I'd make that change first. sage Just to clarify a bit. What I saw when played with Ceph. Please correct me if I'm wrong.. For low-level RADOS access client data written to EC pool has to be aligned with stripe size . The last block can be unaligned though but no more appends are permitted in this case. Data copied from cache goes in blocks up to 8Mb size. In general case the last block seems to have unaligned size too. EC pool additionally performs alignment of the incoming blocks to stripe bound internally. This way blocks going to EC lib are always aligned. We should probably perform compression prior to this alignment. Thus some dependency on stripe size is present in EC pools but it's not that strict. Thanks, Igor -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -BEGIN PGP SIGNATURE- Version: Mailvelope v1.1.0 Comment:
New Defects reported by Coverity Scan for ceph
Hi, Please find the latest report on new defect(s) introduced to ceph found with Coverity Scan. Defect(s) Reported-by: Coverity Scan Showing 20 of 38 defect(s) ** CID 717233: Uninitialized scalar field (UNINIT_CTOR) /mds/Capability.h: 249 in Capability::Capability(CInode *, unsigned long, client_t)() ** CID 1238869: Value not atomically updated (ATOMICITY) /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() ** CID 1238870: Unchecked return value (CHECKED_RETURN) /test/test_snap_mapper.cc: 562 in MapperVerifier::remove_oid()() ** CID 1238871: Dereference after null check (FORWARD_NULL) /mds/Server.cc: 6988 in Server::do_rename_rollback(ceph::buffer::list &, int, std::tr1::shared_ptr &, bool)() /mds/Server.cc: 7107 in Server::do_rename_rollback(ceph::buffer::list &, int, std::tr1::shared_ptr &, bool)() ** CID 1238872: Unchecked return value (CHECKED_RETURN) /tools/ceph_objectstore_tool.cc: 1284 in do_import_rados(std::basic_string)() ** CID 1238873: Unchecked return value (CHECKED_RETURN) /rbd_replay/Replayer.cc: 154 in rbd_replay::Replayer::run(const std::basic_string &)() ** CID 1238874: Missing unlock (LOCK) /osdc/Objecter.cc: 1855 in Objecter::op_cancel(Objecter::OSDSession *, unsigned long, int)() ** CID 1238875: Unrecoverable parse warning (PARSE_ERROR) /client/Client.cc: 7737 in () ** CID 1238876: Unrecoverable parse warning (PARSE_ERROR) /client/Client.cc: 7735 in () ** CID 1238877: Missing unlock (LOCK) /common/Timer.cc: 240 in RWTimer::shutdown()() ** CID 1238878: Unrecoverable parse warning (PARSE_ERROR) /client/Client.cc: 7734 in () ** CID 1238879: Thread deadlock (ORDER_REVERSAL) ** CID 1238880: Thread deadlock (ORDER_REVERSAL) ** CID 1238881: Thread deadlock (ORDER_REVERSAL) ** CID 1238882: Thread deadlock (ORDER_REVERSAL) ** CID 1238883: Improper use of negative value (NEGATIVE_RETURNS) /mds/MDS.cc: 962 in MDS::handle_mds_map(MMDSMap *)() ** CID 1238884: Unrecoverable parse warning (PARSE_ERROR) /client/Client.cc: 7733 in () ** CID 1238885: Thread deadlock (ORDER_REVERSAL) ** CID 1238886: Thread deadlock (ORDER_REVERSAL) ** CID 1238887: Thread deadlock (ORDER_REVERSAL) *** CID 717233: Uninitialized scalar field (UNINIT_CTOR) /mds/Capability.h: 249 in Capability::Capability(CInode *, unsigned long, client_t)() 243 suppress(0), state(0), 244 client_follows(0), client_xattr_version(0), 245 client_inline_version(0), 246 item_session_caps(this), item_snaprealm_caps(this), item_revoking_caps(this) { 247 g_num_cap++; 248 g_num_capa++; >>> CID 717233: Uninitialized scalar field (UNINIT_CTOR) >>> Non-static class member "num_revoke_warnings" is not initialized in >>> this constructor nor in any functions that it calls. 249 } 250 ~Capability() { 251 g_num_cap--; 252 g_num_caps++; 253 } 254 *** CID 1238869: Value not atomically updated (ATOMICITY) /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() 3049 if (!rwlock.is_wlocked()) { 3050 rwlock.unlock(); 3051 rwlock.get_write(); 3052 } 3053 iter = pool_ops.find(tid); 3054 if (iter != pool_ops.end()) { >>> CID 1238869: Value not atomically updated (ATOMICITY) >>> Using an unreliable value of "op" inside the second locked section. If >>> the data that "op" depends on was changed by another thread, this use might >>> be incorrect. 3055 _finish_pool_op(op); 3056 } 3057 } else { 3058 ldout(cct, 10) << "unknown request " << tid << dendl; 3059 } 3060 rwlock.unlock(); /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() 3049 if (!rwlock.is_wlocked()) { 3050 rwlock.unlock(); 3051 rwlock.get_write(); 3052 } 3053 iter = pool_ops.find(tid); 3054 if (iter != pool_ops.end()) { >>> CID 1238869: Value not atomically updated (ATOMICITY) >>> Using an unreliable value of "op" inside the second locked section. If >>> the data that "op" depends on was changed by another thread, this use might >>> be incorrect. 3055 _finish_pool_op(op); 3056 } 3057 } else { 3058 ldout(cct, 10) << "unknown request " << tid << dendl; 3059 } 3060 rwlock.unlock(); /osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)() 3049 if (!rwlock.is_wlocked()) { 3050
Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON!
Thanks Mark ! >>Again, sorry for the delay on these. No problem, it's already fantasctic that you manage theses meetings each week ! Regards, Alexandre - Mail original - De: "Mark Nelson"À: "aderumier" Cc: "ceph-devel" Envoyé: Lundi 28 Septembre 2015 18:24:22 Objet: Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON! Hi Alexandre, Sorry for the long delay. I think I got through all of them. They should be public now and I've listed them in the etherpad: http://pad.ceph.com/p/performance_weekly Again, sorry for the delay on these. I can't find any way to make bluejeans default to making the meetings public. Mark On 09/23/2015 11:44 AM, Alexandre DERUMIER wrote: > Hi Mark, > > can you post the video records of previous meetings ? > > Thanks > > Alexandre > > > - Mail original - > De: "Mark Nelson" > À: "ceph-devel" > Envoyé: Mercredi 23 Septembre 2015 15:51:21 > Objet: 09/23/2015 Weekly Ceph Performance Meeting IS ON! > > 8AM PST as usual! Discussion topics include an update on transparent > huge pages testing and I think Ben would like to talk a bit about CBT > PRs. Please feel free to add your own! > > Here's the links: > > Etherpad URL: > http://pad.ceph.com/p/performance_weekly > > To join the Meeting: > https://bluejeans.com/268261044 > > To join via Browser: > https://bluejeans.com/268261044/browser > > To join with Lync: > https://bluejeans.com/268261044/lync > > > To join via Room System: > Video Conferencing System: bjn.vc -or- 199.48.152.152 > Meeting ID: 268261044 > > To join via Phone: > 1) Dial: > +1 408 740 7256 > +1 888 240 2560(US Toll Free) > +1 408 317 9253(Alternate Number) > (see all numbers - http://bluejeans.com/numbers) > 2) Enter Conference ID: 268261044 > > Mark > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[puppet] Moving puppet-ceph to the Openstack big tent
Hi, puppet-ceph currently lives in stackforge [1] which is being retired [2]. puppet-ceph is also mirrored on the Ceph Github organization [3]. This version of the puppet-ceph module was created from scratch and not as a fork of the (then) upstream puppet-ceph by Enovance [4]. Today, the version by Enovance is no longer officially maintained since Red Hat has adopted the new release. Being an Openstack project under Stackforge or Openstack brings a lot of benefits but it's not black and white, there are cons too. It provides us with the tools, the processes and the frameworks to review and test each contribution to ensure we ship a module that is stable and is held to the highest standards. But it also means that: - We forego some level of ownership back to the Openstack foundation, it's technical committee and the Puppet Openstack PTL. - puppet-ceph contributors will also be required to sign the Contributors License Agreement and jump through the Gerrit hoops [5] which can make contributing to the project harder. We have put tremendous efforts into creating a quality module and as such it was the first puppet module in the stackforge organization to implement not only unit tests but also integration tests with third party CI. Integration testing for other puppet modules are just now starting to take shape by using the Openstack CI inrastructure. In the context of Openstack, RDO already ships with a mean to install Ceph with this very module and Fuel will be adopting it soon as well. This means the module will benefit from real world experience and improvements by the Openstack community and packagers. This will help further reinforce that not only Ceph is the best unified storage solution for Openstack but that we have means to deploy it in the real world easily. We all know that Ceph is also deployed outside of this context and this is why the core reviewers make sure that contributions remain generic and usable outside of this use case. Today, the core members of the project discussed whether or not we should move puppet-ceph to the Openstack big tent and we had a consensus approving the move. We would also like to hear the thoughts of the community on this topic. Please let us know what you think. Thanks, [1]: https://github.com/stackforge/puppet-ceph [2]: https://review.openstack.org/#/c/192016/ [3]: https://github.com/ceph/puppet-ceph [4]: https://github.com/redhat-cip/puppet-ceph [5]: https://wiki.openstack.org/wiki/How_To_Contribute David Moreau Simard -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Compression implementation options
Hi folks, Here is a brief summary on potential compression implementation options. I think we should choose the desired approach prior to start working on the compression feature. Comments, additions and fixes are welcome. Compression At Client - compression/decompression to be performed at the client level (most preferably - Rados) before sending/after receiving data to/from Ceph. Pros: * Ceph cluster isn’t loaded with additional computation burden. * All Ceph cluster components and data transfers benefit from reduce data volume. * Compression is transparent to Ceph cluster components Cons: * Weak clients can lack CPU resources to handle their traffic. * Any Read/Write access requires at least two sequential requests to Ceph cluster to get data: the first one to retrieve “original to compressed“ offset mapping for desired data block, the second one to get compressed data block. * Random write access handling is tricky (see notes below). Even more requests to the cluster per single user one might be needed in this case. Compression At Replicated Pool - compression to be performed at primary Ceph entities at Replicated Pool level prior to data replication. Pros: * Clients benefit from cluster CPU resources utilization. * Compression for specific data block is performed at a single point only - thus total CPU utilization for Ceph cluster is less. * Underlying Ceph components and data transfers benefit from from reduced data volume. Cons: * Clients that use EC pools directly lack compression unless it’s implemented there too. * In two-tier model data compression at cache tier may be inappropriate due to performance reasons. Compression at cache tier also prevents from cache removal when/if needed. * Random write access handling is tricky (see notes below). Compression At Erasure Coded pool - compression to be performed at primary Ceph entities at EC Pool level prior to Erasure Coding. Pros: * Clients benefit from cluster CPU resources utilization. * Erasure Coding “inflates” processed data block (up to ~50%). Thus doing compression prior to that reduces CPU utilization. * Natural combination with EC means. Compression and EC have similar purposes - save storage space at the cost of CPU usage. One can reuse EC infrastructure and design solutions. * No need for random write access support - EC pools don’t provide that on its own. Thus we can reuse the same approach to resolve the issue when needed. Implementation becomes much easier. * Underlying Ceph components and data transfers benefit from reduced data volume. Cons: * Limited applicability - clients that don’t use EC pools lack compression. Compression At Ceph Filestore entity - compression to be performed by Ceph File Store component prior to saving object data to underlying file system. Pros: *Clients benefit from cluster CPU resources utilization. Cons: * Random write access is tricky (see notes below). * From cluster perspective compression is performed either on each replicated block or on a block “inflated” by erasure coding. Thus total Ceph cluster CPU utilization to perform compression becomes considerably higher ( three times increase for replicated pools and ~50% one for EC pools). * No benefit in reduced data transfers over the net. * Recovery procedure caused by OSD down triggers complete data set decompression and compression when EC pool used. This might considerably increase CPU usage utilization for recovery process. Compression Externally at File System - compression to be performed at File Store node by means of underlying file system. Pros: * Compression is (mostly) transparent to Ceph * Clients benefit from cluster CPU resources utilization. Cons: * File system “lock-in”. One can use BTRFS file system only for now. Its production readiness is questionable. * Limited flexibility - compression is a partition/mount point property. Hard to have better granularity - on per-pool or per-object. No way to disable compression. * From cluster perspective compression is performed either on each replicated block or on a block “inflated” by erasure coding. Thus total Ceph cluster CPU utilization to perform compression becomes considerably higher ( three times increase for replicated pools and ~50% one for EC pools). * No benefit in reduced data transfers over the net. * Recovery procedure caused by OSD down triggers complete data set decompression and compression when EC pool used. This might considerably increase CPU usage utilization for recovery process. Compression Externally at Block Device - compression to be performed at File Store node by means of underlying block device that supports inline data
Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON!
Hi Alexandre, Sorry for the long delay. I think I got through all of them. They should be public now and I've listed them in the etherpad: http://pad.ceph.com/p/performance_weekly Again, sorry for the delay on these. I can't find any way to make bluejeans default to making the meetings public. Mark On 09/23/2015 11:44 AM, Alexandre DERUMIER wrote: Hi Mark, can you post the video records of previous meetings ? Thanks Alexandre - Mail original - De: "Mark Nelson"À: "ceph-devel" Envoyé: Mercredi 23 Septembre 2015 15:51:21 Objet: 09/23/2015 Weekly Ceph Performance Meeting IS ON! 8AM PST as usual! Discussion topics include an update on transparent huge pages testing and I think Ben would like to talk a bit about CBT PRs. Please feel free to add your own! Here's the links: Etherpad URL: http://pad.ceph.com/p/performance_weekly To join the Meeting: https://bluejeans.com/268261044 To join via Browser: https://bluejeans.com/268261044/browser To join with Lync: https://bluejeans.com/268261044/lync To join via Room System: Video Conferencing System: bjn.vc -or- 199.48.152.152 Meeting ID: 268261044 To join via Phone: 1) Dial: +1 408 740 7256 +1 888 240 2560(US Toll Free) +1 408 317 9253(Alternate Number) (see all numbers - http://bluejeans.com/numbers) 2) Enter Conference ID: 268261044 Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs invalidate upcalls
Hi, -- Matt Benjamin Red Hat, Inc. 315 West Huron Street, Suite 140A Ann Arbor, Michigan 48103 http://www.redhat.com/en/technologies/storage tel. 734-761-4689 fax. 734-769-8938 cel. 734-216-5309 - Original Message - > From: "John Spray"> To: "Matt Benjamin" > Cc: "Ceph Development" > Sent: Monday, September 28, 2015 9:01:28 AM > Subject: Re: libcephfs invalidate upcalls > > On Sat, Sep 26, 2015 at 8:03 PM, Matt Benjamin wrote: > > Hi John, > > > > I prototyped an invalidate upcall for libcephfs and the Gasesha Ceph fsal, > > building on the Client invalidation callback registrations. > > > > As you suggested, NFS (or AFS, or DCE) minimally expect a more generic > > "cached vnode may have changed" trigger than the current inode and dentry > > invalidates, so I extended the model slightly to hook cap revocation, > > feedback appreciated. > > In cap_release, we probably need to be a bit more discriminating about > when to drop, e.g. if we've only lost our exclusive write caps, the > rest of our metadata might all still be fine to cache. Is ganesha in > general doing any data caching? I think I had implicitly assumed that > we were only worrying about metadata here but now I realise I never > checked that. Ganesha isn't currently, though it did once, and is likely to again, at some point. The exclusive write cap is in fact something with a direct mapping to NFSv4 delegations, so we do want to be able to trigger a recall, in this case. > > The awkward part is Client::trim_caps. In the Client::trim_caps case, > the lru_is_expirable part won't be true until something has already > been invalidated, so there needs to be an explicit hook there -- > rather than invalidating in response to cap release, we need to > invalidate in order to get ganesha to drop its handle, which will > render something expirable, and finally when we expire it, the cap > gets released. Ok, sure. > > In that case maybe we need a hook in ganesha to say "invalidate > everything you can" so that we don't have to make a very large number > of function calls to invalidate things. In the fuse/kernel case we > can only sometimes invalidate a piece of metadata (e.g. we can't if > its flocked or whatever), so we ask it to invalidate everything. But > perhaps in the NFS case we can always expect our invalidate calls to > be respected, so we could just invalidate a smaller number of things > (the difference between actual cache size and desired)? As you noted above, what we're invalidating a cache entry. With Dan's mdcache work, we might no longer be caching at the Ganesha level, but I didn't assume that here. Matt > > John > > > > > g...@github.com:linuxbox2/ceph.git , branch invalidate > > g...@github.com:linuxbox2/nfs-ganesha.git , branch ceph-invalidates > > > > thanks, > > > > Matt > > > > -- > > Matt Benjamin > > Red Hat, Inc. > > 315 West Huron Street, Suite 140A > > Ann Arbor, Michigan 48103 > > > > http://www.redhat.com/en/technologies/storage > > > > tel. 734-761-4689 > > fax. 734-769-8938 > > cel. 734-216-5309 > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: aarch64 not using crc32?
Hi Sage, HWCAP_CRC32 and others were added to the Kernel with this commit 4bff28ccda2b7a3fbdf8e80aef7a599284681dc6; it looks like this first landed in v3.14. Are you using the stock Kernel on Trusty (v3.13?)? Can you update to a later version for gitbuilder? For regular testing v3.19 (lts-vivid) may be a good choice since this also includes the arm64 CRC32 Kernel module. Thanks, Yazen On Thu, Sep 24, 2015 at 9:35 AM, Pankaj Gargwrote: > Hi Sage, > I actually had the same issue just a couple weeks back. The hardware actually > has the CRC32 capability and we have tested it. This issue lies the toolchain > on the machine. There are several versions of .h files present with different > HWCAP #defines. We need to fix it so that we are using the right one. There > is a version of that file present which defines this capability, but is not > being included. I will look into this issue and let you know. Since my builds > were just being tested on ARM, I hardcoded the presence of CRC. > > Thanks, > -Pankaj > > On Sep 24, 2015 5:17 AM, Sage Weil wrote: >> >> Hi Pankaj, >> >> In order to get the build going on the new trusty gitbuilder I had to make >> this change: >> >> https://github.com/ceph/ceph/commit/3123b2c5d3b72c9d43b10d8f296305d41b68b730 >> >> It was clearly a bug, but what worries me is that the fact that I hit it >> means the HWCAP_CRC32 is not present. Is there a problem with the >> hardware feature detection or is that feature simply missing from the box >> we're using? >> >> Thanks! >> sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
ceph branch status
-- All Branches -- Abhishek Varshney2015-09-22 15:11:25 +0530 hammer-backports Adam C. Emerson 2015-09-14 12:32:18 -0400 wip-cxx11time 2015-09-15 12:09:20 -0400 wip-cxx11concurrency Adam Crume 2014-12-01 20:45:58 -0800 wip-doc-rbd-replay Alfredo Deza 2015-03-23 16:39:48 -0400 wip-11212 Alfredo Deza 2014-07-08 13:58:35 -0400 wip-8679 2014-09-04 13:58:14 -0400 wip-8366 2014-10-13 11:10:10 -0400 wip-9730 Ali Maredia 2015-09-22 15:10:10 -0400 wip-cmake 2015-09-24 12:53:48 -0400 wip-10587-split-servers Boris Ranto 2015-09-04 15:19:11 +0200 wip-bash-completion Dan Mick 2013-07-16 23:00:06 -0700 wip-5634 Danny Al-Gaaf 2015-04-23 16:32:00 +0200 wip-da-SCA-20150421 2015-04-23 17:18:57 +0200 wip-nosetests 2015-04-23 18:20:16 +0200 wip-unify-num_objects_degraded 2015-09-24 14:55:35 +0200 wip-da-SCA-20150910 David Zafman 2014-08-29 10:41:23 -0700 wip-libcommon-rebase 2015-04-24 13:14:23 -0700 wip-cot-giant 2015-08-04 07:39:00 -0700 wip-12577-hammer 2015-09-09 20:52:03 -0700 wip-zafman-testing Dongmao Zhang 2014-11-14 19:14:34 +0800 thesues-master Greg Farnum 2015-04-29 21:44:11 -0700 wip-init-names 2015-07-16 09:28:24 -0700 hammer-12297 2015-09-22 10:35:08 -0700 greg-fs-testing Greg Farnum 2014-10-23 13:33:44 -0700 wip-forward-scrub Guang G Yang 2015-06-26 20:31:44 + wip-ec-readall 2015-07-23 16:13:19 + wip-12316 Guang Yang 2014-08-08 10:41:12 + wip-guangyy-pg-splitting 2014-09-25 00:47:46 + wip-9008 2014-09-30 10:36:39 + guangyy-wip-9614 Haomai Wang 2014-07-27 13:37:49 +0800 wip-flush-set 2015-04-20 00:47:59 +0800 update-organization 2015-07-21 19:33:56 +0800 fio-objectstore 2015-08-26 09:57:27 +0800 wip-recovery-attr Ilya Dryomov 2014-09-05 16:15:10 +0400 wip-rbd-notify-errors Ivo Jimenez 2015-08-24 23:12:45 -0700 hammer-with-new-workunit-for-wip-12551 Jason Dillaman 2015-07-31 13:55:23 -0400 wip-12383-next 2015-08-31 23:17:53 -0400 wip-12698 2015-09-01 10:17:02 -0400 wip-11287 Jenkins 2014-07-29 05:24:39 -0700 wip-nhm-hang 2015-02-02 10:35:28 -0800 wip-sam-v0.92 2015-08-21 12:46:32 -0700 last 2015-09-15 10:23:18 -0700 rhcs-v0.80.8 2015-09-21 16:48:32 -0700 rhcs-v0.94.1-ubuntu Joao Eduardo Luis 2014-09-10 09:39:23 +0100 wip-leveldb-get.dumpling Joao Eduardo Luis 2014-07-22 15:41:42 +0100 wip-leveldb-misc Joao Eduardo Luis 2014-09-02 17:19:52 +0100 wip-leveldb-get 2014-10-17 16:20:11 +0100 wip-paxos-fix 2014-10-21 21:32:46 +0100 wip-9675.dumpling 2015-07-27 21:56:42 +0100 wip-11470.hammer 2015-09-09 15:45:45 +0100 wip-11786.hammer Joao Eduardo Luis 2014-11-17 16:43:53 + wip-mon-osdmap-cleanup 2014-12-15 16:18:56 + wip-giant-mon-backports 2014-12-17 17:13:57 + wip-mon-backports.firefly 2014-12-17 23:15:10 + wip-mon-sync-fix.dumpling 2015-01-07 23:01:00 + wip-mon-blackhole-mlog-0.87.7 2015-01-10 02:40:42 + wip-dho-joao 2015-01-10 02:46:31 + wip-mon-paxos-fix 2015-01-26 13:00:09 + wip-mon-datahealth-fix 2015-02-04 22:36:14 + wip-10643 2015-09-09 15:43:51 +0100 wip-11786.firefly Joao Eduardo Luis 2015-05-27 23:48:45 +0100 wip-mon-scrub 2015-05-29 12:21:43 +0100 wip-11545 2015-06-05 16:12:57 +0100 wip-10507 2015-06-16 14:34:11 +0100 wip-11470 2015-06-25 00:16:41 +0100 wip-10507-2 2015-07-14 16:52:35 +0100 wip-joao-testing 2015-09-08 09:48:41 +0100 wip-leveldb-hang John Spray 2015-08-14 15:50:13 +0100 wip-9663-hashorder 2015-08-25 12:14:40 +0100 wip-scrub-basic 2015-09-01 16:43:40 +0100 wip-12133 2015-09-25 16:12:51 +0100 wip-jcsp-test John Wilkins 2013-07-31 18:00:50 -0700 wip-doc-rados-python-api 2014-07-03 07:31:14 -0700 wip-doc-rgw-federated 2014-11-03 14:04:33 -0800
RE: Very slow recovery/peering with latest master
FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by fdisk) in my environment. But blkid doesn't show the information of disk in external bay (which is connected by a JBOD controller) in my setup. See below, SDB and SDH are SSDs attached to the front panel but the rest osd disks(0-9) are from an external bay. /dev/sdc 976285652 294887592 681398060 31% /var/lib/ceph/mnt/osd-device-0-data /dev/sdd 976285652 269840116 706445536 28% /var/lib/ceph/mnt/osd-device-1-data /dev/sde 976285652 257610832 718674820 27% /var/lib/ceph/mnt/osd-device-2-data /dev/sdf 976285652 293460620 682825032 31% /var/lib/ceph/mnt/osd-device-3-data /dev/sdg 976285652 29100 681841552 31% /var/lib/ceph/mnt/osd-device-4-data /dev/sdi 976285652 288416840 687868812 30% /var/lib/ceph/mnt/osd-device-5-data /dev/sdj 976285652 273090960 703194692 28% /var/lib/ceph/mnt/osd-device-6-data /dev/sdk 976285652 302720828 673564824 32% /var/lib/ceph/mnt/osd-device-7-data /dev/sdl 976285652 268207968 708077684 28% /var/lib/ceph/mnt/osd-device-8-data /dev/sdm 976285652 293316752 682968900 31% /var/lib/ceph/mnt/osd-device-9-data /dev/sdb1 292824376 10629024 282195352 4% /var/lib/ceph/mnt/osd-device-40-data /dev/sdh1 292824376 11413956 281410420 4% /var/lib/ceph/mnt/osd-device-41-data root@osd1:~# blkid /dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs" /dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs" /dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4" /dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4" /dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap" > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Friday, September 25, 2015 12:09 AM > To: Podoski, Igor > Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; > Handzik, Joe > Subject: RE: Very slow recovery/peering with latest master > > Yeah , Igor may be.. > Meanwhile, I am able to get gdb trace of the hang.. > > (gdb) bt > #0 0x7f6f6bf043bd in read () at ../sysdeps/unix/syscall-template.S:81 > #1 0x7f6f6af3b066 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #2 0x7f6f6af43ae2 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #3 0x7f6f6af42788 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #4 0x7f6f6af42a53 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #5 0x7f6f6af3c17b in blkid_do_safeprobe () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #6 0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #7 0x7f6f6af387fb in blkid_get_dev () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #8 0x7f6f6af38acb in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #9 0x7f6f6af3946d in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1 > #10 0x7f6f6af39892 in blkid_probe_all_new () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #11 0x7f6f6af3dc10 in blkid_find_dev_with_tag () from /lib/x86_64- > linux-gnu/libblkid.so.1 > #12 0x7f6f6d3bf923 in get_device_by_uuid (dev_uuid=..., > label=label@entry=0x7f6f6d535fe5 "PARTUUID", > partition=partition@entry=0x7f6f347eb5a0 "", > device=device@entry=0x7f6f347ec5a0 "") > at common/blkdev.cc:193 > #13 0x7f6f6d147de5 in FileStore::collect_metadata (this=0x7f6f68893000, > pm=0x7f6f21419598) at os/FileStore.cc:660 > #14 0x7f6f6cebfa9a in OSD::_collect_metadata > (this=this@entry=0x7f6f6894f000, pm=pm@entry=0x7f6f21419598) at > osd/OSD.cc:4586 > #15 0x7f6f6cec0614 in OSD::_send_boot > (this=this@entry=0x7f6f6894f000) at osd/OSD.cc:4568 > #16 0x7f6f6cec203a in OSD::_maybe_boot (this=0x7f6f6894f000, > oldest=1, newest=100) at osd/OSD.cc:4463 > #17 0x7f6f6cefc5e1 in Context::complete (this=0x7f6f3d3864e0, > r=) at ./include/Context.h:64 > #18 0x7f6f6d2eed08 in Finisher::finisher_thread_entry > (this=0x7ffee7272d70) at common/Finisher.cc:65 > #19 0x7f6f6befd182 in start_thread (arg=0x7f6f347ee700) at > pthread_create.c:312 > #20 0x7f6f6a24347d in clone () > at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 > > > Strace was not helpful much since other threads are not block and keep > printing the futex traces.. > > Thanks & Regards > Somnath > > -Original Message- > From: Podoski, Igor [mailto:igor.podo...@ts.fujitsu.com] > Sent: Wednesday, September 23, 2015 11:33 PM > To: Somnath Roy > Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; > Handzik, Joe > Subject: RE: Very slow recovery/peering with latest master > > > -Original Message- > > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > > ow...@vger.kernel.org] On Behalf Of Sage Weil > > Sent: Thursday, September 24, 2015 3:32 AM > > To: Handzik, Joe > > Cc: Somnath Roy; Samuel Just; Samuel Just
How to make a osd status become down+peering?
HI, Now I am running a cluster on giant0.87 ,After a serials of disks failures, we have to add new osd and set out bad ones, so the rebalance process began unfortunately,one osd was stuck in down+peering ,and cat not recover by itself. >From log , we found that one pg was stuck,because two files on this pg was broken, so the peering process abort. So, I tried ceph osd lost osd.n, but It doesn`t work.At last I deleted this two bad files,and the osd success recoverd. Now I am trying fix this bug, maybe some has already fixed it.So Does anyone know how to solve this problem or encounter it before would be help! -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Very slow recovery/peering with latest master
That's really good info, thanks for tracking that down. Do you expect this to be a common configuration going forward in Ceph deployments? Joe > On Sep 28, 2015, at 3:43 AM, Somnath Roywrote: > > Xiaoxi, > Thanks for giving me some pointers. > Now, with the help of strace I am able to figure out why it is taking so long > in my setup to complete blkid* calls. > In my case, the partitions are showing properly even if it is connected to > JBOD controller. > > root@emsnode10:~/wip-write-path-optimization/src/os# strace -t -o > /root/strace_blkid.txt blkid > /dev/sda1: UUID="d2060642-1af4-424f-9957-6a8dc77ff301" TYPE="ext4" > /dev/sda5: UUID="2a987cc0-e3cd-43d4-99cd-b8d8e58617e7" TYPE="swap" > /dev/sdy2: UUID="0ebd1631-52e7-4dc2-8bff-07102b877bfc" TYPE="xfs" > /dev/sdw2: UUID="29f1203b-6f44-45e3-8f6a-8ad1d392a208" TYPE="xfs" > /dev/sdt2: UUID="94f6bb55-ac61-499c-8552-600581e13dfa" TYPE="xfs" > /dev/sdr2: UUID="b629710e-915d-4c56-b6a5-4782e6d6215d" TYPE="xfs" > /dev/sdv2: UUID="69623b7f-9036-4a35-8298-dc7f5cecdb21" TYPE="xfs" > /dev/sds2: UUID="75d941c5-a85c-4c37-b409-02de34483314" TYPE="xfs" > /dev/sdx: UUID="cc84bc66-208b-4387-8470-071ec71532f2" TYPE="xfs" > /dev/sdu2: UUID="c9817831-8362-48a9-9a6c-920e0f04d029" TYPE="xfs" > > But, it is taking time on the drives those are not reserved for this host. > Basically, I am using 2 heads in front of a JBOF and I am using sg_persist to > reserve the drives between 2 hosts. > Here is the strace output of blkid. > > http://pastebin.com/qz2Z7Phj > > You can see lot of input/output errors on accessing the drives which are not > reserved for this host. > > This is an inefficiency part of blkid* calls (?) since calls like > fdisk/lsscsi are not taking time. > > Regards > Somnath > > > -Original Message- > From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com] > Sent: Monday, September 28, 2015 1:02 AM > To: Somnath Roy; Podoski, Igor > Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; > Handzik, Joe > Subject: RE: Very slow recovery/peering with latest master > > FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by > fdisk) in my environment. > > But blkid doesn't show the information of disk in external bay (which is > connected by a JBOD controller) in my setup. > > See below, SDB and SDH are SSDs attached to the front panel but the rest osd > disks(0-9) are from an external bay. > > /dev/sdc 976285652 294887592 681398060 31% > /var/lib/ceph/mnt/osd-device-0-data > /dev/sdd 976285652 269840116 706445536 28% > /var/lib/ceph/mnt/osd-device-1-data > /dev/sde 976285652 257610832 718674820 27% > /var/lib/ceph/mnt/osd-device-2-data > /dev/sdf 976285652 293460620 682825032 31% > /var/lib/ceph/mnt/osd-device-3-data > /dev/sdg 976285652 29100 681841552 31% > /var/lib/ceph/mnt/osd-device-4-data > /dev/sdi 976285652 288416840 687868812 30% > /var/lib/ceph/mnt/osd-device-5-data > /dev/sdj 976285652 273090960 703194692 28% > /var/lib/ceph/mnt/osd-device-6-data > /dev/sdk 976285652 302720828 673564824 32% > /var/lib/ceph/mnt/osd-device-7-data > /dev/sdl 976285652 268207968 708077684 28% > /var/lib/ceph/mnt/osd-device-8-data > /dev/sdm 976285652 293316752 682968900 31% > /var/lib/ceph/mnt/osd-device-9-data > /dev/sdb1 292824376 10629024 282195352 4% > /var/lib/ceph/mnt/osd-device-40-data > /dev/sdh1 292824376 11413956 281410420 4% > /var/lib/ceph/mnt/osd-device-41-data > > > > root@osd1:~# blkid > /dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs" > /dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs" > /dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4" > /dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4" > /dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap" > > > >> -Original Message- >> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- >> ow...@vger.kernel.org] On Behalf Of Somnath Roy >> Sent: Friday, September 25, 2015 12:09 AM >> To: Podoski, Igor >> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage >> Weil; Handzik, Joe >> Subject: RE: Very slow recovery/peering with latest master >> >> Yeah , Igor may be.. >> Meanwhile, I am able to get gdb trace of the hang.. >> >> (gdb) bt >> #0 0x7f6f6bf043bd in read () at >> ../sysdeps/unix/syscall-template.S:81 >> #1 0x7f6f6af3b066 in ?? () from >> /lib/x86_64-linux-gnu/libblkid.so.1 >> #2 0x7f6f6af43ae2 in ?? () from >> /lib/x86_64-linux-gnu/libblkid.so.1 >> #3 0x7f6f6af42788 in ?? () from >> /lib/x86_64-linux-gnu/libblkid.so.1 >> #4 0x7f6f6af42a53 in ?? () from >> /lib/x86_64-linux-gnu/libblkid.so.1 >> #5 0x7f6f6af3c17b in blkid_do_safeprobe () from >> /lib/x86_64-linux- >> gnu/libblkid.so.1 >> #6 0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux- >> gnu/libblkid.so.1 >> #7
[Hammer Backports] Should rest-bench be removed on hammer ?
Hi, The rest-bench tool has been removed in master through PR #5428 (https://github.com/ceph/ceph/pull/5428). The backport PR #5812 (https://github.com/ceph/ceph/pull/5812) is currently causing failures on the hammer-backports integration branch. These failures can be resolved by either backporting PR #5428 or by adding a hammer-specific commit to PR #5812. How should we proceed here? Thanks Abhishek -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Hammer Backports] Should rest-bench be removed on hammer ?
Hi, On 28/09/2015 12:19, Abhishek Varshney wrote: > Hi, > > The rest-bench tool has been removed in master through PR #5428 > (https://github.com/ceph/ceph/pull/5428). The backport PR #5812 > (https://github.com/ceph/ceph/pull/5812) is currently causing failures > on the hammer-backports integration branch. These failures can be > resolved by either backporting PR #5428 or by adding a hammer-specific > commit to PR #5812. > > How should we proceed here? It looks like rest-bench support was removed because cosbench can replace it. The string cosbench or rest.bench does not show in ceph-qa-suite / ceph master or hammer, which probably means tests using rest-bench are outside of the scope of the ceph project. Deprecating rest-bench from hammer by backporting https://github.com/ceph/ceph/pull/5428 seems sensible. Cheers > > Thanks > Abhishek > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
RE: Very slow recovery/peering with latest master
Xiaoxi, Thanks for giving me some pointers. Now, with the help of strace I am able to figure out why it is taking so long in my setup to complete blkid* calls. In my case, the partitions are showing properly even if it is connected to JBOD controller. root@emsnode10:~/wip-write-path-optimization/src/os# strace -t -o /root/strace_blkid.txt blkid /dev/sda1: UUID="d2060642-1af4-424f-9957-6a8dc77ff301" TYPE="ext4" /dev/sda5: UUID="2a987cc0-e3cd-43d4-99cd-b8d8e58617e7" TYPE="swap" /dev/sdy2: UUID="0ebd1631-52e7-4dc2-8bff-07102b877bfc" TYPE="xfs" /dev/sdw2: UUID="29f1203b-6f44-45e3-8f6a-8ad1d392a208" TYPE="xfs" /dev/sdt2: UUID="94f6bb55-ac61-499c-8552-600581e13dfa" TYPE="xfs" /dev/sdr2: UUID="b629710e-915d-4c56-b6a5-4782e6d6215d" TYPE="xfs" /dev/sdv2: UUID="69623b7f-9036-4a35-8298-dc7f5cecdb21" TYPE="xfs" /dev/sds2: UUID="75d941c5-a85c-4c37-b409-02de34483314" TYPE="xfs" /dev/sdx: UUID="cc84bc66-208b-4387-8470-071ec71532f2" TYPE="xfs" /dev/sdu2: UUID="c9817831-8362-48a9-9a6c-920e0f04d029" TYPE="xfs" But, it is taking time on the drives those are not reserved for this host. Basically, I am using 2 heads in front of a JBOF and I am using sg_persist to reserve the drives between 2 hosts. Here is the strace output of blkid. http://pastebin.com/qz2Z7Phj You can see lot of input/output errors on accessing the drives which are not reserved for this host. This is an inefficiency part of blkid* calls (?) since calls like fdisk/lsscsi are not taking time. Regards Somnath -Original Message- From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com] Sent: Monday, September 28, 2015 1:02 AM To: Somnath Roy; Podoski, Igor Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; Handzik, Joe Subject: RE: Very slow recovery/peering with latest master FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by fdisk) in my environment. But blkid doesn't show the information of disk in external bay (which is connected by a JBOD controller) in my setup. See below, SDB and SDH are SSDs attached to the front panel but the rest osd disks(0-9) are from an external bay. /dev/sdc 976285652 294887592 681398060 31% /var/lib/ceph/mnt/osd-device-0-data /dev/sdd 976285652 269840116 706445536 28% /var/lib/ceph/mnt/osd-device-1-data /dev/sde 976285652 257610832 718674820 27% /var/lib/ceph/mnt/osd-device-2-data /dev/sdf 976285652 293460620 682825032 31% /var/lib/ceph/mnt/osd-device-3-data /dev/sdg 976285652 29100 681841552 31% /var/lib/ceph/mnt/osd-device-4-data /dev/sdi 976285652 288416840 687868812 30% /var/lib/ceph/mnt/osd-device-5-data /dev/sdj 976285652 273090960 703194692 28% /var/lib/ceph/mnt/osd-device-6-data /dev/sdk 976285652 302720828 673564824 32% /var/lib/ceph/mnt/osd-device-7-data /dev/sdl 976285652 268207968 708077684 28% /var/lib/ceph/mnt/osd-device-8-data /dev/sdm 976285652 293316752 682968900 31% /var/lib/ceph/mnt/osd-device-9-data /dev/sdb1 292824376 10629024 282195352 4% /var/lib/ceph/mnt/osd-device-40-data /dev/sdh1 292824376 11413956 281410420 4% /var/lib/ceph/mnt/osd-device-41-data root@osd1:~# blkid /dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs" /dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs" /dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4" /dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4" /dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap" > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Somnath Roy > Sent: Friday, September 25, 2015 12:09 AM > To: Podoski, Igor > Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage > Weil; Handzik, Joe > Subject: RE: Very slow recovery/peering with latest master > > Yeah , Igor may be.. > Meanwhile, I am able to get gdb trace of the hang.. > > (gdb) bt > #0 0x7f6f6bf043bd in read () at > ../sysdeps/unix/syscall-template.S:81 > #1 0x7f6f6af3b066 in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #2 0x7f6f6af43ae2 in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #3 0x7f6f6af42788 in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #4 0x7f6f6af42a53 in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #5 0x7f6f6af3c17b in blkid_do_safeprobe () from > /lib/x86_64-linux- > gnu/libblkid.so.1 > #6 0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #7 0x7f6f6af387fb in blkid_get_dev () from /lib/x86_64-linux- > gnu/libblkid.so.1 > #8 0x7f6f6af38acb in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #9 0x7f6f6af3946d in ?? () from > /lib/x86_64-linux-gnu/libblkid.so.1 > #10 0x7f6f6af39892 in blkid_probe_all_new () from > /lib/x86_64-linux- > gnu/libblkid.so.1 > #11 0x7f6f6af3dc10 in blkid_find_dev_with_tag () from /lib/x86_64- > linux-gnu/libblkid.so.1 >
Re: Teuthology Integration to native openstack
Hi, On 28/09/2015 07:24, Bharath Krishna wrote: > Hi Dachary, > > Thanks for the reply. I am following your blog http://dachary.org/?p=3767 > And the README in > https://github.com/dachary/teuthology/tree/wip-6502-openstack-v2/#openstack > -backend The up to date instructions are at https://github.com/dachary/teuthology/tree/openstack/#openstack-backend (the link you used comes from http://dachary.org/?p=3828 and I just updated it so noone else will be confused). > > I have sourced the openrc file of my Openstack deployment and verified > that clients are working fine. My Openstack deployment has Cinder > integrated with CEPH backend. > > I have cloned and installed teuthology using the below steps: > > $ git clone -b wip-6502-openstack-v2 http://github.com/dachary/teuthology > $ cd teuthology ; ./bootstrap install > $ source virtualenv/bin/activate > > > Then I tried to run a dummy suite as test and I ran into following error: > > Traceback (most recent call last): > File "/root/teuthology/virtualenv/bin/teuthology-openstack", line 9, in > > load_entry_point('teuthology==0.1.0', 'console_scripts', > 'teuthology-openstack')() > File "/root/teuthology/scripts/openstack.py", line 8, in main > teuthology.openstack.main(parse_args(argv), argv) > File "/root/teuthology/teuthology/openstack.py", line 375, in main > return TeuthologyOpenStack(ctx, teuth_config, argv).main() > File "/root/teuthology/teuthology/openstack.py", line 181, in main > self.verify_openstack() > File "/root/teuthology/teuthology/openstack.py", line 270, in > verify_openstack > str(providers)) > Exception: ('OS_AUTH_URL=http://:5000/v2.0', " does is not a > known OpenStack provider (('cloud.ovh.net', 'ovh'), ('control.os1.phx2', > 'redhat'), ('entercloudsuite.com', 'entercloudsuite'))") This limitation was in an earlier implementations and should not be a problem now. Cheers > > > Thank you. > > Regards, > M Bharath Krishna > > On 9/28/15, 1:47 AM, "Loic Dachary"wrote: > >> [moving to ceph-devel] >> >> Hi, >> >> On 27/09/2015 21:20, Bharath Krishna wrote: >>> Hi, >>> >>> We have an openstack deployment in place with CEPH as CINDER backend. >>> >>> We would like to perform functional testing for CEPH and found >>> teuthology as recommended option. >>> >>> Have successfully installed teuthology. Now to integrate it with >>> Openstack, I could see that the possible providers could be either OVH, >>> REDHAT or ENTERCLOUDSITE. >>> >>> Is there any option where in we can source openstack deployment of our >>> own and test CEPH using teuthology? >> >> The documentation mentions these providers because they have been tested. >> But there should be no blocker to run teuthology against a regular >> OpenStack provider. Should you run into troubles, please let me know and >> I'll help. >> >> Cheers >> >>> >>> If NO, please suggest on how to test CEPH in such scenarios? >>> >>> Please help. >>> >>> Thank you. >>> Bharath Krishna >>> ___ >>> ceph-users mailing list >>> ceph-us...@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> -- >> Loïc Dachary, Artisan Logiciel Libre >> > -- Loïc Dachary, Artisan Logiciel Libre signature.asc Description: OpenPGP digital signature
[CEPH-DEVEL] [Workaround] Keystone VPI v3
Since the OpenStack Keystone team will move to use v3 API and try to decommission v2 completely, probably we need to modify codes in /src/rgw/. ./src/common/config_opts.h ./src/rgw/rgw_json_enc.cc ./src/rgw/rgw_swift.cc ./src/rgw/rgw_swift_auth.cc ./src/rgw/rgw_rest_swift.cc ./src/rgw/rgw_keystone.h I think that there is no backward compatibility for v2 anymore because of security reason. What do you think? I'm pretty sure I've missed something anyhow -; Shinobu -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Follow-Up on Alexandre's Transparent Huge Pages Testing
Hi Everyone, A while back Alexandre Derumier posted some test results looking at how transparent huge pages can reduce memory usage with jemalloc. I went back and ran a number of new tests on the community performance cluster to verify his findings and also look at how performance and cpu usage were affected, both during various fio benchmark tests and also during a 4k random write recovery scenario. I tested tcmalloc 2.4 with 32MB thread cache, 128MB thread cache, and jemalloc 4.0. The gist of it is that I also see a reduction in memory usage, most pronounced with jemalloc. Unfortuantely the best reduction in memory usage is when memory usage is already fairly low. The most important case is the memory spike when OSDs are marked back up/in during a recovery test. In this case there is still a benefit, though memory usage is still a little higher than TCMalloc with 128MB thread cache. There's a little bit of a concerning trend where memory usage appears to increase fairly quickly after the recovery test is complete and the post-recovery phase of the benchmark is running. That will likely need to be investigate in more depth. I have been doing some other tests with the async messenger and newstore, but those will have to wait for another paper. Here's are the results: https://drive.google.com/file/d/0B2gTBZrkrnpZY3U3TUU3RkJVeVk/view Mark -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: loadable objectstore
No James, Facing library issues with libnss3 and libcommon(ceph). Will resolve them, generate a new pull request soon on master. Thanks, Varada > -Original Message- > From: James (Fei) Liu-SSI [mailto:james@ssi.samsung.com] > Sent: Monday, September 28, 2015 11:51 PM > To: Varada Kari; Sage Weil > ; Matt W. Benjamin ; Loic > Dachary > Cc: ceph-devel > Subject: RE: loadable objectstore > > Hi Varada, > Have you rebased the pull request to master already? > > Thanks, > James > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Varada Kari > Sent: Friday, September 11, 2015 3:28 AM > To: Sage Weil; Matt W. Benjamin; Loic Dachary > Cc: ceph-devel > Subject: RE: loadable objectstore > > Hi Sage/ Matt, > > I have submitted the pull request based on wip-plugin branch for the object > store factory implementation at https://github.com/ceph/ceph/pull/5884 . > Haven't rebased to the master yet. Working on rebase and including new > store in the factory implementation. Please have a look and let me know > your comments. Will submit a rebased PR soon with new store integration. > > Thanks, > Varada > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Varada Kari > Sent: Friday, July 03, 2015 7:31 PM > To: Sage Weil ; Adam Crume > > Cc: Loic Dachary ; ceph-devel de...@vger.kernel.org>; Matt W. Benjamin > Subject: RE: loadable objectstore > > Hi All, > > Not able to make much progress after making common as a shared object > along with object store. > Compilation of the test binaries are failing with > "./.libs/libceph_filestore.so: > undefined reference to `tracepoint_dlopen'". > > CXXLDceph_streamtest > ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen' > collect2: error: ld returned 1 exit status > make[3]: *** [ceph_streamtest] Error 1 > > But libfilestore.so is linked with lttng-ust. > > src/.libs$ ldd libceph_filestore.so > libceph_keyvaluestore.so.1 => /home/varada/obs-factory/plugin- > work/src/.libs/libceph_keyvaluestore.so.1 (0x7f5e50f5) > libceph_os.so.1 => /home/varada/obs-factory/plugin- > work/src/.libs/libceph_os.so.1 (0x7f5e4f93a000) > libcommon.so.1 => /home/varada/ obs-factory/plugin- > work/src/.libs/libcommon.so.1 (0x7f5e4b5df000) > liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 > (0x7f5e4b179000) > liblttng-ust-tracepoint.so.0 => > /usr/lib/x86_64-linux-gnu/liblttng-ust- > tracepoint.so.0 (0x7f5e4a021000) > liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x7f5e49e1a000) > liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x7f5e49c12000) > > Edited the above output just show the dependencies. > Did anyone face this issue before? > Any help would be much appreciated. > > Thanks, > Varada > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Varada Kari > Sent: Friday, June 26, 2015 3:34 PM > To: Sage Weil > Cc: Loic Dachary; ceph-devel; Matt W. Benjamin > Subject: RE: loadable objectstore > > Hi, > > Made some more changes to resolve lttng problems at > https://github.com/varadakari/ceph/commits/wip-plugin. > But couldn’t by pass the issues. Facing some issues like mentioned below. > > ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen' > > Compiling with -llttng-ust is not resolving the problem. Seen some threads in > devel list before, mentioning this problem. > Can anyone take a look and guide me to fix this problem? > > Haven't made the changes to change the plugin name etc... will be making > them as part of cleanup. > > Thanks, > Varada > > -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Varada Kari > Sent: Monday, June 22, 2015 8:57 PM > To: Matt W. Benjamin > Cc: Loic Dachary; ceph-devel; Sage Weil > Subject: RE: loadable objectstore > > Hi Matt, > > Majority of the changes are segregating the files to corresponding shared > object and creating a factory object. And the naming is mostly taken from > Erasure-coding plugins. Want a good naming convention :-), hence a > preliminary review. Do agree, we have lot of loadable interfaces, and I think > we are in the way of making them on-demand (if possible) loadable > modules. > > Varada > > -Original Message- > From: Matt W. Benjamin [mailto:m...@cohortfs.com] > Sent: Monday, June 22, 2015 8:37 PM > To: Varada Kari > Cc: Loic Dachary; ceph-devel; Sage Weil > Subject: Re: loadable objectstore > > Hi, > > It's just aesthetic, but it feels clunky to change the names of well known >
Re: [Hammer Backports] Should rest-bench be removed on hammer ?
On Mon, 28 Sep 2015, Loic Dachary wrote: > Hi, > > On 28/09/2015 12:19, Abhishek Varshney wrote: > > Hi, > > > > The rest-bench tool has been removed in master through PR #5428 > > (https://github.com/ceph/ceph/pull/5428). The backport PR #5812 > > (https://github.com/ceph/ceph/pull/5812) is currently causing failures > > on the hammer-backports integration branch. These failures can be > > resolved by either backporting PR #5428 or by adding a hammer-specific > > commit to PR #5812. > > > > How should we proceed here? > > It looks like rest-bench support was removed because cosbench can replace it. > The string cosbench or rest.bench does not show in ceph-qa-suite / ceph > master or hammer, which probably means tests using rest-bench are outside of > the scope of the ceph project. Deprecating rest-bench from hammer by > backporting https://github.com/ceph/ceph/pull/5428 seems sensible. I don't think we should be removing tools in a stable series unless we have a really good reason to do so. In this case we're dropping rest-bench because we don't want to maintain it, not because it is fatally broken. Hammer users who are using shouldn't find that is is removed in a later point release. s -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: libcephfs invalidate upcalls
On Sat, Sep 26, 2015 at 8:03 PM, Matt Benjaminwrote: > Hi John, > > I prototyped an invalidate upcall for libcephfs and the Gasesha Ceph fsal, > building on the Client invalidation callback registrations. > > As you suggested, NFS (or AFS, or DCE) minimally expect a more generic > "cached vnode may have changed" trigger than the current inode and dentry > invalidates, so I extended the model slightly to hook cap revocation, > feedback appreciated. In cap_release, we probably need to be a bit more discriminating about when to drop, e.g. if we've only lost our exclusive write caps, the rest of our metadata might all still be fine to cache. Is ganesha in general doing any data caching? I think I had implicitly assumed that we were only worrying about metadata here but now I realise I never checked that. The awkward part is Client::trim_caps. In the Client::trim_caps case, the lru_is_expirable part won't be true until something has already been invalidated, so there needs to be an explicit hook there -- rather than invalidating in response to cap release, we need to invalidate in order to get ganesha to drop its handle, which will render something expirable, and finally when we expire it, the cap gets released. In that case maybe we need a hook in ganesha to say "invalidate everything you can" so that we don't have to make a very large number of function calls to invalidate things. In the fuse/kernel case we can only sometimes invalidate a piece of metadata (e.g. we can't if its flocked or whatever), so we ask it to invalidate everything. But perhaps in the NFS case we can always expect our invalidate calls to be respected, so we could just invalidate a smaller number of things (the difference between actual cache size and desired)? John > > g...@github.com:linuxbox2/ceph.git , branch invalidate > g...@github.com:linuxbox2/nfs-ganesha.git , branch ceph-invalidates > > thanks, > > Matt > > -- > Matt Benjamin > Red Hat, Inc. > 315 West Huron Street, Suite 140A > Ann Arbor, Michigan 48103 > > http://www.redhat.com/en/technologies/storage > > tel. 734-761-4689 > fax. 734-769-8938 > cel. 734-216-5309 > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html