New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 6 of 6 defect(s)


** CID 1019567:  Thread deadlock  (ORDER_REVERSAL)


** CID 1231681:  Thread deadlock  (ORDER_REVERSAL)


** CID 1231682:  Thread deadlock  (ORDER_REVERSAL)


** CID 1231683:  Thread deadlock  (ORDER_REVERSAL)


** CID 1231684:  Thread deadlock  (ORDER_REVERSAL)



** CID 1231685:  Use after free  (USE_AFTER_FREE)




*** CID 1019567:  Thread deadlock  (ORDER_REVERSAL)
/osd/OSD.cc: 3689 in OSD::handle_osd_ping(MOSDPing *)()
3683  << ", " << debug_heartbeat_drops_remaining[from]
3684  << " remaining to drop" << dendl;
3685  break;
3686}
3687   }
3688 
>>> CID 1019567:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "is_healthy" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
3689   if (!cct->get_heartbeat_map()->is_healthy()) {
3690dout(10) << "internal heartbeat not healthy, dropping ping 
request" << dendl;
3691break;
3692   }
3693 
3694   Message *r = new MOSDPing(monc->get_fsid(),


*** CID 1231681:  Thread deadlock  (ORDER_REVERSAL)
/librados/RadosClient.cc: 111 in librados::RadosClient::lookup_pool(const char 
*)()
105   int r = wait_for_osdmap();
106   if (r < 0) {
107 lock.Unlock();
108 return r;
109   }
110   int64_t ret = osdmap.lookup_pg_pool_name(name);
>>> CID 1231681:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "get_write" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
111   pool_cache_rwl.get_write();
112   lock.Unlock();
113   if (ret < 0) {
114 pool_cache_rwl.unlock();
115 return -ENOENT;
116   }


*** CID 1231682:  Thread deadlock  (ORDER_REVERSAL)
/osd/OSD.cc: 2369 in OSD::shutdown()()
2363   service.start_shutdown();
2364 
2365   clear_waiting_sessions();
2366 
2367   // Shutdown PGs
2368   {
>>> CID 1231682:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "RLocker" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
2369 RWLock::RLocker l(pg_map_lock);
2370 for (ceph::unordered_map::iterator p = pg_map.begin();
2371 p != pg_map.end();
2372 ++p) {
2373   dout(20) << " kicking pg " << p->first << dendl;
2374   p->second->lock();


*** CID 1231683:  Thread deadlock  (ORDER_REVERSAL)
/client/Client.cc: 372 in Client::init()()
366   client_lock.Unlock();
367   objecter->init_unlocked();
368   client_lock.Lock();
369 
370   objecter->init_locked();
371 
>>> CID 1231683:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "set_want_keys" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
372   monclient->set_want_keys(CEPH_ENTITY_TYPE_MDS | CEPH_ENTITY_TYPE_OSD);
373   monclient->sub_want("mdsmap", 0, 0);
374   monclient->sub_want("osdmap", 0, CEPH_SUBSCRIBE_ONETIME);
375   monclient->renew_subs();
376 
377   // logger


*** CID 1231684:  Thread deadlock  (ORDER_REVERSAL)
/osd/OSD.h: 2237 in OSD::RepScrubWQ::_process(MOSDRepScrub *, 
ThreadPool::TPHandle &)()
2231   ThreadPool::TPHandle ) {
2232   osd->osd_lock.Lock();
2233   if (osd->is_stopping()) {
2234osd->osd_lock.Unlock();
2235return;
2236   }
>>> CID 1231684:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "_have_pg" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
2237   if (osd->_have_pg(msg->pgid)) {
2238PG *pg = osd->_lookup_lock_pg(msg->pgid);
2239osd->osd_lock.Unlock();
2240pg->replica_scrub(msg, handle);
2241msg->put();
2242pg->unlock();
/osd/OSD.h: 2238 in OSD::RepScrubWQ::_process(MOSDRepScrub *, 
ThreadPool::TPHandle &)()
2232   osd->osd_lock.Lock();
2233   if (osd->is_stopping()) {
2234osd->osd_lock.Unlock();
2235return;
2236   }
2237   if (osd->_have_pg(msg->pgid)) {
>>> CID 1231684:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "_lookup_lock_pg" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 7 / 14).
2238PG *pg = 

New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 1 of 1 defect(s)


** CID 1230671:  Missing unlock  (LOCK)
/msg/SimpleMessenger.cc: 258 in SimpleMessenger::reaper()()



*** CID 1230671:  Missing unlock  (LOCK)
/msg/SimpleMessenger.cc: 258 in SimpleMessenger::reaper()()
252   ::close(p->sd);
253 ldout(cct,10) << "reaper reaped pipe " << p << " " << 
p->get_peer_addr() << dendl;
254 p->put();
255 ldout(cct,10) << "reaper deleted pipe " << p << dendl;
256   }
257   ldout(cct,10) << "reaper done" << dendl;
>>> CID 1230671:  Missing unlock  (LOCK)
>>> Returning without unlocking "this->lock._m".
258 }
259 
260 void SimpleMessenger::queue_reap(Pipe *pipe)
261 {
262   ldout(cct,10) << "queue_reap " << pipe << dendl;
263   lock.Lock();



To view the defects in Coverity Scan visit, 
http://scan.coverity.com/projects/25?tab=overview

To unsubscribe from the email notification for new defects, 
http://scan5.coverity.com/cgi-bin/unsubscribe.py



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 1 of 1 defect(s)


** CID 1241497:  Thread deadlock  (ORDER_REVERSAL)




*** CID 1241497:  Thread deadlock  (ORDER_REVERSAL)
/osdc/Filer.cc: 314 in Filer::_do_purge_range(PurgeRange *, int)()
308 return;
309   }
310 
311   int max = 10 - pr->uncommitted;
312   while (pr->num > 0 && max > 0) {
313 object_t oid = file_object_t(pr->ino, pr->first);
>>> CID 1241497:  Thread deadlock  (ORDER_REVERSAL)
>>> Calling "get_osdmap_read" acquires lock "RWLock.L" while holding lock 
>>> "Mutex._m" (count: 15 / 30).
314 const OSDMap *osdmap = objecter->get_osdmap_read();
315 object_locator_t oloc = osdmap->file_to_object_locator(pr->layout);
316 objecter->put_osdmap_read();
317 objecter->remove(oid, oloc, pr->snapc, pr->mtime, pr->flags,
318  NULL, new C_PurgeRange(this, pr));
319 pr->uncommitted++;



To view the defects in Coverity Scan visit, 
http://scan.coverity.com/projects/25?tab=overview

To unsubscribe from the email notification for new defects, 
http://scan5.coverity.com/cgi-bin/unsubscribe.py



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 1 of 1 defect(s)


** CID 1243158:  Resource leak  (RESOURCE_LEAK)
/test/librbd/test_librbd.cc: 1370 in 
LibRBD_ListChildrenTiered_Test::TestBody()()



*** CID 1243158:  Resource leak  (RESOURCE_LEAK)
/test/librbd/test_librbd.cc: 1370 in 
LibRBD_ListChildrenTiered_Test::TestBody()()
1364 
1365   int features = RBD_FEATURE_LAYERING;
1366   rbd_image_t parent;
1367   int order = 0;
1368 
1369   // make a parent to clone from
>>> CID 1243158:  Resource leak  (RESOURCE_LEAK)
>>> Variable "ioctx2" going out of scope leaks the storage it points to.
1370   ASSERT_EQ(0, create_image_full(ioctx1, "parent", 4<<20, ,
1371 false, features));
1372   ASSERT_EQ(0, rbd_open(ioctx1, "parent", , NULL));
1373   // create a snapshot, reopen as the parent we're interested in
1374   ASSERT_EQ(0, rbd_snap_create(parent, "parent_snap"));
1375   ASSERT_EQ(0, rbd_snap_set(parent, "parent_snap"));



To view the defects in Coverity Scan visit, 
http://scan.coverity.com/projects/25?tab=overview

To unsubscribe from the email notification for new defects, 
http://scan5.coverity.com/cgi-bin/unsubscribe.py



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,

Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

14 new defect(s) introduced to ceph found with Coverity Scan.
4 defect(s), reported by Coverity Scan earlier, were marked fixed in the recent 
build analyzed by Coverity Scan.

New defect(s) Reported-by: Coverity Scan
Showing 14 of 14 defect(s)


** CID 1296388:  Uninitialized members  (UNINIT_CTOR)
/librbd/RebuildObjectMapRequest.cc: 35 in 
librbdC_VerifyObject::C_VerifyObject(librbd::AsyncObjectThrottle 
&, librbd::ImageCtx *, unsigned long, unsigned long)()



*** CID 1296388:  Uninitialized members  (UNINIT_CTOR)
/librbd/RebuildObjectMapRequest.cc: 35 in 
librbdC_VerifyObject::C_VerifyObject(librbd::AsyncObjectThrottle 
&, librbd::ImageCtx *, unsigned long, unsigned long)()
29 : C_AsyncObjectThrottle(throttle), m_image_ctx(*image_ctx),
30   m_snap_id(snap_id), m_object_no(object_no),
31   m_oid(m_image_ctx.get_object_name(m_object_no))
32   {
33 m_io_ctx.dup(m_image_ctx.md_ctx);
34 m_io_ctx.snap_set_read(CEPH_SNAPDIR);
>>> CID 1296388:  Uninitialized members  (UNINIT_CTOR)
>>> Non-static class member "m_snap_list_ret" is not initialized in this 
>>> constructor nor in any functions that it calls.
35   }
36 
37   virtual void complete(int r) {
38 if (should_complete(r)) {
39   ldout(m_image_ctx.cct, 20) << m_oid << " C_VerifyObject completed "
40  << dendl;

** CID 1296387:(UNCAUGHT_EXCEPT)
/test/system/rados_watch_notify.cc: 59 in main()
/test/system/rados_watch_notify.cc: 59 in main()
/test/system/rados_watch_notify.cc: 59 in main()
/test/system/rados_watch_notify.cc: 59 in main()



*** CID 1296387:(UNCAUGHT_EXCEPT)
/test/system/rados_watch_notify.cc: 59 in main()
53 
54 const char *get_id_str()
55 {
56   return "main";
57 }
58 
>>> CID 1296387:(UNCAUGHT_EXCEPT)
>>> In function "main(int, char const **)" an exception of type 
>>> "ceph::FailedAssertion" is thrown and never caught.
59 int main(int argc, const char **argv)
60 {
61   std::string pool = "foo." + stringify(getpid());
62   CrossProcessSem *setup_sem = NULL;
63   RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem));
64   CrossProcessSem *watch_sem = NULL;
/test/system/rados_watch_notify.cc: 59 in main()
53 
54 const char *get_id_str()
55 {
56   return "main";
57 }
58 
>>> CID 1296387:(UNCAUGHT_EXCEPT)
>>> In function "main(int, char const **)" an exception of type 
>>> "ceph::FailedAssertion" is thrown and never caught.
59 int main(int argc, const char **argv)
60 {
61   std::string pool = "foo." + stringify(getpid());
62   CrossProcessSem *setup_sem = NULL;
63   RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem));
64   CrossProcessSem *watch_sem = NULL;
/test/system/rados_watch_notify.cc: 59 in main()
53 
54 const char *get_id_str()
55 {
56   return "main";
57 }
58 
>>> CID 1296387:(UNCAUGHT_EXCEPT)
>>> In function "main(int, char const **)" an exception of type 
>>> "ceph::FailedAssertion" is thrown and never caught.
59 int main(int argc, const char **argv)
60 {
61   std::string pool = "foo." + stringify(getpid());
62   CrossProcessSem *setup_sem = NULL;
63   RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem));
64   CrossProcessSem *watch_sem = NULL;
/test/system/rados_watch_notify.cc: 59 in main()
53 
54 const char *get_id_str()
55 {
56   return "main";
57 }
58 
>>> CID 1296387:(UNCAUGHT_EXCEPT)
>>> In function "main(int, char const **)" an exception of type 
>>> "ceph::FailedAssertion" is thrown and never caught.
59 int main(int argc, const char **argv)
60 {
61   std::string pool = "foo." + stringify(getpid());
62   CrossProcessSem *setup_sem = NULL;
63   RETURN1_IF_NONZERO(CrossProcessSem::create(0, _sem));
64   CrossProcessSem *watch_sem = NULL;

** CID 1296386:(UNCAUGHT_EXCEPT)
/test/system/rados_open_pools_parallel.cc: 98 in main()
/test/system/rados_open_pools_parallel.cc: 98 in main()



*** CID 1296386:(UNCAUGHT_EXCEPT)
/test/system/rados_open_pools_parallel.cc: 98 in main()
92 
93 const char *get_id_str()
94 {
95   return "main";
96 }
97 
>>> CID 1296386:(UNCAUGHT_EXCEPT)
>>> In function "main(int, char const **)" an exception of type 
>>> "ceph::FailedAssertion" is thrown and never caught.
98 int main(int argc, const char **argv)
99 {
100   // first test: create a pool, shut down the client, access 

RE: loadable objectstore

2015-09-28 Thread James (Fei) Liu-SSI
Hi Varada,
  Have you rebased the pull request to master already?  

  Thanks,
  James 

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari
Sent: Friday, September 11, 2015 3:28 AM
To: Sage Weil; Matt W. Benjamin; Loic Dachary
Cc: ceph-devel
Subject: RE: loadable objectstore

Hi Sage/ Matt,

I have submitted the pull request based on wip-plugin branch for the object 
store factory implementation at https://github.com/ceph/ceph/pull/5884 . 
Haven't rebased to the master yet. Working on rebase and including new store in 
the factory implementation.  Please have a look and let me know your comments. 
Will submit a rebased PR soon with new store integration. 

Thanks,
Varada

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari
Sent: Friday, July 03, 2015 7:31 PM
To: Sage Weil ; Adam Crume 
Cc: Loic Dachary ; ceph-devel ; 
Matt W. Benjamin 
Subject: RE: loadable objectstore

Hi All,

Not able to make much progress after making common as a shared object along 
with object store. 
Compilation of the test binaries are failing with 
"./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'".

  CXXLDceph_streamtest
./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'
collect2: error: ld returned 1 exit status
make[3]: *** [ceph_streamtest] Error 1

But libfilestore.so is linked with lttng-ust.

src/.libs$ ldd libceph_filestore.so
libceph_keyvaluestore.so.1 => 
/home/varada/obs-factory/plugin-work/src/.libs/libceph_keyvaluestore.so.1 
(0x7f5e50f5)
libceph_os.so.1 => 
/home/varada/obs-factory/plugin-work/src/.libs/libceph_os.so.1 
(0x7f5e4f93a000)
libcommon.so.1 => /home/varada/ 
obs-factory/plugin-work/src/.libs/libcommon.so.1 (0x7f5e4b5df000)
liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0 
(0x7f5e4b179000)
liblttng-ust-tracepoint.so.0 => 
/usr/lib/x86_64-linux-gnu/liblttng-ust-tracepoint.so.0 (0x7f5e4a021000)
liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x7f5e49e1a000)
liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x7f5e49c12000)

Edited the above output just show the dependencies.  
Did anyone face this issue before? 
Any help would be much appreciated. 

Thanks,
Varada

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari
Sent: Friday, June 26, 2015 3:34 PM
To: Sage Weil
Cc: Loic Dachary; ceph-devel; Matt W. Benjamin
Subject: RE: loadable objectstore

Hi,

Made some more changes to resolve lttng problems at 
https://github.com/varadakari/ceph/commits/wip-plugin.
But couldn’t by pass the issues. Facing some issues like mentioned below.

./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'

Compiling with -llttng-ust is not resolving the problem. Seen some threads in 
devel list before, mentioning this problem. 
Can anyone take a look and guide me to fix this problem?

Haven't made the changes to change the plugin name etc... will be making them 
as part of cleanup.

Thanks,
Varada

-Original Message-
From: ceph-devel-ow...@vger.kernel.org 
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Varada Kari
Sent: Monday, June 22, 2015 8:57 PM
To: Matt W. Benjamin
Cc: Loic Dachary; ceph-devel; Sage Weil
Subject: RE: loadable objectstore

Hi Matt,

Majority of the changes are segregating the files to corresponding shared 
object and creating a factory object. And the naming is mostly taken from 
Erasure-coding plugins. Want a good naming convention :-), hence a preliminary 
review. Do agree, we have lot of loadable interfaces, and I think we are in the 
way of making them on-demand (if possible) loadable modules.

Varada

-Original Message-
From: Matt W. Benjamin [mailto:m...@cohortfs.com]
Sent: Monday, June 22, 2015 8:37 PM
To: Varada Kari
Cc: Loic Dachary; ceph-devel; Sage Weil
Subject: Re: loadable objectstore

Hi,

It's just aesthetic, but it feels clunky to change the names of well known 
modules to Plugin--esp. if that generalizes forward to new loadable 
modules (and we have a lot of loadable interfaces).

Matt

- "Varada Kari"  wrote:

> Hi Sage,
>
> Please find the initial implementation of objects store factory 
> (initial cut) at 
> https://github.com/varadakari/ceph/commit/9d5fe2fecf38ba106c7c7b7a3ede
> 4f189ec7e1c8
>
> This is still work in progress branch. Right now I am facing Lttng 
> issues,
> LTTng-UST: Error (-17) while registering tracepoint probe. Duplicate 
> registration of tracepoint probes having the same name is not allowed.
>
> Might be an issue with libcommon inclusion. Trying resolving the issue 
> now. Seems I need to make libcommon 

Re: Adding Data-At-Rest compression support to Ceph

2015-09-28 Thread Igor Fedotov


On 25.09.2015 17:14, Sage Weil wrote:

On Fri, 25 Sep 2015, Igor Fedotov wrote:

Another thing to note is that we don't have the whole object ready for
compression. We just have some new data block written(appended) to the object.
And we should either compress that block and save mentioned mapping data or
decompress the existing object data and do full compression again.
And IMO introducing seek points is largely similar to what we were talking
about - it requires a sort of offset mapping as well.

Probably compression at OSD has some Pros as well. But it wouldn't eliminate
the need to "muck with stripe sizes or anything".

I think the best option here is going to be to compress the "stripe unit".
I.e., if you have a stripe_size of 64K, and are doing k=4 m=2, then the
stripe unit is 16K (64/4).  Then each shard has an independent unit it can
compress/decompress and we don't break the ability to read a small extent
by talking to only a single shard.

Sage, are you considering compression applied after erasure coding here?
Please note that one needs to compress additional 50% of data this way. 
Generated 'm' chunks need to be processed as well.
And you lose an ability to perform recovery on OSD down without applying 
decompression ( and probably another compression) to remaining shards.


Contrary doing compression before EC produces reduced data set for EC  ( 
some CPU cycles saving)  and is suitable for recovery procedure not 
involving additional decompression/compression pair.
But I suppose 'stripe unit' from the above wouldn't work in this case - 
compression entity has to produce  blocks having "stripe unit" size. 
This way you can fit all compressed data into single shard only. But 
that's hard to achieve


Thus as usual we should choose what drawbacks(benefits) are less(more) 
important here:
ability to read small extent from single shard + increased data set for 
compression vs. ability to omit total decompression on recovery + 
reduced data set for EC.






*Maybe* the shard could compress contiguous stripe units if multiple
stripes are written together..

In any case, though, there will some metadata it has to track with the
object, because the stripe units are no longer fixed size, and there will
be object_size/stripe_size of them.  I forget if we are already storing a
CRC for each stripe unit or if it is for the entire shard... if it's the
former then this won't be a huge change, I think.

sage




On 24.09.2015 20:53, Samuel Just wrote:

The catch is that currently accessing 4k in the middle of a 4MB object
does not require reading the whole object, so you'd need some kind of
logical offset -> compressed offset mapping.
-Sam

On Thu, Sep 24, 2015 at 10:36 AM, Robert LeBlanc 
wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I'm probably missing something, but since we are talking about data at
rest, can't we just have the OSD compress the object as it goes to
disk? Instead of
rbd\udata.1ba49c10d9b00c.6859__head_2AD1002B__11 it would
be
rbd\udata.1ba49c10d9b00c.6859__head_2AD1002B__11.{gz,xz,bz2,lzo,etc}.
Then it seems that you don't have to muck with stripe sizes or
anything. For compressible objects they would be less than 4MB, some
of theses algorithms already say if it is not compressible enough,
just store it.

Something like zlib Z_FULL_FLUSH may help provide some seek points
within an archive to prevent decompressing the whole object for reads?

- 
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Sep 24, 2015 at 10:25 AM, Igor Fedotov  wrote:

On 24.09.2015 19:03, Sage Weil wrote:

On Thu, 24 Sep 2015, Igor Fedotov wrote:

Dynamic stripe sizes are possible but it's a significant change from
the
way the EC pool currently works. I would make that a separate project
(as
its useful in its own right) and not complicate the compression
situation.
Or, if it simplifies the compression approach, then I'd make that
change
first. sage

Just to clarify a bit. What I saw when played with Ceph. Please correct
me
if I'm wrong..

For low-level RADOS access client data written to EC pool has to be
aligned
with stripe size . The last block can be unaligned though but no more
appends are permitted in this case.
Data copied from cache goes in blocks up to 8Mb size. In general case
the
last block seems to have unaligned size too.

EC pool additionally performs alignment of the incoming blocks to stripe
bound internally. This way blocks going to EC lib are always aligned.
We should probably perform compression prior to this alignment.
Thus some dependency on stripe size is present in EC pools but it's not
that
strict.

Thanks,
Igor

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

-BEGIN PGP SIGNATURE-
Version: Mailvelope v1.1.0
Comment: 

New Defects reported by Coverity Scan for ceph

2015-09-28 Thread scan-admin

Hi,


Please find the latest report on new defect(s) introduced to ceph found with 
Coverity Scan.

Defect(s) Reported-by: Coverity Scan
Showing 20 of 38 defect(s)


** CID 717233:  Uninitialized scalar field  (UNINIT_CTOR)
/mds/Capability.h: 249 in Capability::Capability(CInode *, unsigned long, 
client_t)()

** CID 1238869:  Value not atomically updated  (ATOMICITY)
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()

** CID 1238870:  Unchecked return value  (CHECKED_RETURN)
/test/test_snap_mapper.cc: 562 in MapperVerifier::remove_oid()()

** CID 1238871:  Dereference after null check  (FORWARD_NULL)
/mds/Server.cc: 6988 in Server::do_rename_rollback(ceph::buffer::list &, int, 
std::tr1::shared_ptr &, bool)()
/mds/Server.cc: 7107 in Server::do_rename_rollback(ceph::buffer::list &, int, 
std::tr1::shared_ptr &, bool)()

** CID 1238872:  Unchecked return value  (CHECKED_RETURN)
/tools/ceph_objectstore_tool.cc: 1284 in 
do_import_rados(std::basic_string)()

** CID 1238873:  Unchecked return value  (CHECKED_RETURN)
/rbd_replay/Replayer.cc: 154 in rbd_replay::Replayer::run(const 
std::basic_string&)()

** CID 1238874:  Missing unlock  (LOCK)
/osdc/Objecter.cc: 1855 in Objecter::op_cancel(Objecter::OSDSession *, unsigned 
long, int)()

** CID 1238875:  Unrecoverable parse warning  (PARSE_ERROR)
/client/Client.cc: 7737 in ()

** CID 1238876:  Unrecoverable parse warning  (PARSE_ERROR)
/client/Client.cc: 7735 in ()

** CID 1238877:  Missing unlock  (LOCK)
/common/Timer.cc: 240 in RWTimer::shutdown()()

** CID 1238878:  Unrecoverable parse warning  (PARSE_ERROR)
/client/Client.cc: 7734 in ()

** CID 1238879:  Thread deadlock  (ORDER_REVERSAL)


** CID 1238880:  Thread deadlock  (ORDER_REVERSAL)



** CID 1238881:  Thread deadlock  (ORDER_REVERSAL)



** CID 1238882:  Thread deadlock  (ORDER_REVERSAL)


** CID 1238883:  Improper use of negative value  (NEGATIVE_RETURNS)
/mds/MDS.cc: 962 in MDS::handle_mds_map(MMDSMap *)()

** CID 1238884:  Unrecoverable parse warning  (PARSE_ERROR)
/client/Client.cc: 7733 in ()

** CID 1238885:  Thread deadlock  (ORDER_REVERSAL)


** CID 1238886:  Thread deadlock  (ORDER_REVERSAL)


** CID 1238887:  Thread deadlock  (ORDER_REVERSAL)




*** CID 717233:  Uninitialized scalar field  (UNINIT_CTOR)
/mds/Capability.h: 249 in Capability::Capability(CInode *, unsigned long, 
client_t)()
243 suppress(0), state(0),
244 client_follows(0), client_xattr_version(0),
245 client_inline_version(0),
246 item_session_caps(this), item_snaprealm_caps(this), 
item_revoking_caps(this) {
247 g_num_cap++;
248 g_num_capa++;
>>> CID 717233:  Uninitialized scalar field  (UNINIT_CTOR)
>>> Non-static class member "num_revoke_warnings" is not initialized in 
>>> this constructor nor in any functions that it calls.
249   }
250   ~Capability() {
251 g_num_cap--;
252 g_num_caps++;
253   }
254 


*** CID 1238869:  Value not atomically updated  (ATOMICITY)
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()
3049 if (!rwlock.is_wlocked()) {
3050   rwlock.unlock();
3051   rwlock.get_write();
3052 }
3053 iter = pool_ops.find(tid);
3054 if (iter != pool_ops.end()) {
>>> CID 1238869:  Value not atomically updated  (ATOMICITY)
>>> Using an unreliable value of "op" inside the second locked section. If 
>>> the data that "op" depends on was changed by another thread, this use might 
>>> be incorrect.
3055   _finish_pool_op(op);
3056 }
3057   } else {
3058 ldout(cct, 10) << "unknown request " << tid << dendl;
3059   }
3060   rwlock.unlock();
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()
3049 if (!rwlock.is_wlocked()) {
3050   rwlock.unlock();
3051   rwlock.get_write();
3052 }
3053 iter = pool_ops.find(tid);
3054 if (iter != pool_ops.end()) {
>>> CID 1238869:  Value not atomically updated  (ATOMICITY)
>>> Using an unreliable value of "op" inside the second locked section. If 
>>> the data that "op" depends on was changed by another thread, this use might 
>>> be incorrect.
3055   _finish_pool_op(op);
3056 }
3057   } else {
3058 ldout(cct, 10) << "unknown request " << tid << dendl;
3059   }
3060   rwlock.unlock();
/osdc/Objecter.cc: 3055 in Objecter::handle_pool_op_reply(MPoolOpReply *)()
3049 if (!rwlock.is_wlocked()) {
3050   

Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON!

2015-09-28 Thread Alexandre DERUMIER
Thanks Mark !

>>Again, sorry for the delay on these. 

No problem, it's already fantasctic that you manage theses meetings each week !

Regards,

Alexandre
- Mail original -
De: "Mark Nelson" 
À: "aderumier" 
Cc: "ceph-devel" 
Envoyé: Lundi 28 Septembre 2015 18:24:22
Objet: Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON!

Hi Alexandre, 

Sorry for the long delay. I think I got through all of them. They 
should be public now and I've listed them in the etherpad: 

http://pad.ceph.com/p/performance_weekly 

Again, sorry for the delay on these. I can't find any way to make 
bluejeans default to making the meetings public. 

Mark 

On 09/23/2015 11:44 AM, Alexandre DERUMIER wrote: 
> Hi Mark, 
> 
> can you post the video records of previous meetings ? 
> 
> Thanks 
> 
> Alexandre 
> 
> 
> - Mail original - 
> De: "Mark Nelson"  
> À: "ceph-devel"  
> Envoyé: Mercredi 23 Septembre 2015 15:51:21 
> Objet: 09/23/2015 Weekly Ceph Performance Meeting IS ON! 
> 
> 8AM PST as usual! Discussion topics include an update on transparent 
> huge pages testing and I think Ben would like to talk a bit about CBT 
> PRs. Please feel free to add your own! 
> 
> Here's the links: 
> 
> Etherpad URL: 
> http://pad.ceph.com/p/performance_weekly 
> 
> To join the Meeting: 
> https://bluejeans.com/268261044 
> 
> To join via Browser: 
> https://bluejeans.com/268261044/browser 
> 
> To join with Lync: 
> https://bluejeans.com/268261044/lync 
> 
> 
> To join via Room System: 
> Video Conferencing System: bjn.vc -or- 199.48.152.152 
> Meeting ID: 268261044 
> 
> To join via Phone: 
> 1) Dial: 
> +1 408 740 7256 
> +1 888 240 2560(US Toll Free) 
> +1 408 317 9253(Alternate Number) 
> (see all numbers - http://bluejeans.com/numbers) 
> 2) Enter Conference ID: 268261044 
> 
> Mark 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[puppet] Moving puppet-ceph to the Openstack big tent

2015-09-28 Thread David Moreau Simard
Hi,

puppet-ceph currently lives in stackforge [1] which is being retired
[2]. puppet-ceph is also mirrored on the Ceph Github organization [3].
This version of the puppet-ceph module was created from scratch and
not as a fork of the (then) upstream puppet-ceph by Enovance [4].
Today, the version by Enovance is no longer officially maintained
since Red Hat has adopted the new release.

Being an Openstack project under Stackforge or Openstack brings a lot
of benefits but it's not black and white, there are cons too.

It provides us with the tools, the processes and the frameworks to
review and test each contribution to ensure we ship a module that is
stable and is held to the highest standards.
But it also means that:
- We forego some level of ownership back to the Openstack foundation,
it's technical committee and the Puppet Openstack PTL.
- puppet-ceph contributors will also be required to sign the
Contributors License Agreement and jump through the Gerrit hoops [5]
which can make contributing to the project harder.

We have put tremendous efforts into creating a quality module and as
such it was the first puppet module in the stackforge organization to
implement not only unit tests but also integration tests with third
party CI.
Integration testing for other puppet modules are just now starting to
take shape by using the Openstack CI inrastructure.

In the context of Openstack, RDO already ships with a mean to install
Ceph with this very module and Fuel will be adopting it soon as well.
This means the module will benefit from real world experience and
improvements by the Openstack community and packagers.
This will help further reinforce that not only Ceph is the best
unified storage solution for Openstack but that we have means to
deploy it in the real world easily.

We all know that Ceph is also deployed outside of this context and
this is why the core reviewers make sure that contributions remain
generic and usable outside of this use case.

Today, the core members of the project discussed whether or not we
should move puppet-ceph to the Openstack big tent and we had a
consensus approving the move.
We would also like to hear the thoughts of the community on this topic.

Please let us know what you think.

Thanks,

[1]: https://github.com/stackforge/puppet-ceph
[2]: https://review.openstack.org/#/c/192016/
[3]: https://github.com/ceph/puppet-ceph
[4]: https://github.com/redhat-cip/puppet-ceph
[5]: https://wiki.openstack.org/wiki/How_To_Contribute

David Moreau Simard
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Compression implementation options

2015-09-28 Thread Igor Fedotov

Hi folks,

Here is a brief summary on potential compression implementation options.
I think we should choose the desired approach prior to start working on 
the compression feature.


Comments, additions and fixes are welcome.

Compression At Client - compression/decompression to be performed at the 
client level (most preferably - Rados) before sending/after receiving 
data to/from Ceph.

Pros:
* Ceph cluster isn’t loaded with additional computation burden.
* All Ceph cluster components and data transfers benefit from 
reduce data volume.

* Compression is transparent to Ceph cluster components
Cons:
* Weak clients can lack CPU resources to handle their traffic.
* Any Read/Write access requires at least two sequential 
requests to Ceph cluster to get data: the first one to retrieve 
“original to compressed“ offset mapping for desired data block, the 
second one to get compressed data block.
* Random write access handling is tricky (see notes below). 
Even more requests to the cluster per single user one might be needed in 
this case.


Compression At Replicated Pool - compression to be performed at primary 
Ceph entities at Replicated Pool level prior to data replication.

Pros:
* Clients benefit from cluster CPU resources utilization.
* Compression for specific data block is performed at a single 
point only - thus total CPU utilization for Ceph cluster is less.
* Underlying Ceph components and data transfers benefit from 
from reduced data volume.

Cons:
* Clients that use EC pools directly lack compression unless 
it’s implemented there too.
* In two-tier model data compression at cache tier may be 
inappropriate due to performance reasons. Compression at cache tier also 
prevents from cache removal when/if needed.

* Random write access handling is tricky (see notes below).

Compression At Erasure Coded pool - compression to be performed at 
primary Ceph entities at EC Pool level prior to Erasure Coding.

Pros:
* Clients benefit from cluster CPU resources utilization.
* Erasure Coding “inflates” processed data block (up to ~50%). 
Thus doing compression prior to that reduces CPU utilization.
* Natural combination with EC means. Compression and EC have 
similar purposes - save storage space at the cost of CPU usage. One can 
reuse EC infrastructure and design solutions.
* No need for random write access support - EC pools don’t 
provide that on its own. Thus we can reuse the same approach to resolve 
the issue when needed. Implementation becomes much easier.
* Underlying Ceph components and data transfers benefit from 
reduced data volume.

Cons:
* Limited applicability - clients that don’t use EC pools lack 
compression.


Compression At Ceph Filestore entity - compression to be performed by 
Ceph File Store component prior to saving object data to underlying file 
system.

Pros:
*Clients benefit from cluster CPU resources utilization.

Cons:
* Random write access is tricky (see notes below).
* From cluster perspective compression is performed either on 
each replicated block or on a block “inflated” by erasure coding. Thus 
total Ceph cluster CPU utilization to perform compression becomes 
considerably higher ( three times increase for replicated pools and ~50% 
one for EC pools).

* No benefit in reduced data transfers over the net.
* Recovery procedure caused by OSD down triggers complete data 
set decompression and compression when EC pool used. This might 
considerably increase CPU usage utilization for recovery process.


Compression Externally at File System - compression to be performed at 
File Store node by means of underlying file system.

Pros:
* Compression is (mostly) transparent to Ceph
* Clients benefit from cluster CPU resources utilization.
Cons:
* File system “lock-in”. One can use BTRFS file system only for 
now. Its production readiness is questionable.
* Limited flexibility - compression is a partition/mount point 
property. Hard to have better granularity - on per-pool or per-object. 
No way to disable compression.
* From cluster perspective compression is performed either on 
each replicated block or on a block “inflated” by erasure coding. Thus 
total Ceph cluster CPU utilization to perform compression becomes 
considerably higher ( three times increase for replicated pools and ~50% 
one for EC pools).

* No benefit in reduced data transfers over the net.
* Recovery procedure caused by OSD down triggers complete data 
set decompression and compression when EC pool used. This might 
considerably increase CPU usage utilization for recovery process.


Compression Externally at Block Device - compression to be performed at 
File Store node by means of underlying block device that supports inline 
data 

Re: 09/23/2015 Weekly Ceph Performance Meeting IS ON!

2015-09-28 Thread Mark Nelson

Hi Alexandre,

Sorry for the long delay.  I think I got through all of them.  They 
should be public now and I've listed them in the etherpad:


http://pad.ceph.com/p/performance_weekly

Again, sorry for the delay on these.  I can't find any way to make 
bluejeans default to making the meetings public.


Mark

On 09/23/2015 11:44 AM, Alexandre DERUMIER wrote:

Hi Mark,

can you post the video records of previous meetings ?

Thanks

Alexandre


- Mail original -
De: "Mark Nelson" 
À: "ceph-devel" 
Envoyé: Mercredi 23 Septembre 2015 15:51:21
Objet: 09/23/2015 Weekly Ceph Performance Meeting IS ON!

8AM PST as usual! Discussion topics include an update on transparent
huge pages testing and I think Ben would like to talk a bit about CBT
PRs. Please feel free to add your own!

Here's the links:

Etherpad URL:
http://pad.ceph.com/p/performance_weekly

To join the Meeting:
https://bluejeans.com/268261044

To join via Browser:
https://bluejeans.com/268261044/browser

To join with Lync:
https://bluejeans.com/268261044/lync


To join via Room System:
Video Conferencing System: bjn.vc -or- 199.48.152.152
Meeting ID: 268261044

To join via Phone:
1) Dial:
+1 408 740 7256
+1 888 240 2560(US Toll Free)
+1 408 317 9253(Alternate Number)
(see all numbers - http://bluejeans.com/numbers)
2) Enter Conference ID: 268261044

Mark


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs invalidate upcalls

2015-09-28 Thread Matt Benjamin
Hi,

-- 
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-761-4689
fax.  734-769-8938
cel.  734-216-5309

- Original Message -
> From: "John Spray" 
> To: "Matt Benjamin" 
> Cc: "Ceph Development" 
> Sent: Monday, September 28, 2015 9:01:28 AM
> Subject: Re: libcephfs invalidate upcalls
> 
> On Sat, Sep 26, 2015 at 8:03 PM, Matt Benjamin  wrote:
> > Hi John,
> >
> > I prototyped an invalidate upcall for libcephfs and the Gasesha Ceph fsal,
> > building on the Client invalidation callback registrations.
> >
> > As you suggested, NFS (or AFS, or DCE) minimally expect a more generic
> > "cached vnode may have changed" trigger than the current inode and dentry
> > invalidates, so I extended the model slightly to hook cap revocation,
> > feedback appreciated.
> 
> In cap_release, we probably need to be a bit more discriminating about
> when to drop, e.g. if we've only lost our exclusive write caps, the
> rest of our metadata might all still be fine to cache.  Is ganesha in
> general doing any data caching?  I think I had implicitly assumed that
> we were only worrying about metadata here but now I realise I never
> checked that.

Ganesha isn't currently, though it did once, and is likely to again, at some 
point.

The exclusive write cap is in fact something with a direct mapping to NFSv4 
delegations,
so we do want to be able to trigger a recall, in this case.

> 
> The awkward part is Client::trim_caps.  In the Client::trim_caps case,
> the lru_is_expirable part won't be true until something has already
> been invalidated, so there needs to be an explicit hook there --
> rather than invalidating in response to cap release, we need to
> invalidate in order to get ganesha to drop its handle, which will
> render something expirable, and finally when we expire it, the cap
> gets released.

Ok, sure.

> 
> In that case maybe we need a hook in ganesha to say "invalidate
> everything you can" so that we don't have to make a very large number
> of function calls to invalidate things.  In the fuse/kernel case we
> can only sometimes invalidate a piece of metadata (e.g. we can't if
> its flocked or whatever), so we ask it to invalidate everything.  But
> perhaps in the NFS case we can always expect our invalidate calls to
> be respected, so we could just invalidate a smaller number of things
> (the difference between actual cache size and desired)?

As you noted above, what we're invalidating a cache entry.  With Dan's
mdcache work, we might no longer be caching at the Ganesha level, but
I didn't assume that here.

Matt

> 
> John
> 
> >
> > g...@github.com:linuxbox2/ceph.git , branch invalidate
> > g...@github.com:linuxbox2/nfs-ganesha.git , branch ceph-invalidates
> >
> > thanks,
> >
> > Matt
> >
> > --
> > Matt Benjamin
> > Red Hat, Inc.
> > 315 West Huron Street, Suite 140A
> > Ann Arbor, Michigan 48103
> >
> > http://www.redhat.com/en/technologies/storage
> >
> > tel.  734-761-4689
> > fax.  734-769-8938
> > cel.  734-216-5309
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: aarch64 not using crc32?

2015-09-28 Thread Yazen Ghannam
Hi Sage,

HWCAP_CRC32 and others were added to the Kernel with this commit
4bff28ccda2b7a3fbdf8e80aef7a599284681dc6; it looks like this first
landed in v3.14. Are you using the stock Kernel on Trusty (v3.13?)?
Can you update to a later version for gitbuilder? For regular testing
v3.19 (lts-vivid) may be a good choice since this also includes the
arm64 CRC32 Kernel module.

Thanks,
Yazen

On Thu, Sep 24, 2015 at 9:35 AM, Pankaj Garg
 wrote:
> Hi Sage,
> I actually had the same issue just a couple weeks back. The hardware actually 
> has the CRC32 capability and we have tested it. This issue lies the toolchain 
> on the machine. There are several versions of .h files present with different 
> HWCAP #defines. We need to fix it so that we are using the right one. There 
> is a version of that file present which defines this capability, but is not 
> being included. I will look into this issue and let you know. Since my builds 
> were just being tested on ARM, I hardcoded the presence of CRC.
>
> Thanks,
> -Pankaj
>
> On Sep 24, 2015 5:17 AM, Sage Weil  wrote:
>>
>> Hi Pankaj,
>>
>> In order to get the build going on the new trusty gitbuilder I had to make
>> this change:
>>
>> https://github.com/ceph/ceph/commit/3123b2c5d3b72c9d43b10d8f296305d41b68b730
>>
>> It was clearly a bug, but what worries me is that the fact that I hit it
>> means the HWCAP_CRC32 is not present.  Is there a problem with the
>> hardware feature detection or is that feature simply missing from the box
>> we're using?
>>
>> Thanks!
>> sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph branch status

2015-09-28 Thread ceph branch robot
-- All Branches --

Abhishek Varshney 
2015-09-22 15:11:25 +0530   hammer-backports

Adam C. Emerson 
2015-09-14 12:32:18 -0400   wip-cxx11time
2015-09-15 12:09:20 -0400   wip-cxx11concurrency

Adam Crume 
2014-12-01 20:45:58 -0800   wip-doc-rbd-replay

Alfredo Deza 
2015-03-23 16:39:48 -0400   wip-11212

Alfredo Deza 
2014-07-08 13:58:35 -0400   wip-8679
2014-09-04 13:58:14 -0400   wip-8366
2014-10-13 11:10:10 -0400   wip-9730

Ali Maredia 
2015-09-22 15:10:10 -0400   wip-cmake
2015-09-24 12:53:48 -0400   wip-10587-split-servers

Boris Ranto 
2015-09-04 15:19:11 +0200   wip-bash-completion

Dan Mick 
2013-07-16 23:00:06 -0700   wip-5634

Danny Al-Gaaf 
2015-04-23 16:32:00 +0200   wip-da-SCA-20150421
2015-04-23 17:18:57 +0200   wip-nosetests
2015-04-23 18:20:16 +0200   wip-unify-num_objects_degraded
2015-09-24 14:55:35 +0200   wip-da-SCA-20150910

David Zafman 
2014-08-29 10:41:23 -0700   wip-libcommon-rebase
2015-04-24 13:14:23 -0700   wip-cot-giant
2015-08-04 07:39:00 -0700   wip-12577-hammer
2015-09-09 20:52:03 -0700   wip-zafman-testing

Dongmao Zhang 
2014-11-14 19:14:34 +0800   thesues-master

Greg Farnum 
2015-04-29 21:44:11 -0700   wip-init-names
2015-07-16 09:28:24 -0700   hammer-12297
2015-09-22 10:35:08 -0700   greg-fs-testing

Greg Farnum 
2014-10-23 13:33:44 -0700   wip-forward-scrub

Guang G Yang 
2015-06-26 20:31:44 +   wip-ec-readall
2015-07-23 16:13:19 +   wip-12316

Guang Yang 
2014-08-08 10:41:12 +   wip-guangyy-pg-splitting
2014-09-25 00:47:46 +   wip-9008
2014-09-30 10:36:39 +   guangyy-wip-9614

Haomai Wang 
2014-07-27 13:37:49 +0800   wip-flush-set
2015-04-20 00:47:59 +0800   update-organization
2015-07-21 19:33:56 +0800   fio-objectstore
2015-08-26 09:57:27 +0800   wip-recovery-attr

Ilya Dryomov 
2014-09-05 16:15:10 +0400   wip-rbd-notify-errors

Ivo Jimenez 
2015-08-24 23:12:45 -0700   hammer-with-new-workunit-for-wip-12551

Jason Dillaman 
2015-07-31 13:55:23 -0400   wip-12383-next
2015-08-31 23:17:53 -0400   wip-12698
2015-09-01 10:17:02 -0400   wip-11287

Jenkins 
2014-07-29 05:24:39 -0700   wip-nhm-hang
2015-02-02 10:35:28 -0800   wip-sam-v0.92
2015-08-21 12:46:32 -0700   last
2015-09-15 10:23:18 -0700   rhcs-v0.80.8
2015-09-21 16:48:32 -0700   rhcs-v0.94.1-ubuntu

Joao Eduardo Luis 
2014-09-10 09:39:23 +0100   wip-leveldb-get.dumpling

Joao Eduardo Luis 
2014-07-22 15:41:42 +0100   wip-leveldb-misc

Joao Eduardo Luis 
2014-09-02 17:19:52 +0100   wip-leveldb-get
2014-10-17 16:20:11 +0100   wip-paxos-fix
2014-10-21 21:32:46 +0100   wip-9675.dumpling
2015-07-27 21:56:42 +0100   wip-11470.hammer
2015-09-09 15:45:45 +0100   wip-11786.hammer

Joao Eduardo Luis 
2014-11-17 16:43:53 +   wip-mon-osdmap-cleanup
2014-12-15 16:18:56 +   wip-giant-mon-backports
2014-12-17 17:13:57 +   wip-mon-backports.firefly
2014-12-17 23:15:10 +   wip-mon-sync-fix.dumpling
2015-01-07 23:01:00 +   wip-mon-blackhole-mlog-0.87.7
2015-01-10 02:40:42 +   wip-dho-joao
2015-01-10 02:46:31 +   wip-mon-paxos-fix
2015-01-26 13:00:09 +   wip-mon-datahealth-fix
2015-02-04 22:36:14 +   wip-10643
2015-09-09 15:43:51 +0100   wip-11786.firefly

Joao Eduardo Luis 
2015-05-27 23:48:45 +0100   wip-mon-scrub
2015-05-29 12:21:43 +0100   wip-11545
2015-06-05 16:12:57 +0100   wip-10507
2015-06-16 14:34:11 +0100   wip-11470
2015-06-25 00:16:41 +0100   wip-10507-2
2015-07-14 16:52:35 +0100   wip-joao-testing
2015-09-08 09:48:41 +0100   wip-leveldb-hang

John Spray 
2015-08-14 15:50:13 +0100   wip-9663-hashorder
2015-08-25 12:14:40 +0100   wip-scrub-basic
2015-09-01 16:43:40 +0100   wip-12133
2015-09-25 16:12:51 +0100   wip-jcsp-test

John Wilkins 
2013-07-31 18:00:50 -0700   wip-doc-rados-python-api
2014-07-03 07:31:14 -0700   wip-doc-rgw-federated
2014-11-03 14:04:33 -0800   

RE: Very slow recovery/peering with latest master

2015-09-28 Thread Chen, Xiaoxi
FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by 
fdisk) in my environment.

But blkid doesn't show the information of disk in external bay (which is 
connected by a JBOD controller) in my setup.

See below, SDB and SDH are SSDs attached to the front panel but the rest osd 
disks(0-9) are from an external bay.

/dev/sdc   976285652 294887592 681398060  31% 
/var/lib/ceph/mnt/osd-device-0-data
/dev/sdd   976285652 269840116 706445536  28% 
/var/lib/ceph/mnt/osd-device-1-data
/dev/sde   976285652 257610832 718674820  27% 
/var/lib/ceph/mnt/osd-device-2-data
/dev/sdf   976285652 293460620 682825032  31% 
/var/lib/ceph/mnt/osd-device-3-data
/dev/sdg   976285652 29100 681841552  31% 
/var/lib/ceph/mnt/osd-device-4-data
/dev/sdi   976285652 288416840 687868812  30% 
/var/lib/ceph/mnt/osd-device-5-data
/dev/sdj   976285652 273090960 703194692  28% 
/var/lib/ceph/mnt/osd-device-6-data
/dev/sdk   976285652 302720828 673564824  32% 
/var/lib/ceph/mnt/osd-device-7-data
/dev/sdl   976285652 268207968 708077684  28% 
/var/lib/ceph/mnt/osd-device-8-data
/dev/sdm   976285652 293316752 682968900  31% 
/var/lib/ceph/mnt/osd-device-9-data
/dev/sdb1  292824376  10629024 282195352   4% 
/var/lib/ceph/mnt/osd-device-40-data
/dev/sdh1  292824376  11413956 281410420   4% 
/var/lib/ceph/mnt/osd-device-41-data



root@osd1:~# blkid 
/dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs" 
/dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs" 
/dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4" 
/dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4" 
/dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap"

 

> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Friday, September 25, 2015 12:09 AM
> To: Podoski, Igor
> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil;
> Handzik, Joe
> Subject: RE: Very slow recovery/peering with latest master
> 
> Yeah , Igor may be..
> Meanwhile, I am able to get gdb trace of the hang..
> 
> (gdb) bt
> #0  0x7f6f6bf043bd in read () at ../sysdeps/unix/syscall-template.S:81
> #1  0x7f6f6af3b066 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #2  0x7f6f6af43ae2 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #3  0x7f6f6af42788 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #4  0x7f6f6af42a53 in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #5  0x7f6f6af3c17b in blkid_do_safeprobe () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #6  0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #7  0x7f6f6af387fb in blkid_get_dev () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #8  0x7f6f6af38acb in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #9  0x7f6f6af3946d in ?? () from /lib/x86_64-linux-gnu/libblkid.so.1
> #10 0x7f6f6af39892 in blkid_probe_all_new () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #11 0x7f6f6af3dc10 in blkid_find_dev_with_tag () from /lib/x86_64-
> linux-gnu/libblkid.so.1
> #12 0x7f6f6d3bf923 in get_device_by_uuid (dev_uuid=...,
> label=label@entry=0x7f6f6d535fe5 "PARTUUID",
> partition=partition@entry=0x7f6f347eb5a0 "",
> device=device@entry=0x7f6f347ec5a0 "")
> at common/blkdev.cc:193
> #13 0x7f6f6d147de5 in FileStore::collect_metadata (this=0x7f6f68893000,
> pm=0x7f6f21419598) at os/FileStore.cc:660
> #14 0x7f6f6cebfa9a in OSD::_collect_metadata
> (this=this@entry=0x7f6f6894f000, pm=pm@entry=0x7f6f21419598) at
> osd/OSD.cc:4586
> #15 0x7f6f6cec0614 in OSD::_send_boot
> (this=this@entry=0x7f6f6894f000) at osd/OSD.cc:4568
> #16 0x7f6f6cec203a in OSD::_maybe_boot (this=0x7f6f6894f000,
> oldest=1, newest=100) at osd/OSD.cc:4463
> #17 0x7f6f6cefc5e1 in Context::complete (this=0x7f6f3d3864e0,
> r=) at ./include/Context.h:64
> #18 0x7f6f6d2eed08 in Finisher::finisher_thread_entry
> (this=0x7ffee7272d70) at common/Finisher.cc:65
> #19 0x7f6f6befd182 in start_thread (arg=0x7f6f347ee700) at
> pthread_create.c:312
> #20 0x7f6f6a24347d in clone ()
> at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> 
> 
> Strace was not helpful much since other threads are not block and keep
> printing the futex traces..
> 
> Thanks & Regards
> Somnath
> 
> -Original Message-
> From: Podoski, Igor [mailto:igor.podo...@ts.fujitsu.com]
> Sent: Wednesday, September 23, 2015 11:33 PM
> To: Somnath Roy
> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil;
> Handzik, Joe
> Subject: RE: Very slow recovery/peering with latest master
> 
> > -Original Message-
> > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> > ow...@vger.kernel.org] On Behalf Of Sage Weil
> > Sent: Thursday, September 24, 2015 3:32 AM
> > To: Handzik, Joe
> > Cc: Somnath Roy; Samuel Just; Samuel Just 

How to make a osd status become down+peering?

2015-09-28 Thread chen kael
HI,
Now I am running a cluster on giant0.87 ,After a serials of disks
failures, we have to  add new osd and set out bad ones, so the
rebalance process began
unfortunately,one osd was stuck in down+peering ,and cat not recover by itself.
>From log , we found that one pg was stuck,because two files on this pg
was broken, so the peering process abort.
So, I tried ceph osd lost osd.n, but It doesn`t work.At last I deleted
this two bad files,and the osd success recoverd.

Now I am trying fix this bug, maybe some has already fixed it.So Does
anyone know how to solve this problem or encounter it before would be
help!
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very slow recovery/peering with latest master

2015-09-28 Thread Handzik, Joe
That's really good info, thanks for tracking that down. Do you expect this to 
be a common configuration going forward in Ceph deployments? 

Joe

> On Sep 28, 2015, at 3:43 AM, Somnath Roy  wrote:
> 
> Xiaoxi,
> Thanks for giving me some pointers.
> Now, with the help of strace I am able to figure out why it is taking so long 
> in my setup to complete blkid* calls.
> In my case, the partitions are showing properly even if it is connected to 
> JBOD controller.
> 
> root@emsnode10:~/wip-write-path-optimization/src/os# strace -t -o 
> /root/strace_blkid.txt blkid
> /dev/sda1: UUID="d2060642-1af4-424f-9957-6a8dc77ff301" TYPE="ext4"
> /dev/sda5: UUID="2a987cc0-e3cd-43d4-99cd-b8d8e58617e7" TYPE="swap"
> /dev/sdy2: UUID="0ebd1631-52e7-4dc2-8bff-07102b877bfc" TYPE="xfs"
> /dev/sdw2: UUID="29f1203b-6f44-45e3-8f6a-8ad1d392a208" TYPE="xfs"
> /dev/sdt2: UUID="94f6bb55-ac61-499c-8552-600581e13dfa" TYPE="xfs"
> /dev/sdr2: UUID="b629710e-915d-4c56-b6a5-4782e6d6215d" TYPE="xfs"
> /dev/sdv2: UUID="69623b7f-9036-4a35-8298-dc7f5cecdb21" TYPE="xfs"
> /dev/sds2: UUID="75d941c5-a85c-4c37-b409-02de34483314" TYPE="xfs"
> /dev/sdx: UUID="cc84bc66-208b-4387-8470-071ec71532f2" TYPE="xfs"
> /dev/sdu2: UUID="c9817831-8362-48a9-9a6c-920e0f04d029" TYPE="xfs"
> 
> But, it is taking time on the drives those are not reserved for this host. 
> Basically, I am using 2 heads in front of a JBOF and I am using sg_persist to 
> reserve the drives between 2 hosts.
> Here is the strace output of blkid.
> 
> http://pastebin.com/qz2Z7Phj
> 
> You can see lot of input/output errors on accessing the drives which are not 
> reserved for this host.
> 
> This is an inefficiency part of blkid* calls (?) since calls like 
> fdisk/lsscsi are not taking time.
> 
> Regards
> Somnath
> 
> 
> -Original Message-
> From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com]
> Sent: Monday, September 28, 2015 1:02 AM
> To: Somnath Roy; Podoski, Igor
> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; 
> Handzik, Joe
> Subject: RE: Very slow recovery/peering with latest master
> 
> FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by 
> fdisk) in my environment.
> 
> But blkid doesn't show the information of disk in external bay (which is 
> connected by a JBOD controller) in my setup.
> 
> See below, SDB and SDH are SSDs attached to the front panel but the rest osd 
> disks(0-9) are from an external bay.
> 
> /dev/sdc   976285652 294887592 681398060  31% 
> /var/lib/ceph/mnt/osd-device-0-data
> /dev/sdd   976285652 269840116 706445536  28% 
> /var/lib/ceph/mnt/osd-device-1-data
> /dev/sde   976285652 257610832 718674820  27% 
> /var/lib/ceph/mnt/osd-device-2-data
> /dev/sdf   976285652 293460620 682825032  31% 
> /var/lib/ceph/mnt/osd-device-3-data
> /dev/sdg   976285652 29100 681841552  31% 
> /var/lib/ceph/mnt/osd-device-4-data
> /dev/sdi   976285652 288416840 687868812  30% 
> /var/lib/ceph/mnt/osd-device-5-data
> /dev/sdj   976285652 273090960 703194692  28% 
> /var/lib/ceph/mnt/osd-device-6-data
> /dev/sdk   976285652 302720828 673564824  32% 
> /var/lib/ceph/mnt/osd-device-7-data
> /dev/sdl   976285652 268207968 708077684  28% 
> /var/lib/ceph/mnt/osd-device-8-data
> /dev/sdm   976285652 293316752 682968900  31% 
> /var/lib/ceph/mnt/osd-device-9-data
> /dev/sdb1  292824376  10629024 282195352   4% 
> /var/lib/ceph/mnt/osd-device-40-data
> /dev/sdh1  292824376  11413956 281410420   4% 
> /var/lib/ceph/mnt/osd-device-41-data
> 
> 
> 
> root@osd1:~# blkid
> /dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs"
> /dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs"
> /dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4"
> /dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4"
> /dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap"
> 
> 
> 
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
>> ow...@vger.kernel.org] On Behalf Of Somnath Roy
>> Sent: Friday, September 25, 2015 12:09 AM
>> To: Podoski, Igor
>> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage
>> Weil; Handzik, Joe
>> Subject: RE: Very slow recovery/peering with latest master
>> 
>> Yeah , Igor may be..
>> Meanwhile, I am able to get gdb trace of the hang..
>> 
>> (gdb) bt
>> #0  0x7f6f6bf043bd in read () at
>> ../sysdeps/unix/syscall-template.S:81
>> #1  0x7f6f6af3b066 in ?? () from
>> /lib/x86_64-linux-gnu/libblkid.so.1
>> #2  0x7f6f6af43ae2 in ?? () from
>> /lib/x86_64-linux-gnu/libblkid.so.1
>> #3  0x7f6f6af42788 in ?? () from
>> /lib/x86_64-linux-gnu/libblkid.so.1
>> #4  0x7f6f6af42a53 in ?? () from
>> /lib/x86_64-linux-gnu/libblkid.so.1
>> #5  0x7f6f6af3c17b in blkid_do_safeprobe () from
>> /lib/x86_64-linux-
>> gnu/libblkid.so.1
>> #6  0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux-
>> gnu/libblkid.so.1
>> #7 

[Hammer Backports] Should rest-bench be removed on hammer ?

2015-09-28 Thread Abhishek Varshney
Hi,

The rest-bench tool has been removed in master through PR #5428
(https://github.com/ceph/ceph/pull/5428). The backport PR #5812
(https://github.com/ceph/ceph/pull/5812) is currently causing failures
on the hammer-backports integration branch. These failures can be
resolved by either backporting PR #5428 or by adding a hammer-specific
commit to PR #5812.

How should we proceed here?

Thanks
Abhishek
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Hammer Backports] Should rest-bench be removed on hammer ?

2015-09-28 Thread Loic Dachary
Hi,

On 28/09/2015 12:19, Abhishek Varshney wrote:
> Hi,
> 
> The rest-bench tool has been removed in master through PR #5428
> (https://github.com/ceph/ceph/pull/5428). The backport PR #5812
> (https://github.com/ceph/ceph/pull/5812) is currently causing failures
> on the hammer-backports integration branch. These failures can be
> resolved by either backporting PR #5428 or by adding a hammer-specific
> commit to PR #5812.
> 
> How should we proceed here?

It looks like rest-bench support was removed because cosbench can replace it. 
The string cosbench or rest.bench does not show in ceph-qa-suite / ceph master 
or hammer, which probably means tests using rest-bench are outside of the scope 
of the ceph project. Deprecating rest-bench from hammer by backporting 
https://github.com/ceph/ceph/pull/5428 seems sensible.

Cheers

> 
> Thanks
> Abhishek
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


RE: Very slow recovery/peering with latest master

2015-09-28 Thread Somnath Roy
Xiaoxi,
Thanks for giving me some pointers.
Now, with the help of strace I am able to figure out why it is taking so long 
in my setup to complete blkid* calls.
In my case, the partitions are showing properly even if it is connected to JBOD 
controller.

root@emsnode10:~/wip-write-path-optimization/src/os# strace -t -o 
/root/strace_blkid.txt blkid
/dev/sda1: UUID="d2060642-1af4-424f-9957-6a8dc77ff301" TYPE="ext4"
/dev/sda5: UUID="2a987cc0-e3cd-43d4-99cd-b8d8e58617e7" TYPE="swap"
/dev/sdy2: UUID="0ebd1631-52e7-4dc2-8bff-07102b877bfc" TYPE="xfs"
/dev/sdw2: UUID="29f1203b-6f44-45e3-8f6a-8ad1d392a208" TYPE="xfs"
/dev/sdt2: UUID="94f6bb55-ac61-499c-8552-600581e13dfa" TYPE="xfs"
/dev/sdr2: UUID="b629710e-915d-4c56-b6a5-4782e6d6215d" TYPE="xfs"
/dev/sdv2: UUID="69623b7f-9036-4a35-8298-dc7f5cecdb21" TYPE="xfs"
/dev/sds2: UUID="75d941c5-a85c-4c37-b409-02de34483314" TYPE="xfs"
/dev/sdx: UUID="cc84bc66-208b-4387-8470-071ec71532f2" TYPE="xfs"
/dev/sdu2: UUID="c9817831-8362-48a9-9a6c-920e0f04d029" TYPE="xfs"

But, it is taking time on the drives those are not reserved for this host. 
Basically, I am using 2 heads in front of a JBOF and I am using sg_persist to 
reserve the drives between 2 hosts.
Here is the strace output of blkid.

http://pastebin.com/qz2Z7Phj

You can see lot of input/output errors on accessing the drives which are not 
reserved for this host.

This is an inefficiency part of blkid* calls (?) since calls like fdisk/lsscsi 
are not taking time.

Regards
Somnath


-Original Message-
From: Chen, Xiaoxi [mailto:xiaoxi.c...@intel.com]
Sent: Monday, September 28, 2015 1:02 AM
To: Somnath Roy; Podoski, Igor
Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage Weil; 
Handzik, Joe
Subject: RE: Very slow recovery/peering with latest master

FWIW, blkid works well in both GPT(created by parted) and MSDOS(created by 
fdisk) in my environment.

But blkid doesn't show the information of disk in external bay (which is 
connected by a JBOD controller) in my setup.

See below, SDB and SDH are SSDs attached to the front panel but the rest osd 
disks(0-9) are from an external bay.

/dev/sdc   976285652 294887592 681398060  31% 
/var/lib/ceph/mnt/osd-device-0-data
/dev/sdd   976285652 269840116 706445536  28% 
/var/lib/ceph/mnt/osd-device-1-data
/dev/sde   976285652 257610832 718674820  27% 
/var/lib/ceph/mnt/osd-device-2-data
/dev/sdf   976285652 293460620 682825032  31% 
/var/lib/ceph/mnt/osd-device-3-data
/dev/sdg   976285652 29100 681841552  31% 
/var/lib/ceph/mnt/osd-device-4-data
/dev/sdi   976285652 288416840 687868812  30% 
/var/lib/ceph/mnt/osd-device-5-data
/dev/sdj   976285652 273090960 703194692  28% 
/var/lib/ceph/mnt/osd-device-6-data
/dev/sdk   976285652 302720828 673564824  32% 
/var/lib/ceph/mnt/osd-device-7-data
/dev/sdl   976285652 268207968 708077684  28% 
/var/lib/ceph/mnt/osd-device-8-data
/dev/sdm   976285652 293316752 682968900  31% 
/var/lib/ceph/mnt/osd-device-9-data
/dev/sdb1  292824376  10629024 282195352   4% 
/var/lib/ceph/mnt/osd-device-40-data
/dev/sdh1  292824376  11413956 281410420   4% 
/var/lib/ceph/mnt/osd-device-41-data



root@osd1:~# blkid
/dev/sdb1: UUID="907806fe-1d29-4ef7-ad11-5a933a11601e" TYPE="xfs"
/dev/sdh1: UUID="9dfe68ac-f297-4a02-8d21-50c194af4ff2" TYPE="xfs"
/dev/sda1: UUID="cdf945ce-a345-4766-b89e-cecc33689016" TYPE="ext4"
/dev/sda2: UUID="7a565029-deb9-4e68-835c-f097c2b1514e" TYPE="ext4"
/dev/sda5: UUID="e61bfc35-932d-442f-a5ca-795897f62744" TYPE="swap"



> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Somnath Roy
> Sent: Friday, September 25, 2015 12:09 AM
> To: Podoski, Igor
> Cc: Samuel Just; Samuel Just (sam.j...@inktank.com); ceph-devel; Sage
> Weil; Handzik, Joe
> Subject: RE: Very slow recovery/peering with latest master
>
> Yeah , Igor may be..
> Meanwhile, I am able to get gdb trace of the hang..
>
> (gdb) bt
> #0  0x7f6f6bf043bd in read () at
> ../sysdeps/unix/syscall-template.S:81
> #1  0x7f6f6af3b066 in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #2  0x7f6f6af43ae2 in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #3  0x7f6f6af42788 in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #4  0x7f6f6af42a53 in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #5  0x7f6f6af3c17b in blkid_do_safeprobe () from
> /lib/x86_64-linux-
> gnu/libblkid.so.1
> #6  0x7f6f6af3e0c4 in blkid_verify () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #7  0x7f6f6af387fb in blkid_get_dev () from /lib/x86_64-linux-
> gnu/libblkid.so.1
> #8  0x7f6f6af38acb in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #9  0x7f6f6af3946d in ?? () from
> /lib/x86_64-linux-gnu/libblkid.so.1
> #10 0x7f6f6af39892 in blkid_probe_all_new () from
> /lib/x86_64-linux-
> gnu/libblkid.so.1
> #11 0x7f6f6af3dc10 in blkid_find_dev_with_tag () from /lib/x86_64-
> linux-gnu/libblkid.so.1
> 

Re: Teuthology Integration to native openstack

2015-09-28 Thread Loic Dachary
Hi,

On 28/09/2015 07:24, Bharath Krishna wrote:
> Hi Dachary,
> 
> Thanks for the reply. I am following your blog http://dachary.org/?p=3767
> And the README in 
> https://github.com/dachary/teuthology/tree/wip-6502-openstack-v2/#openstack
> -backend

The up to date instructions are at 
https://github.com/dachary/teuthology/tree/openstack/#openstack-backend (the 
link you used comes from http://dachary.org/?p=3828 and I just updated it so 
noone else will be confused).
> 
> I have sourced the openrc file of my Openstack deployment and verified
> that clients are working fine. My Openstack deployment has Cinder
> integrated with CEPH backend.
> 
> I have cloned and installed teuthology using the below steps:
> 
> $ git clone -b wip-6502-openstack-v2 http://github.com/dachary/teuthology
> $ cd teuthology ; ./bootstrap install
> $ source virtualenv/bin/activate
> 
> 
> Then I tried to run a dummy suite as test and I ran into following error:
> 
> Traceback (most recent call last):
>   File "/root/teuthology/virtualenv/bin/teuthology-openstack", line 9, in
> 
> load_entry_point('teuthology==0.1.0', 'console_scripts',
> 'teuthology-openstack')()
>   File "/root/teuthology/scripts/openstack.py", line 8, in main
> teuthology.openstack.main(parse_args(argv), argv)
>   File "/root/teuthology/teuthology/openstack.py", line 375, in main
> return TeuthologyOpenStack(ctx, teuth_config, argv).main()
>   File "/root/teuthology/teuthology/openstack.py", line 181, in main
> self.verify_openstack()
>   File "/root/teuthology/teuthology/openstack.py", line 270, in
> verify_openstack
> str(providers))
> Exception: ('OS_AUTH_URL=http://:5000/v2.0', " does is not a
> known OpenStack provider (('cloud.ovh.net', 'ovh'), ('control.os1.phx2',
> 'redhat'), ('entercloudsuite.com', 'entercloudsuite'))")

This limitation was in an earlier implementations and should not be a problem 
now.

Cheers

> 
> 
> Thank you.
> 
> Regards,
> M Bharath Krishna
> 
> On 9/28/15, 1:47 AM, "Loic Dachary"  wrote:
> 
>> [moving to ceph-devel]
>>
>> Hi,
>>
>> On 27/09/2015 21:20, Bharath Krishna wrote:
>>> Hi,
>>>
>>> We have an openstack deployment in place with CEPH as CINDER backend.
>>>
>>> We would like to perform functional testing for CEPH and found
>>> teuthology as recommended option.
>>>
>>> Have successfully installed teuthology. Now to integrate it with
>>> Openstack, I could see that the possible providers could be either OVH,
>>> REDHAT or ENTERCLOUDSITE.
>>>
>>> Is there any option where in we can source openstack deployment of our
>>> own and test CEPH using teuthology?
>>
>> The documentation mentions these providers because they have been tested.
>> But there should be no blocker to run teuthology against a regular
>> OpenStack provider. Should you run into troubles, please let me know and
>> I'll help.
>>
>> Cheers
>>
>>>
>>> If NO, please suggest on how to test CEPH in such scenarios?
>>>
>>> Please help.
>>>
>>> Thank you.
>>> Bharath Krishna
>>> ___
>>> ceph-users mailing list
>>> ceph-us...@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


[CEPH-DEVEL] [Workaround] Keystone VPI v3

2015-09-28 Thread Shinobu Kinjo
Since the OpenStack Keystone team will move to use v3 API and try to 
decommission v2 completely, probably we need to modify codes in /src/rgw/.

./src/common/config_opts.h
./src/rgw/rgw_json_enc.cc
./src/rgw/rgw_swift.cc
./src/rgw/rgw_swift_auth.cc
./src/rgw/rgw_rest_swift.cc
./src/rgw/rgw_keystone.h

I think that there is no backward compatibility for v2 anymore because of 
security reason.
What do you think?

I'm pretty sure I've missed something anyhow -;

Shinobu
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Follow-Up on Alexandre's Transparent Huge Pages Testing

2015-09-28 Thread Mark Nelson

Hi Everyone,

A while back Alexandre Derumier posted some test results looking at how 
transparent huge pages can reduce memory usage with jemalloc.  I went 
back and ran a number of new tests on the community performance cluster 
to verify his findings and also look at how performance and cpu usage 
were affected, both during various fio benchmark tests and also during a 
4k random write recovery scenario.  I tested tcmalloc 2.4 with 32MB 
thread cache, 128MB thread cache, and jemalloc 4.0.


The gist of it is that I also see a reduction in memory usage, most 
pronounced with jemalloc.  Unfortuantely the best reduction in memory 
usage is when memory usage is already fairly low.  The most important 
case is the memory spike when OSDs are marked back up/in during a 
recovery test.  In this case there is still a benefit, though memory 
usage is still a little higher than TCMalloc with 128MB thread cache. 
There's a little bit of a concerning trend where memory usage appears to 
increase fairly quickly after the recovery test is complete and the 
post-recovery phase of the benchmark is running.  That will likely need 
to be investigate in more depth.


I have been doing some other tests with the async messenger and 
newstore, but those will have to wait for another paper.


Here's are the results:

https://drive.google.com/file/d/0B2gTBZrkrnpZY3U3TUU3RkJVeVk/view

Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: loadable objectstore

2015-09-28 Thread Varada Kari
No James, Facing library issues with libnss3 and libcommon(ceph). Will resolve 
them, generate a new pull request soon on master.

Thanks,
Varada

> -Original Message-
> From: James (Fei) Liu-SSI [mailto:james@ssi.samsung.com]
> Sent: Monday, September 28, 2015 11:51 PM
> To: Varada Kari ; Sage Weil
> ; Matt W. Benjamin ; Loic
> Dachary 
> Cc: ceph-devel 
> Subject: RE: loadable objectstore
>
> Hi Varada,
>   Have you rebased the pull request to master already?
>
>   Thanks,
>   James
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, September 11, 2015 3:28 AM
> To: Sage Weil; Matt W. Benjamin; Loic Dachary
> Cc: ceph-devel
> Subject: RE: loadable objectstore
>
> Hi Sage/ Matt,
>
> I have submitted the pull request based on wip-plugin branch for the object
> store factory implementation at https://github.com/ceph/ceph/pull/5884 .
> Haven't rebased to the master yet. Working on rebase and including new
> store in the factory implementation.  Please have a look and let me know
> your comments. Will submit a rebased PR soon with new store integration.
>
> Thanks,
> Varada
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, July 03, 2015 7:31 PM
> To: Sage Weil ; Adam Crume
> 
> Cc: Loic Dachary ; ceph-devel  de...@vger.kernel.org>; Matt W. Benjamin 
> Subject: RE: loadable objectstore
>
> Hi All,
>
> Not able to make much progress after making common as a shared object
> along with object store.
> Compilation of the test binaries are failing with 
> "./.libs/libceph_filestore.so:
> undefined reference to `tracepoint_dlopen'".
>
>   CXXLDceph_streamtest
> ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'
> collect2: error: ld returned 1 exit status
> make[3]: *** [ceph_streamtest] Error 1
>
> But libfilestore.so is linked with lttng-ust.
>
> src/.libs$ ldd libceph_filestore.so
> libceph_keyvaluestore.so.1 => /home/varada/obs-factory/plugin-
> work/src/.libs/libceph_keyvaluestore.so.1 (0x7f5e50f5)
> libceph_os.so.1 => /home/varada/obs-factory/plugin-
> work/src/.libs/libceph_os.so.1 (0x7f5e4f93a000)
> libcommon.so.1 => /home/varada/ obs-factory/plugin-
> work/src/.libs/libcommon.so.1 (0x7f5e4b5df000)
> liblttng-ust.so.0 => /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
> (0x7f5e4b179000)
> liblttng-ust-tracepoint.so.0 => 
> /usr/lib/x86_64-linux-gnu/liblttng-ust-
> tracepoint.so.0 (0x7f5e4a021000)
> liburcu-bp.so.1 => /usr/lib/liburcu-bp.so.1 (0x7f5e49e1a000)
> liburcu-cds.so.1 => /usr/lib/liburcu-cds.so.1 (0x7f5e49c12000)
>
> Edited the above output just show the dependencies.
> Did anyone face this issue before?
> Any help would be much appreciated.
>
> Thanks,
> Varada
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Friday, June 26, 2015 3:34 PM
> To: Sage Weil
> Cc: Loic Dachary; ceph-devel; Matt W. Benjamin
> Subject: RE: loadable objectstore
>
> Hi,
>
> Made some more changes to resolve lttng problems at
> https://github.com/varadakari/ceph/commits/wip-plugin.
> But couldn’t by pass the issues. Facing some issues like mentioned below.
>
> ./.libs/libceph_filestore.so: undefined reference to `tracepoint_dlopen'
>
> Compiling with -llttng-ust is not resolving the problem. Seen some threads in
> devel list before, mentioning this problem.
> Can anyone take a look and guide me to fix this problem?
>
> Haven't made the changes to change the plugin name etc... will be making
> them as part of cleanup.
>
> Thanks,
> Varada
>
> -Original Message-
> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
> ow...@vger.kernel.org] On Behalf Of Varada Kari
> Sent: Monday, June 22, 2015 8:57 PM
> To: Matt W. Benjamin
> Cc: Loic Dachary; ceph-devel; Sage Weil
> Subject: RE: loadable objectstore
>
> Hi Matt,
>
> Majority of the changes are segregating the files to corresponding shared
> object and creating a factory object. And the naming is mostly taken from
> Erasure-coding plugins. Want a good naming convention :-), hence a
> preliminary review. Do agree, we have lot of loadable interfaces, and I think
> we are in the way of making them on-demand (if possible) loadable
> modules.
>
> Varada
>
> -Original Message-
> From: Matt W. Benjamin [mailto:m...@cohortfs.com]
> Sent: Monday, June 22, 2015 8:37 PM
> To: Varada Kari
> Cc: Loic Dachary; ceph-devel; Sage Weil
> Subject: Re: loadable objectstore
>
> Hi,
>
> It's just aesthetic, but it feels clunky to change the names of well known
> 

Re: [Hammer Backports] Should rest-bench be removed on hammer ?

2015-09-28 Thread Sage Weil
On Mon, 28 Sep 2015, Loic Dachary wrote:
> Hi,
> 
> On 28/09/2015 12:19, Abhishek Varshney wrote:
> > Hi,
> > 
> > The rest-bench tool has been removed in master through PR #5428
> > (https://github.com/ceph/ceph/pull/5428). The backport PR #5812
> > (https://github.com/ceph/ceph/pull/5812) is currently causing failures
> > on the hammer-backports integration branch. These failures can be
> > resolved by either backporting PR #5428 or by adding a hammer-specific
> > commit to PR #5812.
> > 
> > How should we proceed here?
> 
> It looks like rest-bench support was removed because cosbench can replace it. 
> The string cosbench or rest.bench does not show in ceph-qa-suite / ceph 
> master or hammer, which probably means tests using rest-bench are outside of 
> the scope of the ceph project. Deprecating rest-bench from hammer by 
> backporting https://github.com/ceph/ceph/pull/5428 seems sensible.

I don't think we should be removing tools in a stable series unless we 
have a really good reason to do so.  In this case we're dropping 
rest-bench because we don't want to maintain it, not because it is fatally 
broken.  Hammer users who are using shouldn't find that is is removed in 
a later point release.

s

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: libcephfs invalidate upcalls

2015-09-28 Thread John Spray
On Sat, Sep 26, 2015 at 8:03 PM, Matt Benjamin  wrote:
> Hi John,
>
> I prototyped an invalidate upcall for libcephfs and the Gasesha Ceph fsal, 
> building on the Client invalidation callback registrations.
>
> As you suggested, NFS (or AFS, or DCE) minimally expect a more generic 
> "cached vnode may have changed" trigger than the current inode and dentry 
> invalidates, so I extended the model slightly to hook cap revocation, 
> feedback appreciated.

In cap_release, we probably need to be a bit more discriminating about
when to drop, e.g. if we've only lost our exclusive write caps, the
rest of our metadata might all still be fine to cache.  Is ganesha in
general doing any data caching?  I think I had implicitly assumed that
we were only worrying about metadata here but now I realise I never
checked that.

The awkward part is Client::trim_caps.  In the Client::trim_caps case,
the lru_is_expirable part won't be true until something has already
been invalidated, so there needs to be an explicit hook there --
rather than invalidating in response to cap release, we need to
invalidate in order to get ganesha to drop its handle, which will
render something expirable, and finally when we expire it, the cap
gets released.

In that case maybe we need a hook in ganesha to say "invalidate
everything you can" so that we don't have to make a very large number
of function calls to invalidate things.  In the fuse/kernel case we
can only sometimes invalidate a piece of metadata (e.g. we can't if
its flocked or whatever), so we ask it to invalidate everything.  But
perhaps in the NFS case we can always expect our invalidate calls to
be respected, so we could just invalidate a smaller number of things
(the difference between actual cache size and desired)?

John

>
> g...@github.com:linuxbox2/ceph.git , branch invalidate
> g...@github.com:linuxbox2/nfs-ganesha.git , branch ceph-invalidates
>
> thanks,
>
> Matt
>
> --
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-761-4689
> fax.  734-769-8938
> cel.  734-216-5309
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html