Re: [ceph-users] Help needed porting Ceph to RSockets

2014-02-05 Thread Gandalf Corvotempesta
2013-10-31 Hefty, Sean sean.he...@intel.com: Can you please try the attached patch in place of all previous patches? Any updates on ceph with rsockets? -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-10-30 Thread Hefty, Sean
Sorry it took so long to get to this. after some more analysis and debugging, I found workarounds for my problems; I have added these workarounds to the last version of the patch for the poll problem by Sean; see the attachment to this posting. The shutdown() operations below are all

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-16 Thread Gandalf Corvotempesta
2013/9/12 Andreas Bluemle andreas.blue...@itxperts.de: I have not yet done any performance testing. The next step I have to take is more related to setting up a larger cluster with sth. like 150 osd's without hitting any resource limitations. How do you manage failover ? Will you use

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-12 Thread Gandalf Corvotempesta
2013/9/10 Andreas Bluemle andreas.blue...@itxperts.de: Since I have added these workarounds to my version of the librdmacm library, I can at least start up ceph using LD_PRELOAD and end up in a healthy ceph cluster state. Have you seen any performance improvement by using LD_PRELOAD with ceph?

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-12 Thread Andreas Bluemle
On Thu, 12 Sep 2013 12:20:03 +0200 Gandalf Corvotempesta gandalf.corvotempe...@gmail.com wrote: 2013/9/10 Andreas Bluemle andreas.blue...@itxperts.de: Since I have added these workarounds to my version of the librdmacm library, I can at least start up ceph using LD_PRELOAD and end up in a

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-09-10 Thread Andreas Bluemle
Hi, after some more analysis and debugging, I found workarounds for my problems; I have added these workarounds to the last version of the patch for the poll problem by Sean; see the attachment to this posting. The shutdown() operations below are all SHUT_RDWR. 1. shutdown() on side A of a

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-22 Thread Hefty, Sean
I tested out the patch and unfortunately had the same results as Andreas. About 50% of the time the rpoll() thread in Ceph still hangs when rshutdown() is called. I saw a similar behaviour when increasing the poll time on the pre-patched version if that's of any relevance. I'm not optimistic,

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-21 Thread Matthew Anderson
Hi Sean, I tested out the patch and unfortunately had the same results as Andreas. About 50% of the time the rpoll() thread in Ceph still hangs when rshutdown() is called. I saw a similar behaviour when increasing the poll time on the pre-patched version if that's of any relevance. Thanks On

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi Sean, I will re-check until the end of the week; there is some test scheduling issue with our test system, which affects my access times. Thanks Andreas On Mon, 19 Aug 2013 17:10:11 + Hefty, Sean sean.he...@intel.com wrote: Can you see if the patch below fixes the hang?

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Andreas Bluemle
Hi, I have added the patch and re-tested: I still encounter hangs of my application. I am not quite sure whether the I hit the same error on the shutdown because now I don't hit the error always, but only every now and then. WHen adding the patch to my code base (git tag v1.0.17) I notice an

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-20 Thread Hefty, Sean
I have added the patch and re-tested: I still encounter hangs of my application. I am not quite sure whether the I hit the same error on the shutdown because now I don't hit the error always, but only every now and then. I guess this is at least some progress... :/ WHen adding the patch to

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-19 Thread Hefty, Sean
Can you see if the patch below fixes the hang? Signed-off-by: Sean Hefty sean.he...@intel.com --- src/rsocket.c | 11 ++- 1 files changed, 10 insertions(+), 1 deletions(-) diff --git a/src/rsocket.c b/src/rsocket.c index d544dd0..e45b26d 100644 --- a/src/rsocket.c +++ b/src/rsocket.c

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-16 Thread Hefty, Sean
I am looking at a multithreaded application here, and I believe that the race is between thread A calling the rpoll() for POLLIN event and thread B calling the shutdown(SHUT_RDWR) for reading and writing of the (r)socket almost immediately afterwards. I modified a test program, and I can

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-14 Thread Andreas Bluemle
Hi, maybe some information about the environment I am working in: - CentOS 6.4 with custom kernel 3.8.13 - librdmacm / librspreload from git, tag 1.0.17 - application started with librspreload in LD_PRELOAD environment Currently, I have increased the value of the spin time by setting the

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-14 Thread Atchley, Scott
On Aug 14, 2013, at 3:21 AM, Andreas Bluemle andreas.blue...@itxperts.de wrote: Hi, maybe some information about the environment I am working in: - CentOS 6.4 with custom kernel 3.8.13 - librdmacm / librspreload from git, tag 1.0.17 - application started with librspreload in LD_PRELOAD

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-14 Thread Hefty, Sean
The first question I would have is: why is the rpoll() split into these two pieces? There must have been some reason to do a busy loop on some local state information rather than just call the real poll() directly. As Scott mentioned in his email, this is done for performance reasons. The

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-13 Thread Andreas Bluemle
Hi Matthew, I found a workaround for my (our) problem: in the librdmacm code, rsocket.c, there is a global constant polling_time, which is set to 10 microseconds at the moment. I raise this to 1 - and all of a sudden things work nicely. I think we are looking at two issues here: 1. the

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-13 Thread Atchley, Scott
On Aug 13, 2013, at 10:06 AM, Andreas Bluemle andreas.blue...@itxperts.de wrote: Hi Matthew, I found a workaround for my (our) problem: in the librdmacm code, rsocket.c, there is a global constant polling_time, which is set to 10 microseconds at the moment. I raise this to 1 - and

RE: [ceph-users] Help needed porting Ceph to RSockets

2013-08-13 Thread Hefty, Sean
I found a workaround for my (our) problem: in the librdmacm code, rsocket.c, there is a global constant polling_time, which is set to 10 microseconds at the moment. I raise this to 1 - and all of a sudden things work nicely. I am adding the linux-rdma list to CC so Sean might see

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-12 Thread Matthew Anderson
Moving this conversation to ceph-devel where the dev's might be able to shed some light on this. I've added some additional debug to my code to narrow the issue down a bit and the reader thread appears to be getting locked by tcp_read_wait() because rpoll never returns an event when the socket is

Re: [ceph-users] Help needed porting Ceph to RSockets

2013-08-12 Thread Andreas Bluemle
Hi Matthew, I can confirm the beahviour whichi you describe. I too believe that the problem is on the client side (ceph command). My log files show the very same symptom, i.e. the client side not being able to shutdown the pipes properly. (Q: I had problems yesterday to send a mail to ceph-users