Add an atomic variable 'teardowning' as flag in struct ceph_messenger,
set this flag to 1 in function ceph_destroy_client(), and add the condition code
in function ceph_data_ready() to test the flag value, if true(1), just return.
Signed-off-by: Guanjun He g...@suse.com
---
On 6/21/2012 at 12:04 AM, in message
pine.lnx.4.64.1206200902110.14...@cobra.newdream.net, Sage Weil
s...@inktank.com wrote:
On Wed, 20 Jun 2012, Guan Jun He wrote:
On 6/19/2012 at 11:33 PM, in message
pine.lnx.4.64.1206190828510.21...@cobra.newdream.net, Sage Weil
s...@inktank.com
Hi,
The sudo /etc/init.d/ceph -a stop script command was didn't stop the ceph.
I tried with the version ceph-0.47.3 also ,but it was not stopping.
The sudo /etc/init.d/ceph -a start is working fine.
And i tried with the sudo /etc/init.d/ceph -a killall it's working fine but
when
iam stop the
Hello list,
i'm able to reproducably crash osd daemons.
How i can reproduce:
Kernel: 3.5.0-rc3
Ceph: 0.47.3
FS: btrfs
Journal: 2GB tmpfs per OSD
OSD: 3x servers with 4x Intel SSD OSDs each
10GBE Network
rbd_cache_max_age: 2.0
rbd_cache_size: 33554432
Disk is set to writeback.
Start a KVM VM
When i start now the OSD again it seems to hang for forever. Load goes
up to 200 and I/O Waits rise vom 0% to 20%.
Am 21.06.2012 14:55, schrieb Stefan Priebe - Profihost AG:
Hello list,
i'm able to reproducably crash osd daemons.
How i can reproduce:
Kernel: 3.5.0-rc3
Ceph: 0.47.3
FS:
Another strange thing. Why does THIS OSD have 24GB and the others just
650MB?
/dev/sdb1 224G 654M 214G 1% /srv/osd.20
/dev/sdc1 224G 638M 214G 1% /srv/osd.21
/dev/sdd1 224G 24G 190G 12% /srv/osd.22
/dev/sde1 224G 607M 214G 1%
Mhm is this normal (ceph health is NOW OK again)
/dev/sdb1 224G 655M 214G 1% /srv/osd.20
/dev/sdc1 224G 640M 214G 1% /srv/osd.21
/dev/sdd1 224G 34G 181G 16% /srv/osd.22
/dev/sde1 224G 608M 214G 1% /srv/osd.23
Why does one OSD has
Currently the socket state change event handler records an error
message on a connection to distinguish a close while connecting from
a close while a connection was already established.
Changing connection information during handling of a socket event is
not very clean, so instead move this
A connection state's NEGOTIATING bit gets set while in CONNECTING
state after we have successfully exchanged a ceph banner and IP
addresses with the connection's peer (the server). But that bit
is not cleared again--at least not until another connection attempt
is initiated.
Instead, clear it as
There is no state explicitly defined when a ceph connection is fully
operational. So define one.
It's set when the connection sequence completes successfully, and is
cleared when the connection gets closed.
Be a little more careful when examining the old state when a socket
disconnect event is
Currently a ceph connection enters a CONNECTING state when it
begins the process of (re-)connecting with its peer. Once the two
ends have successfully exchanged their banner and addresses, an
additional NEGOTIATING bit is set in the ceph connection's state to
indicate the connection information
This patch gathers a few small changes in net/ceph/messenger.c:
out_msg_pos_next()
- small logic change that mostly affects indentation
write_partial_msg_pages().
- use a local variable trail_off to represent the offset into
a message of the trail portion of the data (if present)
The functions ceph_con_get() and ceph_con_put() are both only ever
used in net/ceph/messenger.c, so change them to have static scope.
Move their definition up in the source file so they're both defined
before their first use.
Signed-off-by: Alex Elder el...@inktank.com
---
The following commit changed it so SOCK_CLOSED bit was stored in
a connection's new flags field rather than its state field.
libceph: start separating connection flags from state
commit 928443cd
That bit is used in con_close_socket() to protect against setting an
error message more than
In con_close_socket(), a connection's SOCK_CLOSED flag gets set and
then cleared while its shutdown method is called and its reference
gets dropped.
Previously, that flag got set only if it had not already been set,
so setting it in con_close_socket() might have prevented additional
processing
Sage liked the state diagram I put in my commit description so
I'm putting it in with the code.
Signed-off-by: Alex Elder el...@inktank.com
---
net/ceph/messenger.c | 42 +-
1 file changed, 41 insertions(+), 1 deletion(-)
Index: b/net/ceph/messenger.c
There are two phases in the process of linking together the two ends
of a ceph connection. The first involves exchanging a banner and
IP addresses, and if that is successful a second phase exchanges
some detail about each side's connection capabilities.
When initiating a connection, the client
Hi Alexandre,
[Sorry I didn't follow up earlier; I didn't understand your question.]
If you turn off the journal compeletely, you will see bursty write commits
from the perspective of the client, because the OSD is periodically doing
a sync or snapshot and only acking the writes then.
If you
Do you see any error messages? Are daemons still running on all machines, or
just remote ones?
On Jun 21, 2012, at 4:35 AM, ramu ramu.freesyst...@gmail.com wrote:
Hi,
The sudo /etc/init.d/ceph -a stop script command was didn't stop the ceph.
I tried with the version ceph-0.47.3 also ,but
You can also try -v to get more output.
On 06/21/2012 09:59 AM, Dan Mick wrote:
Do you see any error messages? Are daemons still running on all machines, or
just remote ones?
On Jun 21, 2012, at 4:35 AM, ramuramu.freesyst...@gmail.com wrote:
Hi,
The sudo /etc/init.d/ceph -a stop script
I get the feeling that all of these lists of clear/set_bit calls would go
away if we
- verify we are under the mutex at each of these sites
- replace con-state with an enum
Is there a reason you stopped short of doing that (besides time)?
sage
On Thu, 21 Jun 2012, Alex Elder wrote:
On 06/21/2012 01:44 PM, Sage Weil wrote:
I get the feeling that all of these lists of clear/set_bit calls would go
away if we
- verify we are under the mutex at each of these sites
- replace con-state with an enum
Is there a reason you stopped short of doing that (besides time)?
Time.
These should actually go away entirely; pushed a couple patches that do
that, and remove the now-unused con-nref member.
sage
On Thu, 21 Jun 2012, Alex Elder wrote:
The functions ceph_con_get() and ceph_con_put() are both only ever
used in net/ceph/messenger.c, so change them to have static
OK i discovered this time that all osds had the same disk usage before
crash. After starting the osd again i got this one:
/dev/sdb1 224G 23G 191G 11% /srv/osd.30
/dev/sdc1 224G 1,5G 213G 1% /srv/osd.31
/dev/sdd1 224G 1,5G 213G 1% /srv/osd.32
On 06/15/2012 03:48 PM, Josh Durgin wrote:
Here's a draft of a patch to the docs outlining the rbd layering
design. Is anything unclear? Any suggestions for improvement?
Josh
I'm going to try to take into account the comments others have made
but I may end up duplicating--and if so, I
On Fri, Jun 22, 2012 at 7:43 AM, James Page james.p...@ubuntu.com wrote:
You can type faster than I can... I'm working on getting this
resolved in the current dev release of Ubuntu in the next few
days after which it will go through the normal SRU process for
Ubuntu 12.04.
Sweet, thanks!
Dan Mick dan.mick at inktank.com writes:
On 06/18/2012 11:01 AM, Sage Weil wrote:
On Mon, 18 Jun 2012, Josh Durgin wrote:
$ rbd copyup pool2/child1
disown and adopt? :) (actually I started as a joke, but really I
kinda like that; fits with the parent-child name)
The issue I see
27 matches
Mail list logo