osd crash after reboot

2012-12-14 Thread Stefan Priebe
Hello list, after a reboot of my node i see this on all OSDs of this node after the reboot: 2012-12-14 09:03:20.393224 7f8e652f8780 -1 osd/OSD.cc: In function 'OSDMapRef OSDService::get_map(epoch_t)' thread 7f8e652f8780 time 2012-12-14 09:03:20.392528 osd/OSD.cc: 4385: FAILED

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe
same log more verbose: 11 ec=10 les/c 3307/3307 3306/3306/3306) [] r=0 lpr=0 lcod 0'0 mlcod 0'0 inactive] read_log done -11 2012-12-14 09:17:50.648572 7fb6e0d6b780 10 osd.3 pg_epoch: 3996 pg[3.44b( v 3988'3969 (1379'2968,3988'3969] local-les=3307 n=11 ec=10 les/c 3307/3307 3306/3306/3306)

Re: Debian packaging question

2012-12-14 Thread James Page
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 14/12/12 04:38, Gary Lowell wrote: I think that the --debbuildopts '-j8 -b' might be trouncing the - --binary-arch flag - I'll get pbuilder setup and give it a test - I normally use sbuild (for which the packaging changes did have the

Re: [PATCH] implement librados aio_stat

2012-12-14 Thread Giannakos Filippos
Hi team, I forgot to include a description (also cc-ing correctly the synnefo-devel list). I am a member of the Synnefo team, where we are experimenting with RADOS as a storage backend to host blocks for our volume block storage named archipelago. In this patch I implement aio stat and

RE: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2012-12-14 Thread Lachfeld, Jutta
Hi Noah, Gregory and Sage, first of all, thanks for your quick replies. Here are some answers to your questions. Gregory, I have got the output of ceph -s before and after this specific TeraSort run, and to me it looks ok; all 30 osds are up: health HEALTH_OK monmap e1: 1 mons at

Re: osd crash after reboot

2012-12-14 Thread Dennis Jacobfeuerborn
On 12/14/2012 10:14 AM, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block device: osd journal = /dev/sde1 didn't match

Re: Usage of CEPH FS versa HDFS for Hadoop: TeraSort benchmark performance comparison issue

2012-12-14 Thread Mark Nelson
On 12/13/2012 08:54 AM, Lachfeld, Jutta wrote: Hi all, Hi! Sorry to send this a bit late, it looks like the reply I authored yesterday from my phone got eaten by vger. I am currently doing some comparisons between CEPH FS and HDFS as a file system for Hadoop using Hadoop's integrated

Re: osd crash after reboot

2012-12-14 Thread Mark Nelson
On 12/14/2012 08:52 AM, Dennis Jacobfeuerborn wrote: On 12/14/2012 10:14 AM, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hello Dennis, Am 14.12.2012 15:52, schrieb Dennis Jacobfeuerborn: didn't match anymore - as the numbers got renumber due to the failed disk. Is there a way to use some kind of UUIDs here too for journal? You should be able to use /dev/disk/by-uuid/* instead. That should give you a stable view

Re: osd crash after reboot

2012-12-14 Thread Mark Nelson
Hi Stefan, Here's what I often do when I have a journal and data partition sharing a disk: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Mark On

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hi Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% My disks are gpt too and i'm also using parted. But

OSDMonitor: don't allow creation of pools with 65535 pgs

2012-12-14 Thread Jim Schutt
Hi, I'm looking at commit e3ed28eb2 in the next branch, and I have a question. Shouldn't the limit be pg_num 65536, because PGs are numbered 0 thru pg_num-1? If not, what am I missing? FWIW, up through yesterday I've been using the next branch and this: ceph osd pool set data pg_num 65536

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe - Profihost AG
Hello Mark, Am 14.12.2012 16:20, schrieb Mark Nelson: sudo parted -s -a optimal /dev/$DEV mklabel gpt sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-journal 0% 10G sudo parted -s -a optimal /dev/$DEV mkpart osd-device-$i-data 10G 100% Isn't that the part type you're using? mkpart

Re: rbd map command hangs for 15 minutes during system start up

2012-12-14 Thread Alex Elder
On 12/13/2012 01:00 PM, Nick Bartos wrote: Here's another log with the kernel debugging enabled: https://gist.github.com/raw/4278697/1c9e41d275e614783fbbdee8ca5842680f46c249/rbd-hang-1355424455.log Note that it hung on the 2nd try. Just to make sure I'm working with the right code base, can

Re: rbd map command hangs for 15 minutes during system start up

2012-12-14 Thread Nick Bartos
The kernel is 3.5.7 with the following patches applied (and in the order specified below): 001-libceph_eliminate_connection_state_DEAD_13_days_ago.patch 002-libceph_kill_bad_proto_ceph_connection_op_13_days_ago.patch 003-libceph_rename_socket_callbacks_13_days_ago.patch

Re: OSDMonitor: don't allow creation of pools with 65535 pgs

2012-12-14 Thread Joao Eduardo Luis
On 12/14/2012 03:41 PM, Jim Schutt wrote: Hi, I'm looking at commit e3ed28eb2 in the next branch, and I have a question. Shouldn't the limit be pg_num 65536, because PGs are numbered 0 thru pg_num-1? If not, what am I missing? FWIW, up through yesterday I've been using the next branch and

Re: [EXTERNAL] Re: OSDMonitor: don't allow creation of pools with 65535 pgs

2012-12-14 Thread Jim Schutt
On 12/14/2012 09:59 AM, Joao Eduardo Luis wrote: On 12/14/2012 03:41 PM, Jim Schutt wrote: Hi, I'm looking at commit e3ed28eb2 in the next branch, and I have a question. Shouldn't the limit be pg_num 65536, because PGs are numbered 0 thru pg_num-1? If not, what am I missing? FWIW, up

Re: osd crash after reboot

2012-12-14 Thread Sage Weil
On Fri, 14 Dec 2012, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a disk was missing (disk failure) afte the reboot. fstab and mountpoint are working with UUIDs so they match but the journal block device: osd journal = /dev/sde1 didn't match

Re: osd crash after reboot

2012-12-14 Thread Stefan Priebe
Hi Sage, this was just an idea and i need to fix MY uuid problem. But then the crash is still a problem of ceph. Have you looked into my log? Am 14.12.2012 20:42, schrieb Sage Weil: On Fri, 14 Dec 2012, Stefan Priebe wrote: One more IMPORTANT note. This might happen due to the fact that a

ceph-client/testing branch force-updated again

2012-12-14 Thread Alex Elder
I have updated the testing branch in the ceph-client git repository again, and you'll find that a forced update is needed to bring your own repository up to date. This will probably be necessary again at some point once we get some reviews done on commits still in this branch, but we'll try not

Re: [PATCH 1/9] rbd: do not allow remove of mounted-on image

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Thu, 13 Dec 2012, Alex Elder wrote: There is no check in rbd_remove() to see if anybody holds open the image being removed. That's not cool. Add a simple open count that goes up and down with opens and closes (releases) of the device, and don't

Re: [PATCH 3/9] libceph: avoid using freed osd in __kick_osd_requests()

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Thu, 13 Dec 2012, Alex Elder wrote: If an osd has no requests and no linger requests, __reset_osd() will just remove it with a call to __remove_osd(). That drops a reference to the osd, and therefore the osd may have been free by the time

Re: [PATCH 4/9] rbd: get rid of RBD_MAX_SEG_NAME_LEN

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Thu, 13 Dec 2012, Alex Elder wrote: RBD_MAX_SEG_NAME_LEN represents the maximum length of an rbd object name (i.e., one of the objects providing storage backing an rbd image). Another symbol, MAX_OBJ_NAME_SIZE, is used in the osd client code to

Re: [PATCH 5/9] libceph: init osd-o_node in create_osd()

2012-12-14 Thread Sage Weil
We should drop this one, I think. See upstream commit 4c199a93a2d36b277a9fd209a0f2793f8460a215. When we added the similar call on teh request tree it caused some noise in linux-next and then got removed. sage On Thu, 13 Dec 2012, Alex Elder wrote: It turns out to be harmless but the

[PATCH] libceph: report connection fault with warning

2012-12-14 Thread Alex Elder
When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error message on the log whenever this occurs. A ceph connection will attempt to

Re: [PATCH] libceph: report connection fault with warning

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Fri, 14 Dec 2012, Alex Elder wrote: When a connection's socket disconnects, or if there's a protocol error of some kind on the connection, a fault is signaled and the connection is reset (closed and reopened, basically). We currently get an error

Re: [PATCH 6/9] rbd: remove linger unconditionally

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Thu, 13 Dec 2012, Alex Elder wrote: In __unregister_linger_request(), the request is being removed from the osd client's req_linger list only when the request has a non-null osd pointer. It should be done whether or not the request currently has an

Re: [PATCH 9/9] libceph: socket can close in any connection state

2012-12-14 Thread Sage Weil
Reviewed-by: Sage Weil s...@inktank.com On Thu, 13 Dec 2012, Alex Elder wrote: A connection's socket can close for any reason, independent of the state of the connection (and without irrespective of the connection mutex). As a result, the connectino can be in pretty much any state at the