Re: 0.55 init script Issue?

2012-12-10 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 06/12/12 13:35, Sage Weil wrote:
 Yeah, this or something very similar is definitely the correct
 solution. Sage recently added the ceph upstart job, and
 we didn't put it through sufficient verification prior to
 release in order to notice this issue. Users who aren't
 using upstart (I expect that's all of them) should just
 delete the job after running the package install. We'll
 certainly sort this out prior to the next release; I'm not
 sure if we want to roll a v0.55.1 right away or not.
 
 Let's push it to the testing branch, but make sure any other
 fixes are there before rolling a .1.. maybe tomorrow?
 I've pushed this to the testing branch.  If someone wants to verify
 the packages built at
 
 http://gitbuilder.ceph.com/ceph-deb-precise-x86_64-basic/ref/testing/

  are fixed, that would be fabulous!

I think the radosgw init script and upstart configuration are going to
conflict in a similar way to the ceph one.

I've been working on integrating the upstart configurations into the
Ubuntu distro packaging of ceph; it is possible to install multiple
upstart configurations into single binary packages using dh_installinit:

dh_installinit --no-start
# Install upstart configurations using dh_installinit
for conf in `ls -1 src/upstart/ceph-*.conf | grep -v mds`; do \
name=`basename $$conf | cut -d . -f 1`; \
cp $$conf debian/ceph.$$name.upstart; \
dh_installinit -pceph --upstart-only --no-start --name=$$name; \
done
for conf in `ls -1 src/upstart/ceph-mds*.conf`; do \
name=`basename $$conf | cut -d . -f 1`; \
cp $$conf debian/ceph-mds.$$name.upstart; \
dh_installinit -pceph-mds --upstart-only --no-start
- --name=$$name; \
done
for conf in `ls -1 src/upstart/radosgw*.conf`; do \
name=`basename $$conf | cut -d . -f 1`; \
cp $$conf debian/radosgw.$$name.upstart; \
dh_installinit -pradosgw --upstart-only --no-start
- --name=$$name; \
done


- -- 
James Page
Ubuntu Core Developer
Debian Maintainer
james.p...@ubuntu.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJQxafNAAoJEL/srsug59jDGloP/0NK8RAFLLGcwHBOGqPSiOSr
e61NAus6zM0gRHdY0GxPLLzh8XfcoJtpiOdJE2PeC0YbNX7SRLnlHBc3uh3nRv0+
LF5RfzN0MASpk5miCVIpZCNhRVbg+tte1k3tEbZEgwsFavmhXdXzL7bwaKDKVuRp
Y8/XENu6qF2tSt1A1P5ABOqetjrZi78Z6bNSBS20N2PIZHAXwdb2MdFPKO600n2w
jVPjXh/nfCuPpmehE+ZVP/1y/7jLvtgYTIppg8bHNG0X4TI6aarHh3fiypti+UNu
y/fQvx48ktkCtaF7nfoIx+Kr1qWO9xcuHYZy/35a9woJIyBWGTHS4yT/UYG202gM
JjelhXu0BM4WrmTf9bdFo2iK55MNtKOEPOAjmy4FZbVtBc1iUNtyBOJGDSv5Ls+0
uwDImZbg1W8VqtnioQRrhNUelgV3SLYoWfzOjAsHMoS25/WsLwvnRm8XExD2223l
SQOfcwXjTDldRRc04wohr2Rc1/vuTEFGt94cLVf67UutEDgN4T8LK2Za+RV/D6/e
GmrofhClKQUIeHm3WaD3P2oJJZIG+MOeNG0+Y1JB4aOgL+ZUtuLTQo3xa8fhvhDA
mRcYrCvUwS+k70OcSh2GTpt9y+1mnk+qAB4JR3SclTVOUUh4i9KtYa+1F5+8NWwU
DsfC6VS1vdgPnBEcKw0d
=5jkr
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on disk encryption

2012-12-10 Thread James Page
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 19/09/12 02:53, Dustin Kirkland wrote:
 Looking forward, another option might be to implement
 encryption inside btrfs (placeholder fields are there in the
 disk format, introduced along with the compression code way
 back when).  This would let ceph-osd handle more of the key
 handling internally and do something like, say, only encrypt
 the current/ and snap_*/ subdirectories.
 
 Other ideas?  Thoughts?
 
 sage
 I love the idea of btrfs supporting encryption natively much like
 it does compression.  It may be some time before that happens, so
 in the meantime, I'd love to see Ceph support dm-crypt and/or
 eCryptfs beneath.

Has this discussion progressed into any sort of implementation yet?
It sounds like this is going to be a key feature for users who want
top-to-bottom encryption of data right down to the block level.

- -- 
James Page
Ubuntu Core Developer
Debian Maintainer
james.p...@ubuntu.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iQIcBAEBCAAGBQJQxaiUAAoJEL/srsug59jDULUP/2pbVVNJC/Dt6S7A+uUGMGQJ
/jgFqu6SVHplTGs3cKqDYH22W9b34Gr/kcga9qj00lo844drRNRo/AVfdaA+j7Ge
gkqc4ZiwNgZSHmu+I9/4fDpSRJf19i2le1/qtIToAXsxZJyefM4clPrWblK24bRd
T7yWbVJBxjiYv7FziHZohDEJ/jz2OMk4THZYVkB+yuUPLbDnbFxqK17gRtKPuS/K
EeuFBw1kFgB0OKQ4LGy/GSOK1xM4NiGKpdV9beeSfu1L5f1ClW0Drl221gnhZ4qe
g6HXAdCK1xhDU2xUhrrPSp0iVFGjxjnvoQz7PikX6Hn5lhqjbAHVaoQ9dJpshAsY
86XDVFJJF2ca9FjzBGo+Cx7Ap0ahI4eK1NTiNc/zPEb8TgM9q1OtIlAb9A6pyC/E
l0WQ/0WzhbbnjeByXloLkTG2K0WkaJYovemc959VUrdpP5Di2vsEhhFsVFlFUlTC
i8xQaQZmoXXp8mhzNwdSLIcoUb9Y5MnghNO3mdz6WfM2KtyrTobi5lKZyFxZJfhA
oGt5It6AF/fRHi2Xu9yLyfVYrnf/oDJn1vjzJ0BkJLZ8rUANLVGYrpiKAECY1EF3
Nb2kXnhBVs1426TgvcAlchDUACPNUR2YVx9s12gHVTZURgrSr0+QMPHJL9uRJPxE
5T4wqmJNV2Caponla/fr
=wHrw
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A couple of OSD-crashes after serious network trouble

2012-12-10 Thread Oliver Francke

Hi Sam,

helpful input.. and... not so...

On 12/07/2012 10:18 PM, Samuel Just wrote:

Ah... unfortunately doing a repair in these 6 cases would probably
result in the wrong object surviving.  It should work, but it might
corrupt the rbd image contents.  If the images are expendable, you
could repair and then delete the images.

The red flag here is that the known size is smaller than the other
size.  This indicates that it most likely chose the wrong file as the
correct one since rbd image blocks usually get bigger over time.  To
fix this, you will need to manually copy the file for the larger of
the two object replicas to replace the smaller of the two object
replicas.

For the first, soid 87c96f10/rb.0.47d9b.1014b7b4.02df/head//65
in pg 65.10:
1) Find the object on the primary and the replica (from above, primary
is 12 and replica is 40).  You can use find in the primary and replica
current/65.10_head directories to look for a file matching
*rb.0.47d9b.1014b7b4.02df*).  The file name should be
'rb.0.47d9b.1014b7b4.02df__head_87C96F10__65' I think.
2) Stop the primary and replica osds
3) Compare the file sizes for the two files -- you should find that
the file sizes do not match.
4) Replace the smaller file with the larger one (you'll probably want
to keep a copy of the smaller one around just in case).
5) Restart the osds and scrub pg 65.10 -- the pg should come up clean
(possibly with a relatively harmless stat mismatch)


been there. on OSD.12 it's
-rw-r--r-- 1 root root 699904 Dec  9 06:25 
rb.0.47d9b.1014b7b4.02df__head_87C96F10__41


on OSD.40:
-rw-r--r-- 1 root root 4194304 Dec  9 06:25 
rb.0.47d9b.1014b7b4.02df__head_87C96F10__41


going by a short glance into the file, there are some readable 
syslog-entries, in both files.
For the bad luck in this example, the shorter file contains the more 
current entries?!


What exactly happens, if I try to copy or export the file? Which block 
will be chosen?

VM is running as I'm writing, so flexibility reduced.

Regards,

Oliver.


If this worked our correctly, you can repeat for the other 5 cases.

Let me know if you have any questions.
-Sam

On Fri, Dec 7, 2012 at 11:09 AM, Oliver Francke oliver.fran...@filoo.de wrote:

Hi Sam,

Am 07.12.2012 um 19:37 schrieb Samuel Just sam.j...@inktank.com:


That is very likely to be one of the merge_log bugs fixed between 0.48
and 0.55.  I could confirm with a stacktrace from gdb with line
numbers or the remainder of the logging dumped when the daemon
crashed.

My understanding of your situation is that currently all pgs are
active+clean but you are missing some rbd image headers and some rbd
images appear to be corrupted.  Is that accurate?
-Sam


thnx for droppig in.

Uhm almost correct, there are now 6 pg in state inconsistent:

HEALTH_WARN 6 pgs inconsistent
pg 65.da is active+clean+inconsistent, acting [1,33]
pg 65.d7 is active+clean+inconsistent, acting [13,42]
pg 65.10 is active+clean+inconsistent, acting [12,40]
pg 65.f is active+clean+inconsistent, acting [13,31]
pg 65.75 is active+clean+inconsistent, acting [1,33]
pg 65.6a is active+clean+inconsistent, acting [13,31]

I know which images are affected, but does a repair help?

0 log [ERR] : 65.10 osd.40: soid 
87c96f10/rb.0.47d9b.1014b7b4.02df/head//65 size 4194304 != known size 
699904
0 log [ERR] : 65.6a osd.31: soid 
19a2526a/rb.0.2dcf2.1da2a31e.0737/head//65 size 4191744 != known size 
2757632
0 log [ERR] : 65.75 osd.33: soid 
20550575/rb.0.2d520.5c17a6e3.0339/head//65 size 4194304 != known size 
1238016
0 log [ERR] : 65.d7 osd.42: soid 
fa3a5d7/rb.0.2c2a8.12ec359d.205c/head//65 size 4194304 != known size 
1382912
0 log [ERR] : 65.da osd.33: soid 
c2a344da/rb.0.2be17.cb4bd69.0081/head//65 size 4191744 != known size 
1815552
0 log [ERR] : 65.f osd.31: soid 
e8d2430f/rb.0.2d1e9.1339c5dd.0c41/head//65 size 2424832 != known size 
2331648

of make things worse?

I could only check 14 out of 20 OSD's so far, cause from two older nodes a scrub leads to 
slow-requests…  couple of minutes, so VM's got stalled… customers pressing the 
reset-button, so losing caches…

Comments welcome,

Oliver.


On Fri, Dec 7, 2012 at 6:39 AM, Oliver Francke oliver.fran...@filoo.de wrote:

Hi,

is the following a known one, too? Would be good to get it out of my head:



/var/log/ceph/ceph-osd.40.log.1.gz: 1: /usr/bin/ceph-osd() [0x706c59]
/var/log/ceph/ceph-osd.40.log.1.gz: 2: (()+0xeff0) [0x7f7f306c0ff0]
/var/log/ceph/ceph-osd.40.log.1.gz: 3: (gsignal()+0x35) [0x7f7f2f35f1b5]
/var/log/ceph/ceph-osd.40.log.1.gz: 4: (abort()+0x180) [0x7f7f2f361fc0]
/var/log/ceph/ceph-osd.40.log.1.gz: 5:
(__gnu_cxx::__verbose_terminate_handler()+0x115) [0x7f7f2fbf3dc5]
/var/log/ceph/ceph-osd.40.log.1.gz: 6: (()+0xcb166) [0x7f7f2fbf2166]
/var/log/ceph/ceph-osd.40.log.1.gz: 7: (()+0xcb193) [0x7f7f2fbf2193]
/var/log/ceph/ceph-osd.40.log.1.gz: 8: (()+0xcb28e) [0x7f7f2fbf228e]
/var/log/ceph/ceph-osd.40.log.1.gz: 

Re: [Qemu-devel] [PATCHv6] rbd block driver fix race between aio completition and aio cancel

2012-12-10 Thread Kevin Wolf
Am 30.11.2012 14:50, schrieb Stefan Hajnoczi:
 On Fri, Nov 30, 2012 at 9:55 AM, Stefan Priebe s.pri...@profihost.ag wrote:
 This one fixes a race which qemu had also in iscsi block driver
 between cancellation and io completition.

 qemu_rbd_aio_cancel was not synchronously waiting for the end of
 the command.

 To archieve this it introduces a new status flag which uses
 -EINPROGRESS.

 Changes since PATCHv5:
 - qemu_aio_release has to be done in qemu_rbd_aio_cancel if I/O
   was cancelled

 Changes since PATCHv4:
 - removed unnecessary qemu_vfree of acb-bounce as BH will always
   run

 Changes since PATCHv3:
 - removed unnecessary if condition in rbd_start_aio as we
   haven't start io yet
 - moved acb-status = 0 to rbd_aio_bh_cb so qemu_aio_wait always
   waits until BH was executed

 Changes since PATCHv2:
 - fixed missing braces
 - added vfree for bounce

 Signed-off-by: Stefan Priebe s.pri...@profihost.ag

 ---
  block/rbd.c |   20 
  1 file changed, 12 insertions(+), 8 deletions(-)
 
 Reviewed-by: Stefan Hajnoczi stefa...@gmail.com

Thanks, applied to the block branch.

For future patches, please put a --- line between the real commit
message (including the SoB, of course) and the changelog so that git am
automatically removes the changelog.

Kevin
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: on disk encryption

2012-12-10 Thread Gregory Farnum
On Monday, December 10, 2012 at 1:17 AM, James Page wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
  
 On 19/09/12 02:53, Dustin Kirkland wrote:
Looking forward, another option might be to implement
encryption inside btrfs (placeholder fields are there in the
disk format, introduced along with the compression code way
back when). This would let ceph-osd handle more of the key
handling internally and do something like, say, only encrypt
the current/ and snap_*/ subdirectories.
 
Other ideas? Thoughts?
 
sage
  I love the idea of btrfs supporting encryption natively much like
  it does compression. It may be some time before that happens, so
  in the meantime, I'd love to see Ceph support dm-crypt and/or
  eCryptfs beneath.
  
  
  
 Has this discussion progressed into any sort of implementation yet?
 It sounds like this is going to be a key feature for users who want
 top-to-bottom encryption of data right down to the block level.


Peter is working on this now — I'll let him discuss the details. :)
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX

2012-12-10 Thread Jimmy Tang

On 9 Dec 2012, at 18:22, Noah Watkins wrote:

 On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote:
 Oooh, very nice! Do you have a list of the dependencies that you actually 
 needed to install?
 
 I can put that together. They were boost, gperf, fuse4x, cryptopp. I
 think that might have been it.
 

Is libaio really needed to build ceph-fuse? I use macports on my system and the 
last time I tried to make a change set to let ceph/ceph-fuse build on my laptop 
failed as I didn't have libaio, though I could just write a port for it.

 Apart from breaking this up into smaller patches, we'll also want to 
 reformat some of it. Rather than sticking an #if APPLE on top of every spin 
 lock, we should have utility functions that do this for us. ;)
 
 Definitely. OSX has spinlock implementations for user space, but it's
 going to take some reading. For example, spinlocks in Ceph are
 initialized for shared memory, rather than the default private. It
 isn't clear from documentation what the semantics are of OSX
 spinlocks, nor is it clear if the shared memory attribute is needed.
 
 Also, we should be able to find libatomic_ops for OS X (its parent project 
 works under OS X), and we can use that to construct a spin lock if we think 
 it'll be useful. I'm not too sure how effective its muteness are at 
 spinlock-y workloads.
 
 This patch set uses the OSX atomic inc/dec ops, rather than spinlocks.
 
 Another fun fact:
 
 msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o
 and libcommon_la-pipe.o, but HFS+ is case-insensitive by default.
 Result is duplicate symbols. That took a while to figure out :P
 

good catch, that might explain why my last look at ceph on osx failed so 
miserably.


Jimmy.

--
Senior Software Engineer, Digital Repository of Ireland (DRI)
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie
Tel: +353-1-896-3847

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX

2012-12-10 Thread Sage Weil
On Mon, 10 Dec 2012, Jimmy Tang wrote:
 
 On 9 Dec 2012, at 18:22, Noah Watkins wrote:
 
  On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote:
  Oooh, very nice! Do you have a list of the dependencies that you actually 
  needed to install?
  
  I can put that together. They were boost, gperf, fuse4x, cryptopp. I
  think that might have been it.
  
 
 Is libaio really needed to build ceph-fuse? I use macports on my system 
 and the last time I tried to make a change set to let ceph/ceph-fuse 
 build on my laptop failed as I didn't have libaio, though I could just 
 write a port for it.

libaio is only used by ceph-osd.  Not needed by fuse.

sage


 
  Apart from breaking this up into smaller patches, we'll also want to 
  reformat some of it. Rather than sticking an #if APPLE on top of every 
  spin lock, we should have utility functions that do this for us. ;)
  
  Definitely. OSX has spinlock implementations for user space, but it's
  going to take some reading. For example, spinlocks in Ceph are
  initialized for shared memory, rather than the default private. It
  isn't clear from documentation what the semantics are of OSX
  spinlocks, nor is it clear if the shared memory attribute is needed.
  
  Also, we should be able to find libatomic_ops for OS X (its parent project 
  works under OS X), and we can use that to construct a spin lock if we 
  think it'll be useful. I'm not too sure how effective its muteness are at 
  spinlock-y workloads.
  
  This patch set uses the OSX atomic inc/dec ops, rather than spinlocks.
  
  Another fun fact:
  
  msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o
  and libcommon_la-pipe.o, but HFS+ is case-insensitive by default.
  Result is duplicate symbols. That took a while to figure out :P
  
 
 good catch, that might explain why my last look at ceph on osx failed so 
 miserably.
 
 
 Jimmy.
 
 --
 Senior Software Engineer, Digital Repository of Ireland (DRI)
 Trinity Centre for High Performance Computing,
 Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
 http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie
 Tel: +353-1-896-3847
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-commit] [ceph/ceph] e6a154: osx: compile on OSX

2012-12-10 Thread Sam Lang

On 12/10/2012 07:01 AM, Sage Weil wrote:

On Mon, 10 Dec 2012, Jimmy Tang wrote:


On 9 Dec 2012, at 18:22, Noah Watkins wrote:


On Sun, Dec 9, 2012 at 10:05 AM, Gregory Farnum g...@inktank.com wrote:

Oooh, very nice! Do you have a list of the dependencies that you actually 
needed to install?


I can put that together. They were boost, gperf, fuse4x, cryptopp. I
think that might have been it.



Is libaio really needed to build ceph-fuse? I use macports on my system
and the last time I tried to make a change set to let ceph/ceph-fuse
build on my laptop failed as I didn't have libaio, though I could just
write a port for it.


libaio is only used by ceph-osd.  Not needed by fuse.


An alternative on OSX could be aio-lite: 
https://trac.mcs.anl.gov/projects/aio-lite


It might perform better on linux as well because of the request 
serialization there, although that library was implemented a few years 
ago, and the linux implementation may have improved significantly since 
then.  It also wouldn't be hard to do something similar with ceph thread 
structures instead of depending on an external library like this one.


-sam



sage





Apart from breaking this up into smaller patches, we'll also want to reformat 
some of it. Rather than sticking an #if APPLE on top of every spin lock, we 
should have utility functions that do this for us. ;)


Definitely. OSX has spinlock implementations for user space, but it's
going to take some reading. For example, spinlocks in Ceph are
initialized for shared memory, rather than the default private. It
isn't clear from documentation what the semantics are of OSX
spinlocks, nor is it clear if the shared memory attribute is needed.


Also, we should be able to find libatomic_ops for OS X (its parent project 
works under OS X), and we can use that to construct a spin lock if we think 
it'll be useful. I'm not too sure how effective its muteness are at spinlock-y 
workloads.


This patch set uses the OSX atomic inc/dec ops, rather than spinlocks.

Another fun fact:

msg/Pipe.cc and common/pipe.c are compiled into libcommon_la-Pipe.o
and libcommon_la-pipe.o, but HFS+ is case-insensitive by default.
Result is duplicate symbols. That took a while to figure out :P



good catch, that might explain why my last look at ceph on osx failed so 
miserably.


Jimmy.

--
Senior Software Engineer, Digital Repository of Ireland (DRI)
Trinity Centre for High Performance Computing,
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jt...@tchpc.tcd.ie
Tel: +353-1-896-3847

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2012-12-10 Thread Alexandre Maumené
subscribe ceph-devel
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


mds don´t start after upgrade 0.55

2012-12-10 Thread Soporte
Hi.

I can´t start mds server. The logs show:

--- begin dump of recent events ---
   -30 2012-12-09 16:49:52.181838 7f547966a780  5 asok(0x1d8c000) 
register_command perfcounters_dump hook 0x1d74010
   -29 2012-12-09 16:49:52.181884 7f547966a780  5 asok(0x1d8c000) 
register_command 1 hook 0x1d74010
   -28 2012-12-09 16:49:52.181890 7f547966a780  5 asok(0x1d8c000) 
register_command perf dump hook 0x1d74010
   -27 2012-12-09 16:49:52.181901 7f547966a780  5 asok(0x1d8c000) 
register_command perfcounters_schema hook 0x1d74010
   -26 2012-12-09 16:49:52.181907 7f547966a780  5 asok(0x1d8c000) 
register_command 2 hook 0x1d74010
   -25 2012-12-09 16:49:52.181910 7f547966a780  5 asok(0x1d8c000) 
register_command perf schema hook 0x1d74010
   -24 2012-12-09 16:49:52.181915 7f547966a780  5 asok(0x1d8c000) 
register_command config show hook 0x1d74010
   -23 2012-12-09 16:49:52.181919 7f547966a780  5 asok(0x1d8c000) 
register_command config set hook 0x1d74010
   -22 2012-12-09 16:49:52.181926 7f547966a780  5 asok(0x1d8c000) 
register_command log flush hook 0x1d74010
   -21 2012-12-09 16:49:52.181932 7f547966a780  5 asok(0x1d8c000) 
register_command log dump hook 0x1d74010
   -20 2012-12-09 16:49:52.181936 7f547966a780  5 asok(0x1d8c000) 
register_command log reopen hook 0x1d74010
   -19 2012-12-09 16:49:52.183484 7f547966a780  0 ceph version 0.55 
(690f8175606edf37a3177c27a3949c78fd37099f), process ceph-mds, pid 2400
   -18 2012-12-09 16:49:52.184629 7f547966a780  1 finished 
global_init_daemonize
   -17 2012-12-09 16:49:52.187153 7f547966a780  5 asok(0x1d8c000) init 
/var/run/ceph/ceph-mds.a.asok
   -16 2012-12-09 16:49:52.187209 7f547966a780  5 asok(0x1d8c000) 
bind_and_listen /var/run/ceph/ceph-mds.a.asok
   -15 2012-12-09 16:49:52.187274 7f547966a780  5 asok(0x1d8c000) 
register_command 0 hook 0x1d720b8
   -14 2012-12-09 16:49:52.187291 7f547966a780  5 asok(0x1d8c000) 
register_command version hook 0x1d720b8
   -13 2012-12-09 16:49:52.187306 7f547966a780  5 asok(0x1d8c000) 
register_command git_version hook 0x1d720b8
   -12 2012-12-09 16:49:52.187316 7f547966a780  5 asok(0x1d8c000) 
register_command help hook 0x1d740c0
   -11 2012-12-09 16:49:52.187369 7f547966a780 10 monclient(hunting): 
build_initial_monmap
   -10 2012-12-09 16:49:52.187697 7f547966a780 10 monclient(hunting): init
-9 2012-12-09 16:49:52.188025 7f547966a780 10 monclient(hunting): 
auth_supported 2
-8 2012-12-09 16:49:52.188049 7f547966a780 10 monclient(hunting): 
_reopen_session
-7 2012-12-09 16:49:52.188099 7f547966a780 10 monclient(hunting): 
_pick_new_mon picked mon.d con 0x1d9cf20 addr 10.0.1.244:6789/0
-6 2012-12-09 16:49:52.188129 7f547966a780 10 monclient(hunting): 
_send_mon_message to mon.d at 10.0.1.244:6789/0
-5 2012-12-09 16:49:52.188142 7f547966a780 10 monclient(hunting): 
renew_subs
-4 2012-12-09 16:49:52.188224 7f5475915700  5 asok(0x1d8c000) entry start
-3 2012-12-09 16:49:52.189164 7f5474112700  0 mds.-1.0 ms_handle_connect 
on 10.0.1.244:6789/0
-2 2012-12-09 16:49:52.189942 7f5474112700 10 monclient(hunting): no 
handler for protocol 0
-1 2012-12-09 16:49:52.189965 7f5474112700 10 monclient(hunting): none of 
our auth protocols are supported by the server
 0 2012-12-09 16:49:52.190925 7f547966a780 -1 *** Caught signal 
(Segmentation fault) **

Maybe and issue with cephx? How I can check this?

Cristian.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: OSDs don't actually delete files when two CephFS (or more) are in use

2012-12-10 Thread Geoffrey Hartz
Ok!

I'm not in production yet. Only testing. In about 1/2months I will use
in real (waiting for v0.56 bobtail)

Thanks for your consideration! really appreciate.


I have an other strange behavior between CephFS and kernel mount

With ceph-fuse command :
Client 1 create 2000 files and do ls = 1/2sec
Client 2 do ls = 1/2sec

With mount -t ceph ...
Client 1 create 2000 files and do ls = few ms
Client 2 do ls = 4-5sec

2012/12/10 Sam Lang sam.l...@inktank.com:
 On 12/10/2012 10:34 AM, Geoffrey Hartz wrote:

 Hi.

 I wait about more than 15 minutes. I was doing some benchmark when I
 noticed that the space never go back to normal..

 How can I disable this behavior? It's on Ceph side?

 With one client, OSD are cleaned after few seconds, this sounds normal


 Actually after discussing with Greg and Sage, it sounds like this is a bug.
 The issue is that the client that didn't remove the file is caching the
 dentry indefinitely.  I've created a ticket to track the issue here:
 http://tracker.newdream.net/issues/3601.

 Unfortunately, I don't think there's a good workaround for the time being.
 You can try to evict those cached dentries by creating/accessing a bunch of
 other files, but the default cache size on the client is 16384, which is a
 lot of files to touch just to free up the space for those removed files. :-)

 You can decrease the cache size with the config option client_cache_size:

 [client]
 client cache size = 128

 Then you only have to create/touch 128 files to evict the other files from
 the cache.  That's not ideal, because reducing the cache size will affect
 your overall performance, but if you know that you won't be accessing a lot
 of files anyway, its probably your best bet.

 -sam



 2012/12/10 Sam Lang sam.l...@inktank.com:

 On 12/10/2012 08:29 AM, Geoffrey Hartz wrote:


 Hi!

 I'm new to Ceph and I have a strange behavior with CephFS

 Config is :

 Ubuntu 12.04
 Kernel 3.6.9
 Ceph V0.55

 2 OSD, 1 mon, 1 MDS, all on same host
 2 clients, separate Hosts

 Ceph.conf:

 http://paste.ubuntu.com/1423712/

 To mount the share I use : sudo ceph-fuse -m 192.168.80.139:6789 /mnt

 When I create a file on one client, the other see the file, can be
 downloaded etc.

 But when I delete the file, both clients don't see the file anymore
 BUT the file is still there on OSD (using space disk).



 Removing a file removes the directory entry (as you've seen), but the
 inode
 itself doesn't get removed until all references to it are dropped.  The
 clients may cache the capability for those inodes for a period of time,
 so
 you're not seeing the references drop until they get evicted from the
 cache.
 Unmounting ensures that they get evicted from the client caches, so all
 references go to zero.

 Also, removal of the underlying objects is done lazily, so you may not
 see
 the space get freed up right away.

 -sam


 When I umount from BOTH clients, OSD are update and file is actually
 delete (same behavior with mount -t ceph)

 I'm missing something?

 Thanks!

 --
 Geoffrey HARTZ
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html









-- 
Geoffrey HARTZ
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Few questions about Ceph

2012-12-10 Thread Samuel Just
3) Forgive the style, it'll be going into the docs shortly :)

It's possible to have multiple independent crush heirarchies within the same
crush map.  Suppose you want to have pools default to osds backed by large
spinning disks but have some pools mapped to osds backed by fast ssds:

  device 0 osd.0
  device 1 osd.1
  device 2 osd.2
  device 3 osd.3
  device 4 osd.4
  device 5 osd.5
  device 6 osd.6
  device 7 osd.7

   host ceph-osd-ssd-server-1 {
   id -1
   alg straw
   hash 0
   item osd.0 weight 1.00
   item osd.1 weight 1.00
   }

   host ceph-osd-ssd-server-2 {
   id -2
   alg straw
   hash 0
   item osd.2 weight 1.00
   item osd.3 weight 1.00
   }

   host ceph-osd-platter-server-1 {
   id -3
   alg straw
   hash 0
   item osd.4 weight 1.00
   item osd.5 weight 1.00
   }

   host ceph-osd-platter-server-2 {
   id -4
   alg straw
   hash 0
   item osd.6 weight 1.00
   item osd.7 weight 1.00
   }

   root platter {
   id -5
   alg straw
   hash 0
   item ceph-osd-platter-server-1 weight 2.00
   item ceph-osd-platter-server-2 weight 2.00
   }

   root ssd {
   id -6
   alg straw
   hash 0
   item ceph-osd-ssd-server-1 weight 2.00
   item ceph-osd-ssd-server-2 weight 2.00
   }

   rule data {
   ruleset 0
   type replicated
   min_size 2
   max_size 2
   step take platter
   step chooseleaf 0 type host
   step emit
   }

   rule metadata {
   ruleset 1
   type replicated
   min_size 0
   max_size 10
   step take platter
   step chooseleaf 0 type host
   step emit
   }

   rule rbd {
   ruleset 2
   type replicated
   min_size 0
   max_size 10
   step take platter
   step chooseleaf 0 type host
   step emit
   }

   rule platter {
   ruleset 3
   type replicated
   min_size 0
   max_size 10
   step take platter
   step chooseleaf 0 type host
   step emit
   }

   rule ssd {
   ruleset 4
   type replicated
   min_size 0
   max_size 10
   step take ssd
   step chooseleaf 0 type host
   step emit
   }

   rule ssd-primary {
   ruleset 4
   type replicated
   min_size 0
   max_size 10
   step take ssd
   step chooseleaf 1 type host
   step emit
   step take platter
   step chooseleaf -1 type host
   step emit
   }

You can then set a pool to use the ssd rule by:
ceph osd pool set poolname crush_ruleset 4

Similarly, using the ssd-primary rule will cause
each pg in the pool to be placed with an ssd as
the primary and platters as the replicas.

-Sam

On Mon, Dec 10, 2012 at 11:17 AM, Alexandre Maumené
alexan...@maumene.org wrote:
 Hello all,

 I have a few questions about Ceph:

 1) Is it possible to run a cluster with some lantecy between monitor
 nodes? Latency will be 30ms at worst.

 2) When using RBD what are the best practices for a direct mount using
 XFS filesystem? And for a qemu/kvm devices? I'm thinking about
 writeback, rbd_cache, ...

 3) About the CRUSH map, how can I separate 2 pools on different OSD?
 I'd like to setup a cluster with different disks (like SATA/SAS) and I
 want to be able to specify on which disks (or OSD) my data are going
 to be write.

 Thanks in advance for any answer.

 Regards,
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Few questions about Ceph

2012-12-10 Thread Samuel Just
oops, ssd-primary should be ruleset 5
-Sam

On Mon, Dec 10, 2012 at 2:22 PM, Samuel Just sam.j...@inktank.com wrote:
 3) Forgive the style, it'll be going into the docs shortly :)

 It's possible to have multiple independent crush heirarchies within the same
 crush map.  Suppose you want to have pools default to osds backed by large
 spinning disks but have some pools mapped to osds backed by fast ssds:

   device 0 osd.0
   device 1 osd.1
   device 2 osd.2
   device 3 osd.3
   device 4 osd.4
   device 5 osd.5
   device 6 osd.6
   device 7 osd.7

host ceph-osd-ssd-server-1 {
id -1
alg straw
hash 0
item osd.0 weight 1.00
item osd.1 weight 1.00
}

host ceph-osd-ssd-server-2 {
id -2
alg straw
hash 0
item osd.2 weight 1.00
item osd.3 weight 1.00
}

host ceph-osd-platter-server-1 {
id -3
alg straw
hash 0
item osd.4 weight 1.00
item osd.5 weight 1.00
}

host ceph-osd-platter-server-2 {
id -4
alg straw
hash 0
item osd.6 weight 1.00
item osd.7 weight 1.00
}

root platter {
id -5
alg straw
hash 0
item ceph-osd-platter-server-1 weight 2.00
item ceph-osd-platter-server-2 weight 2.00
}

root ssd {
id -6
alg straw
hash 0
item ceph-osd-ssd-server-1 weight 2.00
item ceph-osd-ssd-server-2 weight 2.00
}

rule data {
ruleset 0
type replicated
min_size 2
max_size 2
step take platter
step chooseleaf 0 type host
step emit
}

rule metadata {
ruleset 1
type replicated
min_size 0
max_size 10
step take platter
step chooseleaf 0 type host
step emit
}

rule rbd {
ruleset 2
type replicated
min_size 0
max_size 10
step take platter
step chooseleaf 0 type host
step emit
}

rule platter {
ruleset 3
type replicated
min_size 0
max_size 10
step take platter
step chooseleaf 0 type host
step emit
}

rule ssd {
ruleset 4
type replicated
min_size 0
max_size 10
step take ssd
step chooseleaf 0 type host
step emit
}

rule ssd-primary {
ruleset 4
type replicated
min_size 0
max_size 10
step take ssd
step chooseleaf 1 type host
step emit
step take platter
step chooseleaf -1 type host
step emit
}

 You can then set a pool to use the ssd rule by:
 ceph osd pool set poolname crush_ruleset 4

 Similarly, using the ssd-primary rule will cause
 each pg in the pool to be placed with an ssd as
 the primary and platters as the replicas.

 -Sam

 On Mon, Dec 10, 2012 at 11:17 AM, Alexandre Maumené
 alexan...@maumene.org wrote:
 Hello all,

 I have a few questions about Ceph:

 1) Is it possible to run a cluster with some lantecy between monitor
 nodes? Latency will be 30ms at worst.

 2) When using RBD what are the best practices for a direct mount using
 XFS filesystem? And for a qemu/kvm devices? I'm thinking about
 writeback, rbd_cache, ...

 3) About the CRUSH map, how can I separate 2 pools on different OSD?
 I'd like to setup a cluster with different disks (like SATA/SAS) and I
 want to be able to specify on which disks (or OSD) my data are going
 to be write.

 Thanks in advance for any answer.

 Regards,
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segmentation fault on rbd client ceph version 0.48.2argonaut

2012-12-10 Thread Josh Durgin

On 12/10/2012 01:54 PM, Vladislav Gorbunov wrote:

but access to iscsi/seodo1 and iscsi/siri1 fail on every rbd client
hosts. Data completely inaccessible.

root@bender:~# rbd info iscsi/seodo1
*** Caught signal (Segmentation fault) **
  in thread 7fb8c93f5780
  ceph version 0.48.2argonaut (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
  1: rbd() [0x41dfea]
  2: (()+0xfcb0) [0x7fb8c796fcb0]
  3: (()+0x16244d) [0x7fb8c6ae444d]
  4: (librbd::read_header_bl(librados::IoCtx, std::string const,
ceph::buffer::list, unsigned long*)+0xf9) [0x7fb8c8fadb99]
  5: (librbd::read_header(librados::IoCtx, std::string const,
rbd_obj_header_ondisk*, unsigned long*)+0x82) [0x7fb8c8fadda2]
  6: (librbd::ictx_refresh(librbd::ImageCtx*)+0x90b) [0x7fb8c8fb05eb]
  7: (librbd::open_image(librbd::ImageCtx*)+0x1b5) [0x7fb8c8fb1165]
  8: (librbd::RBD::open(librados::IoCtx, librbd::Image, char const*,
char const*)+0x5f) [0x7fb8c8fb16af]
  9: (main()+0x73c) [0x41721c]
  10: (__libc_start_main()+0xed) [0x7fb8c69a376d]
  11: rbd() [0x41a0c9]
2012-12-11 09:33:14.264755 7fb8c93f5780 -1 *** Caught signal
(Segmentation fault) **
  in thread 7fb8c93f5780


It sounds like the header object (which rbd uses to determine the
prefix for data object names) is corrupted or otherwise inaccessible.

Could you save the header object to a file ('rados -p iscsi get 
seodo1.rbd') and put that file somewhere accessible?


Did anything happen to your cluster before this header became
unreadable? Any disk problems, or osds crashing?

Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Errors attaching RBD image to a running VM

2012-12-10 Thread Michael Morgan
On Mon, Dec 10, 2012 at 01:12:45PM -0800, Josh Durgin wrote:
 
 There was a regression in 1.0.0 with attaching non-files, such as RBD.
 This is fixed by f0e72b2f5c675f927d04545dc5095f9e5998f171, which you
 could cherry-pick onto 1.0.0.
 
 If you'd rather just use a released version, 0.10.2 should be fine.
 

Libvirt 0.10.2 appears to have fixed my problems. Booting from RBD works as
well as attaching/detaching multiple additional volumes. Thanks so much Josh!

-Mike
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Segmentation fault on rbd client ceph version 0.48.2argonaut

2012-12-10 Thread Vladislav Gorbunov
Look like the header object on broken images is empty.

root@bender:~# rados -p iscsi stat seodo1.rbd
iscsi/seodo1.rbd mtime 1354795057, size 0

root@bender:~# rados -p iscsi stat siri.rbd
iscsi/siri.rbd mtime 1355151093, size 0

On accessible image header size not empty:
root@bender:~# rados -p iscsi stat siri1.rbd
iscsi/siri1.rbd mtime 1355174156, size 112

and header can't saved:
root@bender:~# rados -p iscsi get seodo1.rbd seodo1.header
2012-12-11 11:34:06.044164 7fe732f52780  0 wrote 0 byte payload to seodo1.header

Before this header became unreadable new osd server added and cluster
was rebalanced. One of the mon server (mon.0) crushed, and i restart
them.

2012/12/11 Josh Durgin josh.dur...@inktank.com:
 On 12/10/2012 01:54 PM, Vladislav Gorbunov wrote:

 but access to iscsi/seodo1 and iscsi/siri1 fail on every rbd client
 hosts. Data completely inaccessible.

 root@bender:~# rbd info iscsi/seodo1
 *** Caught signal (Segmentation fault) **
   in thread 7fb8c93f5780
   ceph version 0.48.2argonaut
 (commit:3e02b2fad88c2a95d9c0c86878f10d1beb780bfe)
   1: rbd() [0x41dfea]
   2: (()+0xfcb0) [0x7fb8c796fcb0]
   3: (()+0x16244d) [0x7fb8c6ae444d]
   4: (librbd::read_header_bl(librados::IoCtx, std::string const,
 ceph::buffer::list, unsigned long*)+0xf9) [0x7fb8c8fadb99]
   5: (librbd::read_header(librados::IoCtx, std::string const,
 rbd_obj_header_ondisk*, unsigned long*)+0x82) [0x7fb8c8fadda2]
   6: (librbd::ictx_refresh(librbd::ImageCtx*)+0x90b) [0x7fb8c8fb05eb]
   7: (librbd::open_image(librbd::ImageCtx*)+0x1b5) [0x7fb8c8fb1165]
   8: (librbd::RBD::open(librados::IoCtx, librbd::Image, char const*,
 char const*)+0x5f) [0x7fb8c8fb16af]
   9: (main()+0x73c) [0x41721c]
   10: (__libc_start_main()+0xed) [0x7fb8c69a376d]
   11: rbd() [0x41a0c9]
 2012-12-11 09:33:14.264755 7fb8c93f5780 -1 *** Caught signal
 (Segmentation fault) **
   in thread 7fb8c93f5780


 It sounds like the header object (which rbd uses to determine the
 prefix for data object names) is corrupted or otherwise inaccessible.

 Could you save the header object to a file ('rados -p iscsi get seodo1.rbd')
 and put that file somewhere accessible?

 Did anything happen to your cluster before this header became
 unreadable? Any disk problems, or osds crashing?

 Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: mounting fuse from fstab

2012-12-10 Thread Sage Weil
On Mon, 10 Dec 2012, Sam Lang wrote:
 On 12/10/2012 10:33 AM, Sage Weil wrote:
  We put together a simple helper script for mounting ceph-fuse via fstab
  (below).  Some of man pages indicate that the # syntax is deprecated,
  however, and it's not clear to me that whatever replaces it (mount.fuse)
  will let us accomplish the same thing (pass something along the mount,
  control command line options).  Also, it's unclear *when* it was
  deprecated; if we want this to work on, say, RHEL, the replacement
  might not be there.
 
 Would the fuse options fsname and subtype be more portable?

Looking at the subtype stuff a bit more, I finally understand.. if you set 
type to fuse.ceph, it will run /sbin/mount.fuse.ceph with the usual 
arguments (which include the device name).

...and it appears that that support is present in RHEL6, which is probably 
the oldest thing we care about.  I think that's a better route.  Something 
like:

 id=user,foo=bar/foofuse.ceph   defaults0 0

where the key/value pairs are passed by /sbin/mount.fuse.ceph to ceph-fuse 
on the command line?

sage


 -sam
 
  
  Anybody know if doing something like the below is a bad idea?
  
  Thanks!
  sage
  
  ---
  
  #!/bin/sh
  #
  # Helper to mount ceph-fuse from /etc/fstab.  To use, add an entry
  # like:
  #
  # # DEVICE PATH TYPE   OPTIONS
  # /sbin/ceph-fuse-mount#admin  /mnt/cephceph   defaults   0 0
  #
  # where 'admin' can be replaced with the client id to use when
  # authenticating (if it is not client.admin).  This will also control
  # which section of ceph.conf will be applied to the ceph-fuse process.
  
  set -e
  id=$1
  shift
  exec ceph-fuse -i $id $@
  
  
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Behavior of librbd::Image::read_iterate() changed?

2012-12-10 Thread Henry C Chang
Thanks for your reply, Dan.

It really seems due to new stripping codes. Below are the debug
messages of rbd export:

root@ceph1:~# rbd -p rbdtest export image1 out1 --debug_rbd 20
--debug_striper 20
2012-12-11 12:06:21.538932 7f50a5fea780 20 librbd: open_image: ictx =
0x27b33a0 name = 'image1' id = '' snap_name = ''
2012-12-11 12:06:21.542949 7f50a5fea780 20 librbd: detect format of image1 : old
2012-12-11 12:06:21.542970 7f50a5fea780 20 librbd: ictx_refresh 0x27b33a0
2012-12-11 12:06:21.547103 7f50a5fea780 10 librbd::ImageCtx:
init_layout stripe_unit 4194304 stripe_count 1 object_size 4194304
prefix rb.0.652e.6b8b4567 format rb.0.652e.6b8b4567.%012llx
2012-12-11 12:06:21.550876 7f50a5fea780 20 librbd::ImageCtx: watching
header object returned 0
2012-12-11 12:06:21.550916 7f50a5fea780 20 librbd: info 0x27b33a0
2012-12-11 12:06:21.550921 7f50a5fea780 20 librbd: ictx_check 0x27b33a0
2012-12-11 12:06:21.550983 7f50a5fea780 20 librbd: read_iterate
0x27b33a0 off = 0 len = 1073741824
2012-12-11 12:06:21.550988 7f50a5fea780 20 librbd: ictx_check 0x27b33a0
2012-12-11 12:06:21.551006 7f50a5fea780 20 librbd: aio_read 0x27b33a0
completion 0x27b4b10 [0,4194304]
2012-12-11 12:06:21.551009 7f50a5fea780 20 librbd: ictx_check 0x27b33a0
2012-12-11 12:06:21.551017 7f50a5fea780 10 striper file_to_extents
0~4194304 format rb.0.652e.6b8b4567.%012llx
2012-12-11 12:06:21.553971 7f50a5fea780 20 striper  su 4194304 sc 1 os
4194304 stripes_per_object 1
2012-12-11 12:06:21.553981 7f50a5fea780 20 striper  off 0 blockno 0
stripeno 0 stripepos 0 objectsetno 0 objectno 0 block_start 0
block_off 0 0~4194304
2012-12-11 12:06:21.554146 7f50a5fea780 20 striper  added new
extent(rb.0.652e.6b8b4567. (0) in @14 0~4194304 - [])
2012-12-11 12:06:21.554176 7f50a5fea780 15 striper file_to_extents
extent(rb.0.652e.6b8b4567. (0) in @14 0~4194304 -
[0,4194304]) in @14
2012-12-11 12:06:21.554180 7f50a5fea780 20 librbd:  oid
rb.0.652e.6b8b4567. 0~4194304 from [0,4194304]
2012-12-11 12:06:21.554194 7f50a5fea780 20 librbd::AioRequest: send
0x27b5530 rb.0.652e.6b8b4567. 0~4194304
2012-12-11 12:06:21.555059 7f50a5fea780 20 librbd::AioCompletion:
AioCompletion::finish_adding_requests 0x27b4b10 pending 1
2012-12-11 12:06:21.556193 7f50937fe700 20 librbd::AioRequest:
should_complete 0x27b5530 rb.0.652e.6b8b4567. 0~4194304 r
= -2
2012-12-11 12:06:21.556246 7f50937fe700 10 striper extent_to_file 0 0~4194304
2012-12-11 12:06:21.556248 7f50937fe700 20 striper  stripes_per_object 1
2012-12-11 12:06:21.556249 7f50937fe700 20 striper  object 0~4194304
- file 0~4194304
2012-12-11 12:06:21.556251 7f50937fe700 10 librbd::ImageCtx:
prune_parent_extents image overlap 0, object overlap 0 from image
extents []
2012-12-11 12:06:21.556254 7f50937fe700 10 librbd::AioCompletion:
C_AioRead::finish() 0x27b3fc0 r = -2
2012-12-11 12:06:21.556255 7f50937fe700 10 librbd::AioCompletion:  got
{} for [0,4194304] bl 0
2012-12-11 12:06:21.556264 7f50937fe700 10 striper
add_partial_sparse_result(0x27b4bf8) 0 covering {0=0} (offset 0) to
[0,4194304]
2012-12-11 12:06:21.556276 7f50937fe700 20 striper   t 0~4194304 bl has 0 off 0
2012-12-11 12:06:21.556277 7f50937fe700 20 striper   t 0~4194304 bl has 0 off 0
2012-12-11 12:06:21.556278 7f50937fe700 20 striper   s at end
2012-12-11 12:06:21.556282 7f50937fe700 20 librbd::AioCompletion:
AioCompletion::complete_request() 0x27b4b10 complete_cb=0x7f50a5b7e4c0
pending 1
2012-12-11 12:06:21.556284 7f50937fe700 20 librbd::AioCompletion:
AioCompletion::finalize() 0x27b4b10 rval 4194304 read_buf 0 read_bl
0x7fff8f12a9b0
2012-12-11 12:06:21.556285 7f50937fe700 10 striper
assemble_result(0x27b4bf8) zero_tail=1
2012-12-11 12:06:21.556290 7f50937fe700 20 striper
assemble_result(0x27b4bf8) 0~4194304 0 bytes
2012-12-11 12:06:21.576042 7f50937fe700 20 librbd::AioCompletion:
AioCompletion::finalize() moving resulting 4194304 bytes to bl
0x7fff8f12a9b0
writing 4194304 bytes at ofs 0

2012/12/8 Dan Mick dan.m...@inktank.com:
 I suspect, but have not figured out yet, that this is due to the new
 striping code (even on images that don't have advanced striping enabled).  I
 know we want to look at it further; it might be that this is a regression.


 On 12/07/2012 07:44 PM, Henry C Chang wrote:

 Hi,

 I am testing v0.55. I noticed that the behavior of
 librbd::Image::read_iterate() changed. With 0.48.2, when hitting the
 hole, the callback function will be called with the buf set to NULL.
 However, with v0.55, I got the zero-ed buffer of full length of the
 object (e.g., 4MB).

 Is it the expected behavior or a bug?

 Henry
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

Debian packaging question

2012-12-10 Thread Gary Lowell
Hi -

I'm looking for advice on debian multiple architecture repositories.  To date 
we have been building ceph debian packages on two different machines for the 
i386 and amd64 platforms, rsyncing the results to a common directory on the 
build host, then putting the results together using the reprepro command to 
push out to ceph.com.  As all the packages are architecture=linux-any, the arch 
is embedded in the file names and we don't have any collisions.

The new libcephfs-java, which is architecture=all, ends up being built twice 
with the same resulting file name, but different checksums depending on where 
it was built. Not unexpectedly, reprepro complains about this.

I know just enough about debian packaging to be a danger to myself and others. 
I can see how to fix up the checksums after the fact, but what is the right way 
to fix the problem ?

Thanks,
Gary


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debian packaging question

2012-12-10 Thread Wido den Hollander

Hi,

On 12/11/2012 01:19 PM, Gary Lowell wrote:

Hi -

I'm looking for advice on debian multiple architecture repositories.  To date 
we have been building ceph debian packages on two different machines for the 
i386 and amd64 platforms, rsyncing the results to a common directory on the 
build host, then putting the results together using the reprepro command to 
push out to ceph.com.  As all the packages are architecture=linux-any, the arch 
is embedded in the file names and we don't have any collisions.

The new libcephfs-java, which is architecture=all, ends up being built twice 
with the same resulting file name, but different checksums depending on where 
it was built. Not unexpectedly, reprepro complains about this.

I know just enough about debian packaging to be a danger to myself and others. 
I can see how to fix up the checksums after the fact, but what is the right way 
to fix the problem ?



I assume you are building with dpkg-buildpackage ?

The manpage shows:

-B Specifies a binary-only build, limited to architecture dependent 
packages.  Passed to dpkg-genchanges.


-A Specifies a binary-only build, limited to architecture 
independent packages. Passed to dpkg-genchanges.


So on the i386 and amd64 machines you'd run with -B and sync them to 
ceph.com


On one of the machines you'd also run with -A which should produce the 
architecture independent packages like libcephfs-java.


That's the theory, I haven't tested it :)

Wido


Thanks,
Gary


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debian packaging question

2012-12-10 Thread Sage Weil
On Mon, 10 Dec 2012, Gary Lowell wrote:
 Hi -
 
 I'm looking for advice on debian multiple architecture repositories.  
 To date we have been building ceph debian packages on two different 
 machines for the i386 and amd64 platforms, rsyncing the results to a 
 common directory on the build host, then putting the results together 
 using the reprepro command to push out to ceph.com.  As all the packages 
 are architecture=linux-any, the arch is embedded in the file names and 
 we don't have any collisions.
 
 The new libcephfs-java, which is architecture=all, ends up being built 
 twice with the same resulting file name, but different checksums 
 depending on where it was built. Not unexpectedly, reprepro complains 
 about this.

Can we just ignore the second attempt that fails?  Or only try to add the 
arch=all .dsc once?

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RBD using problem

2012-12-10 Thread Chuanyu Tsai
Hi cephers,

I know that we can setup cluster addr and public addr to let OSDs 
listen on different subnet for different purpose, however, I don't 
have so much public IPs to give all my OSDs an IP per OSD.

So my question is, if I follow the link:
http://www.spinics.net/lists/ceph-devel/msg10941.html
to set all my pools using the ssd-primary rule or 
something like that, than I bind public IP on every 
ceph-osd-ssd-server-* 
(it means all my primary osd have public IP, right?), 
can this method make me using RBD correctly?

Or do we have some method to let public network clients use RBD 
and no need to let all OSDs have public IP?

Regards,
Chuanyu.
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debian packaging question

2012-12-10 Thread Gary Lowell

On Dec 10, 2012, at 9:34 PM, Wido den Hollander wrote:

 Hi,
 
 On 12/11/2012 01:19 PM, Gary Lowell wrote:
 Hi -
 
 I'm looking for advice on debian multiple architecture repositories.  To 
 date we have been building ceph debian packages on two different machines 
 for the i386 and amd64 platforms, rsyncing the results to a common directory 
 on the build host, then putting the results together using the reprepro 
 command to push out to ceph.com.  As all the packages are 
 architecture=linux-any, the arch is embedded in the file names and we don't 
 have any collisions.
 
 The new libcephfs-java, which is architecture=all, ends up being built twice 
 with the same resulting file name, but different checksums depending on 
 where it was built. Not unexpectedly, reprepro complains about this.
 
 I know just enough about debian packaging to be a danger to myself and 
 others. I can see how to fix up the checksums after the fact, but what is 
 the right way to fix the problem ?
 
 
 I assume you are building with dpkg-buildpackage ?
 
 The manpage shows:
 
 -B Specifies a binary-only build, limited to architecture dependent 
 packages.  Passed to dpkg-genchanges.
 
 -A Specifies a binary-only build, limited to architecture independent 
 packages. Passed to dpkg-genchanges.
 
 So on the i386 and amd64 machines you'd run with -B and sync them to ceph.com
 
 On one of the machines you'd also run with -A which should produce the 
 architecture independent packages like libcephfs-java.
 
 That's the theory, I haven't tested it :)
 
 Wido

Thanks Wido.  We're using pbuilder, but it looks like it has similar options, 
or can pass an option string to dpkg_buildpackage.   I'll do some testing.

Cheers,
Gary--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Debian packaging question

2012-12-10 Thread Gary Lowell

On Dec 10, 2012, at 9:34 PM, Sage Weil wrote:

 On Mon, 10 Dec 2012, Gary Lowell wrote:
 Hi -
 
 I'm looking for advice on debian multiple architecture repositories.  
 To date we have been building ceph debian packages on two different 
 machines for the i386 and amd64 platforms, rsyncing the results to a 
 common directory on the build host, then putting the results together 
 using the reprepro command to push out to ceph.com.  As all the packages 
 are architecture=linux-any, the arch is embedded in the file names and 
 we don't have any collisions.
 
 The new libcephfs-java, which is architecture=all, ends up being built 
 twice with the same resulting file name, but different checksums 
 depending on where it was built. Not unexpectedly, reprepro complains 
 about this.
 
 Can we just ignore the second attempt that fails?  Or only try to add the 
 arch=all .dsc once?

For 0.55 that's pretty much what I did and it still required fixing up the 
changelog checksums before reprepro would run without error.  I was hoping for 
a cleaner solution.Wido's suggestion looks like it will allow me to build 
just one version libcephfs-java, which will help, and shouldn't require much 
change to the build scripts.

Cheers,
Gary--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html