[ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet
Hi,

from now I have 5 monitors which share slow SSD with several OSD
journal. As a result, each data migration operation (reweight, recovery,
etc) is very slow and the cluster is near down.

So I have to change that. I'm looking to replace this 5 monitors by 3
new monitors, which still share (very fast) SSD with several OSD.
I suppose it's not a good idea, since monitors should have a dedicated
storage. What do you think about that ?
Is it a better practice to have dedicated storage, but share CPU with
Xen VM ?

Second point, I'm not sure how to do that migration, without downtime.
I was hoping to add the 3 new monitors, then progressively remove the 5
old monitors, but in the doc [1] indicate a special procedure for
unhealthy cluster, which seem to be for clusters with damaged monitors,
right ? In my case I only have dead PG [2] (#5226), from which I can't
recover, but monitors are fine. Can I use the standard procedure ?

Thanks,
Olivier

[1] 
http://ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors
[2] http://tracker.ceph.com/issues/5226

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error noticed while setting the Storage cluster

2013-08-08 Thread Suresh Sadhu
Thanks wido ,I have rectified it , I have created the ceph cluster and created 
cloudstack osd.

On hypervisor(KVM host) side do I need to install any ceph packages  to 
communicate to ceph storage cluster  which was exists on other host.

Regards
Sadhu








-Original Message-
From: ceph-users-boun...@lists.ceph.com 
[mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Wido den Hollander
Sent: 07 August 2013 00:21
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] error noticed while setting the Storage cluster

On 08/06/2013 08:31 PM, Suresh Sadhu wrote:
 HI ,

 I am getting following error when I try to execute this command from 
 admin node.

 Followed below procedure mentioned in the document.

 http://ceph.com/docs/master/start/quick-ceph-deploy/

 Sadhu@ubuntu-2:~$ ceph-deploy install --stable cuttlefish ubuntu3

 sadhu@ubuntu3's password:

 Traceback (most recent call last):

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 383, 
 in __init__

  self.modules = AutoImporter(self)

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 236, 
 in __init__

  remote_compile = self.__client.eval(compile)

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 478, 
 in eval

  return self.remote.eval(code, globals, locals)

File 
 /usr/lib/python2.7/dist-packages/pushy/protocol/connection.py,
 line 54, in eval

  return self.send_request(MessageType.evaluate, args)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 311, in send_request

  self.__send_message(message_type, args)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 560, in __send_message

  self.__ostream.send_message(m)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 97, in send_message

  self.__file.write(bytes_)

 IOError: [Errno 32] Broken pipe

 [remote] sudo:  /etc/sudoers.d/ceph: syntax error near line 1 

 [remote] sudo:  /etc/sudoers.d/ceph: syntax error near line 2 

 [remote] sudo: parse error in /etc/sudoers.d/ceph near line 1

 [remote] sudo: no valid sudoers sources found, quitting


Have you verified your sudoers file? Might be a copy/paste issue?

Wido

 [remote] sudo: unable to initialize policy plugin

 Traceback (most recent call last):

File /usr/bin/ceph-deploy, line 21, in module

  main()

File /usr/lib/pymodules/python2.7/ceph_deploy/cli.py, line 112, 
 in main

  return args.func(args)

File /usr/lib/pymodules/python2.7/ceph_deploy/install.py, line 
 364, in install

  sudo = args.pushy(get_transport(hostname))

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 583, 
 in connect

  return PushyClient(target, **kwargs)

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 383, 
 in __init__

  self.modules = AutoImporter(self)

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 236, 
 in __init__

  remote_compile = self.__client.eval(compile)

File /usr/lib/python2.7/dist-packages/pushy/client.py, line 478, 
 in eval

  return self.remote.eval(code, globals, locals)

File 
 /usr/lib/python2.7/dist-packages/pushy/protocol/connection.py,
 line 54, in eval

  return self.send_request(MessageType.evaluate, args)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 311, in send_request

  self.__send_message(message_type, args)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 560, in __send_message

  self.__ostream.send_message(m)

File
 /usr/lib/python2.7/dist-packages/pushy/protocol/baseconnection.py,
 line 97, in send_message

  self.__file.write(bytes_)

 IOError: [Errno 32] Broken pipe



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-08 Thread Oliver Francke

Hi Josh,

I have a session logged with:

debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story 
here, anyway.


Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is 
3.2.0-51-amd...


Do you want me to open a ticket for that stuff? I have about 5MB 
compressed logfile waiting for you ;)


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
screenshot' command. After that, the guest resumes and runs as expected. At that point we 
can examine the guest. Each time we'll see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan



--

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Openstack glance ceph rbd_store_user authentification problem

2013-08-08 Thread Steffen Thorhauer

Hi,
recently I had a problem with openstack glance and ceph.
I used the 
http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance 
documentation and 
http://docs.openstack.org/developer/glance/configuring.html documentation
I'm using ubuntu 12.04 LTS with grizzly from Ubuntu Cloud Archive and 
ceph 61.7.


glance-api.conf had following config options

default_store = rbd
rbd_store_user=images
rbd_store_pool = images
rbd_store_ceph_conf = /etc/ceph/ceph.conf


All the time when doing glance image create I get errors. In the glance 
api log I only found error like


2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images Traceback (most 
recent call last):
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File 
/usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 444, in 
_upload

2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images image_meta['size'])
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File 
/usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 241, in add
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images with 
rados.Rados(conffile=self.conf_file, rados_id=self.user) as conn:
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File 
/usr/lib/python2.7/dist-packages/rados.py, line 134, in __enter__

2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images self.connect()
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File 
/usr/lib/python2.7/dist-packages/rados.py, line 192, in connect
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images raise 
make_ex(ret, error calling connect)
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images ObjectNotFound: 
error calling connect


This trace message helped me not very much :-(
My google search glance.api.v1.images ObjectNotFound: error calling 
connect did only find 
http://irclogs.ceph.widodh.nl/index.php?date=2012-10-26
This  points me to an ceph authentification problem. But the ceph tools 
worked fine for me.
The I tried the debug option in glance-api.conf and I found following 
entry .


DEBUG glance.common.config [-] rbd_store_pool = images 
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485
DEBUG glance.common.config [-] rbd_store_user = glance 
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485


The glance-api service  did not use my rbd_store_user = images option!! 
Then I configured a client.glance auth and it worked with the

implicit glance user!!!

Now my question: Am I the only one with this problem??

Regards,
  Steffen Thorhauer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Openstack glance ceph rbd_store_user authentification problem

2013-08-08 Thread Mike Dawson

Steffan,

It works for me. I have:

user@node:/etc/ceph# cat /etc/glance/glance-api.conf | grep rbd
default_store = rbd
#   glance.store.rbd.Store,
rbd_store_ceph_conf = /etc/ceph/ceph.conf
rbd_store_user = images
rbd_store_pool = images
rbd_store_chunk_size = 4


Thanks,
Mike Dawson


On 8/8/2013 9:01 AM, Steffen Thorhauer wrote:

Hi,
recently I had a problem with openstack glance and ceph.
I used the
http://ceph.com/docs/master/rbd/rbd-openstack/#configuring-glance
documentation and
http://docs.openstack.org/developer/glance/configuring.html documentation
I'm using ubuntu 12.04 LTS with grizzly from Ubuntu Cloud Archive and
ceph 61.7.

glance-api.conf had following config options

default_store = rbd
rbd_store_user=images
rbd_store_pool = images
rbd_store_ceph_conf = /etc/ceph/ceph.conf


All the time when doing glance image create I get errors. In the glance
api log I only found error like

2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images Traceback (most
recent call last):
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/glance/api/v1/images.py, line 444, in
_upload
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images image_meta['size'])
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/glance/store/rbd.py, line 241, in add
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images with
rados.Rados(conffile=self.conf_file, rados_id=self.user) as conn:
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/rados.py, line 134, in __enter__
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images self.connect()
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images   File
/usr/lib/python2.7/dist-packages/rados.py, line 192, in connect
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images raise
make_ex(ret, error calling connect)
2013-08-08 10:25:38.021 5725 TRACE glance.api.v1.images ObjectNotFound:
error calling connect

This trace message helped me not very much :-(
My google search glance.api.v1.images ObjectNotFound: error calling
connect did only find
http://irclogs.ceph.widodh.nl/index.php?date=2012-10-26
This  points me to an ceph authentification problem. But the ceph tools
worked fine for me.
The I tried the debug option in glance-api.conf and I found following
entry .

DEBUG glance.common.config [-] rbd_store_pool = images
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485
DEBUG glance.common.config [-] rbd_store_user = glance
log_opt_values /usr/lib/python2.7/dist-packages/oslo/config/cfg.py:1485

The glance-api service  did not use my rbd_store_user = images option!!
Then I configured a client.glance auth and it worked with the
implicit glance user!!!

Now my question: Am I the only one with this problem??

Regards,
   Steffen Thorhauer
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to recover the osd.

2013-08-08 Thread Mike Dawson

Looks like you didn't get osd.0 deployed properly. Can you show:

- ls /var/lib/ceph/osd/ceph-0
- cat /etc/ceph/ceph.conf


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture
Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/8/2013 9:13 AM, Suresh Sadhu wrote:

HI,

My storage health cluster is warning state , one of the osd is in down
state and even if I try to start the osd it fail to start

sadhu@ubuntu3:~$ ceph osd stat

e22: 2 osds: 1 up, 1 in

sadhu@ubuntu3:~$ ls /var/lib/ceph/osd/

ceph-0  ceph-1

sadhu@ubuntu3:~$ ceph osd tree

# idweight  type name   up/down reweight

-1  0.14root default

-2  0.14host ubuntu3

0   0.06999 osd.0   down0

1   0.06999 osd.1   up  1

sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start 0

/etc/init.d/ceph: 0. not found (/etc/ceph/ceph.conf defines ,
/var/lib/ceph defines )

sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start osd.0

/etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines ,
/var/lib/ceph defines )

Ceph health status in warning mode.

pg 4.10 is active+degraded, acting [1]

pg 3.17 is active+degraded, acting [1]

pg 5.16 is active+degraded, acting [1]

pg 4.17 is active+degraded, acting [1]

pg 3.10 is active+degraded, acting [1]

recovery 62/124 degraded (50.000%)

mds.ceph@ubuntu3 at 10.147.41.3:6803/2148 is laggy/unresponsi

regards

sadhu



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-08 Thread Da Chun
Hi list,
I saw the info about data striping in 
http://ceph.com/docs/master/architecture/#data-striping .
But couldn't find the way to set these values.


Could you please tell me how to that or give me a link? Thanks!___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] minimum object size in ceph

2013-08-08 Thread Sage Weil
On Wed, 7 Aug 2013, Nulik Nol wrote:
 thanks Dan,
 i meant like PRIMARY KEY in a RDBMS, or Key for NoSQL (key-value pair)
 database to perform put() get() operations. Well, if it is string then
 it's ok, I can print binary keys in HEX or uuencode or something like
 that.
 Is there a limit on maximum string length for object name?

It is pretty long.. I think 4096 characters, although things are not quite 
as efficient on the backend when names are long.

sage

 
 Regards
 Nulik
 
 On Tue, Aug 6, 2013 at 4:08 PM, Dan Mick dan.m...@inktank.com wrote:
  No minumum object size.  As for key, not sure what you mean; the closest
  thing to an object 'key' is its name, but it's obvious from routines like
  rados_read() and rados_write() that that's a const char *.  Did you mean
  some other key?
 
 
  On 08/06/2013 12:13 PM, Nulik Nol wrote:
 
  Hi,
 
  when using the C api (RADOS) what is the minimum object size ? And
  what is the key type ? (uint64_t, char[], or something like that ?)
 
  TIA
  Nulik
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
  --
  Dan Mick, Filesystem Engineering
  Inktank Storage, Inc.   http://inktank.com
  Ceph docs: http://ceph.com/docs
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-08 Thread johnu
This can help you.
http://www.sebastien-han.fr/blog/2013/02/11/mount-a-specific-pool-with-cephfs/


On Thu, Aug 8, 2013 at 7:48 AM, Da Chun ng...@qq.com wrote:

 Hi list,
 I saw the info about data striping in
 http://ceph.com/docs/master/architecture/#data-striping .
 But couldn't find the way to set these values.

 Could you please tell me how to that or give me a link? Thanks!

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel BUG at net/ceph/osd_client.c:2103

2013-08-08 Thread Laurent Barbe

Hello,

I don't know if it's useful, but I can also reproduce this bug with :
rbd kernel 3.10.4
ceph osd 0.61.4
image format 2

rbd formatted in xfs, after some snapshots, and mount/umount test (no 
write on the file system), xfs mount make segfault and kernel have same log.


Cheers,

Laurent Barbe


Le 05/08/2013 07:22, Olivier Bonvalet a écrit :

Yes of course, thanks !

Le dimanche 04 août 2013 à 20:59 -0700, Sage Weil a écrit :

Hi Olivier,

This looks like http://tracker.ceph.com/issues/5760.  We should be able to
look at this more closely this week.  In the meantime, you might want to
go back to 3.9.x.  If we have a patch that addresses the bug, would you be
able to test it?

Thanks!
sage


On Mon, 5 Aug 2013, Olivier Bonvalet wrote:

Sorry, the dev list is probably a better place for that one.

Le lundi 05 ao?t 2013 ? 03:07 +0200, Olivier Bonvalet a ?crit :

Hi,

I've just upgraded a Xen Dom0 (Debian Wheezy with Xen 4.2.2) from Linux
3.9.11 to Linux 3.10.5, and now I have kernel panic after launching some
VM which use RBD kernel client.


In kernel logs, I have :

Aug  5 02:51:22 murmillia kernel: [  289.205652] kernel BUG at 
net/ceph/osd_client.c:2103!
Aug  5 02:51:22 murmillia kernel: [  289.205725] invalid opcode:  [#1] SMP
Aug  5 02:51:22 murmillia kernel: [  289.205908] Modules linked in: cbc rbd 
libceph libcrc32c xen_gntdev ip6table_mangle ip6t_REJECT ip6table_filter 
ip6_tables xt_DSCP iptable_mangle xt_LOG xt_physdev ipt_REJECT xt_tcpudp 
iptable_filter ip_tables x_tables bridge loop coretemp ghash_clmulni_intel 
aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt 
iTCO_vendor_support gpio_ich microcode serio_raw sb_edac edac_core evdev 
lpc_ich i2c_i801 mfd_core wmi ac ioatdma shpchp button dm_mod hid_generic 
usbhid hid sg sd_mod crc_t10dif crc32c_intel isci megaraid_sas libsas ahci 
libahci ehci_pci ehci_hcd libata scsi_transport_sas igb scsi_mod i2c_algo_bit 
ixgbe usbcore i2c_core dca usb_common ptp pps_core mdio
Aug  5 02:51:22 murmillia kernel: [  289.210499] CPU: 2 PID: 5326 Comm: 
blkback.3.xvda Not tainted 3.10-dae-dom0 #1
Aug  5 02:51:22 murmillia kernel: [  289.210617] Hardware name: Supermicro 
X9DRW-7TPF+/X9DRW-7TPF+, BIOS 2.0a 03/11/2013
Aug  5 02:51:22 murmillia kernel: [  289.210738] task: 880037d01040 ti: 
88003803a000 task.ti: 88003803a000
Aug  5 02:51:22 murmillia kernel: [  289.210858] RIP: e030:[a02d21d0]  
[a02d21d0] ceph_osdc_build_request+0x2bb/0x3c6 [libceph]
Aug  5 02:51:22 murmillia kernel: [  289.211062] RSP: e02b:88003803b9f8  
EFLAGS: 00010212
Aug  5 02:51:22 murmillia kernel: [  289.211154] RAX: 880033a181c0 RBX: 
880033a182ec RCX: 
Aug  5 02:51:22 murmillia kernel: [  289.211251] RDX: 880033a182af RSI: 
8050 RDI: 880030d34888
Aug  5 02:51:22 murmillia kernel: [  289.211347] RBP: 2000 R08: 
88003803ba58 R09: 
Aug  5 02:51:22 murmillia kernel: [  289.211444] R10:  R11: 
 R12: 880033ba3500
Aug  5 02:51:22 murmillia kernel: [  289.211541] R13: 0001 R14: 
88003847aa78 R15: 88003847ab58
Aug  5 02:51:22 murmillia kernel: [  289.211644] FS:  7f775da8c700() 
GS:88003f84() knlGS:
Aug  5 02:51:22 murmillia kernel: [  289.211765] CS:  e033 DS:  ES:  
CR0: 80050033
Aug  5 02:51:22 murmillia kernel: [  289.211858] CR2: 7fa21ee2c000 CR3: 
2be14000 CR4: 00042660
Aug  5 02:51:22 murmillia kernel: [  289.211956] DR0:  DR1: 
 DR2: 
Aug  5 02:51:22 murmillia kernel: [  289.212052] DR3:  DR6: 
0ff0 DR7: 0400
Aug  5 02:51:22 murmillia kernel: [  289.212148] Stack:
Aug  5 02:51:22 murmillia kernel: [  289.212232]  2000 
00243847aa78  880039949b40
Aug  5 02:51:22 murmillia kernel: [  289.212577]  2201 
880033811d98 88003803ba80 88003847aa78
Aug  5 02:51:22 murmillia kernel: [  289.212921]  880030f24380 
880002a38400 2000 a029584c
Aug  5 02:51:22 murmillia kernel: [  289.213264] Call Trace:
Aug  5 02:51:22 murmillia kernel: [  289.213358]  [a029584c] ? 
rbd_osd_req_format_write+0x71/0x7c [rbd]
Aug  5 02:51:22 murmillia kernel: [  289.213459]  [a0296f05] ? 
rbd_img_request_fill+0x695/0x736 [rbd]
Aug  5 02:51:22 murmillia kernel: [  289.213562]  [810c96a7] ? 
arch_local_irq_restore+0x7/0x8
Aug  5 02:51:22 murmillia kernel: [  289.213667]  [81357ff8] ? 
down_read+0x9/0x19
Aug  5 02:51:22 murmillia kernel: [  289.213763]  [a029828a] ? 
rbd_request_fn+0x191/0x22e [rbd]
Aug  5 02:51:22 murmillia kernel: [  289.213864]  [8117ac9e] ? 
__blk_run_queue_uncond+0x1e/0x26
Aug  5 02:51:22 murmillia kernel: [  289.213962]  [8117b7aa] ? 
blk_flush_plug_list+0x1c1/0x1e4
Aug  5 02:51:22 murmillia kernel: [  

Re: [ceph-users] how to recover the osd.

2013-08-08 Thread Suresh Sadhu
Thanks Mike,Please find the output of two commands 

sadhu@ubuntu3:~$ ls /var/lib/ceph/osd/ceph-0
sadhu@ubuntu3:~$ cat /etc/ceph/ceph.conf
[global]
fsid = 593dac9e-ce55-4803-acb4-2d32b4e0d3be
mon_initial_members = ubuntu3
mon_host = 10.147.41.3
#auth_supported = cephx
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd_journal_size = 1024
filestore_xattr_use_omap = true

-Original Message-
From: Mike Dawson [mailto:mike.daw...@cloudapt.com] 
Sent: 08 August 2013 18:50
To: Suresh Sadhu
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] how to recover the osd.

Looks like you didn't get osd.0 deployed properly. Can you show:

- ls /var/lib/ceph/osd/ceph-0
- cat /etc/ceph/ceph.conf


Thanks,

Mike Dawson
Co-Founder  Director of Cloud Architecture Cloudapt LLC
6330 East 75th Street, Suite 170
Indianapolis, IN 46250

On 8/8/2013 9:13 AM, Suresh Sadhu wrote:
 HI,

 My storage health cluster is warning state , one of the osd is in down 
 state and even if I try to start the osd it fail to start

 sadhu@ubuntu3:~$ ceph osd stat

 e22: 2 osds: 1 up, 1 in

 sadhu@ubuntu3:~$ ls /var/lib/ceph/osd/

 ceph-0  ceph-1

 sadhu@ubuntu3:~$ ceph osd tree

 # idweight  type name   up/down reweight

 -1  0.14root default

 -2  0.14host ubuntu3

 0   0.06999 osd.0   down0

 1   0.06999 osd.1   up  1

 sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start 0

 /etc/init.d/ceph: 0. not found (/etc/ceph/ceph.conf defines , 
 /var/lib/ceph defines )

 sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start osd.0

 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , 
 /var/lib/ceph defines )

 Ceph health status in warning mode.

 pg 4.10 is active+degraded, acting [1]

 pg 3.17 is active+degraded, acting [1]

 pg 5.16 is active+degraded, acting [1]

 pg 4.17 is active+degraded, acting [1]

 pg 3.10 is active+degraded, acting [1]

 recovery 62/124 degraded (50.000%)

 mds.ceph@ubuntu3 at 10.147.41.3:6803/2148 is laggy/unresponsi

 regards

 sadhu



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replace all monitors

2013-08-08 Thread Sage Weil
On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
 Hi,
 
 from now I have 5 monitors which share slow SSD with several OSD
 journal. As a result, each data migration operation (reweight, recovery,
 etc) is very slow and the cluster is near down.
 
 So I have to change that. I'm looking to replace this 5 monitors by 3
 new monitors, which still share (very fast) SSD with several OSD.
 I suppose it's not a good idea, since monitors should have a dedicated
 storage. What do you think about that ?
 Is it a better practice to have dedicated storage, but share CPU with
 Xen VM ?

I think it's okay, as long as you aren't wroried about the device filling 
up and the monitors are on different hosts.

 Second point, I'm not sure how to do that migration, without downtime.
 I was hoping to add the 3 new monitors, then progressively remove the 5
 old monitors, but in the doc [1] indicate a special procedure for
 unhealthy cluster, which seem to be for clusters with damaged monitors,
 right ? In my case I only have dead PG [2] (#5226), from which I can't
 recover, but monitors are fine. Can I use the standard procedure ?

The 'healthy' caveat in this case is about the monitor cluster; teh 
special procedure is only needed if you don't have enough healthy mons to 
form a  quorum.  The normal procedure should work just fine.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-08 Thread Josh Durgin

On 08/08/2013 05:40 AM, Oliver Francke wrote:

Hi Josh,

I have a session logged with:

 debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story
here, anyway.

Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
3.2.0-51-amd...

Do you want me to open a ticket for that stuff? I have about 5MB
compressed logfile waiting for you ;)


Yes, that'd be great. If you could include the time when you saw the 
guest hang that'd be ideal. I'm not sure if this is one or two bugs,

but it seems likely it's a bug in rbd and not qemu.

Thanks!
Josh


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs
as expected. At that point we can examine the guest. Each time we'll
see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to recover the osd.

2013-08-08 Thread Suresh Sadhu
Earlier its created properly after rebooting  host ,mount points are gone due 
to that ls command not shown earlier but now  I have mounted again now am able 
to see the same  folder structure

sadhu@ubuntu3:/var/lib/ceph$ ls /var/lib/ceph/osd/ceph-1
activate.monmap  active  ceph_fsid  current  fsid  journal  keyring  magic  
ready  store_version  upstart  whoami
sadhu@ubuntu3:/var/lib/ceph$ ls /var/lib/ceph/osd/ceph-0
activate.monmap  active  ceph_fsid  current  fsid  journal  keyring  magic  
ready  store_version  upstart  whoami
sadhu@ubuntu3:/var/lib/ceph$ mount


sadhu@ubuntu3:/var/lib/ceph$ ceph osd stat
e31: 2 osds: 2 up, 2 in

still it shows ceph health stat as warning.

sadhu@ubuntu3:/var/lib/ceph$ ceph health
HEALTH_WARN 225 pgs degraded; 676 pgs stuck unclean; recovery 21/124 degraded 
(16.935%); mds ceph@ubuntu3 is laggy

Thanks
sadhu



-Original Message-
From: Mike Dawson [mailto:mike.daw...@cloudapt.com] 
Sent: 08 August 2013 22:08
To: Suresh Sadhu
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] how to recover the osd.


On 8/8/2013 12:30 PM, Suresh Sadhu wrote:
 Thanks Mike,Please find the output of two commands

 sadhu@ubuntu3:~$ ls /var/lib/ceph/osd/ceph-0

^^^ that is a problem. It appears that osd.0 didn't get deployed properly. To 
see an example of what structure should be there, do:

ls /var/lib/ceph/osd/ceph-1

ceph-0 should be similar to the apparently working ceph-1 on your cluster.

It should look similar to:

#ls /var/lib/ceph/osd/ceph-0
ceph_fsid
current
fsid
keyring
magic
ready
store_version
whoami

- Mike

 sadhu@ubuntu3:~$ cat /etc/ceph/ceph.conf [global] fsid = 
 593dac9e-ce55-4803-acb4-2d32b4e0d3be
 mon_initial_members = ubuntu3
 mon_host = 10.147.41.3
 #auth_supported = cephx
 auth cluster required = cephx
 auth service required = cephx
 auth client required = cephx
 osd_journal_size = 1024
 filestore_xattr_use_omap = true

 -Original Message-
 From: Mike Dawson [mailto:mike.daw...@cloudapt.com]
 Sent: 08 August 2013 18:50
 To: Suresh Sadhu
 Cc: ceph-users@lists.ceph.com
 Subject: Re: [ceph-users] how to recover the osd.

 Looks like you didn't get osd.0 deployed properly. Can you show:

 - ls /var/lib/ceph/osd/ceph-0
 - cat /etc/ceph/ceph.conf


 Thanks,

 Mike Dawson
 Co-Founder  Director of Cloud Architecture Cloudapt LLC
 6330 East 75th Street, Suite 170
 Indianapolis, IN 46250

 On 8/8/2013 9:13 AM, Suresh Sadhu wrote:
 HI,

 My storage health cluster is warning state , one of the osd is in 
 down state and even if I try to start the osd it fail to start

 sadhu@ubuntu3:~$ ceph osd stat

 e22: 2 osds: 1 up, 1 in

 sadhu@ubuntu3:~$ ls /var/lib/ceph/osd/

 ceph-0  ceph-1

 sadhu@ubuntu3:~$ ceph osd tree

 # idweight  type name   up/down reweight

 -1  0.14root default

 -2  0.14host ubuntu3

 0   0.06999 osd.0   down0

 1   0.06999 osd.1   up  1

 sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start 0

 /etc/init.d/ceph: 0. not found (/etc/ceph/ceph.conf defines , 
 /var/lib/ceph defines )

 sadhu@ubuntu3:~$ sudo /etc/init.d/ceph -a start osd.0

 /etc/init.d/ceph: osd.0 not found (/etc/ceph/ceph.conf defines , 
 /var/lib/ceph defines )

 Ceph health status in warning mode.

 pg 4.10 is active+degraded, acting [1]

 pg 3.17 is active+degraded, acting [1]

 pg 5.16 is active+degraded, acting [1]

 pg 4.17 is active+degraded, acting [1]

 pg 3.10 is active+degraded, acting [1]

 recovery 62/124 degraded (50.000%)

 mds.ceph@ubuntu3 at 10.147.41.3:6803/2148 is laggy/unresponsi

 regards

 sadhu



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Chat Logs: Ceph Dev Summit

2013-08-08 Thread Ross Turk

Hey all - I just posted the IRC chat logs from the Ceph Developer Summit. 
You can find them on the wiki, one log for sessions 1-16 and another for
sessions 17-29:

http://wiki.ceph.com/01Planning/CDS/Emperor/Chat_Log%3A_Sessions_1-16
http://wiki.ceph.com/01Planning/CDS/Emperor/Chat_Log%3A_Sessions_17-29

Cheers,
Ross


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Backup monmap, osdmap, and crushmap

2013-08-08 Thread Craig Lewis
I've seen a couple posts here about broken clusters that had to repair
by modifing the monmap, osdmap, or the crush rules.

The old school sysadmin in me says it would be a good idea to make
backups of these 3 databases.  So far though, it seems like everybody
was able to repair their clusters by dumping the current map and
modifying it.

I'll probably do it, just to assuage my paranoia, but I was wondering
what you guys thought.



I'm thinking of cronning this on the MON servers:
#!/usr/bin/env bash

# Number of days to keep backups
cleanup_age=10

# Fetch the current timestamp, to use in the backup filenames
date=$(date +%Y-%m-%dT%H:%M:%S)

# Dump the current maps
cd /var/lib/ceph/backups/
ceph mon getmap -o ./monmap.${date}
ceph osd getmap -o ./osdmap.${date}
ceph osd getcrushmap -o ./crushmap.${date}

# Delete old maps
find . -type f -regextype posix-extended -regex
'\./(mon|osd|crush)map\..*' -mtime +${cleanup_age} -print0 | xargs -0 rm




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Replace all monitors

2013-08-08 Thread Olivier Bonvalet
Le jeudi 08 août 2013 à 09:43 -0700, Sage Weil a écrit :
 On Thu, 8 Aug 2013, Olivier Bonvalet wrote:
  Hi,
  
  from now I have 5 monitors which share slow SSD with several OSD
  journal. As a result, each data migration operation (reweight, recovery,
  etc) is very slow and the cluster is near down.
  
  So I have to change that. I'm looking to replace this 5 monitors by 3
  new monitors, which still share (very fast) SSD with several OSD.
  I suppose it's not a good idea, since monitors should have a dedicated
  storage. What do you think about that ?
  Is it a better practice to have dedicated storage, but share CPU with
  Xen VM ?
 
 I think it's okay, as long as you aren't wroried about the device filling 
 up and the monitors are on different hosts.

Not sure to understand : by «dedicated storage», I was talking of the
monitor. Can I put monitors on Xen «host», if they have dedicated
storage ?

 
  Second point, I'm not sure how to do that migration, without downtime.
  I was hoping to add the 3 new monitors, then progressively remove the 5
  old monitors, but in the doc [1] indicate a special procedure for
  unhealthy cluster, which seem to be for clusters with damaged monitors,
  right ? In my case I only have dead PG [2] (#5226), from which I can't
  recover, but monitors are fine. Can I use the standard procedure ?
 
 The 'healthy' caveat in this case is about the monitor cluster; teh 
 special procedure is only needed if you don't have enough healthy mons to 
 form a  quorum.  The normal procedure should work just fine.
 

Great, thanks !


 sage
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] journal on ssd

2013-08-08 Thread Joao Pedras
Let me just clarify... the prepare process created all 10 partitions in sdg
the thing is that only 2 (sdg1, sdg2) would be present in /dev. The partx
bit is just a hack as I am not familiar with the entire sequence.
Initially I was deploying this test cluster in 5 nodes, each with 10
spinners, 1 OS spinner, 1 ssd for journal. *All* nodes would only bring up
the first 2 osds.

From the start the partitions for journals are there:
~]# parted /dev/sdg
GNU Parted 2.1
Using /dev/sdg
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) p
Model: ATA Samsung SSD 840 (scsi)
Disk /dev/sdg: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End SizeFile system  Name  Flags
 1  1049kB  4295MB  4294MB   ceph journal
 2  4296MB  8590MB  4294MB   ceph journal
 3  8591MB  12.9GB  4294MB   ceph journal
 4  12.9GB  17.2GB  4294MB   ceph journal
 5  17.2GB  21.5GB  4294MB   ceph journal
 6  21.5GB  25.8GB  4294MB   ceph journal
 7  25.8GB  30.1GB  4294MB   ceph journal
 8  30.1GB  34.4GB  4294MB   ceph journal
 9  34.4GB  38.7GB  4294MB   ceph journal
10  38.7GB  42.9GB  4294MB   ceph journal

After partx all the entries show up under /dev and I have been able to
install the cluster successfully.

The only weirdness happened with only one node. Not everything was entirely
active+clean. That got resolved after I added the 2nd node.

At the moment with 3 nodes:
2013-08-08 17:38:38.328991 mon.0 [INF] pgmap v412: 192 pgs: 192
active+clean; 9518 bytes data, 1153 MB used, 83793 GB / 83794 GB avail

Thanks,



On Thu, Aug 8, 2013 at 8:17 AM, Sage Weil s...@inktank.com wrote:

 On Wed, 7 Aug 2013, Tren Blackburn wrote:
  On Tue, Aug 6, 2013 at 11:14 AM, Joao Pedras jpped...@gmail.com wrote:
Greetings all.
  I am installing a test cluster using one ssd (/dev/sdg) to hold the
  journals. Ceph's version is 0.61.7 and I am using ceph-deploy obtained
  from ceph's git yesterday. This is on RHEL6.4, fresh install.
 
  When preparing the first 2 drives, sda and sdb, all goes well and the
  journals get created in sdg1 and sdg2:
 
  $ ceph-deploy osd prepare ceph00:sda:sdg ceph00:sdb:sdg
  [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
  ceph00:/dev/sda:/dev/sdg ceph00:/dev/sdb:/dev/sdg
  [ceph_deploy.osd][DEBUG ] Deploying osd to ceph00
  [ceph_deploy.osd][DEBUG ] Host ceph00 is now ready for osd use.
  [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sda journal
  /dev/sdg activate False
  [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sdb journal
  /dev/sdg activate False
 
  When preparing sdc or any disk after the first 2 I get the following
  in that osd's log but no errors on ceph-deploy:
 
  # tail -f /var/log/ceph/ceph-osd.2.log
  2013-08-06 10:51:36.655053 7f5ba701a780  0 ceph version 0.61.7
  (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd, pid
  11596
  2013-08-06 10:51:36.658671 7f5ba701a780  1
  filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs in
  /var/lib/ceph/tmp/mnt.i2NK47
  2013-08-06 10:51:36.658697 7f5ba701a780  1
  filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs fsid is already set to
  5d1beb09-1f80-421d-a88c-57789e2fc33e
  2013-08-06 10:51:36.813783 7f5ba701a780  1
  filestore(/var/lib/ceph/tmp/mnt.i2NK47) leveldb db exists/created
  2013-08-06 10:51:36.813964 7f5ba701a780 -1 journal FileJournal::_open:
  disabling aio for non-block journal.  Use journal_force_aio to force
  use of aio anyway
  2013-08-06 10:51:36.813999 7f5ba701a780  1 journal _open
  /var/lib/ceph/tmp/mnt.i2NK47/journal fd 10: 0 bytes, block size 4096
  bytes, directio = 1, aio = 0
  2013-08-06 10:51:36.814035 7f5ba701a780 -1 journal check: ondisk fsid
  ---- doesn't match expected
  5d1beb09-1f80-421d-a88c-57789e2fc33e, invalid (someone else's?)
  journal
  2013-08-06 10:51:36.814093 7f5ba701a780 -1
  filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkjournal error creating
  journal on /var/lib/ceph/tmp/mnt.i2NK47/journal: (22) Invalid argument
  2013-08-06 10:51:36.814125 7f5ba701a780 -1 OSD::mkfs: FileStore::mkfs
  failed with error -22
  2013-08-06 10:51:36.814185 7f5ba701a780 -1  ** ERROR: error creating
  empty object store in /var/lib/ceph/tmp/mnt.i2NK47: (22) Invalid
  argument
 
  I have cleaned the disks with dd, zapped them and so forth but this
  always occurs. If doing sdc/sdd first, for example, then sda or
  whatever follows fails with similar errors.
 
  Does anyone have any insight on this issue?

 Very strange!

 What does the partition table look like at this point?  Does the joural
 nsymlink in the osd data directory point to the right partition/device on
 the failing osd?

 sage




-- 
Joao Pedras
___
ceph-users mailing list
ceph-users@lists.ceph.com

Re: [ceph-users] journal on ssd

2013-08-08 Thread Joao Pedras
I might be able to give that a shot tomorrow as I will probably reinstall
this set.


On Thu, Aug 8, 2013 at 6:19 PM, Sage Weil s...@inktank.com wrote:

 On Thu, 8 Aug 2013, Joao Pedras wrote:
  Let me just clarify... the prepare process created all 10 partitions in
 sdg
  the thing is that only 2 (sdg1, sdg2) would be present in /dev. The partx
  bit is just a hack as I am not familiar with the entire sequence.
 Initially
  I was deploying this test cluster in 5 nodes, each with 10 spinners, 1 OS
  spinner, 1 ssd for journal. *All* nodes would only bring up the first 2
  osds.
 
  From the start the partitions for journals are there:
  ~]# parted /dev/sdg
  GNU Parted 2.1
  Using /dev/sdg
  Welcome to GNU Parted! Type 'help' to view a list of commands.
  (parted) p
  Model: ATA Samsung SSD 840 (scsi)
  Disk /dev/sdg: 512GB
  Sector size (logical/physical): 512B/512B
  Partition Table: gpt
 
  Number  Start   End SizeFile system  Name  Flags
   1  1049kB  4295MB  4294MB   ceph journal
   2  4296MB  8590MB  4294MB   ceph journal
   3  8591MB  12.9GB  4294MB   ceph journal
   4  12.9GB  17.2GB  4294MB   ceph journal
   5  17.2GB  21.5GB  4294MB   ceph journal
   6  21.5GB  25.8GB  4294MB   ceph journal
   7  25.8GB  30.1GB  4294MB   ceph journal
   8  30.1GB  34.4GB  4294MB   ceph journal
   9  34.4GB  38.7GB  4294MB   ceph journal
  10  38.7GB  42.9GB  4294MB   ceph journal
 
  After partx all the entries show up under /dev and I have been able to
  install the cluster successfully.

 This really seems like something that udev should be doing.  I think the
 next step would be to reproduce the problem directly, by wiping the
 partition table (ceph-disk zap /dev/sdg) and running the sgdisk commands
 to create the partitions directly from the command line, and then
 verifying that the /dev entries are (not) present.

 It may be that our ugly ceph-disk-udev helper is throwing a wrench in
 things, but I'm not sure offhand how that would be.  Once you have a
 sequence that reproduces the problem, though, we can experiement (by e.g.
 disabling the ceph helper to rule that out).

 sage


 
  The only weirdness happened with only one node. Not everything was
 entirely
  active+clean. That got resolved after I added the 2nd node.
 
  At the moment with 3 nodes:
  2013-08-08 17:38:38.328991 mon.0 [INF] pgmap v412: 192 pgs: 192
  active+clean; 9518 bytes data, 1153 MB used, 83793 GB / 83794 GB avail
 
  Thanks,
 
 
 
  On Thu, Aug 8, 2013 at 8:17 AM, Sage Weil s...@inktank.com wrote:
On Wed, 7 Aug 2013, Tren Blackburn wrote:
 On Tue, Aug 6, 2013 at 11:14 AM, Joao Pedras
jpped...@gmail.com wrote:
   Greetings all.
 I am installing a test cluster using one ssd (/dev/sdg) to
hold the
 journals. Ceph's version is 0.61.7 and I am using ceph-deploy
obtained
 from ceph's git yesterday. This is on RHEL6.4, fresh install.

 When preparing the first 2 drives, sda and sdb, all goes well
and the
 journals get created in sdg1 and sdg2:

 $ ceph-deploy osd prepare ceph00:sda:sdg ceph00:sdb:sdg
 [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks
 ceph00:/dev/sda:/dev/sdg ceph00:/dev/sdb:/dev/sdg
 [ceph_deploy.osd][DEBUG ] Deploying osd to ceph00
 [ceph_deploy.osd][DEBUG ] Host ceph00 is now ready for osd
use.
 [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sda
journal
 /dev/sdg activate False
 [ceph_deploy.osd][DEBUG ] Preparing host ceph00 disk /dev/sdb
journal
 /dev/sdg activate False

 When preparing sdc or any disk after the first 2 I get the
following
 in that osd's log but no errors on ceph-deploy:

 # tail -f /var/log/ceph/ceph-osd.2.log
 2013-08-06 10:51:36.655053 7f5ba701a780  0 ceph version 0.61.7
 (8f010aff684e820ecc837c25ac77c7a05d7191ff), process ceph-osd,
pid
 11596
 2013-08-06 10:51:36.658671 7f5ba701a780  1
 filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs in
 /var/lib/ceph/tmp/mnt.i2NK47
 2013-08-06 10:51:36.658697 7f5ba701a780  1
 filestore(/var/lib/ceph/tmp/mnt.i2NK47) mkfs fsid is already
set to
 5d1beb09-1f80-421d-a88c-57789e2fc33e
 2013-08-06 10:51:36.813783 7f5ba701a780  1
 filestore(/var/lib/ceph/tmp/mnt.i2NK47) leveldb db
exists/created
 2013-08-06 10:51:36.813964 7f5ba701a780 -1 journal
FileJournal::_open:
 disabling aio for non-block journal.  Use journal_force_aio to
force
 use of aio anyway
 2013-08-06 10:51:36.813999 7f5ba701a780  1 journal _open
 /var/lib/ceph/tmp/mnt.i2NK47/journal fd 10: 0 bytes, 

[ceph-users] ceph-deploy behind corporate firewalls

2013-08-08 Thread Harvey Skinner
 hi all,

I am not sure if I am the only one having issues with ceph-deploy
behind a firewall or not.  I haven't seen any other reports of similar
issues yet.  With http proxies I am able to have apt-get working, but
wget is still an issue.

Working to use the newer ceph-deploy mechanism to deploy my next POC
set up on four storage nodes.   The ceph-deploy install process
unfortunately uses wget to retrieve the Ceph release key and failing
the install.   To get around this i can manually add the Ceph release
key on all my nodes and apt-get install all the Ceph packages.
Question though is whether there is anything else that ceph-deploy
does that I would need to do manually to have everything in state
where ceph-deploy would work correctly for the rest of the cluster
setup and deployment, i.e. ceph-deploy new  -and- ceph-deploy mon
create, etc.?

thank you,
Harvey
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy behind corporate firewalls

2013-08-08 Thread Sage Weil
On Thu, 8 Aug 2013, Harvey Skinner wrote:
  hi all,
 
 I am not sure if I am the only one having issues with ceph-deploy
 behind a firewall or not.  I haven't seen any other reports of similar
 issues yet.  With http proxies I am able to have apt-get working, but
 wget is still an issue.

This is indeed a problem for many users.  It's on our list of things to 
add to the tool!

 Working to use the newer ceph-deploy mechanism to deploy my next POC
 set up on four storage nodes.   The ceph-deploy install process
 unfortunately uses wget to retrieve the Ceph release key and failing
 the install.   To get around this i can manually add the Ceph release
 key on all my nodes and apt-get install all the Ceph packages.
 Question though is whether there is anything else that ceph-deploy
 does that I would need to do manually to have everything in state
 where ceph-deploy would work correctly for the rest of the cluster
 setup and deployment, i.e. ceph-deploy new  -and- ceph-deploy mon
 create, etc.?

I'm pretty sure install is the only thing that needs apt or wget; you 
should be fine.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com