Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi,

 happened to me using ubuntu packages.
 usually when you upgrade a package it calls all its dependencies, for
 ceph you have to update one by one.
 did you try that ?

All ceph packages are up to date. Same happens with a custom compiled
radosgw from git

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Ok, I tracked this down ...

I'm using lighttpd as a FastCGI front end and it doesn't set
SCRIPT_URI environment.


So the line 1123 in rgw/rgw_rest.cc :

s-script_uri = s-env-get(SCRIPT_URI);

Tries to assign NULL to s-script_uri  which crashes with the
particularly unhelpful stack trace I pasted above ...


Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Wido den Hollander

On 01/07/2013 12:04 PM, Sylvain Munaut wrote:

Ok, I tracked this down ...

I'm using lighttpd as a FastCGI front end and it doesn't set
SCRIPT_URI environment.


So the line 1123 in rgw/rgw_rest.cc :

s-script_uri = s-env-get(SCRIPT_URI);

Tries to assign NULL to s-script_uri  which crashes with the
particularly unhelpful stack trace I pasted above ...



As far as I know relying on SCRIPT_URI is rather dangerous since it's 
not always there.


There better should be an if/else-satement surrounding that code having 
it defaulting to something else if SCRIPT_URI isn't available.


That's what I still remember from my PHP-days where SCRIPT_URI was 
always a hit-and-miss.


Wido



Cheers,

 Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi,

 As far as I know relying on SCRIPT_URI is rather dangerous since it's not
 always there.

 There better should be an if/else-satement surrounding that code having it
 defaulting to something else if SCRIPT_URI isn't available.

I've opened a bug and proposed a patch setting the default value to 

http://tracker.newdream.net/issues/3735

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Caleb Miles
Hi all,

Created branch wip-3735 to capture Sylvain's patch.

On Mon, Jan 7, 2013 at 7:21 AM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
 Hi,

 As far as I know relying on SCRIPT_URI is rather dangerous since it's not
 always there.

 There better should be an if/else-satement surrounding that code having it
 defaulting to something else if SCRIPT_URI isn't available.

 I've opened a bug and proposed a patch setting the default value to 

 http://tracker.newdream.net/issues/3735

 Cheers,

 Sylvain
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Librados aio stat

2013-01-07 Thread Filippos Giannakos

Hi Josh,

On 01/05/2013 02:08 AM, Josh Durgin wrote:

On 01/04/2013 05:01 AM, Filippos Giannakos wrote:

Hi Team,

Is there any progress or any comments regarding the librados aio stat
patch ?


They look good to me. I put them in the wip-librados-aio-stat branch.
Can we add your signed-off-by to them?

Thanks,
Josh


Sorry for my late response. You can go ahead with the signed-off.

Best Regards

--
Filippos.
philipg...@grnet.gr
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Sylvain Munaut
Hi,

 Yeah, it's missing a guard here. Strange, I remember fixing this and
 others, but I can't find any trace of that. I think setting it to
 empty string is ok, though we may want to explore other fixes
 (configurable?) -- it affects the Location field in S3 POST response.

Yes, I've seen it's used there. Since we have dedicated gw machines,
they're at the root / so doesn't matter.

BTW, do you know why the stack trace sucks so much ? I mean, it looks
especially unhelpful, the function where the crash happen isn't even
listed in there at all ...

Cheers,

Sylvain
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: radosgw segfault in 0.56

2013-01-07 Thread Yehuda Sadeh
On Mon, Jan 7, 2013 at 1:04 PM, Sylvain Munaut
s.mun...@whatever-company.com wrote:
 Ok, I tracked this down ...

 I'm using lighttpd as a FastCGI front end and it doesn't set
 SCRIPT_URI environment.


 So the line 1123 in rgw/rgw_rest.cc :

 s-script_uri = s-env-get(SCRIPT_URI);

 Tries to assign NULL to s-script_uri  which crashes with the
 particularly unhelpful stack trace I pasted above ...

Yeah, it's missing a guard here. Strange, I remember fixing this and
others, but I can't find any trace of that. I think setting it to
empty string is ok, though we may want to explore other fixes
(configurable?) -- it affects the Location field in S3 POST response.

Yehuda
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is Ceph recovery able to handle massive crash

2013-01-07 Thread Denis Fondras

Hello all,


I'm using Ceph 0.55.1 on a Debian Wheezy (1 mon, 1 mds et 3 osd over
btrfs) and every once in a while, an OSD process crashes (almost never
the same osd crashes).
This time I had 2 osd crash in a row and so I only had one replicate. I
could bring the 2 crashed osd up and it started to recover.
Unfortunately, the source osd crashed while recovering and now I have
a some lost PGs.

If I happen to bring the primary OSD up again, can I imagine the lost PG
will be recovered too ?



Ok, so it seems I can't bring back to life my primary OSD :-(

---8---
health HEALTH_WARN 72 pgs incomplete; 72 pgs stuck inactive; 72 pgs 
stuck unclean

monmap e1: 1 mons at {a=192.168.0.132:6789/0}, election epoch 1, quorum 0 a
osdmap e1130: 3 osds: 2 up, 2 in
 pgmap v1567492: 624 pgs: 552 active+clean, 72 incomplete; 1633 GB 
data, 4766 GB used, 3297 GB / 8383 GB avail

 mdsmap e127: 1/1/1 up {0=a=up:active}

2013-01-07 18:11:10.852673 mon.0 [INF] pgmap v1567492: 624 pgs: 552 
active+clean, 72 incomplete; 1633 GB data, 4766 GB used, 3297 GB / 8383 
GB avail

---8---

When I rbd list, I can see all my images.
When I do rbd map, I can map only a few of them and when I mount the 
devices, none can mount (the mount process hangs and I cannot even ^C 
the process).


Is there something I can try ?

Thank you in advance,
Denis
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Interfaces proposed changes

2013-01-07 Thread David Zafman

I sent this proposal out to the developers that own the FSAL CEPH portion of 
Nfs-Ganesha.  They have changes to Ceph that expose additional interfaces for 
this.  This is our initial cut at improving the interfaces.

David Zafman
Senior Developer
david.zaf...@inktank.com


Begin forwarded message:

 From: David Zafman david.zaf...@inktank.com
 Subject: Interfaces proposed changes
 Date: January 4, 2013 5:50:49 PM PST
 To: Matthew W. Benjamin m...@linuxbox.com, Adam C. Emerson 
 aemer...@linuxbox.com
 
 
 Below is a patch that shows the newly proposed low-level interface.  
 Obviously, the ceph_ll_* functions you created in libcephfs.cc will have the 
 corresponding changes made to them.  An Fh * type is used for as an open file 
 descriptor and needs a corresponding ll_release()/ceph_ll_close().  An Inode 
 * returned by various inode create functions and ll_lookup_ino() is a 
 referenced inode and needs a corresponding _ll_put() exposed via something 
 maybe named ceph_ll_put().
 
 The existing FSAL CEPH doesn't ever call ceph_ll_forget() even though there 
 are references taken on inodes ceph ll_* operation level.  This interface 
 creates a clearer model to be used by FSAL CEPH.  As I don't understand 
 Ganesha's inode caching model, it isn't clear to me if it can indirectly 
 hold inodes that are below FSAL.  Especially for NFS v3 where there is no 
 open state, the code shouldn't keep doing final release of an inode after 
 every operation.
 
 diff --git a/src/client/Client.cc b/src/client/Client.cc
 index d876454..4d4d0f1 100644
 --- a/src/client/Client.cc
 +++ b/src/client/Client.cc
 @@ -6250,13 +6250,39 @@ bool Client::ll_forget(vinodeno_t vino, int num)
   return last;
 }
 
 +
 +inodeno_t Client::ll_get_ino(Inode *in)
 +{
 +  return in-ino;
 +}
 +
 +snapid_t Client::ll_get_snapid(Inode *in)
 +{
 +  return in-snapid;
 +}
 +
 +vinodeno_t Client::ll_get_vino(Inode *in)
 +{
 +  return vinodeno_t(in-ino, in-snapid);
 +}
 +
 +Inode *Client::ll_lookup_ino(vinodeno_t vino)
 +{
 +  Mutex::Locker lock(client_lock);
 +  hash_mapvinodeno_t,Inode*::iterator p = inode_map.find(vino);
 +  if (p == inode_map.end())
 +return NULL;
 +  Inode *in = p-second;
 +  _ll_get(in);
 +  return in;
 +}
 +
 Inode *Client::_ll_get_inode(vinodeno_t vino)
 {
   assert(inode_map.count(vino));
   return inode_map[vino];
 }
 
 -
 int Client::ll_getattr(vinodeno_t vino, struct stat *attr, int uid, int gid)
 {
   Mutex::Locker lock(client_lock);
 @@ -7219,7 +7245,7 @@ int Client::ll_release(Fh *fh)
   return 0;
 }
 
 -
 +// --
 
 
 
 diff --git a/src/client/Client.h b/src/client/Client.h
 index 9512a2d..0cfe8d9 100644
 --- a/src/client/Client.h
 +++ b/src/client/Client.h
 @@ -706,6 +706,32 @@ public:
   void ll_register_ino_invalidate_cb(client_ino_callback_t cb, void *handle);
 
   void ll_register_getgroups_cb(client_getgroups_callback_t cb, void *handle);
 +
 +  // low-level interface v2
 +  inodeno_t ll_get_ino(Inode *in);
 +  snapid_t ll_get_snapid(Inode *in);
 +  vinodeno_t ll_get_vino(Inode *in);
 +  Inode *ll_lookup_ino(vinodeno_t vino);
 +  int ll_lookup(Inode *parent, const char *name, struct stat *attr, Inode 
 **out, int uid = -1, int gid = -1);
 +  bool ll_forget(Inode *in, int count);
 +  int ll_getattr(Inode *in, struct stat *st, int uid = -1, int gid = -1);
 +  int ll_setattr(Inode *in, struct stat *st, int mask, int uid = -1, int gid 
 = -1);
 +  int ll_getxattr(Inode *in, const char *name, void *value, size_t size, int 
 uid=-1, int gid=-1);
 +  int ll_setxattr(Inode *in, const char *name, const void *value, size_t 
 size, int flags, int uid=-1, int gid=-1);
 +  int ll_removexattr(Inode *in, const char *name, int uid=-1, int gid=-1);
 +  int ll_listxattr(Inode *in, char *list, size_t size, int uid=-1, int 
 gid=-1);
 +  int ll_opendir(Inode *in, void **dirpp, int uid = -1, int gid = -1);
 +  int ll_readlink(Inode *in, const char **value, int uid = -1, int gid = -1);
 +  int ll_mknod(Inode *in, const char *name, mode_t mode, dev_t rdev, struct 
 stat *attr, Inode **out, int uid = -1, int gid = -1);
 +  int ll_mkdir(Inode *in, const char *name, mode_t mode, struct stat *attr, 
 Inode **out, int uid = -1, int gid = -1);
 +  int ll_symlink(Inode *in, const char *name, const char *value, struct stat 
 *attr, Inode **out, int uid = -1, int gid = -1);
 +  int ll_unlink(Inode *in, const char *name, int uid = -1, int gid = -1);
 +  int ll_rmdir(Inode *in, const char *name, int uid = -1, int gid = -1);
 +  int ll_rename(Inode *parent, const char *name, Inode *newparent, const 
 char *newname, int uid = -1, int gid = -1);
 +  int ll_link(Inode *in, Inode *newparent, const char *newname, struct stat 
 *attr, int uid = -1, int gid = -1);
 +  int ll_open(Inode *in, int flags, Fh **fh, int uid = -1, int gid = -1);
 +  int ll_create(Inode *parent, const char *name, mode_t mode, int flags, 
 struct stat *attr, Inode **out, int uid = -1, int gid = -1);
 +  int ll_statfs(Inode *in, struct statvfs 

Re: OSD memory leaks?

2013-01-07 Thread Samuel Just
Awesome!  What version are you running (ceph-osd -v, include the hash)?
-Sam

On Mon, Jan 7, 2013 at 11:03 AM, Dave Spano dsp...@optogenics.com wrote:
 This failed the first time I sent it, so I'm resending in plain text.

 Dave Spano
 Optogenics
 Systems Administrator



 - Original Message -

 From: Dave Spano dsp...@optogenics.com
 To: Sébastien Han han.sebast...@gmail.com
 Cc: ceph-devel ceph-devel@vger.kernel.org, Samuel Just 
 sam.j...@inktank.com
 Sent: Monday, January 7, 2013 12:40:06 PM
 Subject: Re: OSD memory leaks?


 Sam,

 Attached are some heaps that I collected today. 001 and 003 are just after I 
 started the profiler; 011 is the most recent. If you need more, or anything 
 different let me know. Already the OSD in question is at 38% memory usage. As 
 mentioned by Sèbastien, restarting ceph-osd keeps things going.

 Not sure if this is helpful information, but out of the two OSDs that I have 
 running, the first one (osd.0) is the one that develops this problem the 
 quickest. osd.1 does have the same issue, it just takes much longer. Do the 
 monitors hit the first osd in the list first, when there's activity?


 Dave Spano
 Optogenics
 Systems Administrator


 - Original Message -

 From: Sébastien Han han.sebast...@gmail.com
 To: Samuel Just sam.j...@inktank.com
 Cc: ceph-devel ceph-devel@vger.kernel.org
 Sent: Friday, January 4, 2013 10:20:58 AM
 Subject: Re: OSD memory leaks?

 Hi Sam,

 Thanks for your answer and sorry the late reply.

 Unfortunately I can't get something out from the profiler, actually I
 do but I guess it doesn't show what is supposed to show... I will keep
 on trying this. Anyway yesterday I just thought that the problem might
 be due to some over usage of some OSDs. I was thinking that the
 distribution of the primary OSD might be uneven, this could have
 explained that some memory leaks are more important with some servers.
 At the end, the repartition seems even but while looking at the pg
 dump I found something interesting in the scrub column, timestamps
 from the last scrubbing operation matched with times showed on the
 graph.

 After this, I made some calculation, I compared the total number of
 scrubbing operation with the time range where memory leaks occurred.
 First of all check my setup:

 root@c2-ceph-01 ~ # ceph osd tree
 dumped osdmap tree epoch 859
 # id weight type name up/down reweight
 -1 12 pool default
 -3 12 rack lc2_rack33
 -2 3 host c2-ceph-01
 0 1 osd.0 up 1
 1 1 osd.1 up 1
 2 1 osd.2 up 1
 -4 3 host c2-ceph-04
 10 1 osd.10 up 1
 11 1 osd.11 up 1
 9 1 osd.9 up 1
 -5 3 host c2-ceph-02
 3 1 osd.3 up 1
 4 1 osd.4 up 1
 5 1 osd.5 up 1
 -6 3 host c2-ceph-03
 6 1 osd.6 up 1
 7 1 osd.7 up 1
 8 1 osd.8 up 1


 And there are the results:

 * Ceph node 1 which has the most important memory leak performed 1608
 in total and 1059 during the time range where memory leaks occured
 * Ceph node 2, 1168 in total and 776 during the time range where
 memory leaks occured
 * Ceph node 3, 940 in total and 94 during the time range where memory
 leaks occurred
 * Ceph node 4, 899 in total and 191 during the time range where
 memory leaks occurred

 I'm still not entirely sure that the scrub operation causes the leak
 but the only relevant relation that I found...

 Could it be that the scrubbing process doesn't release memory? Btw I
 was wondering, how ceph decides at what time it should run the
 scrubbing operation? I know that it's once a day and control by the
 following options

 OPTION(osd_scrub_min_interval, OPT_FLOAT, 300)
 OPTION(osd_scrub_max_interval, OPT_FLOAT, 60*60*24)

 But how ceph determined the time where the operation started, during
 cluster creation probably?

 I just checked the options that control OSD scrubbing and found that by 
 default:

 OPTION(osd_max_scrubs, OPT_INT, 1)

 So that might explain why only one OSD uses a lot of memory.

 My dirty workaround at the moment is to performed a check of memory
 use by every OSD and restart it if it uses more than 25% of the total
 memory. Also note that on ceph 1, 3 and 4 it's always one OSD that
 uses a lot of memory, for ceph 2 only the mem usage is high but almost
 the same for all the OSD process.

 Thank you in advance.

 --
 Regards,
 Sébastien Han.


 On Wed, Dec 19, 2012 at 10:43 PM, Samuel Just sam.j...@inktank.com wrote:

 Sorry, it's been very busy. The next step would to try to get a heap
 dump. You can start a heap profile on osd N by:

 ceph osd tell N heap start_profiler

 and you can get it to dump the collected profile using

 ceph osd tell N heap dump.

 The dumps should show up in the osd log directory.

 Assuming the heap profiler is working correctly, you can look at the
 dump using pprof in google-perftools.

 On Wed, Dec 19, 2012 at 8:37 AM, Sébastien Han han.sebast...@gmail.com 
 wrote:
  No more suggestions? :(
  --
  Regards,
  Sébastien Han.
 
 
  On Tue, Dec 18, 2012 at 6:21 PM, Sébastien Han han.sebast...@gmail.com 
  wrote:
  Nothing terrific...

Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

2013-01-07 Thread Gregory Farnum
On Monday, January 7, 2013 at 1:00 PM, Isaac Otsiabah wrote:
 
 
 When i add a new host (with osd's) to my existing cluster, 1 or 2 previous 
 osd(s) goes down for about 2 minutes and then they come back up. 
 
 
 [root@h1ct ~]# ceph osd tree
 
 # id weight type name up/down reweight
 -1 
 3 root default
 -3 3 rack unknownrack
 -2 3 host h1
 0 1 osd.0 up 1
 1 1 osd.1 up 1
 2 
 1 osd.2 up 1
 
 
 For example, after adding host h2 (with 3 new osd) to the above cluster and 
 running the ceph osd tree command, i see this: 
 
 
 [root@h1 ~]# ceph osd tree
 
 # id weight type name up/down reweight
 -1 6 root default
 -3 
 6 rack unknownrack
 -2 3 host h1
 0 1 osd.0 up 1
 1 1 osd.1 down 1
 2 
 1 osd.2 up 1
 -4 3 host h2
 3 1 osd.3 up 1
 4 1 osd.4 up 
 1
 5 1 osd.5 up 1
 
 
 The down osd always come back up after 2 minutes or less andi see the 
 following error message in the respective osd log file: 
 2013-01-07 04:40:17.613028 7fec7f092760 1 journal _open 
 /ceph_journal/journals/journal_2 fd 26: 1073741824 bytes, block size 
 4096 bytes, directio = 1, aio = 0
 2013-01-07 04:40:17.613122 
 7fec7f092760 1 journal _open /ceph_journal/journals/journal_2 fd 26: 
 1073741824 bytes, block size 4096 bytes, directio = 1, aio = 0
 2013-01-07
 04:42:10.006533 7fec746f7710 0 -- 192.168.0.124:6808/19449  
 192.168.1.123:6800/18287 pipe(0x7fec2e10 sd=31 :6808 pgs=0 cs=0 
 l=0).accept connect_seq 0 vs existing 0 state connecting
 2013-01-07 
 04:45:29.834341 7fec743f4710 0 -- 192.168.1.124:6808/19449  
 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45438 pgs=7 cs=1 
 l=0).fault, initiating reconnect
 2013-01-07 04:45:29.835748 
 7fec743f4710 0 -- 192.168.1.124:6808/19449  
 192.168.1.122:6800/20072 pipe(0x7fec5402f320 sd=28 :45439 pgs=15 cs=3 
 l=0).fault, initiating reconnect
 2013-01-07 04:45:30.835219 7fec743f4710 0 -- 
 192.168.1.124:6808/19449  192.168.1.122:6800/20072 
 pipe(0x7fec5402f320 sd=28 :45894 pgs=482 cs=903 l=0).fault, initiating 
 reconnect
 2013-01-07 04:45:30.837318 7fec743f4710 0 -- 
 192.168.1.124:6808/19449  192.168.1.122:6800/20072 
 pipe(0x7fec5402f320 sd=28 :45895 pgs=483 cs=905 l=0).fault, initiating 
 reconnect
 2013-01-07 04:45:30.851984 7fec637fe710 0 log [ERR] : map 
 e27 had wrong cluster addr (192.168.0.124:6808/19449 != my 
 192.168.1.124:6808/19449)
 
 Also, this only happens only when the cluster ip address and the public ip 
 address are different for example
 
 
 
 [osd.0]
 host = g8ct
 public address = 192.168.0.124
 cluster address = 192.168.1.124
 btrfs devs = /dev/sdb
 
 
 
 
 but does not happen when they are the same. Any idea what may be the issue?
 
This isn't familiar to me at first glance. What version of Ceph are you using?

If this is easy to reproduce, can you pastebin your ceph.conf and then add 
debug ms = 1 to your global config and gather up the logs from each daemon?
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is Ceph recovery able to handle massive crash

2013-01-07 Thread Gregory Farnum
On Monday, January 7, 2013 at 9:25 AM, Denis Fondras wrote:
 Hello all,
 
  I'm using Ceph 0.55.1 on a Debian Wheezy (1 mon, 1 mds et 3 osd over
  btrfs) and every once in a while, an OSD process crashes (almost never
  the same osd crashes).
  This time I had 2 osd crash in a row and so I only had one replicate. I
  could bring the 2 crashed osd up and it started to recover.
  Unfortunately, the source osd crashed while recovering and now I have
  a some lost PGs.
  
  If I happen to bring the primary OSD up again, can I imagine the lost PG
  will be recovered too ?
 
 
 
 Ok, so it seems I can't bring back to life my primary OSD :-(
 
 ---8---
 health HEALTH_WARN 72 pgs incomplete; 72 pgs stuck inactive; 72 pgs 
 stuck unclean
 monmap e1: 1 mons at {a=192.168.0.132:6789/0}, election epoch 1, quorum 0 a
 osdmap e1130: 3 osds: 2 up, 2 in
 pgmap v1567492: 624 pgs: 552 active+clean, 72 incomplete; 1633 GB 
 data, 4766 GB used, 3297 GB / 8383 GB avail
 mdsmap e127: 1/1/1 up {0=a=up:active}
 
 2013-01-07 18:11:10.852673 mon.0 [INF] pgmap v1567492: 624 pgs: 552 
 active+clean, 72 incomplete; 1633 GB data, 4766 GB used, 3297 GB / 8383 
 GB avail
 ---8---
 
 When I rbd list, I can see all my images.
 When I do rbd map, I can map only a few of them and when I mount the 
 devices, none can mount (the mount process hangs and I cannot even ^C 
 the process).
 
 Is there something I can try ?

What's wrong with your primary OSD? In general they shouldn't really be 
crashing that frequently and if you've got a new bug we'd like to diagnose and 
fix it.

If that can't be done (or it's a hardware failure or something), you can mark 
the OSD lost, but that might lose data and then you will be sad.
-Greg

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.56.1 released

2013-01-07 Thread Dennis Jacobfeuerborn
FYI I just updated from 0.56 to 0.56.1 using the RPMs for CentOS 6 (osd and
mon only) and everything went perfectly fine. Thanks!

Regards,
  Dennis

On 01/08/2013 05:53 AM, Sage Weil wrote:
 We found a few critical problems with v0.56, and fixed a few outstanding 
 problems. v0.56.1 is ready, and we're pretty pleased with it!
 
 There are two critical fixes in this update: a fix for possible data loss 
 or corruption if power is lost, and a protocol compatibility problem that 
 was introduced in v0.56 (between v0.56 and any other version of ceph).
 
  * osd: fix commit sequence for XFS, ext4 (or any other non-btrfs) to 
prevent data loss on power cycle or kernel panic
  * osd: fix compatibility for CALL operation
  * osd: process old osdmaps prior to joining cluster (fixes slow startup)
  * osd: fix a couple of recovery-related crashes
  * osd: fix large io requests when journal is in (non-default) aio mode
  * log: fix possible deadlock in logging code
 
 This release will kick off the bobtail backport series, and will get a 
 shiny new URL for it's home.
 
  * Git at git://github.com/ceph/ceph.git
  * Tarball at http://ceph.com/download/ceph-0.56.1.tar.gz
  * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
  * For RPMs, see http://ceph.com/docs/master/install/rpm
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Windows port

2013-01-07 Thread Cesar Mello
Hi,

I have been playing with ceph and reading the docs/thesis the last
couple of nights just to learn something during my vacation. I was not
expecting to find such an awesome and state of the art project.
Congratulations for the great work!

Please I would like to know if a Windows port is imagined for the
future or if that is a dead-end. By Windows port I mean an abstraction
layer for hardware/sockets/threading/etc and building with Visual C++
2012 Express. And then have this state of the art object storage
cluster running on Windows nodes too.

Thank you a lot for the attention!

Best regards
Mello


On Sat, Jan 5, 2013 at 1:20 AM, Cesar Mello cme...@gmail.com wrote:
 Hi,

 Is there interest in a Windows port?

 Thank you for the attention.

 Best regards
 Cesar
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.56.1 released

2013-01-07 Thread Sage Weil
On Tue, 8 Jan 2013, Dennis Jacobfeuerborn wrote:
 FYI I just updated from 0.56 to 0.56.1 using the RPMs for CentOS 6 (osd and
 mon only) and everything went perfectly fine. Thanks!

Great!

Make sure the client-side libraries (librbd, librados) are not v0.56 or 
you will likely have problems.  Upgrade to v0.56.1 and avoid v0.56.

Thanks!
sage

 
 Regards,
   Dennis
 
 On 01/08/2013 05:53 AM, Sage Weil wrote:
  We found a few critical problems with v0.56, and fixed a few outstanding 
  problems. v0.56.1 is ready, and we're pretty pleased with it!
  
  There are two critical fixes in this update: a fix for possible data loss 
  or corruption if power is lost, and a protocol compatibility problem that 
  was introduced in v0.56 (between v0.56 and any other version of ceph).
  
   * osd: fix commit sequence for XFS, ext4 (or any other non-btrfs) to 
 prevent data loss on power cycle or kernel panic
   * osd: fix compatibility for CALL operation
   * osd: process old osdmaps prior to joining cluster (fixes slow startup)
   * osd: fix a couple of recovery-related crashes
   * osd: fix large io requests when journal is in (non-default) aio mode
   * log: fix possible deadlock in logging code
  
  This release will kick off the bobtail backport series, and will get a 
  shiny new URL for it's home.
  
   * Git at git://github.com/ceph/ceph.git
   * Tarball at http://ceph.com/download/ceph-0.56.1.tar.gz
   * For Debian/Ubuntu packages, see 
  http://ceph.com/docs/master/install/debian
   * For RPMs, see http://ceph.com/docs/master/install/rpm
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.56.1 released

2013-01-07 Thread Stefan Priebe - Profihost AG

Hi,

i cannot see any git tag or branch claming to be 0.56.1? Which commit id 
is this?


Greets
  Stefan


Am 08.01.2013 05:53, schrieb Sage Weil:

We found a few critical problems with v0.56, and fixed a few outstanding
problems. v0.56.1 is ready, and we're pretty pleased with it!

There are two critical fixes in this update: a fix for possible data loss
or corruption if power is lost, and a protocol compatibility problem that
was introduced in v0.56 (between v0.56 and any other version of ceph).

  * osd: fix commit sequence for XFS, ext4 (or any other non-btrfs) to
prevent data loss on power cycle or kernel panic
  * osd: fix compatibility for CALL operation
  * osd: process old osdmaps prior to joining cluster (fixes slow startup)
  * osd: fix a couple of recovery-related crashes
  * osd: fix large io requests when journal is in (non-default) aio mode
  * log: fix possible deadlock in logging code

This release will kick off the bobtail backport series, and will get a
shiny new URL for it's home.

  * Git at git://github.com/ceph/ceph.git
  * Tarball at http://ceph.com/download/ceph-0.56.1.tar.gz
  * For Debian/Ubuntu packages, see http://ceph.com/docs/master/install/debian
  * For RPMs, see http://ceph.com/docs/master/install/rpm
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.56.1 released

2013-01-07 Thread Andrey Korolyov
On Tue, Jan 8, 2013 at 11:30 AM, Stefan Priebe - Profihost AG
s.pri...@profihost.ag wrote:
 Hi,

 i cannot see any git tag or branch claming to be 0.56.1? Which commit id is
 this?

 Greets
   Stefan

Same for me, github simply does not sent a new tag in the pull to
local tree by some reason. Repository cloning from scratch resolved
this :)



 Am 08.01.2013 05:53, schrieb Sage Weil:

 We found a few critical problems with v0.56, and fixed a few outstanding
 problems. v0.56.1 is ready, and we're pretty pleased with it!

 There are two critical fixes in this update: a fix for possible data loss
 or corruption if power is lost, and a protocol compatibility problem that
 was introduced in v0.56 (between v0.56 and any other version of ceph).

   * osd: fix commit sequence for XFS, ext4 (or any other non-btrfs) to
 prevent data loss on power cycle or kernel panic
   * osd: fix compatibility for CALL operation
   * osd: process old osdmaps prior to joining cluster (fixes slow startup)
   * osd: fix a couple of recovery-related crashes
   * osd: fix large io requests when journal is in (non-default) aio mode
   * log: fix possible deadlock in logging code

 This release will kick off the bobtail backport series, and will get a
 shiny new URL for it's home.

   * Git at git://github.com/ceph/ceph.git
   * Tarball at http://ceph.com/download/ceph-0.56.1.tar.gz
   * For Debian/Ubuntu packages, see
 http://ceph.com/docs/master/install/debian
   * For RPMs, see http://ceph.com/docs/master/install/rpm
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: v0.56.1 released

2013-01-07 Thread Amon Ott
Am 08.01.2013 08:40, schrieb Andrey Korolyov:
 On Tue, Jan 8, 2013 at 11:30 AM, Stefan Priebe - Profihost AG
 s.pri...@profihost.ag wrote:
 Hi,

 i cannot see any git tag or branch claming to be 0.56.1? Which commit id is
 this?

 Greets
   Stefan
 
 Same for me, github simply does not sent a new tag in the pull to
 local tree by some reason. Repository cloning from scratch resolved
 this :)

This is normal git behaviour. git fetch -t is your friend.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH   Tel: +49 30 24342334
Am Köllnischen Park 1Fax: +49 30 99296856
10179 Berlin http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html