Re: [ceph-users] inktank-mellanox webinar access ?

2014-09-16 Thread Georgios Dimitrakakis

Dear Karan and rest of the followers,

since I haven't received anything from Mellanox regarding this webinar 
I 've decided to look for it myself.


You can find the webinar here:

http://www.mellanox.com/webinars/2014/inktank_ceph/


Best,

G.

On Mon, 14 Jul 2014 15:47:39 +0300, Karan Singh wrote:

Thanks Georgios

I will wait.

- Karan Singh -

On 14 Jul 2014, at 15:37, Georgios Dimitrakakis  wrote:


Hi Karan!

Due to the late reception of the login info I 've also missed
a very big part of the webinar.

They did send me an e-mail though saying that they will let me know
as soon as
a recording of the session will be available.

I will let you know again then.

Best,

G.

On Mon, 14 Jul 2014 12:40:54 +0300, Karan Singh wrote:


Hey i have missed the webinar , is this available for later review
or
slides.

- Karan -

On 10 Jul 2014, at 18:27, Georgios Dimitrakakis wrote:


That makes two of us...

G.

On Thu, 10 Jul 2014 17:12:08 +0200 (CEST), Alexandre DERUMIER
wrote:


Ok, sorry, we have finally receive the login a bit late.

Sorry again to have spam the mailing list
- Mail original -

De: Alexandre DERUMIER
À: ceph-users
Envoyé: Jeudi 10 Juillet 2014 16:55:22
Objet: [ceph-users] inktank-mellanox webinar access ?

Hi,

sorry to spam the mailing list,

but they are a inktank mellanox webinar in 10minutes,

and I don't have receive access since I have been registered
yesterday (same for my co-worker).

and the webinar mellanox contact email (conta...@mellanox.com
[1]
[3]), does
not exist

Maybe somebody from Inktank or Mellanox could help us ?

Regards,

Alexandre

___
ceph-users mailing list
ceph-users@lists.ceph.com [2] [4]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com [4] [5]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [5]


Links:
--
[1] mailto:aderum...@odiso.com [6]
[2] mailto:ceph-us...@ceph.com [7]
[3] mailto:conta...@mellanox.com [8]
[4] mailto:ceph-users@lists.ceph.com [9]
[5] mailto:ceph-users@lists.ceph.com [10]
[6] mailto:gior...@acmac.uoc.gr [11]


--




Links:
--
[1] mailto:conta...@mellanox.com
[2] mailto:ceph-users@lists.ceph.com
[3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[4] mailto:ceph-users@lists.ceph.com
[5] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[6] mailto:aderum...@odiso.com
[7] mailto:ceph-us...@ceph.com
[8] mailto:conta...@mellanox.com
[9] mailto:ceph-users@lists.ceph.com
[10] mailto:ceph-users@lists.ceph.com
[11] mailto:gior...@acmac.uoc.gr
[12] mailto:gior...@acmac.uoc.gr


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph Degraded

2014-11-29 Thread Georgios Dimitrakakis

Hi all!


I am setting UP a new cluster with 10 OSDs
and the state is degraded!

# ceph health
HEALTH_WARN 940 pgs degraded; 1536 pgs stuck unclean
#


There are only the default pools

# ceph osd lspools
0 data,1 metadata,2 rbd,


with each one having 512 pg_num and 512 pgp_num

# ceph osd dump | grep replic
pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 286 flags hashpspool 
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 287 flags 
hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 288 flags hashpspool 
stripe_width 0



No data yet so is there something I can do to repair it as it is?


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Degraded

2014-12-01 Thread Georgios Dimitrakakis

Hi Andrei!

I had a similar setting with replicated size 2 and min_size also 2.

Changing that didn't change the status of the cluster.

I 've also tried to remove the pools and recreate them without success.

Removing and re-adding the OSDs also didn't have any influence!

Therefore and since I didn't have any data at all I performed a force 
recreate on all PGs and after that things went back to normal.


Thanks for your reply!


Best,


George

On Sat, 29 Nov 2014 11:39:51 + (GMT), Andrei Mikhailovsky wrote:
I think I had a similar issue recently when I've added a new pool. 
All

pgs that corresponded to the new pool were shown as degraded/unclean.
After doing a bit of testing I've realized that my issue was down to
this:

replicated size 2
min_size 2

replicated size and min size was the same. In my case, i've got 2 osd
servers with total replica of 2. The minimal size should be set to 1 
-

so that the cluster would still work with at least one PG being up.

After I've changed the min_size to 1 the cluster sorted itself out.
Try doing this for your pools.

Andrei

-


FROM: Georgios Dimitrakakis
TO: ceph-users@lists.ceph.com
SENT: Saturday, 29 November, 2014 11:13:05 AM
SUBJECT: [ceph-users] Ceph Degraded

Hi all!

I am setting UP a new cluster with 10 OSDs
and the state is degraded!

# ceph health
HEALTH_WARN 940 pgs degraded; 1536 pgs stuck unclean
#

There are only the default pools

# ceph osd lspools
0 data,1 metadata,2 rbd,

with each one having 512 pg_num and 512 pgp_num

# ceph osd dump | grep replic
pool 0 'data' replicated size 2 min_size 2 crush_ruleset 0
object_hash
rjenkins pg_num 512 pgp_num 512 last_change 286 flags hashpspool
crash_replay_interval 45 stripe_width 0
pool 1 'metadata' replicated size 2 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 512 pgp_num 512 last_change 287 flags
hashpspool stripe_width 0
pool 2 'rbd' replicated size 2 min_size 2 crush_ruleset 0
object_hash
rjenkins pg_num 512 pgp_num 512 last_change 288 flags hashpspool
stripe_width 0

No data yet so is there something I can do to repair it as it is?

Best regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problems with pgs incomplete

2014-12-01 Thread Georgios Dimitrakakis

Hi!

I had a very similar issue a few days ago.

For me it wasn't too much of a problem since the cluster was new 
without data and I could force recreate the PGs. I really hope that in 
your case it won't be necessary to do the same thing.


As a first step try to reduce the min_size from 2 to 1 as suggested for 
the .rgw.buckets pool and see if this can bring you cluster back to 
health.


Regards,

George

On Mon, 01 Dec 2014 17:09:31 +0300, Butkeev Stas wrote:

Hi all,
I have Ceph cluster+rgw. Now I have problems with one of OSD, it's
down now. I check ceph status and see this information

[root@node-1 ceph-0]# ceph -s
cluster fc8c3ecc-ccb8-4065-876c-dc9fc992d62d
 health HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs
stuck unclean
 monmap e1: 3 mons at
{a=10.29.226.39:6789/0,b=10.29.226.29:6789/0,c=10.29.226.40:6789/0},
election epoch 294, quorum 0,1,2 b,a,c
 osdmap e418: 6 osds: 5 up, 5 in
  pgmap v23588: 312 pgs, 16 pools, 141 kB data, 594 objects
5241 MB used, 494 GB / 499 GB avail
 308 active+clean
   4 incomplete

Why am I having 4 pgs incomplete in bucket .rgw.buckets if I am
having replicated size 2 and min_size 2?

My osd tree
[root@node-1 ceph-0]# ceph osd tree
# idweight  type name   up/down reweight
-1  4   root croc
-2  4   region ru
-4  3   datacenter vol-5
-5  1   host node-1
0   1   osd.0   down0
-6  1   host node-2
1   1   osd.1   up  1
-7  1   host node-3
2   1   osd.2   up  1
-3  1   datacenter comp
-8  1   host node-4
3   1   osd.3   up  1
-9  1   host node-5
4   1   osd.4   up  1
-10 1   host node-6
5   1   osd.5   up  1

Addition information:

[root@node-1 ceph-0]# ceph health detail
HEALTH_WARN 4 pgs incomplete; 4 pgs stuck inactive; 4 pgs stuck 
unclean

pg 13.6 is stuck inactive for 1547.665758, current state incomplete,
last acting [1,3]
pg 13.4 is stuck inactive for 1547.652111, current state incomplete,
last acting [1,2]
pg 13.5 is stuck inactive for 4502.009928, current state incomplete,
last acting [1,3]
pg 13.2 is stuck inactive for 4501.979770, current state incomplete,
last acting [1,3]
pg 13.6 is stuck unclean for 4501.969914, current state incomplete,
last acting [1,3]
pg 13.4 is stuck unclean for 4502.001114, current state incomplete,
last acting [1,2]
pg 13.5 is stuck unclean for 4502.009942, current state incomplete,
last acting [1,3]
pg 13.2 is stuck unclean for 4501.979784, current state incomplete,
last acting [1,3]
pg 13.2 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.6 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.4 is incomplete, acting [1,2] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')
pg 13.5 is incomplete, acting [1,3] (reducing pool .rgw.buckets
min_size from 2 may help; search ceph.com/docs for 'incomplete')

[root@node-1 ceph-0]# ceph osd dump | grep 'pool'
pool 0 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash
rjenkins pg_num 64 pgp_num 64 last_change 1 flags hashpspool
stripe_width 0
pool 1 '.rgw.root' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 34 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 2 '.rgw.control' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 36 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 3 '.rgw' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 38 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 4 '.rgw.gc' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 39 flags
hashpspool stripe_width 0
pool 5 '.users.uid' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 40 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 6 '.log' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 42 owner
18446744073709551615 flags hashpspool stripe_width 0
pool 7 '.users' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins pg_num 8 pgp_num 8 last_change 44 flags
hashpspool stripe_width 0
pool 8 '.users.swift' replicated size 3 min_size 2 crush_ruleset 0
object_hash rjenkins 

[ceph-users] Empty Rados log

2014-12-04 Thread Georgios Dimitrakakis

Hi all!

I have a CEPH installation with radosgw and the radosgw.log in the 
/var/log/ceph directory is empty.


In the ceph.conf I have

log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20

under the: [client.radosgw.gateway]



Any ideas?


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RadosGW and Apache Limits

2014-12-04 Thread Georgios Dimitrakakis

Hi!

On CentOS 6.6 I have installed CEPH and ceph-radosgw

When I try to (re)start the ceph-radosgw service I am getting the 
following:



# service ceph-radosgw restart
Stopping radosgw instance(s)...[  OK  ]
Starting radosgw instance(s)...
/usr/bin/dirname: extra operand `-n'
Try `/usr/bin/dirname --help' for more information.
bash: line 0: ulimit: open files: cannot modify limit: Operation not 
permitted

Starting client.radosgw.gateway... [  OK  ]
/usr/bin/radosgw is running.
#



Why is it happening? Is this normal??


If I change /etc/security/limits.conf and I add

apache   hardnofile  32768


then the error of ulimit dissapears.


Is this the correct way? Should I do something else??


Regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Virtual traffic on cluster network

2014-12-04 Thread Georgios Dimitrakakis

I was thinking the same thing for the following implementation:

I would like to have an RBD volume mounted and accessible at the same 
time by different VMs (using OCFS2).


Therefore I was also thinking that I had to put VMs on the internal 
CEPH network by adding a second NIC and plugging that into this network.



Is this a bad idea??

Do you have something else to propose??


Regards,


George




On Thu, 04 Dec 2014 14:31:01 +0100, Thomas Lemarchand wrote:

Hi,

Ceph cluster network is only useful for OSDs.
Your vm only need access to public network (or client network if you
prefer).

My cluster is also in a virtual environnement. MONs and MDS are
virtuals. OSDs are physicals of course.

--
Thomas Lemarchand
Cloud Solutions SAS - Responsable des systèmes d'information



On jeu., 2014-12-04 at 12:45 +, Peter wrote:

Hi,

i am wondering about running virtual environment traffic (VM - 
Ceph)
traffic on the ceph cluster network by plugging virtual hosts into 
this

network. Is this a good idea?
My thoughts are no, as VM - ceph traffic would be client traffic 
from

ceph perspective.
Just want the community's thoughts on this.

thanks

- p
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] AWS SDK and MultiPart Problem

2014-12-05 Thread Georgios Dimitrakakis

Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into 
Radosgw with ceph version 0.80.7 
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403 error.



I believe that the id which is send to all requests and has been 
urlencoded by the aws-sdk-js doesn't match with the one in rados because 
it's not urlencoded.


Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-05 Thread Georgios Dimitrakakis
For example if I try to perform the same multipart upload at an older 
version ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

PUT 
/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW 
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33


but when I try the same at ceph version 0.80.7 
(6c0127fcb58008793d3c8b62d925bc91963672a3)


I get the following:

PUT 
/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36 
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33



and my guess is that the %2F at the latter is the one that is causing 
the problem and hence the 403 error.




What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
error.


I believe that the id which is send to all requests and has been
urlencoded by the aws-sdk-js doesn't match with the one in rados
because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-05 Thread Georgios Dimitrakakis

It would be nice to see where and how uploadId

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

PUT
/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

PUT

/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33


and my guess is that the %2F at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
error.


I believe that the id which is send to all requests and has been
urlencoded by the aws-sdk-js doesn't match with the one in rados
because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-08 Thread Georgios Dimitrakakis

I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:

It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

It would be nice to see where and how uploadId

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an 
older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

PUT

/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

PUT


/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33


and my guess is that the %2F at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
error.


I believe that the id which is send to all requests and has been
urlencoded by the aws-sdk-js doesn't match with the one in rados
because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent commits 
and probably hasn't passed yet to the repositories?


Regards,

George




On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:

I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:

It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

It would be nice to see where and how uploadId

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an 
older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

PUT

/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

PUT


/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33


and my guess is that the %2F at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload into
Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 403
error.


I believe that the id which is send to all requests and has been
urlencoded by the aws-sdk-js doesn't match with the one in rados
because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

Hi again!

I have installed and enabled the development branch repositories as 
described here:


http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the following:

Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS object 
store. It is
: implemented as a FastCGI module using libfcgi, and can be 
used in

: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch   : 1
Version : 0.80.5
Release : 9.el6
Size: 1.3 M
Repo: epel
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS object 
store. It is
: implemented as a FastCGI module using libfcgi, and can be 
used in

: conjunction with any FastCGI capable web server.



Is this normal???

I am concerned because the installed version is 0.80.7 and the 
available update package is 0.80.5


Have I missed something?

Regards,

George



Pushed a fix to wip-10271. Haven't tested it though, let me know if
you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh yeh...@redhat.com 
wrote:

I don't think it has been fixed recently. I'm looking at it now, and
not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent 
commits and

probably hasn't passed yet to the repositories?

Regards,

George





On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:


I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:


It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


It would be nice to see where and how uploadId

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at an 
older
version ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60)



I can see the upload ID in the apache log as:

PUT


/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

PUT




/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33


and my guess is that the %2F at the latter is the one that is
causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload 
into

Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting a 
403

error.


I believe that the id which is send to all requests and has 
been
urlencoded by the aws-sdk-js doesn't match with the one in 
rados

because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-11 Thread Georgios Dimitrakakis

OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development branch. 
I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Hi again!

I have installed and enabled the development branch repositories as
described here:


http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the 
following:


Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and can 
be used

in
: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch   : 1
Version : 0.80.5
Release : 9.el6
Size: 1.3 M
Repo: epel
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and can 
be used

in
: conjunction with any FastCGI capable web server.



Is this normal???

I am concerned because the installed version is 0.80.7 and the 
available

update package is 0.80.5

Have I missed something?

Regards,

George




Pushed a fix to wip-10271. Haven't tested it though, let me know if
you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh yeh...@redhat.com 
wrote:


I don't think it has been fixed recently. I'm looking at it now, 
and

not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from sources?

I mean is someone aware of it been fixed on any of the recent 
commits

and
probably hasn't passed yet to the repositories?

Regards,

George





On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:



I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:



It looks like a bug. Can you open an issue on tracker.ceph.com,
describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:



It would be nice to see where and how uploadId

is being calculated...


Thanks,


George



For example if I try to perform the same multipart upload at 
an

older
version ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60)


I can see the upload ID in the apache log as:

PUT




/test/.dat?partNumber=25uploadId=I3yihBFZmHx9CCqtcDjr8d-RhgfX8NW
HTTP/1.1 200 - - aws-sdk-nodejs/2.0.29 linux/v0.10.33

but when I try the same at ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3)

I get the following:

PUT






/test/.dat?partNumber=12uploadId=2%2Ff9UgnHhdK0VCnMlpT-XA8ttia1HjK36
HTTP/1.1 403 78 - aws-sdk-nodejs/2.0.29 linux/v0.10.33


and my guess is that the %2F at the latter is the one that 
is

causing the problem and hence the 403 error.



What do you think???


Best,

George




Hi all!

I am using AWS SDK JS v.2.0.29 to perform a multipart upload 
into

Radosgw with ceph version 0.80.7
(6c0127fcb58008793d3c8b62d925bc91963672a3) and I am getting 
a 403

error.


I believe that the id which is send to all requests and has 
been
urlencoded by the aws-sdk-js doesn't match with the one in 
rados

because it's not urlencoded.

Is that the case? Can you confirm it?

Is there something I can do?


Regards,

George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-12 Thread Georgios Dimitrakakis

Dear Yehuda,

I have installed the patched version as you can see:

$ radosgw --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ ceph --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ sudo yum info ceph-radosgw
Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 1.gbd43759.el6
Size: 3.8 M
Repo: installed
From repo   : ceph-source
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS object 
store. It is
: implemented as a FastCGI module using libfcgi, and can be 
used in

: conjunction with any FastCGI capable web server.


Unfortunately the problem on the multipart upload with aws-sdk still 
remains the same!



Here is a part of the apache log:


PUT 
/clients-space/test/iip7.dmg?partNumber=3uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO 
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33


PUT 
/clients-space/test/iip7.dmg?partNumber=1uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO 
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33


PUT 
/clients-space/test/iip7.dmg?partNumber=2uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO 
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33




Directly modification of the binary so that the 2%2F be changed to 
2- results in success and here is the log:



PUT 
/clients-space/test/iip7.dmg?partNumber=1uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2 
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33


PUT 
/clients-space/test/iip7.dmg?partNumber=2uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2 
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33


PUT 
/clients-space/test/iip7.dmg?partNumber=4uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2 
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33


POST 
/clients-space/test/iip7.dmg?uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2 
HTTP/1.1 200 302 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33





Can you think of something else??


Best regards,


George





OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development branch. 
I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Hi again!

I have installed and enabled the development branch repositories as
described here:


http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the 
following:


Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and 
can be used

in
: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch   : 1
Version : 0.80.5
Release : 9.el6
Size: 1.3 M
Repo: epel
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and 
can be used

in
: conjunction with any FastCGI capable web server.



Is this normal???

I am concerned because the installed version is 0.80.7 and the 
available

update package is 0.80.5

Have I missed something?

Regards,

George



Pushed a fix to wip-10271. Haven't tested it though, let me know 
if

you try it.

Thanks,
Yehuda

On Thu, Dec 11, 2014 at 8:38 AM, Yehuda Sadeh yeh...@redhat.com 
wrote:


I don't think it has been fixed recently. I'm looking at it now, 
and

not sure why it hasn't triggered before in other areas.

Yehuda

On Thu, Dec 11, 2014 at 5:55 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


This issue seems very similar to these:

http://tracker.ceph.com/issues/8202
http://tracker.ceph.com/issues/8702


Would it make any difference if I try to build CEPH from 
sources?


I mean is someone aware of it been fixed on any of the recent 
commits

and
probably hasn't passed yet to the repositories?

Regards,

George





On Mon, 08 Dec 2014 19:47:59 +0200, Georgios Dimitrakakis wrote:



I 've just created issues #10271

Best,

George

On Fri, 5 Dec 2014 09:30:45 -0800, Yehuda Sadeh wrote:



It looks like a bug. Can you open an issue on 
tracker.ceph.com,

describing what you see?

Thanks,
Yehuda

On Fri, Dec 5, 2014 at 7:17 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:



It would be nice to see where

Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-12 Thread Georgios Dimitrakakis
I 'd be more than happy to provide to you all the info but for some 
unknown reason my radosgw.log is empty.


This is the part that I have in ceph.conf

[client.radosgw.gateway]
host = xxx
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
rgw dns name = xxx.example.com
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20



but no matter what I put in there the log is empty

$ pwd
/var/log/ceph
$ ls -l radosgw.log
-rw-r--r-- 1 root root 0 Nov 30 03:01 radosgw.log


I have already started  another thread with title Empty Rados log  
here in ceph-users list since December 4th but haven't heard from anyone 
yet...


If I solve this I will be able to provide you with all the data.


Regards,


George



Ok, I've been digging a bit more. I don't have full radosgw logs for
the issue, so if you could provide it (debug rgw = 20), it might 
help.

However, as it is now, I think the issue is with the way the client
library is signing the requests. Instead of using the undecoded
uploadId, it uses the encoded version for the signature, which 
doesn't

sign correctly. The same would have happened if it would have run
against amazon S3 (just tested it).
The two solutions that I see are to fix the client library, and/or to
modify the character to one that does not require escaping. Sadly the
dash character that you were using cannot be used safely in that
context. Maybe tilde ('~') would could work.

Yehuda

On Fri, Dec 12, 2014 at 2:41 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Dear Yehuda,

I have installed the patched version as you can see:

$ radosgw --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ ceph --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ sudo yum info ceph-radosgw
Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 1.gbd43759.el6
Size: 3.8 M
Repo: installed
From repo   : ceph-source
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and can 
be used

in
: conjunction with any FastCGI capable web server.


Unfortunately the problem on the multipart upload with aws-sdk still 
remains

the same!


Here is a part of the apache log:


PUT

/clients-space/test/iip7.dmg?partNumber=3uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=1uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=2uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33



Directly modification of the binary so that the 2%2F be changed to 
2-

results in success and here is the log:


PUT

/clients-space/test/iip7.dmg?partNumber=1uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=2uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=4uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

POST

/clients-space/test/iip7.dmg?uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 302 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33




Can you think of something else??


Best regards,


George






OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development 
branch. I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


Hi again!

I have installed and enabled the development branch repositories 
as

described here:



http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the 
following:


Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object

store.
It is
: implemented as a FastCGI module using libfcgi, and 
can be

used
in
: conjunction with any FastCGI capable web server.

Available Packages
Name: ceph-radosgw
Arch: x86_64
Epoch

Re: [ceph-users] Empty Rados log

2014-12-12 Thread Georgios Dimitrakakis

This is very silly of me...

The file wasn't writable by apache.

I am writing it down for future reference.

G.


Hi all!

I have a CEPH installation with radosgw and the radosgw.log in the
/var/log/ceph directory is empty.

In the ceph.conf I have

log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20

under the: [client.radosgw.gateway]



Any ideas?


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-12 Thread Georgios Dimitrakakis

How silly of me!!!

I 've just noticed that the file isn't writable by the apache!


I 'll be back with the logs...


G.



I 'd be more than happy to provide to you all the info but for some
unknown reason my radosgw.log is empty.

This is the part that I have in ceph.conf

[client.radosgw.gateway]
host = xxx
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
rgw dns name = xxx.example.com
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20



but no matter what I put in there the log is empty

$ pwd
/var/log/ceph
$ ls -l radosgw.log
-rw-r--r-- 1 root root 0 Nov 30 03:01 radosgw.log


I have already started  another thread with title Empty Rados log
here in ceph-users list since December 4th but haven't heard from
anyone yet...

If I solve this I will be able to provide you with all the data.


Regards,


George



Ok, I've been digging a bit more. I don't have full radosgw logs for
the issue, so if you could provide it (debug rgw = 20), it might 
help.

However, as it is now, I think the issue is with the way the client
library is signing the requests. Instead of using the undecoded
uploadId, it uses the encoded version for the signature, which 
doesn't

sign correctly. The same would have happened if it would have run
against amazon S3 (just tested it).
The two solutions that I see are to fix the client library, and/or 
to
modify the character to one that does not require escaping. Sadly 
the

dash character that you were using cannot be used safely in that
context. Maybe tilde ('~') would could work.

Yehuda

On Fri, Dec 12, 2014 at 2:41 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Dear Yehuda,

I have installed the patched version as you can see:

$ radosgw --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ ceph --version
ceph version 0.80.7-1-gbd43759 
(bd43759f6e76fa827e2534fa4e61547779ee10a5)


$ sudo yum info ceph-radosgw
Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 1.gbd43759.el6
Size: 3.8 M
Repo: installed
From repo   : ceph-source
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object store.

It is
: implemented as a FastCGI module using libfcgi, and 
can be used

in
: conjunction with any FastCGI capable web server.


Unfortunately the problem on the multipart upload with aws-sdk 
still remains

the same!


Here is a part of the apache log:


PUT

/clients-space/test/iip7.dmg?partNumber=3uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=1uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=2uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33



Directly modification of the binary so that the 2%2F be changed 
to 2-

results in success and here is the log:


PUT

/clients-space/test/iip7.dmg?partNumber=1uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=2uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT

/clients-space/test/iip7.dmg?partNumber=4uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

POST

/clients-space/test/iip7.dmg?uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 302 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33




Can you think of something else??


Best regards,


George






OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development 
branch. I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


Hi again!

I have installed and enabled the development branch repositories 
as

described here:



http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the 
following:


Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 0.el6
Size: 3.8 M
Repo: installed
From repo   : Ceph
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object

store.
It is
: implemented as a FastCGI module using libfcgi, and 
can be

used

[ceph-users] Dual RADOSGW Network

2014-12-15 Thread Georgios Dimitrakakis

Hi all!

I have a single CEPH node which has two network interfaces.

One is configured to be accessed directly by the internet (153.*) and 
the other one is configured on an internal LAN (192.*)


For the moment radosgw is listening on the external (internet) 
interface.


Can I configure radosgw to be accessed by both interfaces? What I would 
like to do is to save bandwidth and time for the machines on the 
internal network and use the internal net for all rados communications.



Any ideas?


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] AWS SDK and MultiPart Problem

2014-12-15 Thread Georgios Dimitrakakis

Thx a lot Yehuda!

This one with tilde seems to be working!

Fingers crossed that it will continue in the future :-)


Warmest regards,


George


In any case, I pushed earlier today another fix to the same branch
that replaces the slash with a tilde. Let me know if that one works
for you.

Thanks,
Yehuda

On Fri, Dec 12, 2014 at 5:59 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

How silly of me!!!

I 've just noticed that the file isn't writable by the apache!


I 'll be back with the logs...


G.




I 'd be more than happy to provide to you all the info but for some
unknown reason my radosgw.log is empty.

This is the part that I have in ceph.conf

[client.radosgw.gateway]
host = xxx
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
rgw dns name = xxx.example.com
rgw enable usage log = true
rgw usage log tick interval = 30
rgw usage log flush threshold = 1024
rgw usage max shards = 32
rgw usage max user shards = 1
log file = /var/log/ceph/radosgw.log
debug ms = 1
debug rgw = 20



but no matter what I put in there the log is empty

$ pwd
/var/log/ceph
$ ls -l radosgw.log
-rw-r--r-- 1 root root 0 Nov 30 03:01 radosgw.log


I have already started  another thread with title Empty Rados log
here in ceph-users list since December 4th but haven't heard from
anyone yet...

If I solve this I will be able to provide you with all the data.


Regards,


George


Ok, I've been digging a bit more. I don't have full radosgw logs 
for
the issue, so if you could provide it (debug rgw = 20), it might 
help.
However, as it is now, I think the issue is with the way the 
client

library is signing the requests. Instead of using the undecoded
uploadId, it uses the encoded version for the signature, which 
doesn't

sign correctly. The same would have happened if it would have run
against amazon S3 (just tested it).
The two solutions that I see are to fix the client library, and/or 
to
modify the character to one that does not require escaping. Sadly 
the

dash character that you were using cannot be used safely in that
context. Maybe tilde ('~') would could work.

Yehuda

On Fri, Dec 12, 2014 at 2:41 AM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


Dear Yehuda,

I have installed the patched version as you can see:

$ radosgw --version
ceph version 0.80.7-1-gbd43759
(bd43759f6e76fa827e2534fa4e61547779ee10a5)

$ ceph --version
ceph version 0.80.7-1-gbd43759
(bd43759f6e76fa827e2534fa4e61547779ee10a5)

$ sudo yum info ceph-radosgw
Installed Packages
Name: ceph-radosgw
Arch: x86_64
Version : 0.80.7
Release : 1.gbd43759.el6
Size: 3.8 M
Repo: installed
From repo   : ceph-source
Summary : Rados REST gateway
URL : http://ceph.com/
License : GPL-2.0
Description : radosgw is an S3 HTTP REST gateway for the RADOS 
object

store.
It is
: implemented as a FastCGI module using libfcgi, and 
can be

used
in
: conjunction with any FastCGI capable web server.


Unfortunately the problem on the multipart upload with aws-sdk 
still

remains
the same!


Here is a part of the apache log:


PUT



/clients-space/test/iip7.dmg?partNumber=3uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT



/clients-space/test/iip7.dmg?partNumber=1uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT



/clients-space/test/iip7.dmg?partNumber=2uploadId=2%2F9rEUmdFcuW66VJfeH3_jbqqUz0jKvrO
HTTP/1.1 403 78 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33



Directly modification of the binary so that the 2%2F be changed 
to

2-
results in success and here is the log:


PUT



/clients-space/test/iip7.dmg?partNumber=1uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT



/clients-space/test/iip7.dmg?partNumber=2uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

PUT



/clients-space/test/iip7.dmg?partNumber=4uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 - - aws-sdk-nodejs/2.1.0 darwin/v0.10.33

POST


/clients-space/test/iip7.dmg?uploadId=2-R6bxv4TM2Brxn-w9aHOcbb8OSJ3-Vh2
HTTP/1.1 200 302 - aws-sdk-nodejs/2.1.0 darwin/v0.10.33




Can you think of something else??


Best regards,


George






OK! I will give it some time and will try again later!

Thanks a lot for your help!

Warmest regards,

George


The branch I pushed earlier was based off recent development 
branch. I

just pushed one based off firefly (wip-10271-firefly). It will
probably take a bit to build.

Yehuda

On Thu, Dec 11, 2014 at 12:03 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:



Hi again!

I have installed and enabled the development branch 
repositories as

described here:





http://ceph.com/docs/master/install/get-packages/#add-ceph-development

and when I try to update the ceph-radosgw package I get the
following:

Installed Packages
Name: ceph

Re: [ceph-users] Dual RADOSGW Network

2014-12-16 Thread Georgios Dimitrakakis

Thanks Craig.

I will try that!

I thought it was more complicate than that because of the entries for 
the public_network and rgw dns name in the config file...


I will give it a try.

Best,


George




That shouldnt be a problem.  Just have Apache bind to all interfaces
instead of the external IP.

In my case, I only have Apache bound to the internal interface.  My
load balancer has an external and internal IP, and Im able to talk to
it on both interfaces.

On Mon, Dec 15, 2014 at 2:00 PM, Georgios Dimitrakakis  wrote:


Hi all!

I have a single CEPH node which has two network interfaces.

One is configured to be accessed directly by the internet (153.*)
and the other one is configured on an internal LAN (192.*)

For the moment radosgw is listening on the external (internet)
interface.

Can I configure radosgw to be accessed by both interfaces? What I
would like to do is to save bandwidth and time for the machines on
the internal network and use the internal net for all rados
communications.

Any ideas?

Best regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] can't get radosgw with apache work, 403 and 405 error

2014-04-03 Thread Georgios Dimitrakakis

Hi! I am facing the exact same problem!!!

I am also on a CentOS 6.5 64bit system

Does anyone has any suggestions? Where to look? What to check??

zhongku did you manage to solve this problem?

On the other hand if I use python as shown here: 
http://ceph.com/docs/master/radosgw/s3/python/ I can list the buckets 
without a problem. Any ideas??


Best regards,

G.



Hello,


I'm trying to setup radosgw with apache fastcgi on centos 6.5 64bit 
system.

Unfortunately I can' get it right to work, some operation will fail.
s3cmd ls will success
s3cmd ls s3://bucket-name return 403 AccessDenied
s3cmd mb s3://bucket-name return 405 MethodNotAllowed
Here is my apache config
http://pastebin.com/BWwQxVkD
Here is log of s3cmd
http://pastebin.com/DRtQjtvP

ceph version is 0.72.2
ceph-radosgw version is 0.72.2

Please help me, thanks.

---
zhongku


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy

2014-04-04 Thread Georgios Dimitrakakis

On 03/04/2014 15:51, Brian Candler wrote:

On 03/04/2014 15:42, Georgios Dimitrakakis wrote:

Hi Brian,

try disabling requiretty in visudo on all nodes.

There is no requiretty in the sudoers file, or indeed any file 
under /etc.


The manpage says that requiretty is off by default, but I suppose 
Ubuntu could have broken that. So just to be sure, I created 
/etc/sudoers.d/norequiretty with:


Defaults !requiretty

on all nodes. It doesn't make any difference.
Actually, I missed something, as I found when trying to do a local 
sudo:


sudo: /etc/sudoers.d/norequiretty is mode 0644, but should be 0440

But fixing that doesn't prevent the original problem.

brian@ceph-admin:~/my-cluster$ ceph-deploy install node1 node2 node3
[ceph_deploy.cli][INFO  ] Invoked (1.4.0): /usr/bin/ceph-deploy
install node1 node2 node3
[ceph_deploy.install][DEBUG ] Installing stable version emperor on
cluster ceph hosts node1 node2 node3
[ceph_deploy.install][DEBUG ] Detecting platform for host node1 ...
sudo: no tty present and no askpass program specified
Sorry, try again.
sudo: no tty present and no askpass program specified
Sorry, try again.
sudo: no tty present and no askpass program specified
Sorry, try again.
sudo: 3 incorrect password attempts
[node1][DEBUG ] connected to host: node1

Regards,

Brian.


This definitely has to do with the tty!

If you login with SSH there is no TTY present to ask for the sudo 
password.


For instance, I 've found these links on the net

http://stackoverflow.com/questions/21659637/sudo-no-tty-present-and-no-askpass-program-specified-netbeans

http://askubuntu.com/questions/281742/sudo-no-tty-present-and-no-askpass-program-specified

both saying the same thing!

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] external monitoring tools for ceph

2014-07-01 Thread Georgios Dimitrakakis

Hi Craig,

I am also interested at the Zabbix templates and scripts if you can 
publish them.


Regards,

G.

On Mon, 30 Jun 2014 18:15:12 -0700, Craig Lewis wrote:

You should check out Calamari (https://github.com/ceph/calamari [3]),
Inktanks monitoring and administration tool.

 I started before Calamari was announced, so I rolled my own using
using Zabbix.  It handles all the monitoring, graphing, and alerting
in one tool.  Its kind of a pain to setup, but works ok now that its
going.
I dont know how to handle the cluster view though.  Im monitoring
individual machines.  Whenever something happens, like an OSD stops
responding, I get an alert from every monitor.  Otherwise its not a
big deal.

Im in the middle of re-factoring the data gathering from poll to 
push.

 If youre interested, I can publish my templates and scripts when Im
done.

On Sun, Jun 29, 2014 at 1:17 AM, pragya jain  wrote:


Hello all,

I am working on ceph storage cluster with rados gateway for object
storage.
I am looking for external monitoring tools that can be used to
monitor ceph storage cluster and rados gateway interface.
I find various monitoring tools, such as nagios, collectd, ganglia,
diamond, sensu, logstash.
but i dont get details of anyone about what features do these
monitoring tools monitor in ceph.

Has somebody implemented anyone of these tools?

Can somebody help me in identifying the features provided by these
tools?

Is there any other tool which can also be used to monitor ceph
specially for object storage?

Regards
Pragya Jain
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] https://github.com/ceph/calamari
[4] mailto:prag_2...@yahoo.co.in


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inktank-mellanox webinar access ?

2014-07-10 Thread Georgios Dimitrakakis

The same here...Neither do I or my colleagues
G.

On Thu, 10 Jul 2014 16:55:22 +0200 (CEST), Alexandre DERUMIER wrote:

Hi,

sorry to spam the mailing list,

but they are a inktank mellanox webinar  in 10minutes,

and I don't have receive access since I have been registered
yesterday (same for my co-worker).

and the webinar mellanox contact email (conta...@mellanox.com), does
not exist

Maybe somebody from Inktank or Mellanox could help us ?

Regards,

Alexandre



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inktank-mellanox webinar access ?

2014-07-10 Thread Georgios Dimitrakakis

That makes two of us...

G.

On Thu, 10 Jul 2014 17:12:08 +0200 (CEST), Alexandre DERUMIER wrote:

Ok, sorry, we have finally receive the login a bit late.

Sorry again to have spam the mailing list
- Mail original -

De: Alexandre DERUMIER aderum...@odiso.com
À: ceph-users ceph-us...@ceph.com
Envoyé: Jeudi 10 Juillet 2014 16:55:22
Objet: [ceph-users] inktank-mellanox webinar access ?

Hi,

sorry to spam the mailing list,

but they are a inktank mellanox webinar in 10minutes,

and I don't have receive access since I have been registered
yesterday (same for my co-worker).

and the webinar mellanox contact email (conta...@mellanox.com), does
not exist

Maybe somebody from Inktank or Mellanox could help us ?

Regards,

Alexandre



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] inktank-mellanox webinar access ?

2014-07-14 Thread Georgios Dimitrakakis

Hi Karan!

Due to the late reception of the login info I 've also missed
a very big part of the webinar.

They did send me an e-mail though saying that they will let me know as 
soon as

a recording of the session will be available.

I will let you know again then.

Best,

G.

On Mon, 14 Jul 2014 12:40:54 +0300, Karan Singh wrote:

Hey i have missed the webinar , is this available for later review or
slides.

- Karan -

On 10 Jul 2014, at 18:27, Georgios Dimitrakakis  wrote:


That makes two of us...

G.

On Thu, 10 Jul 2014 17:12:08 +0200 (CEST), Alexandre DERUMIER wrote:


Ok, sorry, we have finally receive the login a bit late.

Sorry again to have spam the mailing list
- Mail original -

De: Alexandre DERUMIER
À: ceph-users
Envoyé: Jeudi 10 Juillet 2014 16:55:22
Objet: [ceph-users] inktank-mellanox webinar access ?

Hi,

sorry to spam the mailing list,

but they are a inktank mellanox webinar in 10minutes,

and I don't have receive access since I have been registered
yesterday (same for my co-worker).

and the webinar mellanox contact email (conta...@mellanox.com
[3]), does
not exist

Maybe somebody from Inktank or Mellanox could help us ?

Regards,

Alexandre

___
ceph-users mailing list
ceph-users@lists.ceph.com [4]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com [5]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




Links:
--
[1] mailto:aderum...@odiso.com
[2] mailto:ceph-us...@ceph.com
[3] mailto:conta...@mellanox.com
[4] mailto:ceph-users@lists.ceph.com
[5] mailto:ceph-users@lists.ceph.com
[6] mailto:gior...@acmac.uoc.gr


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Problem starting RADOS Gateway

2014-02-15 Thread Georgios Dimitrakakis

Dear all,

I am following this guide http://ceph.com/docs/master/radosgw/config/ 
to setup Object Storage on CentOS 6.5.


My problem is that when I try to start the service as indicated here: 
http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway



I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I get 
the following:


# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2 
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 24619
2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't 
support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone / region 
transfer performance may be affected

2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): ERROR: 
missing keyring, cannot use cephx for authentication
2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin 
initialization error (2) No such file or directory
2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown 
:/1024619

2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 mark_down_all
2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: waiting 
for dispatch queue
2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry 
start

2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: dispatch 
queue is stopped
2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry 
done
2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait: closing 
pipes

2014-02-15 16:03:38.714826 7fb65ba64820 10 -- :/1024619 reaper
2014-02-15 16:03:38.714828 7fb65ba64820 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714830 7fb65ba64820 10 -- :/1024619 wait: waiting 
for pipes  to close

2014-02-15 16:03:38.714832 7fb65ba64820 10 -- :/1024619 wait: done.
2014-02-15 16:03:38.714833 7fb65ba64820  1 -- :/1024619 shutdown 
complete.
2014-02-15 16:03:38.714916 7fb65ba64820 -1 Couldn't init storage 
provider (RADOS)


Obviously the problem is some missing keyring but which one and how can 
I solve this problem? Furthermore, why this is happening since I am 
following the guide to the letter?? Is something missing??


Best,

G.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem starting RADOS Gateway

2014-02-15 Thread Georgios Dimitrakakis

1) ceph -s is working as expected

# ceph -s
cluster c465bdb2-e0a5-49c8-8305-efb4234ac88a
 health HEALTH_OK
 monmap e1: 1 mons at {master=192.168.0.10:6789/0}, election epoch 
1, quorum 0 master

 mdsmap e111: 1/1/1 up {0=master=up:active}
 osdmap e114: 2 osds: 2 up, 2 in
  pgmap v414: 1200 pgs, 14 pools, 10596 bytes data, 67 objects
500 GB used, 1134 GB / 1722 GB avail
1200 active+clean


2) In /etc/ceph I have the following files

# ls -l
total 20
-rw-r--r-- 1 root root  64 Feb 14 17:10 ceph.client.admin.keyring
-rw-r--r-- 1 root root 401 Feb 15 16:57 ceph.conf
-rw-r--r-- 1 root root 196 Feb 14 20:26 ceph.log
-rw-r--r-- 1 root root 120 Feb 15 11:08 keyring.radosgw.gateway
-rwxr-xr-x 1 root root  92 Dec 21 00:47 rbdmap

3) ceph.conf content is the following

# cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.0.10
mon_initial_members = master
fsid = c465bdb2-e0a5-49c8-8305-efb4234ac88a

[client.radosgw.gateway]
host = master
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log


4) And all the keys that exist are the following:

# ceph auth list
installed auth entries:

mds.master
key: xx==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
osd.0
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: xx==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: xx==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
key: AQBWLf5SGBAyBRAAzLwi5OXsAuR5vdo8hs+2zw==
caps: [mon] allow profile bootstrap-osd
client.radosgw.gateway
key: xx==
caps: [mon] allow rw
caps: [osd] allow rwx



I still don't get what is wrong...

G.

On Sat, 15 Feb 2014 16:27:41 +0100, Udo Lembke wrote:

Hi,
does ceph -s also stuck on missing keyring?

Do you have an keyring like:
cat /etc/ceph/keyring
[client.admin]
key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

Or do you have anothe defined keyring in ceph.conf?
global-section - keyring = /etc/ceph/keyring

The key is in ceph - see
ceph auth get-key client.admin
AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

or ceph auth list for all keys.
Key-genaration is doing by get-or-create key like this (but in this 
case

for bootstap-osd):
ceph auth get-or-create-key client.bootstrap-osd mon allow profile
bootstrap-osd

Udo

On 15.02.2014 15:35, Georgios Dimitrakakis wrote:

Dear all,

I am following this guide 
http://ceph.com/docs/master/radosgw/config/

to setup Object Storage on CentOS 6.5.

My problem is that when I try to start the service as indicated 
here:


http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway


I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I 
get

the following:

# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 
24619

2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't
support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone /
region transfer performance may be affected
2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): 
ERROR:

missing keyring, cannot use cephx for authentication
2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin
initialization error (2) No such file or directory
2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown
:/1024619
2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 
mark_down_all
2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: 
waiting

for dispatch queue
2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 reaper_entry
start
2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714764 7fb65ba64820 10 -- :/1024619 wait: 
dispatch

queue is stopped
2014-02-15 16:03:38.714786 7fb64b5fe700 10 -- :/1024619 reaper_entry 
done
2014-02-15 16:03:38.714819 7fb65ba64820 10 -- :/1024619 wait

[ceph-users] Error Adding Keyring Entries

2014-02-17 Thread Georgios Dimitrakakis
Could someone help me with the following error when I try to add 
keyring entries:


# ceph -k /etc/ceph/ceph.client.admin.keyring auth add 
client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
Error EINVAL: entity client.radosgw.gateway exists but key does not 
match

#

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error Adding Keyring Entries

2014-02-17 Thread Georgios Dimitrakakis
I managed to solve my problem by deleting the key from the list and 
re-adding it!


Best,

G.

On Mon, 17 Feb 2014 10:46:36 +0200, Georgios Dimitrakakis wrote:

Could someone help me with the following error when I try to add
keyring entries:

# ceph -k /etc/ceph/ceph.client.admin.keyring auth add
client.radosgw.gateway -i /etc/ceph/keyring.radosgw.gateway
Error EINVAL: entity client.radosgw.gateway exists but key does not 
match

#

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem starting RADOS Gateway

2014-02-17 Thread Georgios Dimitrakakis

Could someone check this: http://pastebin.com/DsCh5YPm

and let me know what am I doing wrong?


Best,

G.

On Sat, 15 Feb 2014 20:27:16 +0200, Georgios Dimitrakakis wrote:

1) ceph -s is working as expected

# ceph -s
cluster c465bdb2-e0a5-49c8-8305-efb4234ac88a
 health HEALTH_OK
 monmap e1: 1 mons at {master=192.168.0.10:6789/0}, election
epoch 1, quorum 0 master
 mdsmap e111: 1/1/1 up {0=master=up:active}
 osdmap e114: 2 osds: 2 up, 2 in
  pgmap v414: 1200 pgs, 14 pools, 10596 bytes data, 67 objects
500 GB used, 1134 GB / 1722 GB avail
1200 active+clean


2) In /etc/ceph I have the following files

# ls -l
total 20
-rw-r--r-- 1 root root  64 Feb 14 17:10 ceph.client.admin.keyring
-rw-r--r-- 1 root root 401 Feb 15 16:57 ceph.conf
-rw-r--r-- 1 root root 196 Feb 14 20:26 ceph.log
-rw-r--r-- 1 root root 120 Feb 15 11:08 keyring.radosgw.gateway
-rwxr-xr-x 1 root root  92 Dec 21 00:47 rbdmap

3) ceph.conf content is the following

# cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.0.10
mon_initial_members = master
fsid = c465bdb2-e0a5-49c8-8305-efb4234ac88a

[client.radosgw.gateway]
host = master
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log


4) And all the keys that exist are the following:

# ceph auth list
installed auth entries:

mds.master
key: xx==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
osd.0
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: xx==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: xx==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
key: AQBWLf5SGBAyBRAAzLwi5OXsAuR5vdo8hs+2zw==
caps: [mon] allow profile bootstrap-osd
client.radosgw.gateway
key: xx==
caps: [mon] allow rw
caps: [osd] allow rwx



I still don't get what is wrong...

G.

On Sat, 15 Feb 2014 16:27:41 +0100, Udo Lembke wrote:

Hi,
does ceph -s also stuck on missing keyring?

Do you have an keyring like:
cat /etc/ceph/keyring
[client.admin]
key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

Or do you have anothe defined keyring in ceph.conf?
global-section - keyring = /etc/ceph/keyring

The key is in ceph - see
ceph auth get-key client.admin
AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

or ceph auth list for all keys.
Key-genaration is doing by get-or-create key like this (but in this 
case

for bootstap-osd):
ceph auth get-or-create-key client.bootstrap-osd mon allow profile
bootstrap-osd

Udo

On 15.02.2014 15:35, Georgios Dimitrakakis wrote:

Dear all,

I am following this guide 
http://ceph.com/docs/master/radosgw/config/

to setup Object Storage on CentOS 6.5.

My problem is that when I try to start the service as indicated 
here:


http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway


I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I 
get

the following:

# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 
24619

2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl doesn't
support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone /
region transfer performance may be affected
2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1 monclient(hunting): 
ERROR:

missing keyring, cannot use cephx for authentication
2014-02-15 16:03:38.714440 7fb65ba64820  0 librados: client.admin
initialization error (2) No such file or directory
2014-02-15 16:03:38.714463 7fb65ba64820 10 -- :/1024619 shutdown
:/1024619
2014-02-15 16:03:38.714468 7fb65ba64820  1 -- :/1024619 
mark_down_all
2014-02-15 16:03:38.714477 7fb65ba64820 10 -- :/1024619 wait: 
waiting

for dispatch queue
2014-02-15 16:03:38.714406 7fb64b5fe700 10 -- :/1024619 
reaper_entry

start
2014-02-15 16:03:38.714506 7fb64b5fe700 10 -- :/1024619 reaper
2014-02-15 16:03:38.714522 7fb64b5fe700 10 -- :/1024619 reaper done
2014-02-15 16:03:38.714764 7fb65ba64820 10

Re: [ceph-users] Problem starting RADOS Gateway

2014-02-18 Thread Georgios Dimitrakakis
I did manage and found a solution for this thanks to the people at the 
IRC channel!


Many thanks to all of them and especially to +andreask and josef_

In short the problem was due to the fact that for some reason the 
hostname command was not giving out the machine's name but instead it 
was producing the FQDN. Therefore, I had to change in the ceph.conf file 
at the section of the gateway the host to FQDN. Furthermore, the problem 
regarding the socket was solved by removing the /tmp/*.socket that was 
created at some point.


Once again a million thanks to +andreask and josef_ from the IRC 
channel!


Best,


G.

On Mon, 17 Feb 2014 11:44:37 +0200, Georgios Dimitrakakis wrote:

Could someone check this: http://pastebin.com/DsCh5YPm

and let me know what am I doing wrong?


Best,

G.

On Sat, 15 Feb 2014 20:27:16 +0200, Georgios Dimitrakakis wrote:

1) ceph -s is working as expected

# ceph -s
cluster c465bdb2-e0a5-49c8-8305-efb4234ac88a
 health HEALTH_OK
 monmap e1: 1 mons at {master=192.168.0.10:6789/0}, election
epoch 1, quorum 0 master
 mdsmap e111: 1/1/1 up {0=master=up:active}
 osdmap e114: 2 osds: 2 up, 2 in
  pgmap v414: 1200 pgs, 14 pools, 10596 bytes data, 67 objects
500 GB used, 1134 GB / 1722 GB avail
1200 active+clean


2) In /etc/ceph I have the following files

# ls -l
total 20
-rw-r--r-- 1 root root  64 Feb 14 17:10 ceph.client.admin.keyring
-rw-r--r-- 1 root root 401 Feb 15 16:57 ceph.conf
-rw-r--r-- 1 root root 196 Feb 14 20:26 ceph.log
-rw-r--r-- 1 root root 120 Feb 15 11:08 keyring.radosgw.gateway
-rwxr-xr-x 1 root root  92 Dec 21 00:47 rbdmap

3) ceph.conf content is the following

# cat ceph.conf
[global]
auth_service_required = cephx
filestore_xattr_use_omap = true
auth_client_required = cephx
auth_cluster_required = cephx
mon_host = 192.168.0.10
mon_initial_members = master
fsid = c465bdb2-e0a5-49c8-8305-efb4234ac88a

[client.radosgw.gateway]
host = master
keyring = /etc/ceph/keyring.radosgw.gateway
rgw socket path = /tmp/radosgw.sock
log file = /var/log/ceph/radosgw.log


4) And all the keys that exist are the following:

# ceph auth list
installed auth entries:

mds.master
key: xx==
caps: [mds] allow
caps: [mon] allow profile mds
caps: [osd] allow rwx
osd.0
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
osd.1
key: xx==
caps: [mon] allow profile osd
caps: [osd] allow *
client.admin
key: xx==
caps: [mds] allow
caps: [mon] allow *
caps: [osd] allow *
client.bootstrap-mds
key: xx==
caps: [mon] allow profile bootstrap-mds
client.bootstrap-osd
key: AQBWLf5SGBAyBRAAzLwi5OXsAuR5vdo8hs+2zw==
caps: [mon] allow profile bootstrap-osd
client.radosgw.gateway
key: xx==
caps: [mon] allow rw
caps: [osd] allow rwx



I still don't get what is wrong...

G.

On Sat, 15 Feb 2014 16:27:41 +0100, Udo Lembke wrote:

Hi,
does ceph -s also stuck on missing keyring?

Do you have an keyring like:
cat /etc/ceph/keyring
[client.admin]
key = AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

Or do you have anothe defined keyring in ceph.conf?
global-section - keyring = /etc/ceph/keyring

The key is in ceph - see
ceph auth get-key client.admin
AQCdkHZR2NBYMBAATe/rqIwCI96LTuyS3gmMXp==

or ceph auth list for all keys.
Key-genaration is doing by get-or-create key like this (but in this 
case

for bootstap-osd):
ceph auth get-or-create-key client.bootstrap-osd mon allow profile
bootstrap-osd

Udo

On 15.02.2014 15:35, Georgios Dimitrakakis wrote:

Dear all,

I am following this guide 
http://ceph.com/docs/master/radosgw/config/

to setup Object Storage on CentOS 6.5.

My problem is that when I try to start the service as indicated 
here:


http://ceph.com/docs/master/radosgw/config/#restart-services-and-start-the-gateway


I get nothing

# service ceph-radosgw start
Starting radosgw instance(s)...

and if I check if the service is running obviously it is not!

# service ceph-radosgw status
/usr/bin/radosgw is not running.


If I try to start it manually without using the service command I 
get

the following:

# /usr/bin/radosgw -d -c /etc/ceph/ceph.conf --debug_ms 10
2014-02-15 16:03:38.709235 7fb65ba64820  0 ceph version 0.72.2
(a913ded2ff138aefb8cb84d347d72164099cfd60), process radosgw, pid 
24619
2014-02-15 16:03:38.709249 7fb65ba64820 -1 WARNING: libcurl 
doesn't

support curl_multi_wait()
2014-02-15 16:03:38.709252 7fb65ba64820 -1 WARNING: cross zone /
region transfer performance may be affected
2014-02-15 16:03:38.713898 7fb65ba64820 10 -- :/0 ready :/0
2014-02-15 16:03:38.714323 7fb65ba64820  1 -- :/0 messenger.start
2014-02-15 16:03:38.714434 7fb65ba64820 -1

[ceph-users] CORS and CEPH

2014-02-18 Thread Georgios Dimitrakakis
Dear all,

do I need to put or do something special in order to enable CORS support in 
CEPH?

Are there any links on how to test it?

Thanks for your help!

G.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph stops responding

2014-03-05 Thread Georgios Dimitrakakis

Hi!

I have installed ceph and created two osds and was very happy with that 
but apparently not everything was correct.



Today after a system reboot the cluster comes up and for a few moments 
it seems that it's ok (using the ceph health command) but after a few 
seconds the ceph health command doesn't produce any output at all.


It justs stays there without anything on the screen...


ceph -w is doing the same as well...


If I restart the ceph services (service ceph restart) again for a few 
seconds is working but after a few more it stays frozen.


Initially I thought that this was a firewall problem but apparently it 
isn't.


Then I though that this had to do with the

public_network

cluster_network

not defined in ceph.conf and changed that.

No matter whatever I do the cluster works for a few seconds after the 
service restart and then it stops responding...


Any help much appreciated!!!


Best,


G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stops responding

2014-03-05 Thread Georgios Dimitrakakis

My setup consists of two nodes.

The first node (master) is running:

-mds
-mon
-osd.0



and the second node (CLIENT) is running:

-osd.1


Therefore I 've restarted ceph services on both nodes


Leaving the ceph -w running for as long as it can after a few seconds 
the error that is produced is this:


2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for new 
mon
2014-03-05 12:08:17.716108 7fba102f8700  0 -- 192.168.0.10:0/1008298  
X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7fba080090b0).fault



(where X.Y.Z.X is the public IP of the CLIENT node).

And it keep goes on...

ceph-health after a few minutes shows the following

2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting): 
authenticate timed out after 300
2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin 
authentication error (110) Connection timed out

Error connecting to cluster: TimedOut


Any ideas now??

Best,

G.

On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:

First try to start OSD nodes by restarting the ceph service on ceph
nodes. If it works file then you could able to see ceph-osd process
running in process list. And do not need to add any public or private
network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.

Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.

also check /etc/hosts file with valid ip address of cluster nodes

Finally check ceph.client.admin.keyring and 
ceph.bootstrap-osd.keyring

should be matched in all the cluster nodes.

Best of luck.
Srinivas.

On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:


Hi!

I have installed ceph and created two osds and was very happy with
that but apparently not everything was correct.

Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the ceph health command) but
after a few seconds the ceph health command doesnt produce any
output at all.

It justs stays there without anything on the screen...

ceph -w is doing the same as well...

If I restart the ceph services (service ceph restart) again for a
few seconds is working but after a few more it stays frozen.

Initially I thought that this was a firewall problem but apparently
it isnt.

Then I though that this had to do with the

public_network

cluster_network

not defined in ceph.conf and changed that.

No matter whatever I do the cluster works for a few seconds after
the service restart and then it stops responding...

Any help much appreciated!!!

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stops responding

2014-03-05 Thread Georgios Dimitrakakis

Actually there are two monitors (my bad in the previous e-mail).
One at the MASTER and one at the CLIENT.

The monitor in CLIENT is failing with the following

2014-03-05 13:08:38.821135 7f76ba82b700  1 
mon.client1@0(leader).paxos(paxos active c 25603..26314) is_readable 
now=2014-03-05 13:08:38.821136 lease_expire=2014-03-05 13:08:40.845978 
has v0 lc 26314
2014-03-05 13:08:40.599287 7f76bb22c700  0 
mon.client1@0(leader).data_health(86) update_stats avail 4% total 
51606140 used 46645692 avail 2339008
2014-03-05 13:08:40.599527 7f76bb22c700 -1 
mon.client1@0(leader).data_health(86) reached critical levels of 
available space on data store -- shutdown!
2014-03-05 13:08:40.599530 7f76bb22c700  0 ** Shutdown via Data Health 
Service **
2014-03-05 13:08:40.599557 7f76b9328700 -1 mon.client1@0(leader) e2 *** 
Got Signal Interrupt ***
2014-03-05 13:08:40.599568 7f76b9328700  1 mon.client1@0(leader) e2 
shutdown

2014-03-05 13:08:40.599602 7f76b9328700  0 quorum service shutdown
2014-03-05 13:08:40.599609 7f76b9328700  0 
mon.client1@0(shutdown).health(86) HealthMonitor::service_shutdown 1 
services

2014-03-05 13:08:40.599613 7f76b9328700  0 quorum service shutdown


The thing is that there is plenty of space in that host (CLIENT)

# df -h
Filesystem Size  Used Avail Use% Mounted on
/dev/mapper/vg_one-lv_root 50G45G  2.3G  96% /
tmpfs  5.9G 0  5.9G   0% /dev/shm
/dev/sda1  485M   76M  384M  17% /boot
/dev/mapper/vg_one-lv_home 862G   249G 569G  31% /home


On the other hand the other host (MASTER) is running low on disk space 
(93% is full).


But why is the CLIENT failing while the MASTER is still running even 
though is running low on disk space?


I 'll try to free some space and see what happens next...

Best,

G.



On Wed, 05 Mar 2014 11:50:57 +0100, Wido den Hollander wrote:

On 03/05/2014 11:21 AM, Georgios Dimitrakakis wrote:

My setup consists of two nodes.

The first node (master) is running:

-mds
-mon
-osd.0



and the second node (CLIENT) is running:

-osd.1


Therefore I 've restarted ceph services on both nodes


Leaving the ceph -w running for as long as it can after a few 
seconds

the error that is produced is this:

2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for 
new mon
2014-03-05 12:08:17.716108 7fba102f8700  0 -- 192.168.0.10:0/1008298 


X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fba080090b0).fault


(where X.Y.Z.X is the public IP of the CLIENT node).

And it keep goes on...

ceph-health after a few minutes shows the following

2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting):
authenticate timed out after 300
2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin
authentication error (110) Connection timed out
Error connecting to cluster: TimedOut


Any ideas now??



Is the monitor actually running on the first node? If not, checked
the logs in /var/log/ceph as to why it isn't running.

Or maybe you just need to start it.

Wido


Best,

G.

On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:

First try to start OSD nodes by restarting the ceph service on ceph
nodes. If it works file then you could able to see ceph-osd process
running in process list. And do not need to add any public or 
private

network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.

Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.

also check /etc/hosts file with valid ip address of cluster nodes

Finally check ceph.client.admin.keyring and 
ceph.bootstrap-osd.keyring

should be matched in all the cluster nodes.

Best of luck.
Srinivas.

On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:


Hi!

I have installed ceph and created two osds and was very happy with
that but apparently not everything was correct.

Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the ceph health command) but
after a few seconds the ceph health command doesnt produce any
output at all.

It justs stays there without anything on the screen...

ceph -w is doing the same as well...

If I restart the ceph services (service ceph restart) again for 
a

few seconds is working but after a few more it stays frozen.

Initially I thought that this was a firewall problem but 
apparently

it isnt.

Then I though that this had to do with the

public_network

cluster_network

not defined in ceph.conf and changed that.

No matter whatever I do the cluster works for a few seconds after
the service restart and then it stops responding...

Any help much appreciated!!!

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3

Re: [ceph-users] {Disarmed} Re: Ceph stops responding

2014-03-05 Thread Georgios Dimitrakakis

 Can someone help me with this error:

 2014-03-05 14:54:27.253711 7f654fd3d700  0=20 
mon.client1@0(leader).data_health(96) update_stats avail 3% total=20

 51606140 used 47174264 avail 1810436
 2014-03-05 14:54:27.253916 7f654fd3d700 -1=20 
mon.client1@0(leader).data_health(96) reached critical levels of=20

 available space on data store -- shutdown!


 Why is it showing only 3% available when there is plenty of storage???


 Best,


 G.

On Wed, 5 Mar 2014 17:51:28 +0530, Srinivasa Rao Ragolu wrote:

Ideal setup is node1 for mon, node2 is for OSD1 and node3 is OSD2
(Nodes can be VMs also).

MDS is not required if you are not using flie system storage using
ceph

 Please follow the blog for part 1 ,2 and 3 for detailed steps

http://karan-mj.blogspot.in/2013/12/what-is-ceph-ceph-is-open-source.html
[7]

Follow each and every instruction on this blog

Thanks,Srinivas.

On Wed, Mar 5, 2014 at 3:44 PM, Georgios Dimitrakakis  wrote:


My setup consists of two nodes.

The first node (master) is running:

-mds
-mon
-osd.0

and the second node (CLIENT) is running:

-osd.1

Therefore I ve restarted ceph services on both nodes

Leaving the ceph -w running for as long as it can after a few
seconds the error that is produced is this:

2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for
new mon
2014-03-05 12:08:17.716108 7fba102f8700  0 -- MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.0.10:0/1008298 [6] 
X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fba080090b0).fault

(where X.Y.Z.X is the public IP of the CLIENT node).

And it keep goes on...

ceph-health after a few minutes shows the following

2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting):
authenticate timed out after 300
2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin
authentication error (110) Connection timed out
Error connecting to cluster: TimedOut

Any ideas now??

Best,

G.


First try to start OSD nodes by restarting the ceph service on
ceph
nodes. If it works file then you could able to see ceph-osd
process
running in process list. And do not need to add any public or
private
network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.

Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.

also check /etc/hosts file with valid ip address of cluster nodes

Finally check ceph.client.admin.keyring and
ceph.bootstrap-osd.keyring
should be matched in all the cluster nodes.

Best of luck.
Srinivas.

On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:


Hi!

I have installed ceph and created two osds and was very happy
with
that but apparently not everything was correct.

Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the ceph health command)
but
after a few seconds the ceph health command doesnt produce
any

output at all.

It justs stays there without anything on the screen...

ceph -w is doing the same as well...

If I restart the ceph services (service ceph restart) again
for a
few seconds is working but after a few more it stays frozen.

Initially I thought that this was a firewall problem but
apparently
it isnt.

Then I though that this had to do with the

public_network

cluster_network

not defined in ceph.conf and changed that.

No matter whatever I do the cluster works for a few seconds
after
the service restart and then it stops responding...

Any help much appreciated!!!

Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com [1] [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] [2]


Links:
--
[1] mailto:ceph-users@lists.ceph.com [3]
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]
[3] mailto:gior...@acmac.uoc.gr [5]


--




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:gior...@acmac.uoc.gr
[6] http://192.168.0.10:0/1008298
[7] 
http://karan-mj.blogspot.in/2013/12/what-is-ceph-ceph-is-open-source.html

[8] mailto:gior...@acmac.uoc.gr


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Ceph stops responding

2014-03-06 Thread Georgios Dimitrakakis

Good spot!!!

The same problem here!!!

Best,

G.

On Thu, 6 Mar 2014 12:28:26 +0100 (CET), Jerker Nyberg wrote:

I had this error yesterday. I had run out of storage at
/var/lib/cepoh/mon/ at the local file system on the monitor.

Kind regards,
Jerker Nyberg


On Wed, 5 Mar 2014, Georgios Dimitrakakis wrote:


Can someone help me with this error:

2014-03-05 14:54:27.253711 7f654fd3d700  0=20 
mon.client1@0(leader).data_health(96) update_stats avail 3% total=20

51606140 used 47174264 avail 1810436
2014-03-05 14:54:27.253916 7f654fd3d700 -1=20 
mon.client1@0(leader).data_health(96) reached critical levels of=20

available space on data store -- shutdown!


Why is it showing only 3% available when there is plenty of 
storage???



Best,


G.

On Wed, 5 Mar 2014 17:51:28 +0530, Srinivasa Rao Ragolu wrote:

Ideal setup is node1 for mon, node2 is for OSD1 and node3 is OSD2
(Nodes can be VMs also).
MDS is not required if you are not using flie system storage using
ceph

 Please follow the blog for part 1 ,2 and 3 for detailed steps

http://karan-mj.blogspot.in/2013/12/what-is-ceph-ceph-is-open-source.html
[7]
Follow each and every instruction on this blog
Thanks,Srinivas.
On Wed, Mar 5, 2014 at 3:44 PM, Georgios Dimitrakakis  wrote:


My setup consists of two nodes.
The first node (master) is running:
-mds
-mon
-osd.0
and the second node (CLIENT) is running:
-osd.1
Therefore I ve restarted ceph services on both nodes
Leaving the ceph -w running for as long as it can after a few
seconds the error that is produced is this:
2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for
new mon
2014-03-05 12:08:17.716108 7fba102f8700  0 -- MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 192.168.0.10:0/1008298 [6] 
X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
c=0x7fba080090b0).fault
(where X.Y.Z.X is the public IP of the CLIENT node).
And it keep goes on...
ceph-health after a few minutes shows the following
2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting):
authenticate timed out after 300
2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin
authentication error (110) Connection timed out
Error connecting to cluster: TimedOut
Any ideas now??
Best,
G.


First try to start OSD nodes by restarting the ceph service on
ceph
nodes. If it works file then you could able to see ceph-osd
process
running in process list. And do not need to add any public or
private
network in ceph.conf. If none of the OSDs run then you need to
reconfigure them from monitor node.
Please check ceph-mon process is running on monitor node or not?
ceph-mds should not run.
also check /etc/hosts file with valid ip address of cluster nodes
Finally check ceph.client.admin.keyring and
ceph.bootstrap-osd.keyring
should be matched in all the cluster nodes.
Best of luck.
Srinivas.
On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:


Hi!
I have installed ceph and created two osds and was very happy
with
that but apparently not everything was correct.
Today after a system reboot the cluster comes up and for a few
moments it seems that its ok (using the ceph health command)
but
after a few seconds the ceph health command doesnt produce
any
output at all.
It justs stays there without anything on the screen...
ceph -w is doing the same as well...
If I restart the ceph services (service ceph restart) again
for a
few seconds is working but after a few more it stays frozen.
Initially I thought that this was a firewall problem but
apparently
it isnt.
Then I though that this had to do with the
public_network
cluster_network
not defined in ceph.conf and changed that.
No matter whatever I do the cluster works for a few seconds
after
the service restart and then it stops responding...
Any help much appreciated!!!
Best,
G.
___
ceph-users mailing list
ceph-users@lists.ceph.com [1] [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] [2]

Links:
--
[1] mailto:ceph-users@lists.ceph.com [3]
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]
[3] mailto:gior...@acmac.uoc.gr [5]

--



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:gior...@acmac.uoc.gr
[6] http://192.168.0.10:0/1008298
[7] 
http://karan-mj.blogspot.in/2013/12/what-is-ceph-ceph-is-open-source.html

[8] mailto:gior...@acmac.uoc.gr


-- ___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH BackUPs

2015-02-02 Thread Georgios Dimitrakakis

Hi Christian,


On Fri, 30 Jan 2015 01:22:53 +0200 Georgios Dimitrakakis wrote:

 Urged by a previous post by Mike Winfield where he suffered a 
leveldb

 loss
 I would like to know which files are critical for CEPH operation 
and

 must
 be backed-up regularly and how are you people doing it?


Aside from probably being quite hard/disruptive to back up a monitor
leveldb, it will also be quite pointless, as it constantly changes.



I can understand that about leveldb. But besides that are there any 
other files that can be backed up just in case?


This is why one has at least 3 monitors on different machines that 
are on
different UPS backed circuits and storing things on SSDs that are 
also

power failure proof.
And if a monitor gets destroyed like that, the official fix suggested 
by
the Ceph developers is to re-create it from scratch and let it catch 
up to

the good monitors.

That being said, aside from a backup of the actual data on the 
cluster
(which is another challenge), one wonders if in Mike's case a RBD 
FSCK
of sorts can be created that is capable of restoring things based on 
the

actual data still on the OSDs.



So if I understand correctly if one loses the monitors then all the 
data cannot be

accessed even if they still exist on the OSDs?
Is there a feature like that that can get the data that still exist on 
the OSDs?




Christian



All the best,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH RBD and OpenStack

2015-02-07 Thread Georgios Dimitrakakis

Hi John,

I have already put these rules in the firewall but no luck.

Using iptraf I saw that every time is going at a TCP port 33000 plus 
something...different every time!


Best,

George

On Sat, 7 Feb 2015 18:40:38 +0100, John Spray wrote:

The relevant docs are here:

http://ceph.com/docs/master/start/quick-start-preflight/#open-required-ports

John

On Sat, Feb 7, 2015 at 4:33 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Hi all!

I am integrating my OpenStack Cluster with CEPH in order to be able 
to

provide volumes for the instances!

I have managed to perform all operations successfully with one catch 
only.


If firewall services (iptables) are running on the CEPH node then I 
am stack

at attaching state.

Therefore can you please elaborate on which ports should be open on 
the CEPH

node in order to function without a problem?


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH RBD and OpenStack

2015-02-08 Thread Georgios Dimitrakakis

By the way,

on the link that John send I believe there is a typo.

In the very beginning of the Open Required Ports session the port 
range says 6800:7810 where below is

mentioned as 6800:7100.

I think that the former is a typo based on previous documentation where 
the ports where declared to be 6800:6810.


Furthermore the same inconsistency on ports is observed at the Network 
Configuration Reference here: 
http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/


Some times they are mentioned as 6800:7100 while others 6800:6810


Should I file a bug for this?


Regards,


George




Hi John,

I have already put these rules in the firewall but no luck.

Using iptraf I saw that every time is going at a TCP port 33000
plus something...different every time!

Best,

George

On Sat, 7 Feb 2015 18:40:38 +0100, John Spray wrote:

The relevant docs are here:

http://ceph.com/docs/master/start/quick-start-preflight/#open-required-ports

John

On Sat, Feb 7, 2015 at 4:33 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:

Hi all!

I am integrating my OpenStack Cluster with CEPH in order to be able 
to

provide volumes for the instances!

I have managed to perform all operations successfully with one 
catch only.


If firewall services (iptables) are running on the CEPH node then I 
am stack

at attaching state.

Therefore can you please elaborate on which ports should be open on 
the CEPH

node in order to function without a problem?


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH Expansion

2015-01-16 Thread Georgios Dimitrakakis

Hi all!

I would like to expand our CEPH Cluster and add a second OSD node.

In this node I will have ten 4TB disks dedicated to CEPH.

What is the proper way of putting them in the already available CEPH 
node?


I guess that the first thing to do is to prepare them with ceph-deploy 
and mark them as out at preparation.


I should then restart the services and add (mark as in) one of them. 
Afterwards, I have to wait for the rebalance
to occur and upon finishing I will add the second and so on. Is this 
safe enough?



How long do you expect the rebalancing procedure to take?


I already have ten more 4TB disks at another node and the amount of 
data is around 40GB with 2x replication factor.

The connection is over Gigabit.


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-18 Thread Georgios Dimitrakakis

Hi Jiri,

thanks for the feedback.

My main concern is if it's better to add each OSD one-by-one and wait 
for the cluster to rebalance every time or do it all-together at once.


Furthermore an estimate of the time to rebalance would be great!

Regards,


George


Hi George,

 List disks available:
 # $ ceph-deploy disk list {node-name [node-name]...}

 Add OSD using osd create:
 # $ ceph-deploy osd create {node-name}:{disk}[:{path/to/journal}]

 Or you can use the manual steps to prepare and activate disk
described at

http://ceph.com/docs/master/start/quick-ceph-deploy/#expanding-your-cluster
[3]

 Jiri

On 15/01/2015 06:36, Georgios Dimitrakakis wrote:


Hi all!

I would like to expand our CEPH Cluster and add a second OSD node.

In this node I will have ten 4TB disks dedicated to CEPH.

What is the proper way of putting them in the already available
CEPH node?

I guess that the first thing to do is to prepare them with
ceph-deploy and mark them as out at preparation.

I should then restart the services and add (mark as in) one of
them. Afterwards, I have to wait for the rebalance
to occur and upon finishing I will add the second and so on. Is
this safe enough?

How long do you expect the rebalancing procedure to take?

I already have ten more 4TB disks at another node and the amount of
data is around 40GB with 2x replication factor.
The connection is over Gigabit.

Best,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]




Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3]

http://ceph.com/docs/master/start/quick-ceph-deploy/#expanding-your-cluster


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH Expansion

2015-01-25 Thread Georgios Dimitrakakis

Hi Craig!

Indeed I had reduced the replicated size to 2 instead of 3 while the 
minimum size is 1.


I hadn't touched the crushmap though.

I would like to keep on going with the replicated size of 2 . Do you 
think this would be a problem?


Please find below the output of the command:

$ ceph osd dump | grep ^pool
pool 3 'data' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 524 flags hashpspool 
stripe_width 0
pool 4 'metadata' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 526 flags 
hashpspool stripe_width 0
pool 5 'rbd' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 528 flags hashpspool 
stripe_width 0
pool 6 '.rgw' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 618 flags hashpspool 
stripe_width 0
pool 7 '.rgw.control' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 616 flags 
hashpspool stripe_width 0
pool 8 '.rgw.gc' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 614 flags 
hashpspool stripe_width 0
pool 9 '.log' replicated size 2 min_size 1 crush_ruleset 0 object_hash 
rjenkins pg_num 512 pgp_num 512 last_change 612 flags hashpspool 
stripe_width 0
pool 10 '.intent-log' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 610 flags 
hashpspool stripe_width 0
pool 11 '.usage' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 608 flags 
hashpspool stripe_width 0
pool 12 '.users' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 606 flags 
hashpspool stripe_width 0
pool 13 '.users.email' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 604 flags 
hashpspool stripe_width 0
pool 14 '.users.swift' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 602 flags 
hashpspool stripe_width 0
pool 15 '.users.uid' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 600 flags 
hashpspool stripe_width 0
pool 16 '.rgw.root' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 512 pgp_num 512 last_change 598 flags 
hashpspool stripe_width 0
pool 17 '.rgw.buckets.index' replicated size 2 min_size 1 crush_ruleset 
0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 596 flags 
hashpspool stripe_width 0
pool 18 '.rgw.buckets' replicated size 2 min_size 1 crush_ruleset 0 
object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 826 flags 
hashpspool stripe_width 0
pool 19 '.rgw.buckets.extra' replicated size 2 min_size 1 crush_ruleset 
0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 722 owner 
18446744073709551615 flags hashpspool stripe_width 0




Warmest regards,


George


Youve either modified the crushmap, or changed the pool size to 1. 
The defaults create 3 replicas on different hosts.

What does `ceph osd dump | grep ^pool` output?  If the size param is
1, then you reduced the replica count.  If the size param is  1, you
mustve adjusted the crushmap.

Either way, after you add the second node would be the ideal time to
change that back to the default.

Given that you only have 40GB of data in the cluster, you shouldnt
have a problem adding the 2nd node.

On Fri, Jan 23, 2015 at 3:58 PM, Georgios Dimitrakakis  wrote:


Hi Craig!

For the moment I have only one node with 10 OSDs.
I want to add a second one with 10 more OSDs.

Each OSD in every node is a 4TB SATA drive. No SSD disks!

The data ara approximately 40GB and I will do my best to have zero
or at least very very low load during the expansion process.

To be honest I havent touched the crushmap. I wasnt aware that I
should have changed it. Therefore, it still is with the default
one.
Is that OK? Where can I read about the host level replication in
CRUSH map in order
to make sure that its applied or how can I find if this is already
enabled?

Any other things that I should be aware of?

All the best,

George


It depends.  There are a lot of variables, like how many nodes
and
disks you currently have.  Are you using journals on SSD.  How
much
data is already in the cluster.  What the client load is on the
cluster.

Since you only have 40 GB in the cluster, it shouldnt take long
to
backfill.  You may find that it finishes backfilling faster than
you
can format the new disks.

Since you only have a single OSD node, you mustve changed the
crushmap
to allow replication over OSDs instead of hosts.  After you get
the
new node in would be the best time to switch back to host level
replication.  The more data you have, the more painful that
change
will become.

On Sun, Jan 18, 2015 at 10:09 AM

Re: [ceph-users] CEPH Expansion

2015-01-23 Thread Georgios Dimitrakakis

Hi Craig!


For the moment I have only one node with 10 OSDs.
I want to add a second one with 10 more OSDs.

Each OSD in every node is a 4TB SATA drive. No SSD disks!

The data ara approximately 40GB and I will do my best to have zero
or at least very very low load during the expansion process.

To be honest I haven't touched the crushmap. I wasn't aware that I
should have changed it. Therefore, it still is with the default one.
Is that OK? Where can I read about the host level replication in CRUSH 
map in order
to make sure that it's applied or how can I find if this is already 
enabled?


Any other things that I should be aware of?

All the best,


George



It depends.  There are a lot of variables, like how many nodes and
disks you currently have.  Are you using journals on SSD.  How much
data is already in the cluster.  What the client load is on the
cluster.

Since you only have 40 GB in the cluster, it shouldnt take long to
backfill.  You may find that it finishes backfilling faster than you
can format the new disks.

Since you only have a single OSD node, you mustve changed the 
crushmap

to allow replication over OSDs instead of hosts.  After you get the
new node in would be the best time to switch back to host level
replication.  The more data you have, the more painful that change
will become.

On Sun, Jan 18, 2015 at 10:09 AM, Georgios Dimitrakakis  wrote:


Hi Jiri,

thanks for the feedback.

My main concern is if its better to add each OSD one-by-one and
wait for the cluster to rebalance every time or do it all-together
at once.

Furthermore an estimate of the time to rebalance would be great!

Regards,



Links:
--
[1] mailto:gior...@acmac.uoc.gr


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-14 Thread Georgios Dimitrakakis

My problem was that commands like ceph -s fail to connect

and therefore I couldn't extract monmap.

I could get it from the running pid though and I 've used it
along with the documentation and the example of how a monmap
looks like in order to create a new and inject it into the
second monitor.

I belive that this was the action that solved my problems.not quiet 
confident though :-(



Thanks a lot to everyone that spend some time to deal with my problem!

All the best,

George


On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:

Sage,

correct me if I am wrong but this is when you have a surviving 
monitor !

right?


Yes.  By surviving I mean that the mon data directory has not been
deleted.


My problem is that I cannot extract the monmap from any!


Do you mean that the ceph -s or ceph health commands fail to connect 
(the

monitors cannot form quorum) or do you mean that when you follow the
instructinos on that link and run the 'ceph-mon --extract-monmap ...'
command (NOT 'ceph mon getmap ...') you get some error?  If so, 
please

paste the output!

I have a supicion though we're just using different terms.  The 
original
monitor's data is probably just fine, but something went wrong with 
the
configuration and it can't form a quorum with the one you tried to 
add, so
all of the commands are failing.  If so that's precisely the 
situation the

linked procedure will correct...

sage

   Best,


George

 On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
  Not a healthy monitor means that I can not get a monmap from 
none of them!


 If you look at the procedure at


 
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster


 you'll notice that you do not need any running monitors--it 
extracts the
 monmap from the data directory.  This procedure should let you 
remove all

 trace of the new monitor so that the original works as before.

 sage


  and none of the commands ceph health etc. are working.

 
  Best,
 
  George
 
   Yes Sage!
  
   Priority is to fix things!
  
   Right now I don't have a healthy monitor!
  
   Can I remove all of them and add the first one from scratch?
  
   What would that mean about the data??
  
   Best,
  
   George
  
On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:
 This is the message that is flooding the ceph-mon.log now:


  2015-03-14 08:16:39.286823 7f9f6920b700  1
  mon.fu@0(electing).elector(1) init, last seen epoch 1
  2015-03-14 08:16:42.736674 7f9f6880a700  1 
mon.fu@0(electing) e2

  adding peer 15.12.6.21:6789/0 to list of hints
  2015-03-14 08:16:42.737891 7f9f6880a700  1
  mon.fu@0(electing).elector(1) discarding election 
message:

  15.12.6.21:6789/0
  not in my monmap e2: 2 mons at
  {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}
   
It sounds like you need to follow some variation of this 
procedure:

   
   
   
   
  
http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

   
..although it may be that simply killing the daemon running 
on

  15.12.6.21
and restarting the other mon daemons will be enough.  If 
not, the
procedure linked above will let tyou remove all traces of it 
and get

things up again.
   
Not quite sure where things went awry but I assume the 
priority is to

  get
things working first and figure that out later!
   
sage
   



  George


  This is the log for monitor (ceph-mon.log) when I try to 
restart

  the
  monitor:
 
 
  2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 
mon.fu@0(probing) e2

  ***
  Got Signal Terminated ***
  2015-03-14 07:47:26.384593 7f1f1dc0f700  1 
mon.fu@0(probing) e2

  shutdown
  2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum 
service shutdown

  2015-03-14 07:47:26.384657 7f1f1dc0f700  0
  mon.fu@0(shutdown).health(0) 
HealthMonitor::service_shutdown 1

  services
  2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum 
service shutdown
  2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 
0.80.9
  (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process 
ceph-mon, pid

  17050
  2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting 
mon.fu rank 0

  at
  192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu 
fsid

  a1132ec2-7104-4e8e-a3d5-95965cae9138
  2015-03-14 07:47:27.703421 7fc04b4437a0  1 
mon.fu@-1(probing) e2

  preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
  2015-03-14 07:47:27.704504 7fc04b4437a0  1
  mon.fu@-1(probing).paxosservice(pgmap 897493..898204) 
refresh

  upgraded, format 0 - 1
  2015-03-14 07:47:27.704525 7fc04b4437a0  1 
mon.fu@-1(probing).pg

  v0
  on_upgrade discarding in-core PGMap
  2015-03-14 07:47:27.837060 7fc04b4437a0  0 
mon.fu@-1(probing).mds

  e104 print_map
  epoch 104
  flags 0
  created   2014-11-30

Re: [ceph-users] {Disarmed} Re: {Disarmed} Re: {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis

It seems that both PUBLIC_NETWORK and CLUSTER_NETWORK

have to be defined in order to work.


Otherwise if only PUBLIC NETWORK is defined a certain (quite vasty 
though) amount

of traffic is using the other interface.


All the best,

George


In that case - yes...put everything on 1 card - or if both cards are
1G (or same speed for that matter...) - then you might want toblock
all external traffic except i.e. SSH, WEB, but allow ALL traffic
between all CEPH OSDs... so you can still use that network for
public/client traffic - not sure how do you connect/use CEPH - from
internet ??? or you have some more VMs/servers/clients on 192.*
network... ?

On 14 March 2015 at 19:38, Georgios Dimitrakakis  wrote:


Andrija,

I have two cards!

One on 15.12.* and one on 192.*

Obviously the 15.12.* is the external network (real public IP
address e.g used to access the node via SSH)

Thats why I am telling that my public network for CEPH is the 192.
and should I use the cluster network for that as well?

Best,

George


Georgios,

no need to put ANYTHING if you dont plan to split client-to-OSD
vs
OSD-OSD-replication on 2 different Network Cards/Networks - for
pefromance reasons.

if you have only 1 network - simply DONT configure networks at
all
inside your CEPH.conf file...

if you have 2 x 1G cards in servers, then you may use first 1G
for
client traffic, and second 1G for OSD-to-OSD replication...

best

On 14 March 2015 at 19:33, Georgios Dimitrakakis  wrote:


Andrija,

Thanks for you help!

In my case I just have one 192.* network, so should I put that
for
both?

Besides monitors do I have to list OSDs as well?

Thanks again!

Best,

George


This is how I did it, and then retart each OSD one by one, but
monritor with ceph -s, when ceph is healthy, proceed with
next
OSD
restart...
Make sure the networks are fine on physical nodes, that you
can
ping
in between...

[global]
x
x
x
x
x
x

#
### REPLICATION NETWORK ON SEPARATE 10G NICs

# replication network
cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.251.0/24 [29] [29] [29]

# public/client network
public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.253.0/16 [30] [30] [30]

#

[mon.xx]
mon_addr = x.x.x.x:6789
host = xx

[mon.yy]
mon_addr = x.x.x.x:6789
host = yy

[mon.zz]
mon_addr = x.x.x.x:6789
host = zz

On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:


I thought that it was easy but apparently its not!

I have the following in my conf file

mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 192.168.1.0/24 [26] [26] [26]
mon_initial_members = fu,rai,jin

but still the 15.12.6.21 link is being saturated

Any ideas why???

Should I put cluster network as well??

Should I put each OSD in the CONF file???

Regards,

George


Andrija,

thanks a lot for the useful info!

I would also like to thank Kingrat at the IRC channel
for
his
useful advice!

I was under the wrong impression that public is the one
used
for
RADOS.

So I thought that public=external=internet and therefore
I
used
that
one in my conf.

I understand now that I should have specified in CEPH
Publics
Network what I call
internal and which is the one that all machines are
talking
directly to each other.

Thanks you all for the feedback!

Regards,

George


Public network is clients-to-OSD traffic - and if you
have
NOT
explicitely defined cluster network, than also
OSD-to-OSD
replication
takes place over same network.

Otherwise, you can define public and cluster(private)
network -
so OSD
replication will happen over dedicated NICs (cluster
network)
and thus
speed up.

If i.e. replica count on pool is 3, that means, each
1GB of
data
writen to some particualr OSD, will generate 3 x 1GB of
more
writes,
to the replicas... - which ideally will take place over
separate NICs
to speed up things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis 
wrote:


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and
transferring
data?

I have two nodes with two IP addresses each. One for
internal
network MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN
MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 192.168.1.0/24 [1] [1] [1] [1]
and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address
MAILSCANNER
WARNING

Re: [ceph-users] Ceph release timeline

2015-03-15 Thread Georgios Dimitrakakis

Indeed it is!

Thanks!

George



Thanks, thats quite helpful.

On 16 March 2015 at 08:29, Loic Dachary  wrote:


Hi Ceph,

In an attempt to clarify what Ceph release is stable, LTS or
development. a new page was added to the documentation:
http://ceph.com/docs/master/releases/ [1] It is a matrix where each
cell is a release number linked to the release notes from
http://ceph.com/docs/master/release-notes/ [2]. One line per month
and one column per release.

Cheers

--
Loïc Dachary, Artisan Logiciel Libre

___
ceph-users mailing list
ceph-users@lists.ceph.com [3]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW - Create bucket via admin API

2015-03-06 Thread Georgios Dimitrakakis

Hi Italo,

Check the S3 Bucket OPS at : 
http://ceph.com/docs/master/radosgw/s3/bucketops/


or use any of the examples provided in Python 
(http://ceph.com/docs/master/radosgw/s3/python/) or PHP 
(http://ceph.com/docs/master/radosgw/s3/php/) or JAVA 
(http://ceph.com/docs/master/radosgw/s3/java/) or anything else that is 
provided through S3 API (http://ceph.com/docs/master/radosgw/s3/)


Regards,


George


Hello guys,

On adminops documentation that saw how to remove a bucket, but I
can’t find the URI to create one, I’d like to know if this is
possible?

Regards.

ITALO SANTOS
http://italosantos.com.br/ [1]



Links:
--
[1] http://italosantos.com.br/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

Not a firewall problem!! Firewall is disabled ...

Loic I 've tried mon create because of this: 
http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors



Should I first create and then add?? What is the proper order??? Should 
I do it from the already existing monitor node or can I run it from the 
new one?


If I try add from the beginning I am getting this:

ceph_deploy.conf][DEBUG ] found configuration file at: 
/home/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy mon 
add jin
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 'new' to 
create a new cluster




Regards,


George



Hi,

I think ceph-deploy mon add (instead of create) is what you should be 
using.


Cheers

On 13/03/2015 22:25, Georgios Dimitrakakis wrote:

On an already available cluster I 've tried to add a new monitor!

I have used ceph-deploy mon create {NODE}

where {NODE}=the name of the node

and then I restarted the /etc/init.d/ceph service with a success at 
the node

where it showed that the monitor is running like:

# /etc/init.d/ceph restart
=== mon.jin ===
=== mon.jin ===
Stopping Ceph mon.jin on jin...kill 36388...done
=== mon.jin ===
Starting Ceph mon.jin on jin...
Starting ceph-create-keys on jin...



But checking the quorum it doesn't show the newly added monitor!

Plus ceph mon stat gives out only 1 monitor!!!

# ceph mon stat
e1: 1 mons at {fu=192.168.1.100:6789/0}, election epoch 1, quorum 0 
fu



Any ideas on what have I done wrong???


Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

This is the message that is flooding the ceph-mon.log now:


2015-03-14 08:16:39.286823 7f9f6920b700  1 
mon.fu@0(electing).elector(1) init, last seen epoch 1
2015-03-14 08:16:42.736674 7f9f6880a700  1 mon.fu@0(electing) e2  
adding peer 15.12.6.21:6789/0 to list of hints
2015-03-14 08:16:42.737891 7f9f6880a700  1 
mon.fu@0(electing).elector(1) discarding election message: 
15.12.6.21:6789/0
not in my monmap e2: 2 mons at 
{fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}




George


This is the log for monitor (ceph-mon.log) when I try to restart the 
monitor:



2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 ***
Got Signal Terminated ***
2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2 
shutdown

2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service shutdown
2015-03-14 07:47:26.384657 7f1f1dc0f700  0
mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
services
2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service shutdown
2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid
17050
2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 0 at
192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
a1132ec2-7104-4e8e-a3d5-95965cae9138
2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2
preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
2015-03-14 07:47:27.704504 7fc04b4437a0  1
mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
upgraded, format 0 - 1
2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg v0
on_upgrade discarding in-core PGMap
2015-03-14 07:47:27.837060 7fc04b4437a0  0 mon.fu@-1(probing).mds
e104 print_map
epoch   104
flags   0
created 2014-11-30 01:58:17.176938
modified2015-03-14 06:07:05.683239
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure0
last_failure_osd_epoch  1760
compat  compat={},rocompat={},incompat={1=base v0.20,2=client
writeable ranges,3=default file layouts on dirs,4=dir inode in
separate object,5=mds uses versioned encoding,6=dirfrag is stored in
omap}
max_mds 1
in  0
up  {0=59315}
failed
stopped
data_pools  3
metadata_pool   4
inline_data disabled
59315:  15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9

2015-03-14 07:47:27.837972 7fc04b4437a0  0 mon.fu@-1(probing).osd
e1768 crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.837990 7fc04b4437a0  0 mon.fu@-1(probing).osd
e1768 crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.837996 7fc04b4437a0  0 mon.fu@-1(probing).osd
e1768 crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.838003 7fc04b4437a0  0 mon.fu@-1(probing).osd
e1768 crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.839054 7fc04b4437a0  1
mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded,
format 0 - 1
2015-03-14 07:47:27.840052 7fc04b4437a0  0 mon.fu@-1(probing) e2  my
rank is now 0 (was -1)
2015-03-14 07:47:27.840512 7fc045ef5700  0 -- 192.168.1.100:6789/0 
192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0
c=0x38c0dc0).fault






I can no longer start my OSDs :-@


failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf 
--name=osd.6
--keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush create-or-move 
--

6 3.63 host=fu root=default'


Please help!!!

George


ceph mon add stops at this:


[jin][INFO  ] Running command: sudo ceph mon getmap -o
/var/lib/ceph/tmp/ceph.raijin.monmap


and never gets over it!


Any help??

Thanks,


George


Guyn any help much appreciated because my cluster is down :-(

After trying ceph mon add which didn't complete since it was stuck
for ever here:

[jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0 
monclient:

hunting for new mon
^CKilled by signal 2.
[ceph_deploy][ERROR ] KeyboardInterrupt


the previously healthy node is now down completely :-(

$ ceph mon stat
2015-03-14 07:21:37.782360 7ff2545b1700  0 -- 
192.168.1.100:0/1042061
 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 
l=1

c=0x7ff248000e90).fault
^CError connecting to cluster: InterruptedOrTimeoutError


Any ideas??


All the best,

George




Georgeos

, you need to have deployment server and cd into folder that 
you

used originaly while deploying CEPH - in this folder you should
already have ceph.conf, admin.client keyring and other stuff - 
which
is required to to connect to cluster...and provision new MONs or 
OSDs,

etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 
new to

create a new cluster...

...means (if Im not mistaken) that you are runnign ceph-deploy 
from

NOT original folder...

On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:


Not a firewall problem!! Firewall is disabled ...

Loic I ve tried mon create because of this:



http://ceph.com/docs/v0.80.5/start/quick-ceph

Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

This is the output from CEPH HEALTH


# ceph health
2015-03-14 09:16:54.435458 7f507843b700  0 -- :/1048223  
15.12.6.21:6789/0 pipe(0x7f5074022250 sd=3 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f50740224e0).fault
2015-03-14 09:16:57.433435 7f507833a700  0 -- :/1048223  
192.168.1.100:6789/0 pipe(0x7f5068000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f5068000e90).fault
2015-03-14 09:17:00.434317 7f507843b700  0 -- :/1048223  
15.12.6.21:6789/0 pipe(0x7f5068003010 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f50680032a0).fault
2015-03-14 09:17:03.434074 7f507833a700  0 -- :/1048223  
192.168.1.100:6789/0 pipe(0x7f5068003830 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f5068003ac0).fault
2015-03-14 09:17:06.434936 7f507843b700  0 -- :/1048223  
15.12.6.21:6789/0 pipe(0x7f5068002680 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7f5068002910).fault



Why is it trying to listen on internal network IP address for one 
monitor and on the external for the other?



Best,

George


Not a healthy monitor means that I can not get a monmap from none of 
them!


and none of the commands ceph health etc. are working.

Best,

George


Yes Sage!

Priority is to fix things!

Right now I don't have a healthy monitor!

Can I remove all of them and add the first one from scratch?

What would that mean about the data??

Best,

George


On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:

This is the message that is flooding the ceph-mon.log now:


 2015-03-14 08:16:39.286823 7f9f6920b700  1
 mon.fu@0(electing).elector(1) init, last seen epoch 1
 2015-03-14 08:16:42.736674 7f9f6880a700  1 mon.fu@0(electing) e2
 adding peer 15.12.6.21:6789/0 to list of hints
 2015-03-14 08:16:42.737891 7f9f6880a700  1
 mon.fu@0(electing).elector(1) discarding election message:
 15.12.6.21:6789/0
 not in my monmap e2: 2 mons at
 {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}


It sounds like you need to follow some variation of this procedure:



http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

..although it may be that simply killing the daemon running on 
15.12.6.21

and restarting the other mon daemons will be enough.  If not, the
procedure linked above will let tyou remove all traces of it and 
get

things up again.

Not quite sure where things went awry but I assume the priority is 
to get

things working first and figure that out later!

sage





 George


 This is the log for monitor (ceph-mon.log) when I try to restart 
the

 monitor:


 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 
***

 Got Signal Terminated ***
 2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2
 shutdown
 2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service 
shutdown

 2015-03-14 07:47:26.384657 7f1f1dc0f700  0
 mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
 services
 2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service 
shutdown

 2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9
 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, 
pid

 17050
 2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 
0 at

 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
 a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2
 preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.704504 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
 upgraded, format 0 - 1
 2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg 
v0

 on_upgrade discarding in-core PGMap
 2015-03-14 07:47:27.837060 7fc04b4437a0  0 
mon.fu@-1(probing).mds

 e104 print_map
 epoch  104
 flags  0
 created2014-11-30 01:58:17.176938
 modified   2015-03-14 06:07:05.683239
 tableserver0
 root   0
 session_timeout60
 session_autoclose  300
 max_file_size  1099511627776
 last_failure   0
 last_failure_osd_epoch 1760
 compat compat={},rocompat={},incompat={1=base v0.20,2=client
 writeable ranges,3=default file layouts on dirs,4=dir inode in
 separate object,5=mds uses versioned encoding,6=dirfrag is 
stored in

 omap}
 max_mds1
 in 0
 up {0=59315}
 failed
 stopped
 data_pools 3
 metadata_pool  4
 inline_datadisabled
 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9

 2015-03-14 07:47:27.837972 7fc04b4437a0  0 
mon.fu@-1(probing).osd

 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837990 7fc04b4437a0  0 
mon.fu@-1(probing).osd

 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837996 7fc04b4437a0  0 
mon.fu@-1(probing).osd

 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.838003 7fc04b4437a0  0 
mon.fu@-1(probing).osd

 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.839054 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh 
upgraded,

 format 0 - 1
 2015-03-14 07:47

Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

I can no longer start my OSDs :-@


failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.6 
--keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush create-or-move -- 6 
3.63 host=fu root=default'



Please help!!!

George


ceph mon add stops at this:


[jin][INFO  ] Running command: sudo ceph mon getmap -o
/var/lib/ceph/tmp/ceph.raijin.monmap


and never gets over it!


Any help??

Thanks,


George


Guyn any help much appreciated because my cluster is down :-(

After trying ceph mon add which didn't complete since it was stuck
for ever here:

[jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0 monclient:
hunting for new mon
^CKilled by signal 2.
[ceph_deploy][ERROR ] KeyboardInterrupt


the previously healthy node is now down completely :-(

$ ceph mon stat
2015-03-14 07:21:37.782360 7ff2545b1700  0 -- 
192.168.1.100:0/1042061
 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 
l=1

c=0x7ff248000e90).fault
^CError connecting to cluster: InterruptedOrTimeoutError


Any ideas??


All the best,

George




Georgeos

, you need to have deployment server and cd into folder that you
used originaly while deploying CEPH - in this folder you should
already have ceph.conf, admin.client keyring and other stuff - 
which
is required to to connect to cluster...and provision new MONs or 
OSDs,

etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new 
to

create a new cluster...

...means (if Im not mistaken) that you are runnign ceph-deploy from
NOT original folder...

On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:


Not a firewall problem!! Firewall is disabled ...

Loic I ve tried mon create because of this:



http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors

[4]

Should I first create and then add?? What is the proper order???
Should I do it from the already existing monitor node or can I run
it from the new one?

If I try add from the beginning I am getting this:

ceph_deploy.conf][DEBUG ] found configuration file at:
/home/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
mon add jin
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new
to create a new cluster

Regards,

George


Hi,

I think ceph-deploy mon add (instead of create) is what you
should be using.

Cheers

On 13/03/2015 22:25, Georgios Dimitrakakis wrote:


On an already available cluster I ve tried to add a new monitor!

I have used ceph-deploy mon create {NODE}

where {NODE}=the name of the node

and then I restarted the /etc/init.d/ceph service with a
success at the node
where it showed that the monitor is running like:

# /etc/init.d/ceph restart
=== mon.jin ===
=== mon.jin ===
Stopping Ceph mon.jin on jin...kill 36388...done
=== mon.jin ===
Starting Ceph mon.jin on jin...
Starting ceph-create-keys on jin...

But checking the quorum it doesnt show the newly added monitor!

Plus ceph mon stat gives out only 1 monitor!!!

# ceph mon stat
e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch 1,
quorum 0 fu

Any ideas on what have I done wrong???

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [2]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]

___
ceph-users mailing list
ceph-users@lists.ceph.com [5]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis
This is the log for monitor (ceph-mon.log) when I try to restart the 
monitor:



2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 *** Got 
Signal Terminated ***
2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2 
shutdown

2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service shutdown
2015-03-14 07:47:26.384657 7f1f1dc0f700  0 mon.fu@0(shutdown).health(0) 
HealthMonitor::service_shutdown 1 services

2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service shutdown
2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9 
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid 17050
2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 0 at 
192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid 
a1132ec2-7104-4e8e-a3d5-95965cae9138
2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2 
preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
2015-03-14 07:47:27.704504 7fc04b4437a0  1 
mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh upgraded, 
format 0 - 1
2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg v0 
on_upgrade discarding in-core PGMap
2015-03-14 07:47:27.837060 7fc04b4437a0  0 mon.fu@-1(probing).mds e104 
print_map

epoch   104
flags   0
created 2014-11-30 01:58:17.176938
modified2015-03-14 06:07:05.683239
tableserver 0
root0
session_timeout 60
session_autoclose   300
max_file_size   1099511627776
last_failure0
last_failure_osd_epoch  1760
compat	compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate 
object,5=mds uses versioned encoding,6=dirfrag is stored in omap}

max_mds 1
in  0
up  {0=59315}
failed
stopped
data_pools  3
metadata_pool   4
inline_data disabled
59315:  15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9

2015-03-14 07:47:27.837972 7fc04b4437a0  0 mon.fu@-1(probing).osd e1768 
crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.837990 7fc04b4437a0  0 mon.fu@-1(probing).osd e1768 
crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.837996 7fc04b4437a0  0 mon.fu@-1(probing).osd e1768 
crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.838003 7fc04b4437a0  0 mon.fu@-1(probing).osd e1768 
crush map has features 1107558400, adjusting msgr requires
2015-03-14 07:47:27.839054 7fc04b4437a0  1 
mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded, 
format 0 - 1
2015-03-14 07:47:27.840052 7fc04b4437a0  0 mon.fu@-1(probing) e2  my 
rank is now 0 (was -1)
2015-03-14 07:47:27.840512 7fc045ef5700  0 -- 192.168.1.100:6789/0  
192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0 
c=0x38c0dc0).fault







I can no longer start my OSDs :-@


failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.6
--keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush create-or-move 
--

6 3.63 host=fu root=default'


Please help!!!

George


ceph mon add stops at this:


[jin][INFO  ] Running command: sudo ceph mon getmap -o
/var/lib/ceph/tmp/ceph.raijin.monmap


and never gets over it!


Any help??

Thanks,


George


Guyn any help much appreciated because my cluster is down :-(

After trying ceph mon add which didn't complete since it was stuck
for ever here:

[jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0 monclient:
hunting for new mon
^CKilled by signal 2.
[ceph_deploy][ERROR ] KeyboardInterrupt


the previously healthy node is now down completely :-(

$ ceph mon stat
2015-03-14 07:21:37.782360 7ff2545b1700  0 -- 
192.168.1.100:0/1042061
 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 
l=1

c=0x7ff248000e90).fault
^CError connecting to cluster: InterruptedOrTimeoutError


Any ideas??


All the best,

George




Georgeos

, you need to have deployment server and cd into folder that you
used originaly while deploying CEPH - in this folder you should
already have ceph.conf, admin.client keyring and other stuff - 
which
is required to to connect to cluster...and provision new MONs or 
OSDs,

etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new 
to

create a new cluster...

...means (if Im not mistaken) that you are runnign ceph-deploy 
from

NOT original folder...

On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:


Not a firewall problem!! Firewall is disabled ...

Loic I ve tried mon create because of this:



http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors

[4]

Should I first create and then add?? What is the proper order???
Should I do it from the already existing monitor node or can I 
run

it from the new one?

If I try add from the beginning I am getting this:

ceph_deploy.conf][DEBUG ] found configuration file at:
/home/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
mon add jin
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run 
new

to create

Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis
Not a healthy monitor means that I can not get a monmap from none of 
them!


and none of the commands ceph health etc. are working.

Best,

George


Yes Sage!

Priority is to fix things!

Right now I don't have a healthy monitor!

Can I remove all of them and add the first one from scratch?

What would that mean about the data??

Best,

George


On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:

This is the message that is flooding the ceph-mon.log now:


 2015-03-14 08:16:39.286823 7f9f6920b700  1
 mon.fu@0(electing).elector(1) init, last seen epoch 1
 2015-03-14 08:16:42.736674 7f9f6880a700  1 mon.fu@0(electing) e2
 adding peer 15.12.6.21:6789/0 to list of hints
 2015-03-14 08:16:42.737891 7f9f6880a700  1
 mon.fu@0(electing).elector(1) discarding election message:
 15.12.6.21:6789/0
 not in my monmap e2: 2 mons at
 {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}


It sounds like you need to follow some variation of this procedure:



http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

..although it may be that simply killing the daemon running on 
15.12.6.21

and restarting the other mon daemons will be enough.  If not, the
procedure linked above will let tyou remove all traces of it and get
things up again.

Not quite sure where things went awry but I assume the priority is 
to get

things working first and figure that out later!

sage





 George


 This is the log for monitor (ceph-mon.log) when I try to restart 
the

 monitor:


 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 
***

 Got Signal Terminated ***
 2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2
 shutdown
 2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service 
shutdown

 2015-03-14 07:47:26.384657 7f1f1dc0f700  0
 mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
 services
 2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service 
shutdown

 2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9
 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid
 17050
 2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 0 
at

 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
 a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2
 preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.704504 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
 upgraded, format 0 - 1
 2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg 
v0

 on_upgrade discarding in-core PGMap
 2015-03-14 07:47:27.837060 7fc04b4437a0  0 mon.fu@-1(probing).mds
 e104 print_map
 epoch  104
 flags  0
 created2014-11-30 01:58:17.176938
 modified   2015-03-14 06:07:05.683239
 tableserver0
 root   0
 session_timeout60
 session_autoclose  300
 max_file_size  1099511627776
 last_failure   0
 last_failure_osd_epoch 1760
 compat compat={},rocompat={},incompat={1=base v0.20,2=client
 writeable ranges,3=default file layouts on dirs,4=dir inode in
 separate object,5=mds uses versioned encoding,6=dirfrag is stored 
in

 omap}
 max_mds1
 in 0
 up {0=59315}
 failed
 stopped
 data_pools 3
 metadata_pool  4
 inline_datadisabled
 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9

 2015-03-14 07:47:27.837972 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837990 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837996 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.838003 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.839054 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh 
upgraded,

 format 0 - 1
 2015-03-14 07:47:27.840052 7fc04b4437a0  0 mon.fu@-1(probing) e2  
my

 rank is now 0 (was -1)
 2015-03-14 07:47:27.840512 7fc045ef5700  0 -- 
192.168.1.100:6789/0 

 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0
 c=0x38c0dc0).fault





 I can no longer start my OSDs :-@


 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
 --name=osd.6
 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush 
create-or-move

 --
 6 3.63 host=fu root=default'


 Please help!!!

 George

 ceph mon add stops at this:


 [jin][INFO  ] Running command: sudo ceph mon getmap -o
 /var/lib/ceph/tmp/ceph.raijin.monmap


 and never gets over it!


 Any help??

 Thanks,


 George

 Guyn any help much appreciated because my cluster is down :-(

 After trying ceph mon add which didn't complete since it was 
stuck

 for ever here:

 [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0
 monclient:
 hunting for new mon
 ^CKilled by signal 2

Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

Guyn any help much appreciated because my cluster is down :-(

After trying ceph mon add which didn't complete since it was stuck for 
ever here:


[jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0 monclient: 
hunting for new mon

^CKilled by signal 2.
[ceph_deploy][ERROR ] KeyboardInterrupt


the previously healthy node is now down completely :-(

$ ceph mon stat
2015-03-14 07:21:37.782360 7ff2545b1700  0 -- 192.168.1.100:0/1042061 
 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 l=1 
c=0x7ff248000e90).fault

^CError connecting to cluster: InterruptedOrTimeoutError


Any ideas??


All the best,

George




Georgeos

, you need to have deployment server and cd into folder that you
used originaly while deploying CEPH - in this folder you should
already have ceph.conf, admin.client keyring and other stuff - which
is required to to connect to cluster...and provision new MONs or 
OSDs,

etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new to
create a new cluster...

...means (if Im not mistaken) that you are runnign ceph-deploy from
NOT original folder...

On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:


Not a firewall problem!! Firewall is disabled ...

Loic I ve tried mon create because of this:


http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors

[4]

Should I first create and then add?? What is the proper order???
Should I do it from the already existing monitor node or can I run
it from the new one?

If I try add from the beginning I am getting this:

ceph_deploy.conf][DEBUG ] found configuration file at:
/home/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
mon add jin
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new
to create a new cluster

Regards,

George


Hi,

I think ceph-deploy mon add (instead of create) is what you
should be using.

Cheers

On 13/03/2015 22:25, Georgios Dimitrakakis wrote:


On an already available cluster I ve tried to add a new monitor!

I have used ceph-deploy mon create {NODE}

where {NODE}=the name of the node

and then I restarted the /etc/init.d/ceph service with a
success at the node
where it showed that the monitor is running like:

# /etc/init.d/ceph restart
=== mon.jin ===
=== mon.jin ===
Stopping Ceph mon.jin on jin...kill 36388...done
=== mon.jin ===
Starting Ceph mon.jin on jin...
Starting ceph-create-keys on jin...

But checking the quorum it doesnt show the newly added monitor!

Plus ceph mon stat gives out only 1 monitor!!!

# ceph mon stat
e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch 1,
quorum 0 fu

Any ideas on what have I done wrong???

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [2]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]

___
ceph-users mailing list
ceph-users@lists.ceph.com [5]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

ceph mon add stops at this:


[jin][INFO  ] Running command: sudo ceph mon getmap -o 
/var/lib/ceph/tmp/ceph.raijin.monmap



and never gets over it!


Any help??

Thanks,


George


Guyn any help much appreciated because my cluster is down :-(

After trying ceph mon add which didn't complete since it was stuck
for ever here:

[jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0 monclient:
hunting for new mon
^CKilled by signal 2.
[ceph_deploy][ERROR ] KeyboardInterrupt


the previously healthy node is now down completely :-(

$ ceph mon stat
2015-03-14 07:21:37.782360 7ff2545b1700  0 -- 192.168.1.100:0/1042061
 192.168.1.101:6789/0 pipe(0x7ff248000c00 sd=4 :0 s=1 pgs=0 cs=0 
l=1

c=0x7ff248000e90).fault
^CError connecting to cluster: InterruptedOrTimeoutError


Any ideas??


All the best,

George




Georgeos

, you need to have deployment server and cd into folder that you
used originaly while deploying CEPH - in this folder you should
already have ceph.conf, admin.client keyring and other stuff - which
is required to to connect to cluster...and provision new MONs or 
OSDs,

etc.

Message:
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new 
to

create a new cluster...

...means (if Im not mistaken) that you are runnign ceph-deploy from
NOT original folder...

On 13 March 2015 at 23:03, Georgios Dimitrakakis  wrote:


Not a firewall problem!! Firewall is disabled ...

Loic I ve tried mon create because of this:



http://ceph.com/docs/v0.80.5/start/quick-ceph-deploy/#adding-monitors

[4]

Should I first create and then add?? What is the proper order???
Should I do it from the already existing monitor node or can I run
it from the new one?

If I try add from the beginning I am getting this:

ceph_deploy.conf][DEBUG ] found configuration file at:
/home/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.22): /usr/bin/ceph-deploy
mon add jin
[ceph_deploy][ERROR ] RuntimeError: mon keyring not found; run new
to create a new cluster

Regards,

George


Hi,

I think ceph-deploy mon add (instead of create) is what you
should be using.

Cheers

On 13/03/2015 22:25, Georgios Dimitrakakis wrote:


On an already available cluster I ve tried to add a new monitor!

I have used ceph-deploy mon create {NODE}

where {NODE}=the name of the node

and then I restarted the /etc/init.d/ceph service with a
success at the node
where it showed that the monitor is running like:

# /etc/init.d/ceph restart
=== mon.jin ===
=== mon.jin ===
Stopping Ceph mon.jin on jin...kill 36388...done
=== mon.jin ===
Starting Ceph mon.jin on jin...
Starting ceph-create-keys on jin...

But checking the quorum it doesnt show the newly added monitor!

Plus ceph mon stat gives out only 1 monitor!!!

# ceph mon stat
e1: 1 mons at {fu=MAILSCANNER WARNING: NUMERICAL LINKS ARE
OFTEN MALICIOUS: 192.168.1.100:6789/0 [1]}, election epoch 1,
quorum 0 fu

Any ideas on what have I done wrong???

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [2]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]

___
ceph-users mailing list
ceph-users@lists.ceph.com [5]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Adding Monitor

2015-03-13 Thread Georgios Dimitrakakis

Yes Sage!

Priority is to fix things!

Right now I don't have a healthy monitor!

Can I remove all of them and add the first one from scratch?

What would that mean about the data??

Best,

George


On Sat, 14 Mar 2015, Georgios Dimitrakakis wrote:

This is the message that is flooding the ceph-mon.log now:


 2015-03-14 08:16:39.286823 7f9f6920b700  1
 mon.fu@0(electing).elector(1) init, last seen epoch 1
 2015-03-14 08:16:42.736674 7f9f6880a700  1 mon.fu@0(electing) e2
 adding peer 15.12.6.21:6789/0 to list of hints
 2015-03-14 08:16:42.737891 7f9f6880a700  1
 mon.fu@0(electing).elector(1) discarding election message:
 15.12.6.21:6789/0
 not in my monmap e2: 2 mons at
 {fu=192.168.1.100:6789/0,jin=192.168.1.101:6789/0}


It sounds like you need to follow some variation of this procedure:



http://docs.ceph.com/docs/master/rados/operations/add-or-rm-mons/#removing-monitors-from-an-unhealthy-cluster

..although it may be that simply killing the daemon running on 
15.12.6.21

and restarting the other mon daemons will be enough.  If not, the
procedure linked above will let tyou remove all traces of it and get
things up again.

Not quite sure where things went awry but I assume the priority is to 
get

things working first and figure that out later!

sage





 George


 This is the log for monitor (ceph-mon.log) when I try to restart 
the

 monitor:


 2015-03-14 07:47:26.384561 7f1f1dc0f700 -1 mon.fu@0(probing) e2 
***

 Got Signal Terminated ***
 2015-03-14 07:47:26.384593 7f1f1dc0f700  1 mon.fu@0(probing) e2
 shutdown
 2015-03-14 07:47:26.384654 7f1f1dc0f700  0 quorum service shutdown
 2015-03-14 07:47:26.384657 7f1f1dc0f700  0
 mon.fu@0(shutdown).health(0) HealthMonitor::service_shutdown 1
 services
 2015-03-14 07:47:26.384665 7f1f1dc0f700  0 quorum service shutdown
 2015-03-14 07:47:27.620670 7fc04b4437a0  0 ceph version 0.80.9
 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-mon, pid
 17050
 2015-03-14 07:47:27.703151 7fc04b4437a0  0 starting mon.fu rank 0 
at

 192.168.1.100:6789/0 mon_data /var/lib/ceph/mon/ceph-fu fsid
 a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.703421 7fc04b4437a0  1 mon.fu@-1(probing) e2
 preinit fsid a1132ec2-7104-4e8e-a3d5-95965cae9138
 2015-03-14 07:47:27.704504 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(pgmap 897493..898204) refresh
 upgraded, format 0 - 1
 2015-03-14 07:47:27.704525 7fc04b4437a0  1 mon.fu@-1(probing).pg 
v0

 on_upgrade discarding in-core PGMap
 2015-03-14 07:47:27.837060 7fc04b4437a0  0 mon.fu@-1(probing).mds
 e104 print_map
 epoch  104
 flags  0
 created2014-11-30 01:58:17.176938
 modified   2015-03-14 06:07:05.683239
 tableserver0
 root   0
 session_timeout60
 session_autoclose  300
 max_file_size  1099511627776
 last_failure   0
 last_failure_osd_epoch 1760
 compat compat={},rocompat={},incompat={1=base v0.20,2=client
 writeable ranges,3=default file layouts on dirs,4=dir inode in
 separate object,5=mds uses versioned encoding,6=dirfrag is stored 
in

 omap}
 max_mds1
 in 0
 up {0=59315}
 failed
 stopped
 data_pools 3
 metadata_pool  4
 inline_datadisabled
 59315: 15.12.6.21:6800/26628 'fu' mds.0.21 up:active seq 9

 2015-03-14 07:47:27.837972 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837990 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.837996 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.838003 7fc04b4437a0  0 mon.fu@-1(probing).osd
 e1768 crush map has features 1107558400, adjusting msgr requires
 2015-03-14 07:47:27.839054 7fc04b4437a0  1
 mon.fu@-1(probing).paxosservice(auth 2751..2829) refresh upgraded,
 format 0 - 1
 2015-03-14 07:47:27.840052 7fc04b4437a0  0 mon.fu@-1(probing) e2  
my

 rank is now 0 (was -1)
 2015-03-14 07:47:27.840512 7fc045ef5700  0 -- 192.168.1.100:6789/0 


 192.168.1.101:6789/0 pipe(0x3958780 sd=13 :0 s=1 pgs=0 cs=0 l=0
 c=0x38c0dc0).fault





 I can no longer start my OSDs :-@


 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf
 --name=osd.6
 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd crush 
create-or-move

 --
 6 3.63 host=fu root=default'


 Please help!!!

 George

 ceph mon add stops at this:


 [jin][INFO  ] Running command: sudo ceph mon getmap -o
 /var/lib/ceph/tmp/ceph.raijin.monmap


 and never gets over it!


 Any help??

 Thanks,


 George

 Guyn any help much appreciated because my cluster is down :-(

 After trying ceph mon add which didn't complete since it was 
stuck

 for ever here:

 [jin][WARNIN] 2015-03-14 07:07:14.964265 7fb4be6f5700  0
 monclient:
 hunting for new mon
 ^CKilled by signal 2.
 [ceph_deploy][ERROR ] KeyboardInterrupt


 the previously healthy node is now down completely :-(

 $ ceph mon stat
 2015-03-14 07:21:37.782360 7ff2545b1700  0

Re: [ceph-users] {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis

Andrija,

thanks a lot for the useful info!

I would also like to thank Kingrat at the IRC channel for his useful 
advice!



I was under the wrong impression that public is the one used for RADOS.

So I thought that public=external=internet and therefore I used that 
one in my conf.


I understand now that I should have specified in CEPH Public's Network 
what I call
internal and which is the one that all machines are talking directly 
to each other.



Thanks you all for the feedback!


Regards,


George



Public network is clients-to-OSD traffic - and if you have NOT
explicitely defined cluster network, than also OSD-to-OSD replication
takes place over same network.

Otherwise, you can define public and cluster(private) network - so 
OSD
replication will happen over dedicated NICs (cluster network) and 
thus

speed up.

If i.e. replica count on pool is 3, that means, each 1GB of data
writen to some particualr OSD, will generate 3 x 1GB of more writes,
to the replicas... - which ideally will take place over separate NICs
to speed up things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and transferring data?

I have two nodes with two IP addresses each. One for internal
network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
192.168.1.0/24 [1]
and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
osd.1 is down since epoch 2206, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
osd.2 is down since epoch 2198, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
osd.3 is down since epoch 2200, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
osd.4 is down since epoch 2202, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
osd.5 is down since epoch 2194, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
osd.7 is down since epoch 2192, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
osd.8 is down since epoch 2196, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9]

I ve managed to add a second node and during rebalancing I see that
data is transfered through
the internal 192.* but the external link is also saturated!

What is being transferred from that?

Any help much appreciated!

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [10]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]


--

Andrija Panić

Links:
--
[1] http://192.168.1.0/24
[2] http://15.12.6.21:6826/33094
[3] http://15.12.6.21:6817/32463
[4] http://15.12.6.21:6843/34921
[5] http://15.12.6.21:6838/34208
[6] http://15.12.6.21:6831/33610
[7] http://15.12.6.21:6858/35948
[8] http://15.12.6.21:6871/36720
[9] http://15.12.6.21:6855/35354
[10] mailto:ceph-users@lists.ceph.com
[11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[12] mailto:gior...@acmac.uoc.gr


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and transferring data?

I have two nodes with two IP addresses each. One for internal network 
192.168.1.0/24

and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address 15.12.6.21:6826/33094
osd.1 is down since epoch 2206, last address 15.12.6.21:6817/32463
osd.2 is down since epoch 2198, last address 15.12.6.21:6843/34921
osd.3 is down since epoch 2200, last address 15.12.6.21:6838/34208
osd.4 is down since epoch 2202, last address 15.12.6.21:6831/33610
osd.5 is down since epoch 2194, last address 15.12.6.21:6858/35948
osd.7 is down since epoch 2192, last address 15.12.6.21:6871/36720
osd.8 is down since epoch 2196, last address 15.12.6.21:6855/35354


I 've managed to add a second node and during rebalancing I see that 
data is transfered through

the internal 192.* but the external link is also saturated!

What is being transferred from that?


Any help much appreciated!

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis

I thought that it was easy but apparently it's not!

I have the following in my conf file


mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
public_network = 192.168.1.0/24
mon_initial_members = fu,rai,jin


but still the 15.12.6.21 link is being saturated

Any ideas why???

Should I put cluster network as well??

Should I put each OSD in the CONF file???


Regards,


George






Andrija,

thanks a lot for the useful info!

I would also like to thank Kingrat at the IRC channel for his
useful advice!


I was under the wrong impression that public is the one used for 
RADOS.


So I thought that public=external=internet and therefore I used that
one in my conf.

I understand now that I should have specified in CEPH Public's
Network what I call
internal and which is the one that all machines are talking
directly to each other.


Thanks you all for the feedback!


Regards,


George



Public network is clients-to-OSD traffic - and if you have NOT
explicitely defined cluster network, than also OSD-to-OSD 
replication

takes place over same network.

Otherwise, you can define public and cluster(private) network - so 
OSD
replication will happen over dedicated NICs (cluster network) and 
thus

speed up.

If i.e. replica count on pool is 3, that means, each 1GB of data
writen to some particualr OSD, will generate 3 x 1GB of more writes,
to the replicas... - which ideally will take place over separate 
NICs

to speed up things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and transferring data?

I have two nodes with two IP addresses each. One for internal
network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
192.168.1.0/24 [1]
and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
osd.1 is down since epoch 2206, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
osd.2 is down since epoch 2198, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
osd.3 is down since epoch 2200, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
osd.4 is down since epoch 2202, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
osd.5 is down since epoch 2194, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
osd.7 is down since epoch 2192, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
osd.8 is down since epoch 2196, last address MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9]

I ve managed to add a second node and during rebalancing I see that
data is transfered through
the internal 192.* but the external link is also saturated!

What is being transferred from that?

Any help much appreciated!

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [10]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]


--

Andrija Panić

Links:
--
[1] http://192.168.1.0/24
[2] http://15.12.6.21:6826/33094
[3] http://15.12.6.21:6817/32463
[4] http://15.12.6.21:6843/34921
[5] http://15.12.6.21:6838/34208
[6] http://15.12.6.21:6831/33610
[7] http://15.12.6.21:6858/35948
[8] http://15.12.6.21:6871/36720
[9] http://15.12.6.21:6855/35354
[10] mailto:ceph-users@lists.ceph.com
[11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[12] mailto:gior...@acmac.uoc.gr


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis

Andrija,

Thanks for you help!

In my case I just have one 192.* network, so should I put that for 
both?


Besides monitors do I have to list OSDs as well?

Thanks again!

Best,

George


This is how I did it, and then retart each OSD one by one, but
monritor with ceph -s, when ceph is healthy, proceed with next OSD
restart...
Make sure the networks are fine on physical nodes, that you can ping
in between...

[global]
x
x
x
x
x
x

#
### REPLICATION NETWORK ON SEPARATE 10G NICs

# replication network
cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.251.0/24 [29]

# public/client network
public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.253.0/16 [30]

#

[mon.xx]
mon_addr = x.x.x.x:6789
host = xx

[mon.yy]
mon_addr = x.x.x.x:6789
host = yy

[mon.zz]
mon_addr = x.x.x.x:6789
host = zz

On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:


I thought that it was easy but apparently its not!

I have the following in my conf file

mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 192.168.1.0/24 [26]
mon_initial_members = fu,rai,jin

but still the 15.12.6.21 link is being saturated

Any ideas why???

Should I put cluster network as well??

Should I put each OSD in the CONF file???

Regards,

George


Andrija,

thanks a lot for the useful info!

I would also like to thank Kingrat at the IRC channel for his
useful advice!

I was under the wrong impression that public is the one used for
RADOS.

So I thought that public=external=internet and therefore I used
that
one in my conf.

I understand now that I should have specified in CEPH Publics
Network what I call
internal and which is the one that all machines are talking
directly to each other.

Thanks you all for the feedback!

Regards,

George


Public network is clients-to-OSD traffic - and if you have NOT
explicitely defined cluster network, than also OSD-to-OSD
replication
takes place over same network.

Otherwise, you can define public and cluster(private) network -
so OSD
replication will happen over dedicated NICs (cluster network)
and thus
speed up.

If i.e. replica count on pool is 3, that means, each 1GB of
data
writen to some particualr OSD, will generate 3 x 1GB of more
writes,
to the replicas... - which ideally will take place over
separate NICs
to speed up things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and transferring
data?

I have two nodes with two IP addresses each. One for internal
network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
192.168.1.0/24 [1] [1]
and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094 [2]
[2]
osd.1 is down since epoch 2206, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463 [3]
[3]
osd.2 is down since epoch 2198, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921 [4]
[4]
osd.3 is down since epoch 2200, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208 [5]
[5]
osd.4 is down since epoch 2202, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6831/33610 [6]
[6]
osd.5 is down since epoch 2194, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6858/35948 [7]
[7]
osd.7 is down since epoch 2192, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6871/36720 [8]
[8]
osd.8 is down since epoch 2196, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6855/35354 [9]
[9]

I ve managed to add a second node and during rebalancing I
see that
data is transfered through
the internal 192.* but the external link is also saturated!

What is being transferred from that?

Any help much appreciated!

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [10] [10]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]
[11]


--

Andrija Panić

Links:
--
[1] MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
http

Re: [ceph-users] {Disarmed} Re: {Disarmed} Re: Public Network Meaning

2015-03-14 Thread Georgios Dimitrakakis

Andrija,

I have two cards!

One on 15.12.* and one on 192.*

Obviously the 15.12.* is the external network (real public IP address 
e.g used to access the node via SSH)


That's why I am telling that my public network for CEPH is the 192. and 
should I use the cluster network for that as well?


Best,

George



Georgios,

no need to put ANYTHING if you dont plan to split client-to-OSD vs
OSD-OSD-replication on 2 different Network Cards/Networks - for
pefromance reasons.

if you have only 1 network - simply DONT configure networks at all
inside your CEPH.conf file...

if you have 2 x 1G cards in servers, then you may use first 1G for
client traffic, and second 1G for OSD-to-OSD replication...

best

On 14 March 2015 at 19:33, Georgios Dimitrakakis  wrote:


Andrija,

Thanks for you help!

In my case I just have one 192.* network, so should I put that for
both?

Besides monitors do I have to list OSDs as well?

Thanks again!

Best,

George


This is how I did it, and then retart each OSD one by one, but
monritor with ceph -s, when ceph is healthy, proceed with next
OSD
restart...
Make sure the networks are fine on physical nodes, that you can
ping
in between...

[global]
x
x
x
x
x
x

#
### REPLICATION NETWORK ON SEPARATE 10G NICs

# replication network
cluster network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.251.0/24 [29] [29]

# public/client network
public network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 10.44.253.0/16 [30] [30]

#

[mon.xx]
mon_addr = x.x.x.x:6789
host = xx

[mon.yy]
mon_addr = x.x.x.x:6789
host = yy

[mon.zz]
mon_addr = x.x.x.x:6789
host = zz

On 14 March 2015 at 19:14, Georgios Dimitrakakis  wrote:


I thought that it was easy but apparently its not!

I have the following in my conf file

mon_host = 192.168.1.100,192.168.1.101,192.168.1.102
public_network = MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS: 192.168.1.0/24 [26] [26]
mon_initial_members = fu,rai,jin

but still the 15.12.6.21 link is being saturated

Any ideas why???

Should I put cluster network as well??

Should I put each OSD in the CONF file???

Regards,

George


Andrija,

thanks a lot for the useful info!

I would also like to thank Kingrat at the IRC channel for
his
useful advice!

I was under the wrong impression that public is the one used
for
RADOS.

So I thought that public=external=internet and therefore I
used
that
one in my conf.

I understand now that I should have specified in CEPH Publics
Network what I call
internal and which is the one that all machines are talking
directly to each other.

Thanks you all for the feedback!

Regards,

George


Public network is clients-to-OSD traffic - and if you have
NOT
explicitely defined cluster network, than also OSD-to-OSD
replication
takes place over same network.

Otherwise, you can define public and cluster(private)
network -
so OSD
replication will happen over dedicated NICs (cluster
network)
and thus
speed up.

If i.e. replica count on pool is 3, that means, each 1GB of
data
writen to some particualr OSD, will generate 3 x 1GB of
more
writes,
to the replicas... - which ideally will take place over
separate NICs
to speed up things...

On 14 March 2015 at 17:43, Georgios Dimitrakakis  wrote:


Hi all!!

What is the meaning of public_network in ceph.conf?

Is it the network that OSDs are talking and transferring
data?

I have two nodes with two IP addresses each. One for
internal
network MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN
MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
MAILSCANNER WARNING: NUMERICAL LINKS ARE OFTEN MALICIOUS:
192.168.1.0/24 [1] [1] [1]
and one external 15.12.6.*

I see the following in my logs:

osd.0 is down since epoch 2204, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6826/33094
[2] [2]
[2]
osd.1 is down since epoch 2206, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6817/32463
[3] [3]
[3]
osd.2 is down since epoch 2198, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6843/34921
[4] [4]
[4]
osd.3 is down since epoch 2200, last address MAILSCANNER
WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: MAILSCANNER WARNING:
NUMERICAL LINKS ARE OFTEN MALICIOUS: 15.12.6.21:6838/34208
[5] [5]
[5]
osd.4 is down

Re: [ceph-users] Strange Monitor Appearance after Update

2015-03-13 Thread Georgios Dimitrakakis

I found out that there was a folder called

ceph-master_192.168.0.10

in

/var/lib/ceph/mon/


which was outdated!


I must have done something stupid in the configuration in the past
and it was created!

Strangely I haven't seen it appearing any time before and it only 
appeared

after the update from v.0.80.8 -- v.0.80.9


Anyway, removing it and restarting the services seems to have solved 
it!


I hope that I haven't done anything stupid :-)


Regards,


George



Having two monitors should not be causing the problem you are seeing
like you say. What is in /var/log/ceph/ceph.mon.*.log?

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 12, 2015 7:39 PM, Georgios Dimitrakakis  wrote:


Hi Robert!

Thanks for the feedback! I am aware of the fact that the number of
the monitors should be odd
but this is a very basic setup just to test CEPH functionality and
perform tasks there before
doing it to our production cluster.

So I am not concerned about that and I really dont believe that
this is why the problem has appeared!

What concerns me is how this new monitor that has the same name
followed by an underscore and
the IP address appeared out of nowhere and how to stop it!

Regards,

George


Two monitors dont work very well and really dont but you anything.
I
would either add another monitor or remove one. Paxos is most
effective with an odd number of monitors.

I dont know about the problem you are experiencing and how to
help
you. An even number of monitors should work.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 12, 2015 7:19 PM, Georgios Dimitrakakis  wrote:


I forgot to say that the monitors form a quorum and the clusters
health is OK
so there arent any serious troubles other than the annoying
message.

Best,

George


Hi all!

I have updated from 0.80.8 to 0.80.9 and every time I try to
restart
CEPH a monitor a strange monitor is appearing!

Here is the output:

#/etc/init.d/ceph restart mon
=== mon.master ===
=== mon.master ===
Stopping Ceph mon.master on master...kill 10766...done
=== mon.master ===
Starting Ceph mon.master on master...
Starting ceph-create-keys on master...
=== mon.master_192.168.0.10 ===
=== mon.master_192.168.0.10 ===
Stopping Ceph mon.master_192.168.0.10 on master...done
=== mon.master_192.168.0.10 ===
Starting Ceph mon.master_192.168.0.10 on master...
2015-03-13 03:06:22.964493 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 not in monmap and have
been in
a quorum before; must have been removed
2015-03-13 03:06:22.964497 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 commit suicide!
2015-03-13 03:06:22.964499 7f06256fa7a0 -1 failed to
initialize
failed: ulimit -n 32768;  /usr/bin/ceph-mon -i
master_192.168.0.10
--pid-file /var/run/ceph/mon.master_192.168.0.10.pid -c
/etc/ceph/ceph.conf --cluster ceph

I have two monitors which are:

mon.master and mon.client1

and have defined them in ceph.conf as:

mon_initial_members = master,client1
mon_host = 192.168.0.10,192.168.0.11

Why is the mon.master_192.168.0.10 appearing and how can I
stop
it
from happening?

The above is the problem on one node. Obviously the problem
is
appearing on the other node as well but instead I have

mon.client1_192.168.0.11 appearing

Any ideas?

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1] [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
[2]

___
ceph-users mailing list
ceph-users@lists.ceph.com [3] [3]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4] [4]


Links:
--
[1] mailto:ceph-users@lists.ceph.com [5]
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]
[3] mailto:ceph-users@lists.ceph.com [7]
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [8]
[5] mailto:gior...@acmac.uoc.gr [9]

___
ceph-users mailing list
ceph-users@lists.ceph.com [10]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [11]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:ceph-users@lists.ceph.com
[6] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[7] mailto:ceph-users@lists.ceph.com
[8] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[9] mailto:gior...@acmac.uoc.gr
[10] mailto:ceph-users@lists.ceph.com
[11] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[12] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Strange Monitor Appearance after Update

2015-03-12 Thread Georgios Dimitrakakis

Hi all!

I have updated from 0.80.8 to 0.80.9 and every time I try to restart 
CEPH a monitor a strange monitor is appearing!


Here is the output:


#/etc/init.d/ceph restart mon
=== mon.master ===
=== mon.master ===
Stopping Ceph mon.master on master...kill 10766...done
=== mon.master ===
Starting Ceph mon.master on master...
Starting ceph-create-keys on master...
=== mon.master_192.168.0.10 ===
=== mon.master_192.168.0.10 ===
Stopping Ceph mon.master_192.168.0.10 on master...done
=== mon.master_192.168.0.10 ===
Starting Ceph mon.master_192.168.0.10 on master...
2015-03-13 03:06:22.964493 7f06256fa7a0 -1 
mon.master_192.168.0.10@-1(probing) e2 not in monmap and have been in a 
quorum before; must have been removed
2015-03-13 03:06:22.964497 7f06256fa7a0 -1 
mon.master_192.168.0.10@-1(probing) e2 commit suicide!

2015-03-13 03:06:22.964499 7f06256fa7a0 -1 failed to initialize
failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i master_192.168.0.10 
--pid-file /var/run/ceph/mon.master_192.168.0.10.pid -c 
/etc/ceph/ceph.conf --cluster ceph '



I have two monitors which are:

mon.master and mon.client1

and have defined them in ceph.conf as:

mon_initial_members = master,client1
mon_host = 192.168.0.10,192.168.0.11



Why is the mon.master_192.168.0.10 appearing and how can I stop it 
from happening?



The above is the problem on one node. Obviously the problem is 
appearing on the other node as well but instead I have


mon.client1_192.168.0.11 appearing



Any ideas?


Regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Strange Monitor Appearance after Update

2015-03-12 Thread Georgios Dimitrakakis
I forgot to say that the monitors form a quorum and the cluster's 
health is OK

so there aren't any serious troubles other than the annoying message.

Best,

George


Hi all!

I have updated from 0.80.8 to 0.80.9 and every time I try to restart
CEPH a monitor a strange monitor is appearing!

Here is the output:


#/etc/init.d/ceph restart mon
=== mon.master ===
=== mon.master ===
Stopping Ceph mon.master on master...kill 10766...done
=== mon.master ===
Starting Ceph mon.master on master...
Starting ceph-create-keys on master...
=== mon.master_192.168.0.10 ===
=== mon.master_192.168.0.10 ===
Stopping Ceph mon.master_192.168.0.10 on master...done
=== mon.master_192.168.0.10 ===
Starting Ceph mon.master_192.168.0.10 on master...
2015-03-13 03:06:22.964493 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 not in monmap and have been in
a quorum before; must have been removed
2015-03-13 03:06:22.964497 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 commit suicide!
2015-03-13 03:06:22.964499 7f06256fa7a0 -1 failed to initialize
failed: 'ulimit -n 32768;  /usr/bin/ceph-mon -i master_192.168.0.10
--pid-file /var/run/ceph/mon.master_192.168.0.10.pid -c
/etc/ceph/ceph.conf --cluster ceph '


I have two monitors which are:

mon.master and mon.client1

and have defined them in ceph.conf as:

mon_initial_members = master,client1
mon_host = 192.168.0.10,192.168.0.11



Why is the mon.master_192.168.0.10 appearing and how can I stop it
from happening?


The above is the problem on one node. Obviously the problem is
appearing on the other node as well but instead I have

mon.client1_192.168.0.11 appearing



Any ideas?


Regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Strange Monitor Appearance after Update

2015-03-12 Thread Georgios Dimitrakakis

Hi Robert!

Thanks for the feedback! I am aware of the fact that the number of the 
monitors should be odd
but this is a very basic setup just to test CEPH functionality and 
perform tasks there before

doing it to our production cluster.

So I am not concerned about that and I really don't believe that this 
is why the problem has appeared!


What concerns me is how this new monitor that has the same name 
followed by an underscore and

the IP address appeared out of nowhere and how to stop it!

Regards,

George


Two monitors dont work very well and really dont but you anything. I
would either add another monitor or remove one. Paxos is most
effective with an odd number of monitors.

I dont know about the problem you are experiencing and how to help
you. An even number of monitors should work.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 12, 2015 7:19 PM, Georgios Dimitrakakis  wrote:


I forgot to say that the monitors form a quorum and the clusters
health is OK
so there arent any serious troubles other than the annoying
message.

Best,

George


Hi all!

I have updated from 0.80.8 to 0.80.9 and every time I try to
restart
CEPH a monitor a strange monitor is appearing!

Here is the output:

#/etc/init.d/ceph restart mon
=== mon.master ===
=== mon.master ===
Stopping Ceph mon.master on master...kill 10766...done
=== mon.master ===
Starting Ceph mon.master on master...
Starting ceph-create-keys on master...
=== mon.master_192.168.0.10 ===
=== mon.master_192.168.0.10 ===
Stopping Ceph mon.master_192.168.0.10 on master...done
=== mon.master_192.168.0.10 ===
Starting Ceph mon.master_192.168.0.10 on master...
2015-03-13 03:06:22.964493 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 not in monmap and have
been in
a quorum before; must have been removed
2015-03-13 03:06:22.964497 7f06256fa7a0 -1
mon.master_192.168.0.10@-1(probing) e2 commit suicide!
2015-03-13 03:06:22.964499 7f06256fa7a0 -1 failed to initialize
failed: ulimit -n 32768;  /usr/bin/ceph-mon -i
master_192.168.0.10
--pid-file /var/run/ceph/mon.master_192.168.0.10.pid -c
/etc/ceph/ceph.conf --cluster ceph

I have two monitors which are:

mon.master and mon.client1

and have defined them in ceph.conf as:

mon_initial_members = master,client1
mon_host = 192.168.0.10,192.168.0.11

Why is the mon.master_192.168.0.10 appearing and how can I stop
it
from happening?

The above is the problem on one node. Obviously the problem is
appearing on the other node as well but instead I have

mon.client1_192.168.0.11 appearing

Any ideas?

Regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]

___
ceph-users mailing list
ceph-users@lists.ceph.com [3]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW Log Rotation (firefly)

2015-03-02 Thread Georgios Dimitrakakis

Daniel,

on CentOS the logrotate script was not invoked incorrectly because it 
was called everywhere as radosgw:


e.g.
 service radosgw reload /dev/null or
 initctl reload radosgw cluster=$cluster id=$id 2/dev/null || :

but there isn't any radosgw service!

I had to change it into ceph-radosgw to make it worker properly!

Since you are using APT I guess that you are on Ubuntu/Debian but you 
may experience a relevant issue.


I was going to submit a bug for CentOS but had forgot it for some time 
now! I think now it's the time...Anyone has a different opinion on that?



Regards,


G.




On 2015-03-02 18:17:00 +, Gregory Farnum said:

I'm not very (well, at all, for rgw) familiar with these scripts, 
but

how are you starting up your RGW daemon? There's some way to have
Apache handle the process instead of Upstart, but Yehuda says you
don't want to do it.
-Greg


Well, we installed the packages via APT. That places the upstart
scripts into /etc/init. Nothing special. That will make Upstart
launch them in boot.

In the meantime I just placed

   /var/log/radosgw/*.log {
   rotate 7
   daily
   compress
   sharedscripts
   postrotate
   	start-stop-daemon --stop --signal HUP -x /usr/bin/radosgw 
--oknodo

   endscript
   missingok
   notifempty
   }

into the logrotate script, removing the more complicated (and not 
working :))

logic with the core piece from the regular init.d script.

Because the daemons were already running and using an already deleted 
script,
logrotate wouldn't see the need to rotate the (visible) ones, because 
they
had not changed. So I needed to manually execute the above 
start-stop-daemon
on all relevant nodes ones to force the gateway to start a new, 
non-deleted

logfile.

Daniel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CEPH BackUPs

2015-01-29 Thread Georgios Dimitrakakis
Urged by a previous post by Mike Winfield where he suffered a leveldb 
loss
I would like to know which files are critical for CEPH operation and 
must

be backed-up regularly and how are you people doing it?

Any points much appreciated!

Regards,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CRUSH Map Adjustment for Node Replication

2015-03-23 Thread Georgios Dimitrakakis

Hi all!

I had a CEPH Cluster with 10x OSDs all of them in one node.

Since the cluster was built from the beginning with just one OSDs node 
the crushmap had as a default

the replication to be on OSDs.

Here is the relevant part from my crushmap:


# rules
rule replicated_ruleset {
ruleset 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type osd
step emit
}

# end crush map


I have added a new node with 10x more identical OSDs thus the total 
OSDs nodes are now two.


I have changed the replication factor to be 2 on all pools and I would 
like to make sure that

I always keep each copy on a different node.

In order to do so do I have to change the CRUSH map?

Which part should I change?


After modifying the CRUSH map what procedure will take place before the 
cluster is ready again?


Is it going to start re-balancing and moving data around? Will a 
deep-scrub follow?


Does the time of the procedure depends on anything else except the 
amount of data and the available connection (bandwidth)?



Looking forward for your answers!


All the best,


George

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD in ceph.conf

2015-05-07 Thread Georgios Dimitrakakis
Indeed it is not necessary to have any OSD entries in the Ceph.conf 
file
but what happens in the event of a disk failure resulting in changing 
the mount device?


For what I can see is that OSDs are mounted from entries in /etc/mtab 
(I am on CentOS 6.6)

like this:

/dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0
/dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0
/dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0
/dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0
/dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0
/dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0
/dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0
/dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0
/dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0
/dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0


So in the event of a disk failure (e.g. disk SDH fails) then in the 
order the next one will take its place meaning that
SDI will be seen as SDH upon next reboot thus it will be mounted as 
CEPH-6 instead of CEPH-7 and so on...resulting in a problematic 
configuration (I guess that lots of data will be start moving around, 
PGs will be misplaced etc.)



Correct me if I am wrong but the proper way to mount them would be by 
using the UUID of the partition.


Is it OK if I change the entries in /etc/mtab using the UUID=xx 
instead of /dev/sdX1??


Does CEPH try to mount them using a different config file and perhaps 
exports the entries at boot in /etc/mtab (in the latter case no 
modification in /etc/mtab will be taken into account)??


I have deployed the Ceph cluster using only the ceph-deploy command. 
Is there a parameter that I 've missed that must be used during 
deployment in order to specify the mount points using the UUIDs instead 
of the device names?



Regards,


George




On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote:

We dont have OSD entries in our Ceph config. They are not needed if
you dont have specific configs for different OSDs.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 6, 2015 7:18 PM, Florent MONTHEL  wrote:


Hi teqm,

Is it necessary to indicate in ceph.conf all OSD that we have in the
cluster ?
we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD seem
to have change ID so crush map not mapped with the reality
Thanks

FLORENT MONTHEL
___
ceph-users mailing list
ceph-users@lists.ceph.com [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:florent.mont...@flox-arts.net


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush rule freeze cluster

2015-05-11 Thread Georgios Dimitrakakis

Oops... to fast to answer...

G.

On Mon, 11 May 2015 12:13:48 +0300, Timofey Titovets wrote:

Hey! I catch it again. Its a kernel bug. Kernel crushed if i try to
map rbd device with map like above!
Hooray!

2015-05-11 12:11 GMT+03:00 Timofey Titovets nefelim...@gmail.com:

FYI and history
Rule:
# rules
rule replicated_ruleset {
  ruleset 0
  type replicated
  min_size 1
  max_size 10
  step take default
  step choose firstn 0 type room
  step choose firstn 0 type rack
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

And after reset node, i can't find any usable info. Cluster works 
fine

and data just rebalanced by osd disks.
syslog:
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading.
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Starting Network 
Time

Synchronization...
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Started Network 
Time

Synchronization.
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading.
May  9 19:30:02 srv-lab-ceph-node-01 CRON[1731]: (CRON) info (No MTA
installed, discarding output)
May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: [origin
software=rsyslogd swVersion=7.4.4 x-pid=689
x-info=http://www.rsyslog.com;] start
May 11 11:54:56 srv-lab-ceph-node-01 rsyslogd: rsyslogd's groupid 
changed to 103
May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: rsyslogd's userid 
changed to 100


Sorry for noise, guys. Georgios, in any way, thanks for helping.

2015-05-10 12:44 GMT+03:00 Georgios Dimitrakakis 
gior...@acmac.uoc.gr:

Timofey,

may be your best chance is to connect directly at the server and 
see what is

going on.
Then you can try debug why the problem occurred. If you don't want 
to wait

until tomorrow
you may try to see what is going on using the server's direct 
remote console

access.
The majority of the servers provide you with that just with a 
different name
each (DELL calls it iDRAC, Fujitsu iRMC, etc.) so if you have it up 
and

running you can use that.

I think this should be your starting point and you can take it on 
from

there.

I am sorry I cannot help you further with the Crush rules and the 
reason why

it crashed since I am far from being an expert in the field :-(

Regards,

George


Georgios, oh, sorry for my poor english _-_, may be I poor 
expressed

what i want =]

i know how to write simple Crush rule and how use it, i want 
several

things things:
1. Understand why, after inject bad map, my test node make 
offline.

This is unexpected.
2. May be somebody can explain what and why happens with this map.
3. This is not a problem to write several crushmap or/and switch 
it

while cluster running.
But, in production, we have several nfs servers, i think about 
moving

it to ceph, but i can't disable more then 1 server for maintenance
simultaneously. I want avoid data disaster while setup and moving 
data

to ceph, case like Use local data replication, if only one node
exist looks usable as temporally solution, while i not add second
node _-_.
4. May be some one also have test cluster and can test that happen
with clients, if crushmap like it was injected.

2015-05-10 8:23 GMT+03:00 Georgios Dimitrakakis 
gior...@acmac.uoc.gr:


Hi Timofey,

assuming that you have more than one OSD hosts and that the 
replicator
factor is equal (or less) to the number of the hosts why don't 
you just

change the crushmap to host replication?

You just need to change the default CRUSHmap rule from

step chooseleaf firstn 0 type osd

to

step chooseleaf firstn 0 type host

I believe that this is the easiest way to do have replication 
across OSD

nodes unless you have a much more sophisticated setup.

Regards,

George




Hi list,
i had experiments with crush maps, and I've try to get raid1 
like
behaviour (if cluster have 1 working osd node, duplicate data 
across
local disk, for avoiding data lose in case local disk failure 
and

allow client working, because this is not a degraded state)
(
  in best case, i want dynamic rule, like:
  if has only one host - spread data over local disks;
  else if host count  1 - spread over hosts (rack o something 
else);

)

i write rule, like below:

rule test {
  ruleset 0
  type replicated
  min_size 0
  max_size 10
  step take default
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

I've inject it in cluster and client node, now looks like have 
get
kernel panic, I've lost my connection with it. No ssh, no ping, 
this

is remote node and i can't see what happens until Monday.
Yes, it looks like I've shoot in my foot.
This is just a test setup and cluster destruction, not a 
problem, but
i think, what broken rules, must not crush something else and in 
worst

case, must be just ignored by cluster/crushtool compiler.

May be someone can explain, how this rule can crush system? May 
be

this is a crazy mistake somewhere

Re: [ceph-users] Crush rule freeze cluster

2015-05-11 Thread Georgios Dimitrakakis

Timofey,

glad that you 've managed to get it working :-)

Best,

George


FYI and history
Rule:
# rules
rule replicated_ruleset {
  ruleset 0
  type replicated
  min_size 1
  max_size 10
  step take default
  step choose firstn 0 type room
  step choose firstn 0 type rack
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

And after reset node, i can't find any usable info. Cluster works 
fine

and data just rebalanced by osd disks.
syslog:
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading.
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Starting Network 
Time

Synchronization...
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Started Network Time
Synchronization.
May  9 19:30:02 srv-lab-ceph-node-01 systemd[1]: Reloading.
May  9 19:30:02 srv-lab-ceph-node-01 CRON[1731]: (CRON) info (No MTA
installed, discarding output)
May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: [origin
software=rsyslogd swVersion=7.4.4 x-pid=689
x-info=http://www.rsyslog.com;] start
May 11 11:54:56 srv-lab-ceph-node-01 rsyslogd: rsyslogd's groupid
changed to 103
May 11 11:54:57 srv-lab-ceph-node-01 rsyslogd: rsyslogd's userid
changed to 100

Sorry for noise, guys. Georgios, in any way, thanks for helping.

2015-05-10 12:44 GMT+03:00 Georgios Dimitrakakis 
gior...@acmac.uoc.gr:

Timofey,

may be your best chance is to connect directly at the server and see 
what is

going on.
Then you can try debug why the problem occurred. If you don't want 
to wait

until tomorrow
you may try to see what is going on using the server's direct remote 
console

access.
The majority of the servers provide you with that just with a 
different name
each (DELL calls it iDRAC, Fujitsu iRMC, etc.) so if you have it up 
and

running you can use that.

I think this should be your starting point and you can take it on 
from

there.

I am sorry I cannot help you further with the Crush rules and the 
reason why

it crashed since I am far from being an expert in the field :-(

Regards,

George


Georgios, oh, sorry for my poor english _-_, may be I poor 
expressed

what i want =]

i know how to write simple Crush rule and how use it, i want 
several

things things:
1. Understand why, after inject bad map, my test node make offline.
This is unexpected.
2. May be somebody can explain what and why happens with this map.
3. This is not a problem to write several crushmap or/and switch it
while cluster running.
But, in production, we have several nfs servers, i think about 
moving

it to ceph, but i can't disable more then 1 server for maintenance
simultaneously. I want avoid data disaster while setup and moving 
data

to ceph, case like Use local data replication, if only one node
exist looks usable as temporally solution, while i not add second
node _-_.
4. May be some one also have test cluster and can test that happen
with clients, if crushmap like it was injected.

2015-05-10 8:23 GMT+03:00 Georgios Dimitrakakis 
gior...@acmac.uoc.gr:


Hi Timofey,

assuming that you have more than one OSD hosts and that the 
replicator
factor is equal (or less) to the number of the hosts why don't you 
just

change the crushmap to host replication?

You just need to change the default CRUSHmap rule from

step chooseleaf firstn 0 type osd

to

step chooseleaf firstn 0 type host

I believe that this is the easiest way to do have replication 
across OSD

nodes unless you have a much more sophisticated setup.

Regards,

George




Hi list,
i had experiments with crush maps, and I've try to get raid1 like
behaviour (if cluster have 1 working osd node, duplicate data 
across

local disk, for avoiding data lose in case local disk failure and
allow client working, because this is not a degraded state)
(
  in best case, i want dynamic rule, like:
  if has only one host - spread data over local disks;
  else if host count  1 - spread over hosts (rack o something 
else);

)

i write rule, like below:

rule test {
  ruleset 0
  type replicated
  min_size 0
  max_size 10
  step take default
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

I've inject it in cluster and client node, now looks like have 
get
kernel panic, I've lost my connection with it. No ssh, no ping, 
this

is remote node and i can't see what happens until Monday.
Yes, it looks like I've shoot in my foot.
This is just a test setup and cluster destruction, not a problem, 
but
i think, what broken rules, must not crush something else and in 
worst

case, must be just ignored by cluster/crushtool compiler.

May be someone can explain, how this rule can crush system? May 
be

this is a crazy mistake somewhere?




--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
___
ceph-users mailing list
ceph-users

Re: [ceph-users] OSD in ceph.conf

2015-05-11 Thread Georgios Dimitrakakis

Hi Robert,

just to make sure I got it correctly:

Do you mean that the /etc/mtab entries are completely ignored and no 
matter what the order
of the /dev/sdX device is Ceph will just mount correctly the osd/ceph-X 
by default?


In addition, assuming that an OSD node fails for a reason other than a 
disk problem (e.g. mobo/ram)
if I put its disks on another OSD node (all disks have their journals 
with) will Ceph be able to mount

them correctly and continue its operation?

Regards,

George


I have not used ceph-deploy, but it should use ceph-disk for the OSD
preparation.  Ceph-disk creates GPT partitions with specific
partition UUIDS for data and journals. When udev or init starts the
OSD, or mounts it to a temp location reads the whoami file and the
journal, then remounts it in the correct location. There is no need
for fstab entries or the like. This allows you to easily move OSD
disks between servers (if you take the journals with it). Its magic! 
But I think I just gave away the secret.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 7, 2015 5:16 AM, Georgios Dimitrakakis  wrote:


Indeed it is not necessary to have any OSD entries in the Ceph.conf
file
but what happens in the event of a disk failure resulting in
changing the mount device?

For what I can see is that OSDs are mounted from entries in
/etc/mtab (I am on CentOS 6.6)
like this:

/dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0
/dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0
/dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0
/dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0
/dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0
/dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0
/dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0
/dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0
/dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0
/dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0

So in the event of a disk failure (e.g. disk SDH fails) then in the
order the next one will take its place meaning that
SDI will be seen as SDH upon next reboot thus it will be mounted as
CEPH-6 instead of CEPH-7 and so on...resulting in a problematic
configuration (I guess that lots of data will be start moving
around, PGs will be misplaced etc.)

Correct me if I am wrong but the proper way to mount them would be
by using the UUID of the partition.

Is it OK if I change the entries in /etc/mtab using the UUID=xx
instead of /dev/sdX1??

Does CEPH try to mount them using a different config file and
perhaps exports the entries at boot in /etc/mtab (in the latter case
no modification in /etc/mtab will be taken into account)??

I have deployed the Ceph cluster using only the ceph-deploy
command. Is there a parameter that I ve missed that must be used
during deployment in order to specify the mount points using the
UUIDs instead of the device names?

Regards,

George

On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote:


We dont have OSD entries in our Ceph config. They are not needed
if
you dont have specific configs for different OSDs.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 6, 2015 7:18 PM, Florent MONTHEL  wrote:


Hi teqm,

Is it necessary to indicate in ceph.conf all OSD that we have
in the
cluster ?
we have today reboot a cluster (5 nodes RHEL 6.5) and some OSD
seem
to have change ID so crush map not mapped with the reality
Thanks

FLORENT MONTHEL
___
ceph-users mailing list
ceph-users@lists.ceph.com [1] [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2] [2]


Links:
--
[1] mailto:ceph-users@lists.ceph.com [3]
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]
[3] mailto:florent.mont...@flox-arts.net [5]


___
ceph-users mailing list
ceph-users@lists.ceph.com [6]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [7]



Links:
--
[1] mailto:ceph-users@lists.ceph.com
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:florent.mont...@flox-arts.net
[6] mailto:ceph-users@lists.ceph.com
[7] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[8] mailto:gior...@acmac.uoc.gr

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rename or Remove Pool

2015-05-05 Thread Georgios Dimitrakakis

Robert,

I did try that without success.
The error was:
Invalid command:  missing required parameter srcpool(poolname)


Upon debian112's recommendation on IRC channel and looking at this 
post: 
http://cephnotes.ksperis.com/blog/2014/10/29/remove-pool-without-name


I 've used the command:

rados rmpool   --yes-i-really-really-mean-it

which actually removed the problematic pool!


It would be a good idea for developers to also include a way to 
manipulate (rename, delete, etc.) pools using the ID which is definitely 
unique and in my opinion would be error-resistant or at least less 
susceptible to errors.



Best regards,

George


Can you try

ceph osd pool rename   new-name



On Tue, May 5, 2015 at 12:43 PM, Georgios Dimitrakakis
gior...@acmac.uoc.gr wrote:


Hi all!

Somehow I have a pool without a name...

$ ceph osd lspools
3 data,4 metadata,5 rbd,6 .rgw,7 .rgw.control,8 .rgw.gc,9 .log,10
.intent-log,11 .usage,12 .users,13 .users.email,14 .users.swift,15
.users.uid,16 .rgw.root,17 .rgw.buckets.index,18 .rgw.buckets,19
.rgw.buckets.extra,20 volumes,21 ,


which doesn't have any objects in there...

$ rados df
pool name   category KB  objects   
clones
degraded  unfound   rdrd KB   wr
wr KB
-  00
0

0   0000   0
.intent-log -  00
0

0   0000   0
.log-  00
0

0   0000   0
.rgw-  28
0

0   0  947  725   28   8
.rgw.buckets-  133181101   672223
0

0   0  3301538191937648  3668657179126038
.rgw.buckets.extra -  01 
  0

0   0 10681048817397   0
.rgw.buckets.index -  04 
  0

0   0 12680399 75160699 12381179   0
.rgw.control-  08
0

0   0000   0
.rgw.gc -  0   32
0

0   0  2164762  2263771  3688270   0
.rgw.root   -  13
0

0   0  450  2983   3
.usage  -  03
0

0   0202502025040500   0
.users  -  13
0

0   0  158   939   6
.users.email-  12
0

0   0326   4
.users.swift-  12
0

0   0426   4
.users.uid  -  14
0

0   01074110661 9985   10
data-  00
0

0   0002   1
metadata-  2   20
0

0   0  230  262   21   8
rbd -  00
0

0   0000   0
volumes -  691911002   172198
0

0   0  6726256607996917  7872745   1122764623


How can I either rename it so that I can modify the min_size and 
replication
level of that pool (I have some unclean pgs due to that) or delete 
it

completely?


Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Rename or Remove Pool

2015-05-05 Thread Georgios Dimitrakakis


Hi all!

Somehow I have a pool without a name...

$ ceph osd lspools
3 data,4 metadata,5 rbd,6 .rgw,7 .rgw.control,8 .rgw.gc,9 .log,10 
.intent-log,11 .usage,12 .users,13 .users.email,14 .users.swift,15 
.users.uid,16 .rgw.root,17 .rgw.buckets.index,18 .rgw.buckets,19 
.rgw.buckets.extra,20 volumes,21 ,



which doesn't have any objects in there...

$ rados df
pool name   category KB  objects   clones   
 degraded  unfound   rdrd KB   wrwr 
KB
-  000  
 0   0000   
0
.intent-log -  000  
 0   0000   
0
.log-  000  
 0   0000   
0
.rgw-  280  
 0   0  947  725   28   
8
.rgw.buckets-  133181101   6722230  
 0   0  3301538191937648  3668657
179126038
.rgw.buckets.extra -  01
00   0 10681048817397
   0
.rgw.buckets.index -  04
00   0 12680399 75160699 12381179
   0
.rgw.control-  080  
 0   0000   
0
.rgw.gc -  0   320  
 0   0  2164762  2263771  3688270   
0
.rgw.root   -  130  
 0   0  450  2983   
3
.usage  -  030  
 0   0202502025040500   
0
.users  -  130  
 0   0  158   939   
6
.users.email-  120  
 0   0326   
4
.users.swift-  120  
 0   0426   
4
.users.uid  -  140  
 0   01074110661 9985   
10
data-  000  
 0   0002   
1
metadata-  2   200  
 0   0  230  262   21   
8
rbd -  000  
 0   0000   
0
volumes -  691911002   1721980  
 0   0  6726256607996917  7872745   
1122764623



How can I either rename it so that I can modify the min_size and 
replication level of that pool (I have some unclean pgs due to that) or 
delete it completely?



Best,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush rule freeze cluster

2015-05-10 Thread Georgios Dimitrakakis

Timofey,

may be your best chance is to connect directly at the server and see 
what is going on.
Then you can try debug why the problem occurred. If you don't want to 
wait until tomorrow
you may try to see what is going on using the server's direct remote 
console access.
The majority of the servers provide you with that just with a different 
name each (DELL calls it iDRAC, Fujitsu iRMC, etc.) so if you have it up 
and running you can use that.


I think this should be your starting point and you can take it on from 
there.


I am sorry I cannot help you further with the Crush rules and the 
reason why it crashed since I am far from being an expert in the field 
:-(


Regards,

George


Georgios, oh, sorry for my poor english _-_, may be I poor expressed
what i want =]

i know how to write simple Crush rule and how use it, i want several
things things:
1. Understand why, after inject bad map, my test node make offline.
This is unexpected.
2. May be somebody can explain what and why happens with this map.
3. This is not a problem to write several crushmap or/and switch it
while cluster running.
But, in production, we have several nfs servers, i think about moving
it to ceph, but i can't disable more then 1 server for maintenance
simultaneously. I want avoid data disaster while setup and moving 
data

to ceph, case like Use local data replication, if only one node
exist looks usable as temporally solution, while i not add second
node _-_.
4. May be some one also have test cluster and can test that happen
with clients, if crushmap like it was injected.

2015-05-10 8:23 GMT+03:00 Georgios Dimitrakakis 
gior...@acmac.uoc.gr:

Hi Timofey,

assuming that you have more than one OSD hosts and that the 
replicator
factor is equal (or less) to the number of the hosts why don't you 
just

change the crushmap to host replication?

You just need to change the default CRUSHmap rule from

step chooseleaf firstn 0 type osd

to

step chooseleaf firstn 0 type host

I believe that this is the easiest way to do have replication across 
OSD

nodes unless you have a much more sophisticated setup.

Regards,

George




Hi list,
i had experiments with crush maps, and I've try to get raid1 like
behaviour (if cluster have 1 working osd node, duplicate data 
across

local disk, for avoiding data lose in case local disk failure and
allow client working, because this is not a degraded state)
(
  in best case, i want dynamic rule, like:
  if has only one host - spread data over local disks;
  else if host count  1 - spread over hosts (rack o something 
else);

)

i write rule, like below:

rule test {
  ruleset 0
  type replicated
  min_size 0
  max_size 10
  step take default
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

I've inject it in cluster and client node, now looks like have get
kernel panic, I've lost my connection with it. No ssh, no ping, 
this

is remote node and i can't see what happens until Monday.
Yes, it looks like I've shoot in my foot.
This is just a test setup and cluster destruction, not a problem, 
but
i think, what broken rules, must not crush something else and in 
worst

case, must be just ignored by cluster/crushtool compiler.

May be someone can explain, how this rule can crush system? May be
this is a crazy mistake somewhere?



--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crush rule freeze cluster

2015-05-09 Thread Georgios Dimitrakakis

Hi Timofey,

assuming that you have more than one OSD hosts and that the replicator 
factor is equal (or less) to the number of the hosts why don't you just 
change the crushmap to host replication?


You just need to change the default CRUSHmap rule from

step chooseleaf firstn 0 type osd

to

step chooseleaf firstn 0 type host

I believe that this is the easiest way to do have replication across 
OSD nodes unless you have a much more sophisticated setup.


Regards,

George



Hi list,
i had experiments with crush maps, and I've try to get raid1 like
behaviour (if cluster have 1 working osd node, duplicate data across
local disk, for avoiding data lose in case local disk failure and
allow client working, because this is not a degraded state)
(
  in best case, i want dynamic rule, like:
  if has only one host - spread data over local disks;
  else if host count  1 - spread over hosts (rack o something 
else);

)

i write rule, like below:

rule test {
  ruleset 0
  type replicated
  min_size 0
  max_size 10
  step take default
  step choose firstn 0 type host
  step chooseleaf firstn 0 type osd
  step emit
}

I've inject it in cluster and client node, now looks like have get
kernel panic, I've lost my connection with it. No ssh, no ping, this
is remote node and i can't see what happens until Monday.
Yes, it looks like I've shoot in my foot.
This is just a test setup and cluster destruction, not a problem, but
i think, what broken rules, must not crush something else and in 
worst

case, must be just ignored by cluster/crushtool compiler.

May be someone can explain, how this rule can crush system? May be
this is a crazy mistake somewhere?


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD in ceph.conf

2015-05-12 Thread Georgios Dimitrakakis

Robert,

thanks a lot for the feedback!

I was very worried about the same thing! Glad to know tha Ceph's 
automagic takes care of everything :-P


Best regards,

George



If you use ceph-disk (and I believe ceph-depoly) to create your OSDs,
or you go through the manual steps to set up the partition UUIDs, 
then

yes udev and the init script will do all the magic. Your disks can be
moved to another box without problems. Ive moved disks to different
ports on controllers and it all worked just fine. I will be swapping
the disks between two boxes today to try to get to the bottom of some
problems we have been having, if it doesnt work Ill let you know.

The automagic of ceph OSDS has been refreshing for me because I was
worried about having to manage so many disks and mount points, but it
is much easier than I anticipated once I used ceph-disk.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 11, 2015 5:32 AM, Georgios Dimitrakakis  wrote:


Hi Robert,

just to make sure I got it correctly:

Do you mean that the /etc/mtab entries are completely ignored and
no matter what the order
of the /dev/sdX device is Ceph will just mount correctly the
osd/ceph-X by default?

In addition, assuming that an OSD node fails for a reason other
than a disk problem (e.g. mobo/ram)
if I put its disks on another OSD node (all disks have their
journals with) will Ceph be able to mount
them correctly and continue its operation?

Regards,

George


I have not used ceph-deploy, but it should use ceph-disk for the
OSD
preparation.  Ceph-disk creates GPT partitions with specific
partition UUIDS for data and journals. When udev or init starts
the
OSD, or mounts it to a temp location reads the whoami file and
the
journal, then remounts it in the correct location. There is no
need
for fstab entries or the like. This allows you to easily move OSD
disks between servers (if you take the journals with it). Its
magic! 
But I think I just gave away the secret.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 7, 2015 5:16 AM, Georgios Dimitrakakis  wrote:


Indeed it is not necessary to have any OSD entries in the
Ceph.conf
file
but what happens in the event of a disk failure resulting in
changing the mount device?

For what I can see is that OSDs are mounted from entries in
/etc/mtab (I am on CentOS 6.6)
like this:

/dev/sdj1 /var/lib/ceph/osd/ceph-8 xfs rw,noatime,inode64 0 0
/dev/sdh1 /var/lib/ceph/osd/ceph-6 xfs rw,noatime,inode64 0 0
/dev/sdg1 /var/lib/ceph/osd/ceph-5 xfs rw,noatime,inode64 0 0
/dev/sde1 /var/lib/ceph/osd/ceph-3 xfs rw,noatime,inode64 0 0
/dev/sdi1 /var/lib/ceph/osd/ceph-7 xfs rw,noatime,inode64 0 0
/dev/sdf1 /var/lib/ceph/osd/ceph-4 xfs rw,noatime,inode64 0 0
/dev/sdd1 /var/lib/ceph/osd/ceph-2 xfs rw,noatime,inode64 0 0
/dev/sdk1 /var/lib/ceph/osd/ceph-9 xfs rw,noatime,inode64 0 0
/dev/sdb1 /var/lib/ceph/osd/ceph-0 xfs rw,noatime,inode64 0 0
/dev/sdc1 /var/lib/ceph/osd/ceph-1 xfs rw,noatime,inode64 0 0

So in the event of a disk failure (e.g. disk SDH fails) then in
the
order the next one will take its place meaning that
SDI will be seen as SDH upon next reboot thus it will be
mounted as
CEPH-6 instead of CEPH-7 and so on...resulting in a problematic
configuration (I guess that lots of data will be start moving
around, PGs will be misplaced etc.)

Correct me if I am wrong but the proper way to mount them would
be
by using the UUID of the partition.

Is it OK if I change the entries in /etc/mtab using the
UUID=xx
instead of /dev/sdX1??

Does CEPH try to mount them using a different config file and
perhaps exports the entries at boot in /etc/mtab (in the latter
case
no modification in /etc/mtab will be taken into account)??

I have deployed the Ceph cluster using only the ceph-deploy
command. Is there a parameter that I ve missed that must be
used
during deployment in order to specify the mount points using
the
UUIDs instead of the device names?

Regards,

George

On Wed, 6 May 2015 22:36:14 -0600, Robert LeBlanc wrote:


We dont have OSD entries in our Ceph config. They are not
needed
if
you dont have specific configs for different OSDs.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On May 6, 2015 7:18 PM, Florent MONTHEL  wrote:


Hi teqm,

Is it necessary to indicate in ceph.conf all OSD that we
have
in the
cluster ?
we have today reboot a cluster (5 nodes RHEL 6.5) and some
OSD
seem
to have change ID so crush map not mapped with the reality
Thanks

FLORENT MONTHEL
___
ceph-users mailing list
ceph-users@lists.ceph.com [1] [1] [1]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [2]
[2] [2]


Links:
--
[1] mailto:ceph-users@lists.ceph.com [3] [3]
[2] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[4] [4]
[3] mailto:florent.mont...@flox-arts.net [5] [5]


___
ceph-users mailing list
ceph-users@lists.ceph.com [6] [6]
http://lists.ceph.com

Re: [ceph-users] All pools have size=3 but MB data and MB used ratio is 1 to 5

2015-04-17 Thread Georgios Dimitrakakis

Hi!

Do you by any chance have your OSDs placed at a local directory path 
rather than on a non utilized physical disk?


If I remember correctly from a similar setup that I had performed in 
the past the ceph df command accounts for the entire disk and not just 
for the OSD data directory. I am not sure if this still applies since it 
was on an early Firefly release but it is something that it's easy to 
look for.


I don't know if the above make sense but what I mean is that if for 
instance your OSD are at something like /var/lib/ceph/osd.X (or 
whatever) and this doesn't correspond to mounted a device (e.g. 
/dev/sdc1) but are local on the disk that provides the / or /var 
partition then you should do a df -h to see what the amount of data 
are on that partition and compare it with the ceph df output. It 
should be (more or less) the same.


Best,

George



2015-03-27 18:27 GMT+01:00 Gregory Farnum g...@gregs42.com:
Ceph has per-pg and per-OSD metadata overhead. You currently have 
26000 PGs,
suitable for use on a cluster of the order of 260 OSDs. You have 
placed

almost 7GB of data into it (21GB replicated) and have about 7GB of
additional overhead.

You might try putting a suitable amount of data into the cluster 
before

worrying about the ratio of space used to data stored. :)
-Greg


Hello Greg,

I put a suitable amount of data now, and it looks like my ratio is
still 1 to 5.
The folder:
/var/lib/ceph/osd/ceph-N/current/meta/
did not grow, so it looks like that is not the problem.

Do you have any hint how to troubleshoot this issue ???


ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets size
size: 3
ansible@zrh-srv-m-cph02:~$ ceph osd pool get .rgw.buckets min_size
min_size: 2


ansible@zrh-srv-m-cph02:~$ ceph -w
cluster 4179fcec-b336-41a1-a7fd-4a19a75420ea
 health HEALTH_WARN pool .rgw.buckets has too few pgs
 monmap e4: 4 mons at

{rml-srv-m-cph01=10.120.50.20:6789/0,rml-srv-m-cph02=10.120.50.21:6789/0,rml-srv-m-stk03=10.120.50.32:6789/0,zrh-srv-m-cph02=10.120.50.2:6789/0},
election epoch 668, quorum 0,1,2,3
zrh-srv-m-cph02,rml-srv-m-cph01,rml-srv-m-cph02,rml-srv-m-stk03
 osdmap e2170: 54 osds: 54 up, 54 in
  pgmap v619041: 28684 pgs, 15 pools, 109 GB data, 7358 kobjects
518 GB used, 49756 GB / 50275 GB avail
   28684 active+clean

ansible@zrh-srv-m-cph02:~$ ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
50275G 49756G 518G  1.03
POOLS:
NAME   ID USED  %USED MAX AVAIL 
OBJECTS
rbd0155 016461G   
   2
gianfranco 7156 016461G   
   2
images 8   257M 016461G   
  38
.rgw.root  9840 016461G   
   3
.rgw.control   10 0 016461G   
   8
.rgw   11 21334 016461G   
 108
.rgw.gc12 0 016461G   
  32
.users.uid 13  1575 016461G   
   6
.users 1472 016461G   
   6
.rgw.buckets.index 15 0 016461G   
  30
.users.swift   1736 016461G   
   3
.rgw.buckets   18  108G  0.2216461G 
7534745
.intent-log19 0 016461G   
   0
.rgw.buckets.extra 20 0 016461G   
   0
volumes21  512M 016461G   
 161

ansible@zrh-srv-m-cph02:~$
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis

I am trying to build the packages manually and I was wondering
is the flag --enable-rbd enough to have full Ceph functionality?

Does anybody know what else flags should I include in order to have the 
same functionality as the original CentOS package plus the RBD support?


Regards,

George

On Tue, 19 May 2015 13:45:50 +0300, Georgios Dimitrakakis wrote:

Hi!

The QEMU Venom vulnerability (http://venom.crowdstrike.com/) got my
attention and I would
like to know what are you people doing in order to have the latest
patched QEMU version
working with Ceph RBD?

In my case I am using the qemu-img and qemu-kvm packages provided by
Ceph (http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/) in
order to have RBD working on CentOS6 since the default repository
packages do not work!

If I want to update to the latest QEMU packages which ones are known
to work with Ceph RBD?
I have seen some people mentioning that Fedora packages are working
but I am not sure if they have the latest packages available and if
they are going to work eventually.

Is building manually the QEMU packages the only way???


Best regards,


George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis

Erik,

are you talking about the ones here : 
http://ftp.redhat.com/redhat/linux/enterprise/6Server/en/RHEV/SRPMS/ ???


From what I see the version is rather small 0.12.1.2-2.448

How one can verify that it has been patched against venom 
vulnerability?


Additionally I only see the qemu-kvm package and not the qemu-img. Is 
it essential to update both in order to have a working CentOS system or 
can I just proceed with the qemu-kvm?


Robert, any ideas where can I find the latest and patched SRPMs...I 
have been building v.2.3.0 from source but I am very reluctant to use it 
in my system :-)


Best,

George



You can also just fetch the rhev SRPMs  and build those. They have
rbd enabled already.
On May 19, 2015 12:31 PM, Robert LeBlanc  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

You should be able to get the SRPM, extract the SPEC file and use
that
to build a new package. You should be able to tweak all the compile
options as well. Im still really new to building/rebuilding RPMs
but
Ive been able to do this for a couple of packages.
- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

On Tue, May 19, 2015 at 12:33 PM, Georgios Dimitrakakis  wrote:
 I am trying to build the packages manually and I was wondering
 is the flag --enable-rbd enough to have full Ceph functionality?

 Does anybody know what else flags should I include in order to
have the same
 functionality as the original CentOS package plus the RBD
support?

 Regards,

 George


 On Tue, 19 May 2015 13:45:50 +0300, Georgios Dimitrakakis wrote:

 Hi!

 The QEMU Venom vulnerability (http://venom.crowdstrike.com/ [1])
got my
 attention and I would
 like to know what are you people doing in order to have the
latest
 patched QEMU version
 working with Ceph RBD?

 In my case I am using the qemu-img and qemu-kvm packages
provided by
 Ceph (http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/
[2]) in
 order to have RBD working on CentOS6 since the default
repository
 packages do not work!

 If I want to update to the latest QEMU packages which ones are
known
 to work with Ceph RBD?
 I have seen some people mentioning that Fedora packages are
working
 but I am not sure if they have the latest packages available and
if
 they are going to work eventually.

 Is building manually the QEMU packages the only way???


 Best regards,


 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [3]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [5]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com [7]

wsFcBAEBCAAQBQJVW4+RCRDmVDuy+mK58QAAg8AP/jqmQFYEwOeGRTJigk9M
pBhr34vyA3mky+BjjW9pt2tydECOH0p5PlYXBfhrQeg2B/yT0uVUKYbYkdBU
fY85UhS5NFdm7VyFyMPSGQwZlXIADF8YJw+Zbj1tpfRvbCi/sntbvGQk+9X8
usVSwBTbWKhYyMW8J5edppv72fMwoVjmoNXuE7wCUoqwxpQBUt0ouap6gDNd
Cu0ZMu+RKq+gfLGcIeSIhsDfV0/LHm2QBO/XjNZtMjyomOWNk9nYHp6HGJxH
MV/EoF4dYoCqHcODPjU2NvesQfYkmqfFoq/n9q/fMEV5JQ+mDfXqc2BcQUsx
40LDWDs+4BTw0KI+dNT0XUYTw+O0WnXFzgIn1wqXEs8pyOSJy1gCcnOGEavy
4PqYasm1g+5uzggaIddFPcWHJTw5FuFfjCnHX8Jo3EeQVDM6Vg8FPkkb5JQk
sqxVRQWsF89gGRUbHIQWdkgy3PZN0oTkBvUfflmE/cUq/r40sD4c25D+9Gti
Gj0IKG5uqMaHud3Hln++0ai5roOghoK0KxcDoBTmFLaQSNo9c4CIFCDf2kJ3
idH5tVozDSgvFpgBFLFatb7isctIYf4Luh/XpLXUzdjklGGzo9mhOjXsbm56
WCJZOkQ/OY1UFysMV5+tSSEn7TsF7Np9NagZB7AHhYuTKlOnbv3QJlhATOPp
u4wP
=SsM2
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com [8]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [9]



Links:
--
[1] http://venom.crowdstrike.com/
[2] http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:ceph-users@lists.ceph.com
[6] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[7] https://www.mailvelope.com
[8] mailto:ceph-users@lists.ceph.com
[9] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[10] mailto:rob...@leblancnet.us

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] QEMU Venom Vulnerability

2015-05-19 Thread Georgios Dimitrakakis

Erik,

thanks for the feedback. I am still on 6 so if someone else has a 
proposal please come forward...


Best,

George

Sorry, I made the assumption you were on 7. If youre on 6 then I 
defer

to someone else ;)

If youre on 7, go here.


http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/
[21]

On May 19, 2015 2:47 PM, Georgios Dimitrakakis  wrote:


Erik,

are you talking about the ones here :
http://ftp.redhat.com/redhat/linux/enterprise/6Server/en/RHEV/SRPMS/
[20] ???

From what I see the version is rather small 0.12.1.2-2.448

How one can verify that it has been patched against venom
vulnerability?

Additionally I only see the qemu-kvm package and not the qemu-img.
Is it essential to update both in order to have a working CentOS
system or can I just proceed with the qemu-kvm?

Robert, any ideas where can I find the latest and patched SRPMs...I
have been building v.2.3.0 from source but I am very reluctant to
use it in my system :-)

Best,

George


You can also just fetch the rhev SRPMs  and build those. They
have
rbd enabled already.
On May 19, 2015 12:31 PM, Robert LeBlanc  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

You should be able to get the SRPM, extract the SPEC file and
use
that
to build a new package. You should be able to tweak all the
compile
options as well. Im still really new to building/rebuilding
RPMs
but
Ive been able to do this for a couple of packages.
- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62
B9F1

On Tue, May 19, 2015 at 12:33 PM, Georgios Dimitrakakis 
wrote:
 I am trying to build the packages manually and I was
wondering
 is the flag --enable-rbd enough to have full Ceph
functionality?

 Does anybody know what else flags should I include in order
to
have the same
 functionality as the original CentOS package plus the RBD
support?

 Regards,

 George


 On Tue, 19 May 2015 13:45:50 +0300, Georgios Dimitrakakis
wrote:

 Hi!

 The QEMU Venom vulnerability (http://venom.crowdstrike.com/
[1] [1])
got my
 attention and I would
 like to know what are you people doing in order to have the
latest
 patched QEMU version
 working with Ceph RBD?

 In my case I am using the qemu-img and qemu-kvm packages
provided by
 Ceph
(http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/ [2]
[2]) in
 order to have RBD working on CentOS6 since the default
repository
 packages do not work!

 If I want to update to the latest QEMU packages which ones
are
known
 to work with Ceph RBD?
 I have seen some people mentioning that Fedora packages are
working
 but I am not sure if they have the latest packages available
and
if
 they are going to work eventually.

 Is building manually the QEMU packages the only way???


 Best regards,


 George
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [3] [3]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [4]
[4]

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com [5] [5]
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [6]
[6]

-BEGIN PGP SIGNATURE-
Version: Mailvelope v0.13.1
Comment: https://www.mailvelope.com [7] [7]

wsFcBAEBCAAQBQJVW4+RCRDmVDuy+mK58QAAg8AP/jqmQFYEwOeGRTJigk9M
pBhr34vyA3mky+BjjW9pt2tydECOH0p5PlYXBfhrQeg2B/yT0uVUKYbYkdBU
fY85UhS5NFdm7VyFyMPSGQwZlXIADF8YJw+Zbj1tpfRvbCi/sntbvGQk+9X8
usVSwBTbWKhYyMW8J5edppv72fMwoVjmoNXuE7wCUoqwxpQBUt0ouap6gDNd
Cu0ZMu+RKq+gfLGcIeSIhsDfV0/LHm2QBO/XjNZtMjyomOWNk9nYHp6HGJxH
MV/EoF4dYoCqHcODPjU2NvesQfYkmqfFoq/n9q/fMEV5JQ+mDfXqc2BcQUsx
40LDWDs+4BTw0KI+dNT0XUYTw+O0WnXFzgIn1wqXEs8pyOSJy1gCcnOGEavy
4PqYasm1g+5uzggaIddFPcWHJTw5FuFfjCnHX8Jo3EeQVDM6Vg8FPkkb5JQk
sqxVRQWsF89gGRUbHIQWdkgy3PZN0oTkBvUfflmE/cUq/r40sD4c25D+9Gti
Gj0IKG5uqMaHud3Hln++0ai5roOghoK0KxcDoBTmFLaQSNo9c4CIFCDf2kJ3
idH5tVozDSgvFpgBFLFatb7isctIYf4Luh/XpLXUzdjklGGzo9mhOjXsbm56
WCJZOkQ/OY1UFysMV5+tSSEn7TsF7Np9NagZB7AHhYuTKlOnbv3QJlhATOPp
u4wP
=SsM2
-END PGP SIGNATURE-
___
ceph-users mailing list
ceph-users@lists.ceph.com [8] [8]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [9] [9]


Links:
--
[1] http://venom.crowdstrike.com/ [10]
[2] http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/ [11]
[3] mailto:ceph-users@lists.ceph.com [12]
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [13]
[5] mailto:ceph-users@lists.ceph.com [14]
[6] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [15]
[7] https://www.mailvelope.com [16]
[8] mailto:ceph-users@lists.ceph.com [17]
[9] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [18]
[10] mailto:rob...@leblancnet.us [19]



Links:
--
[1] http://venom.crowdstrike.com/
[2] http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/
[3] mailto:ceph-users@lists.ceph.com
[4] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[5] mailto:ceph-users@lists.ceph.com
[6] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[7

Re: [ceph-users] QEMU Venom Vulnerability

2015-05-20 Thread Georgios Dimitrakakis

Hi Brad!

Thanks for pointing out that for CentOS 6 the fix is included! Good to 
know that!


But I think that the original package doesn't support RBD by default so 
it has to be built again, am I right?


If that's correct then starting from there and building a new RPM with 
RBD support is the proper way of updating. Correct?


Since I am very new at building RPMs is something else that I should be 
aware of or take care? Any guidelines maybe


Best regards,

George

On Thu, 21 May 2015 09:25:32 +1000, Brad Hubbard wrote:

On 05/21/2015 08:47 AM, Brad Hubbard wrote:

On 05/20/2015 11:02 AM, Robert LeBlanc wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I've downloaded the new tarball, placed it in rpmbuild/SOURCES then
with the extracted spec file in rpmbuild/SPEC, I update it to the 
new

version and then rpmbuild -ba program.spec. If you install the SRPM
then it will install the RH patches that have been applied to the
package and then you get to have the fun of figuring out which 
patches
are still needed and which ones need to be modified. You can 
probably

build the package without the patches, but some things may work a
little differently. That would get you the closest to the official
RPMs

As to where to find the SRPMs, I'm not really sure, I come from a
Debian background where access to source packages is really easy.



# yumdownloader --source qemu-kvm --source qemu-kvm-rhev

This assumes you have the correct source repos enabled. Something 
like;


# subscription-manager repos 
--enable=rhel-7-server-openstack-6.0-source-rpms 
--enable=rhel-7-server-source-rpms


Taken from https://access.redhat.com/solutions/1381603


Of course the above is for RHEL only and is unnecessary as there are 
errata
packages for rhel. I was just trying to explain how you can get 
access to the

source packages for rhel.

As for Centos 6, although the version number may be small it has 
the fix.



http://vault.centos.org/6.6/updates/Source/SPackages/qemu-kvm-0.12.1.2-2.448.el6_6.3.src.rpm

$ rpm -qp --changelog qemu-kvm-0.12.1.2-2.448.el6_6.3.src.rpm |head 
-5

warning: qemu-kvm-0.12.1.2-2.448.el6_6.3.src.rpm: Header V3 RSA/SHA1
Signature, key ID c105b9de: NOKEY
* Fri May 08 2015 Miroslav Rezanina mreza...@redhat.com -
0.12.1.2-2.448.el6_6.3
- kvm-fdc-force-the-fifo-access-to-be-in-bounds-of-the-all.patch 
[bz#1219267]

- Resolves: bz#1219267
  (EMBARGOED CVE-2015-3456 qemu-kvm: qemu: floppy disk controller
flaw [rhel-6.6.z])

HTH.


Cheers,
Brad



HTH.

Cheers,
Brad


- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Tue, May 19, 2015 at 3:47 PM, Georgios Dimitrakakis  wrote:

Erik,

are you talking about the ones here :

http://ftp.redhat.com/redhat/linux/enterprise/6Server/en/RHEV/SRPMS/ 
???


 From what I see the version is rather small 0.12.1.2-2.448

How one can verify that it has been patched against venom 
vulnerability?


Additionally I only see the qemu-kvm package and not the qemu-img. 
Is it
essential to update both in order to have a working CentOS system 
or can I

just proceed with the qemu-kvm?

Robert, any ideas where can I find the latest and patched 
SRPMs...I have
been building v.2.3.0 from source but I am very reluctant to use 
it in my

system :-)

Best,

George


You can also just fetch the rhev SRPMs  and build those. They 
have

rbd enabled already.
On May 19, 2015 12:31 PM, Robert LeBlanc  wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

You should be able to get the SRPM, extract the SPEC file and 
use

that
to build a new package. You should be able to tweak all the 
compile

options as well. Im still really new to building/rebuilding RPMs
but
Ive been able to do this for a couple of packages.
- 
Robert LeBlanc
GPG Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 
B9F1


On Tue, May 19, 2015 at 12:33 PM, Georgios Dimitrakakis  wrote:

I am trying to build the packages manually and I was wondering
is the flag --enable-rbd enough to have full Ceph 
functionality?


Does anybody know what else flags should I include in order to

have the same

functionality as the original CentOS package plus the RBD

support?


Regards,

George


On Tue, 19 May 2015 13:45:50 +0300, Georgios Dimitrakakis 
wrote:


Hi!

The QEMU Venom vulnerability (http://venom.crowdstrike.com/ 
[1])

got my

attention and I would
like to know what are you people doing in order to have the

latest

patched QEMU version
working with Ceph RBD?

In my case I am using the qemu-img and qemu-kvm packages

provided by

Ceph (http://ceph.com/packages/ceph-extras/rpm/centos6/x86_64/

[2]) in

order to have RBD working on CentOS6 since the default

repository

packages do not work!

If I want to update to the latest QEMU packages which ones are

known

to work with Ceph RBD?
I have seen some people mentioning that Fedora packages are

working
but I am not sure if they have the latest packages available

Re: [ceph-users] Rename pool by id

2015-06-17 Thread Georgios Dimitrakakis

Pavel,

unfortunately there isn't a way to rename  a pool usign its ID as I 
have learned myself the hard way since I 've faced a few months ago the 
exact same issue.


It would be a good idea for developers to also include a way to 
manipulate (rename, delete, etc.) pools using the ID which is definitely 
unique and in my opinion would be error-resistant or at least less 
susceptible to errors.


In order to succeed what you want try the command:

rados rmpool   --yes-i-really-really-mean-it

which will actually remove the problematic pool, as shown here : 
http://cephnotes.ksperis.com/blog/2014/10/29/remove-pool-without-name .


To be fair and give credits everywhere this solution was also suggested 
to me at the IRC channel by debian112 at that time.



Best regards,

George



On Wed, 17 Jun 2015 17:17:55 +0600, pa...@gradient54.ru wrote:

Hi all, is any way to rename a pool by ID (pool number).
I have one pool with empty name, it is not used and just want delete
this, but can't do it, because pool name required.

ceph osd lspools
0 data,1 metadata,2 rbd,12 ,16 libvirt,

I want rename this: pool #12

Thanks,
Pavel


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] NFS interaction with RBD

2015-05-28 Thread Georgios Dimitrakakis

Thanks a million for the feedback Christian!

I 've tried to recreate the issue with 10RBD Volumes mounted on a 
single server without success!


I 've issued the mkfs.xfs command simultaneously (or at least as fast 
I could do it in different terminals) without noticing any problems. Can 
you please tell me what was the size of each one of the RBD Volumes 
cause I have a feeling that mine were two small, and if so I have to 
test it on our bigger cluster.


I 've also thought that besides QEMU version it might also be important 
the underlying OS, so what was your testbed?



All the best,

George


Hi George

In order to experience the error it was enough to simply run mkfs.xfs
on all the volumes.


In the meantime it became clear what the problem was:

 ~ ; cat /proc/183016/limits
...
Max open files1024 4096 
files

..

This can be changed by setting a decent value in
/etc/libvirt/qemu.conf for max_files.

Regards
Christian



On 27 May 2015, at 16:23, Jens-Christian Fischer
jens-christian.fisc...@switch.ch wrote:


George,

I will let Christian provide you the details. As far as I know, it 
was enough to just do a ‘ls’ on all of the attached drives.


we are using Qemu 2.0:

$ dpkg -l | grep qemu
ii  ipxe-qemu   
1.0.0+git-2013.c3d1e78-2ubuntu1   all  PXE boot firmware - 
ROM images for qemu
ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11   
all  QEMU keyboard maps
ii  qemu-system 2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries
ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (arm)
ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (common 
files)
ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (mips)
ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries 
(miscelaneous)
ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (ppc)
ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (sparc)
ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11   
amd64QEMU full system emulation binaries (x86)
ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11   
amd64QEMU utilities


cheers
jc

--
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/stories

On 26.05.2015, at 19:12, Georgios Dimitrakakis 
gior...@acmac.uoc.gr wrote:



Jens-Christian,

how did you test that? Did you just tried to write to them 
simultaneously? Any other tests that one can perform to verify that?


In our installation we have a VM with 30 RBD volumes mounted which 
are all exported via NFS to other VMs.
No one has complaint for the moment but the load/usage is very 
minimal.
If this problem really exists then very soon that the trial phase 
will be over we will have millions of complaints :-(


What version of QEMU are you using? We are using the one provided 
by Ceph in qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm


Best regards,

George


I think we (i.e. Christian) found the problem:

We created a test VM with 9 mounted RBD volumes (no NFS server). 
As
soon as he hit all disks, we started to experience these 120 
second

timeouts. We realized that the QEMU process on the hypervisor is
opening a TCP connection to every OSD for every mounted volume -
exceeding the 1024 FD limit.

So no deep scrubbing etc, but simply to many connections…

cheers
jc

--
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch [3]
http://www.switch.ch

http://www.switch.ch/stories

On 25.05.2015, at 06:02, Christian Balzer  wrote:


Hello,

lets compare your case with John-Paul's.

Different OS and Ceph versions (thus we can assume different NFS
versions
as well).
The only common thing is that both of you added OSDs and are 
likely

suffering from delays stemming from Ceph re-balancing or
deep-scrubbing.

Ceph logs will only pipe up when things have been blocked for 
more

than 30
seconds, NFS might take offense to lower values (or the 
accumulation

of
several distributed delays).

You added 23 OSDs, tell us more about your cluster, HW, network.
Were these added to the existing 16 nodes, are these on new 
storage

nodes
(so could there be something different with those nodes?), how 
busy

Re: [ceph-users] NFS interaction with RBD

2015-05-29 Thread Georgios Dimitrakakis

All,

I 've tried to recreate the issue without success!

My configuration is the following:

OS (Hypervisor + VM): CentOS 6.6 (2.6.32-504.1.3.el6.x86_64)
QEMU: qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64
Ceph: ceph version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), 
20x4TB OSDs equally distributed on two disk nodes, 3xMonitors



OpenStack Cinder has been configured to provide RBD Volumes from Ceph.

I have created 10x 500GB Volumes which were then all attached at a 
single Virtual Machine.


All volumes were formatted two times for comparison reasons, one using 
mkfs.xfs and one using mkfs.ext4.
I did try to issue the commands all at the same time (or as possible to 
that).


In both tests I didn't notice any interruption. It may took longer than 
just doing one at a time but the system was continuously up and 
everything was responding without the problem.


At the time of these processes the open connections were 100 with one 
of the OSD node and 111 with the other one.


So I guess I am not experiencing the issue due to the low number of 
OSDs I am having. Is my assumption correct?



Best regards,

George




Thanks a million for the feedback Christian!

I 've tried to recreate the issue with 10RBD Volumes mounted on a
single server without success!

I 've issued the mkfs.xfs command simultaneously (or at least as
fast I could do it in different terminals) without noticing any
problems. Can you please tell me what was the size of each one of the
RBD Volumes cause I have a feeling that mine were two small, and if 
so

I have to test it on our bigger cluster.

I 've also thought that besides QEMU version it might also be
important the underlying OS, so what was your testbed?


All the best,

George


Hi George

In order to experience the error it was enough to simply run 
mkfs.xfs

on all the volumes.


In the meantime it became clear what the problem was:

 ~ ; cat /proc/183016/limits
...
Max open files1024 4096 
files

..

This can be changed by setting a decent value in
/etc/libvirt/qemu.conf for max_files.

Regards
Christian



On 27 May 2015, at 16:23, Jens-Christian Fischer
jens-christian.fisc...@switch.ch wrote:


George,

I will let Christian provide you the details. As far as I know, it 
was enough to just do a ‘ls’ on all of the attached drives.


we are using Qemu 2.0:

$ dpkg -l | grep qemu
ii  ipxe-qemu   
1.0.0+git-2013.c3d1e78-2ubuntu1   all  PXE boot firmware 
- ROM images for qemu
ii  qemu-keymaps2.0.0+dfsg-2ubuntu1.11  
all  QEMU keyboard maps
ii  qemu-system 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries
ii  qemu-system-arm 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (arm)
ii  qemu-system-common  2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (common files)
ii  qemu-system-mips2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (mips)
ii  qemu-system-misc2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (miscelaneous)
ii  qemu-system-ppc 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (ppc)
ii  qemu-system-sparc   2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (sparc)
ii  qemu-system-x86 2.0.0+dfsg-2ubuntu1.11  
amd64QEMU full system emulation binaries (x86)
ii  qemu-utils  2.0.0+dfsg-2ubuntu1.11  
amd64QEMU utilities


cheers
jc

--
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch
http://www.switch.ch

http://www.switch.ch/stories

On 26.05.2015, at 19:12, Georgios Dimitrakakis 
gior...@acmac.uoc.gr wrote:



Jens-Christian,

how did you test that? Did you just tried to write to them 
simultaneously? Any other tests that one can perform to verify that?


In our installation we have a VM with 30 RBD volumes mounted which 
are all exported via NFS to other VMs.
No one has complaint for the moment but the load/usage is very 
minimal.
If this problem really exists then very soon that the trial phase 
will be over we will have millions of complaints :-(


What version of QEMU are you using? We are using the one provided 
by Ceph in qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm


Best regards,

George


I think we (i.e. Christian) found the problem:

We created a test VM with 9 mounted RBD volumes (no NFS server). 
As
soon as he hit all disks, we started to experience these 120 
second

timeouts. We realized that the QEMU process on the hypervisor is
opening a TCP connection to every OSD for every mounted volume -
exceeding the 1024 FD

Re: [ceph-users] NFS interaction with RBD

2015-05-26 Thread Georgios Dimitrakakis

Jens-Christian,

how did you test that? Did you just tried to write to them 
simultaneously? Any other tests that one can perform to verify that?


In our installation we have a VM with 30 RBD volumes mounted which are 
all exported via NFS to other VMs.

No one has complaint for the moment but the load/usage is very minimal.
If this problem really exists then very soon that the trial phase will 
be over we will have millions of complaints :-(


What version of QEMU are you using? We are using the one provided by 
Ceph in qemu-kvm-0.12.1.2-2.415.el6.3ceph.x86_64.rpm


Best regards,

George


I think we (i.e. Christian) found the problem:

We created a test VM with 9 mounted RBD volumes (no NFS server). As
soon as he hit all disks, we started to experience these 120 second
timeouts. We realized that the QEMU process on the hypervisor is
opening a TCP connection to every OSD for every mounted volume -
exceeding the 1024 FD limit.

So no deep scrubbing etc, but simply to many connections…

cheers
jc

 --
SWITCH
Jens-Christian Fischer, Peta Solutions
Werdstrasse 2, P.O. Box, 8021 Zurich, Switzerland
phone +41 44 268 15 15, direct +41 44 268 15 71
jens-christian.fisc...@switch.ch [3]
http://www.switch.ch

http://www.switch.ch/stories

On 25.05.2015, at 06:02, Christian Balzer  wrote:


Hello,

lets compare your case with John-Paul's.

Different OS and Ceph versions (thus we can assume different NFS
versions
as well).
The only common thing is that both of you added OSDs and are likely
suffering from delays stemming from Ceph re-balancing or
deep-scrubbing.

Ceph logs will only pipe up when things have been blocked for more
than 30
seconds, NFS might take offense to lower values (or the accumulation
of
several distributed delays).

You added 23 OSDs, tell us more about your cluster, HW, network.
Were these added to the existing 16 nodes, are these on new storage
nodes
(so could there be something different with those nodes?), how busy
is your
network, CPU.
Running something like collectd to gather all ceph perf data and
other
data from the storage nodes and then feeding it to graphite (or
similar)
can be VERY helpful to identify if something is going wrong and what
it is
in particular.
Otherwise run atop on your storage nodes to identify if CPU,
network,
specific HDDs/OSDs are bottlenecks.

Deep scrubbing can be _very_ taxing, do your problems persist if
inject
into your running cluster an osd_scrub_sleep value of 0.5 (lower
that
until it hurts again) or if you turn of deep scrubs altogether for
the
moment?

Christian

On Sat, 23 May 2015 23:28:32 +0200 Jens-Christian Fischer wrote:


We see something very similar on our Ceph cluster, starting as of
today.

We use a 16 node, 102 OSD Ceph installation as the basis for an
Icehouse
OpenStack cluster (we applied the RBD patches for live migration
etc)

On this cluster we have a big ownCloud installation (Sync  Share)
that
stores its files on three NFS servers, each mounting 6 2TB RBD
volumes
and exposing them to around 10 web server VMs (we originally
started
with one NFS server with a 100TB volume, but that has become
unwieldy).
All of the servers (hypervisors, ceph storage nodes and VMs) are
using
Ubuntu 14.04

Yesterday evening we added 23 ODSs to the cluster bringing it up
to 125
OSDs (because we had 4 OSDs that were nearing the 90% full mark).
The
rebalancing process ended this morning (after around 12 hours) The
cluster has been clean since then:

cluster b1f3f4c8-x
health HEALTH_OK
monmap e2: 3 mons at





{zhdk0009=[:::1009]:6789/0,zhdk0013=[:::1013]:6789/0,zhdk0025=[:::1025]:6789/0},

election epoch 612, quorum 0,1,2 zhdk0009,zhdk0013,zhdk0025 osdmap
e43476: 125 osds: 125 up, 125 in pgmap v18928606: 3336 pgs, 17
pools,
82447 GB data, 22585 kobjects 266 TB used, 187 TB / 454 TB avail
3319
active+clean 17 active+clean+scrubbing+deep
client io 8186 kB/s rd, 7747 kB/s wr, 2288 op/s

At midnight, we run a script that creates an RBD snapshot of all
RBD
volumes that are attached to the NFS servers (for backup
purposes).
Looking at our monitoring, around that time, one of the NFS
servers
became unresponsive and took down the complete ownCloud
installation
(load on the web server was  200 and they had lost some of the
NFS
mounts)

Rebooting the NFS server solved that problem, but the NFS kernel
server
kept crashing all day long after having run between 10 to 90
minutes.

We initially suspected a corrupt rbd volume (as it seemed that we
could
trigger the kernel crash by just “ls -l” one of the volumes,
but
subsequent “xfs_repair -n” checks on those RBD volumes showed
no
problems.

We migrated the NFS server off of its hypervisor, suspecting a
problem
with RBD kernel modules, rebooted the hypervisor but the problem
persisted (both on the new hypervisor, and on the old one when we
migrated it back)

We changed the /etc/default/nfs-kernel-server to start up 256
servers
(even though the defaults had been working fine for over a year)

Only 

Re: [ceph-users] OSD startup causing slow requests - one tip from me

2015-07-31 Thread Georgios Dimitrakakis

Jan,

this is very handy to know! Thanks for sharing with us!

People, do you believe that it would be nice to have a place where we 
can gather either good practices or problem resolutions or tips from the 
community? We could have a voting system and those with the most votes 
(or above a threshold) could appear there.


Regards,

George


I know a few other people here were battling with the occasional
issue of OSD being extremely slow when starting.

I personally run OSDs mixed with KVM guests on the same nodes, and
was baffled by this issue occuring mostly on the most idle (empty)
machines.
Thought it was some kind of race condition where OSD started too fast
and disks couldn’t catch up, was investigating latency of CPUs and
cards on a mostly idle hardware etc. - with no improvement.

But in the end, most of my issues were caused by page cache using too
much memory. This doesn’t cause any problems when the OSDs have their
memory allocated and are running, but when the OSD is (re)started, OS
struggles to allocate contiguous blocks of memory for it and its
buffers.
This could also be why I’m seeing such an improvement with my NUMA
pinning script - cleaning memory on one node is probably easier and
doesn’t block allocations on other nodes.

How can you tell if this is your case? When restarting an OSD that
has this issue, look for CPU usage of “kswapd” processes. If it is 0
then you have this issue and would benefit from setting this:

for i in $(mount |grep ceph/osd |cut -d' ' -f1 |cut -d'/' -f3 |tr
-d '[0-9]') ; do echo 1 /sys/block/$i/bdi/max_ratio ; done
(another option is echo 1  drop_caches before starting the OSD, but
that’s a bit brutal)

What this does is it limits the pagecache size for each block device
to 1% of physical memory. I’d like to limit it even further but it
doesn’t understand “0.3”...

Let me know if it helps, I’ve not been able to test if this cures the
problem completely, but there was no regression after setting it.

Jan

P.S. This is for RHEL 6 / CentOS 6 ancient 2.6.32 kernel, newer
kernels have tunables to limit the overall pagecache size. You can
also set the limits in cgroups but I’m afraid that won’t help in this
case as you can only set the whole memory footprint limit where it
will battle for allocations anyway.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Zenoss Integration

2015-09-30 Thread Georgios Dimitrakakis


All,

I was wondering if anyone has integrated his CEPH installation with 
Zenoss monitoring software and is willing to share his knowledge.


Best regards,

George
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-09 Thread Georgios Dimitrakakis


Hello!

Brad,

is that possible from the default logging or verbose one is needed??

I 've managed to get the UUID of the deleted volume from OpenStack but 
don't really know how to get the offsets and OSD maps since "rbd info" 
doesn't provide any information for that volume.


Is it possible to somehow get them from leveldb?

Best,

G.


On Tue, Aug 9, 2016 at 7:39 AM, George Mihaiescu
<lmihaie...@gmail.com> wrote:
Look in the cinder db, the volumes table to find the Uuid of the 
deleted volume.


You could also look through the logs at the time of the delete and I
suspect you should
be able to see how the rbd image was prefixed/named at the time of
the delete.

HTH,
Brad



If you go through yours OSDs and look for the directories for PG 
index 20, you might find some fragments from the deleted volume, but 
it's a long shot...


On Aug 8, 2016, at 4:39 PM, Georgios Dimitrakakis 
<gior...@acmac.uoc.gr> wrote:


Dear David (and all),

the data are considered very critical therefore all this attempt to 
recover them.


Although the cluster hasn't been fully stopped all users actions 
have. I mean services are running but users are not able to 
read/write/delete.


The deleted image was the exact same size of the example (500GB) 
but it wasn't the only one deleted today. Our user was trying to do a 
"massive" cleanup by deleting 11 volumes and unfortunately one of 
them was very important.


Let's assume that I "dd" all the drives what further actions should 
I do to recover the files? Could you please elaborate a bit more on 
the phrase "If you've never deleted any other rbd images and assuming 
you can recover data with names, you may be able to find the rbd 
objects"??


Do you mean that if I know the file names I can go through and 
check for them? How?
Do I have to know *all* file names or by searching for a few of 
them I can find all data that exist?


Thanks a lot for taking the time to answer my questions!

All the best,

G.

I dont think theres a way of getting the prefix from the cluster 
at

this point.

If the deleted image was a similar size to the example youve 
given,

you will likely have had objects on every OSD. If this data is
absolutely critical you need to stop your cluster immediately or 
make

copies of all the drives with something like dd. If youve never
deleted any other rbd images and assuming you can recover data 
with

names, you may be able to find the rbd objects.

On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:


Hi,

On 08.08.2016 10:50, Georgios Dimitrakakis wrote:


Hi,


On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

Dear all,

I would like your help with an emergency issue but first
let me describe our environment.

Our environment consists of 2OSD nodes with 10x 2TB HDDs
each and 3MON nodes (2 of them are the OSD nodes as well)
all with ceph version 0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

This environment provides RBD volumes to an OpenStack
Icehouse installation.

Although not a state of the art environment is working
well and within our expectations.

The issue now is that one of our users accidentally
deleted one of the volumes without keeping its data first!

Is there any way (since the data are considered critical
and very important) to recover them from CEPH?


Short answer: no

Long answer: no, but

Consider the way Ceph stores data... each RBD is striped
into chunks
(RADOS objects with 4MB size by default); the chunks are
distributed
among the OSDs with the configured number of replicates
(probably two
in your case since you use 2 OSD hosts). RBD uses thin
provisioning,
so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD,
you need to
recover all individual chunks. Whether this is possible
depends on
your filesystem and whether the space of a former chunk is
already
assigned to other RADOS objects. The RADOS object names are
composed
of the RBD name and the offset position of the chunk, so if
an
undelete mechanism exists for the OSDs filesystem, you have
to be
able to recover file by their filename, otherwise you might
end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never
allocated
before).

Given the fact that
- you probably use XFS on the OSDs since it is the
preferred
filesystem for OSDs (there is RDR-XFS, but Ive never had to
use it)
- you would need to stop the complete ceph cluster
(recovery tools do
not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted
and thus
parts of its former space might already have been
overwritten
(replication might help you here, since there are two OSDs
to try)
- XFS undelete does not work well on fragmented files (and
OSDs tend
to introduce fragmentation...)

the answer is no, since it might not be feasible and the
chance of

Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-13 Thread Georgios Dimitrakakis



Op 13 aug. 2016 om 03:19 heeft Bill Sharer  het volgende geschreven:


If all the system disk does is handle the o/s (ie osd journals are
on dedicated or osd drives as well), no problem. Just rebuild the
system and copy the ceph.conf back in when you re-install ceph.Â
Keep a spare copy of your original fstab to keep your osd filesystem
mounts straight.


With systems deployed with ceph-disk/ceph-deploy you no longer need a
fstab. Udev handles it.


Just keep in mind that you are down 11 osds while that system drive
gets rebuilt though. It's safer to do 10 osds and then have a
mirror set for the system disk.


In the years that I run Ceph I rarely see OS disks fail. Why bother?
Ceph is designed for failure.

I would not sacrifice a OSD slot for a OS disk. Also, let's say a
additional OS disk is €100.

If you put that disk in 20 machines that's €2.000. For that money
you can even buy a additional chassis.

No, I would run on a single OS disk. It fails? Let it fail. 
Re-install

and you're good again.

Ceph makes sure the data is safe.



Wido,

can you elaborate a little bit more on this? How does CEPH achieve 
that? Is it by redundant MONs?


To my understanding the OSD mapping is needed to have the cluster back. 
In our setup (I assume in others as well) that is stored in the OS 
disk.Furthermore, our MONs are running on the same host as OSDs. So if 
the OS disk fails not only we loose the OSD host but we also loose the 
MON node. Is there another way to be protected by such a failure besides 
additional MONs?


We recently had a problem where a user accidentally deleted a volume. 
Of course this has nothing to do with OS disk failure itself but we 've 
been in the loop to start looking for other possible failures on our 
system that could jeopardize data and this thread got my attention.



Warmest regards,

George



Wido

 Bill Sharer

 On 08/12/2016 03:33 PM, Ronny Aasen wrote:


On 12.08.2016 13:41, Félix Barbeira wrote:


Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At
this moment we have ~10 servers DELL R730xd with 12x4TB SATA
disks. The official ceph docs says:

"We recommend using a dedicated drive for the operating system and
software, and one drive for each Ceph OSD Daemon you run on the
host."

I could use for example 1 disk for the OS and 11 for OSD data. In
the operating system I would run 11 daemons to control the OSDs.
But...what happen to the cluster if the disk with the OS fails??
maybe the cluster thinks that 11 OSD failed and try to replicate
all that data over the cluster...that sounds no good.

Should I use 2 disks for the OS making a RAID1? in this case I'm
"wasting" 8TB only for ~10GB that the OS needs.

In all the docs that i've been reading says ceph has no unique
single point of failure, so I think that this scenario must have a
optimal solution, maybe somebody could help me.

Thanks in advance.

--

Félix Barbeira.

if you do not have dedicated slots on the back for OS disks, then i
would recomend using SATADOM flash modules directly into a SATA port
internal in the machine. Saves you 2 slots for osd's and they are
quite reliable. you could even use 2 sd cards if your machine have
the internal SD slot




http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf

[1]

kind regards
Ronny Aasen

___
ceph-users mailing list
ceph-users@lists.ceph.com [2]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]

___
ceph-users mailing list
ceph-u

ph.com
http://li


i/ceph-users-ceph.com



Links:
--
[1]

http://www.dell.com/downloads/global/products/pedge/en/poweredge-idsdm-whitepaper-en.pdf
[2] mailto:ceph-users@lists.ceph.com
[3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[4] mailto:bsha...@sharerland.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-08 Thread Georgios Dimitrakakis

Hi,


On 08.08.2016 10:50, Georgios Dimitrakakis wrote:

Hi,


On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

Dear all,

I would like your help with an emergency issue but first let me 
describe our environment.


Our environment consists of 2OSD nodes with 10x 2TB HDDs each and 
3MON nodes (2 of them are the OSD nodes as well) all with ceph 
version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)


This environment provides RBD volumes to an OpenStack Icehouse 
installation.


Although not a state of the art environment is working well and 
within our expectations.


The issue now is that one of our users accidentally deleted one 
of the volumes without keeping its data first!


Is there any way (since the data are considered critical and very 
important) to recover them from CEPH?


Short answer: no

Long answer: no, but

Consider the way Ceph stores data... each RBD is striped into 
chunks
(RADOS objects with 4MB size by default); the chunks are 
distributed
among the OSDs with the configured number of replicates (probably 
two
in your case since you use 2 OSD hosts). RBD uses thin 
provisioning,

so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD, you need 
to

recover all individual chunks. Whether this is possible depends on
your filesystem and whether the space of a former chunk is already
assigned to other RADOS objects. The RADOS object names are 
composed

of the RBD name and the offset position of the chunk, so if an
undelete mechanism exists for the OSDs' filesystem, you have to be
able to recover file by their filename, otherwise you might end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never 
allocated

before).

Given the fact that
- you probably use XFS on the OSDs since it is the preferred
filesystem for OSDs (there is RDR-XFS, but I've never had to use 
it)
- you would need to stop the complete ceph cluster (recovery tools 
do

not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted and thus
parts of its former space might already have been overwritten
(replication might help you here, since there are two OSDs to try)
- XFS undelete does not work well on fragmented files (and OSDs 
tend

to introduce fragmentation...)

the answer is no, since it might not be feasible and the chance of
success are way too low.

If you want to spend time on it I would propose the stop the ceph
cluster as soon as possible, create copies of all involved OSDs, 
start

the cluster again and attempt the recovery on the copies.

Regards,
Burkhard


Hi! Thanks for the info...I understand that this is a very 
difficult and probably not feasible task but in case I need to try a 
recovery what other info should I need? Can I somehow find out on 
which OSDs the specific data were stored and minimize my search 
there?

Any ideas on how should I proceed?

First of all you need to know the exact object names for the RADOS
objects. As mentioned before, the name is composed of the RBD name 
and

an offset.

In case of OpenStack, there are three different patterns for RBD 
names:


, e.g. 50f2a0bd-15b1-4dbb-8d1f-fc43ce535f13
for glance images,
, e.g. 9aec1f45-9053-461e-b176-c65c25a48794_disk for nova 
images

, e.g. volume-0ca52f58-7e75-4b21-8b0f-39cbcd431c42 for
cinder volumes

(not considering snapshots etc, which might use different patterns)

The RBD chunks are created using a certain prefix (using examples
from our openstack setup):

# rbd -p os-images info 8fa3d9eb-91ed-4c60-9550-a62f34aed014
rbd image '8fa3d9eb-91ed-4c60-9550-a62f34aed014':
size 446 MB in 56 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.30e57d54dea573
format: 2
features: layering, striping
flags:
stripe unit: 8192 kB
stripe count: 1

# rados -p os-images ls | grep rbd_data.30e57d54dea573
rbd_data.30e57d54dea573.0015
rbd_data.30e57d54dea573.0008
rbd_data.30e57d54dea573.000a
rbd_data.30e57d54dea573.002d
rbd_data.30e57d54dea573.0032

I don't know how whether the prefix is derived from some other
information, but the recover the RBD you definitely need it.

_If_ you are able to recover the prefix, you can use 'ceph osd map'
to find the OSDs for each chunk:

# ceph osd map os-images rbd_data.30e57d54dea573.001a
osdmap e418590 pool 'os-images' (38) object
'rbd_data.30e57d54dea573.001a' -> pg 38.d5d81d65 (38.65)
-> up ([45,17,108], p45) acting ([45,17,108], p45)

With 20 OSDs in your case you will likely have to process all of 
them

if the RBD has a size of several GBs.

Regards,
Burkhard



Is it possible to get the prefix if the RBD has been deleted
already?? Is this info somewhere stored? Can I retrieve it with
another way besides "rbd info"? Because when I try to get it using

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-08 Thread Georgios Dimitrakakis

Dear David (and all),

the data are considered very critical therefore all this attempt to 
recover them.


Although the cluster hasn't been fully stopped all users actions have. 
I mean services are running but users are not able to read/write/delete.


The deleted image was the exact same size of the example (500GB) but it 
wasn't the only one deleted today. Our user was trying to do a "massive" 
cleanup by deleting 11 volumes and unfortunately one of them was very 
important.


Let's assume that I "dd" all the drives what further actions should I 
do to recover the files? Could you please elaborate a bit more on the 
phrase "If you've never deleted any other rbd images and assuming you 
can recover data with names, you may be able to find the rbd objects"??


Do you mean that if I know the file names I can go through and check 
for them? How?
Do I have to know *all* file names or by searching for a few of them I 
can find all data that exist?


Thanks a lot for taking the time to answer my questions!

All the best,

G.


I dont think theres a way of getting the prefix from the cluster at
this point.

If the deleted image was a similar size to the example youve given,
you will likely have had objects on every OSD. If this data is
absolutely critical you need to stop your cluster immediately or make
copies of all the drives with something like dd. If youve never
deleted any other rbd images and assuming you can recover data with
names, you may be able to find the rbd objects.

On Mon, Aug 8, 2016 at 7:28 PM, Georgios Dimitrakakis  wrote:


Hi,

On 08.08.2016 10:50, Georgios Dimitrakakis wrote:


Hi,

On 08.08.2016 09:58, Georgios Dimitrakakis wrote:


Dear all,

I would like your help with an emergency issue but first
let me describe our environment.

Our environment consists of 2OSD nodes with 10x 2TB HDDs
each and 3MON nodes (2 of them are the OSD nodes as well)
all with ceph version 0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)

This environment provides RBD volumes to an OpenStack
Icehouse installation.

Although not a state of the art environment is working
well and within our expectations.

The issue now is that one of our users accidentally
deleted one of the volumes without keeping its data first!

Is there any way (since the data are considered critical
and very important) to recover them from CEPH?


Short answer: no

Long answer: no, but

Consider the way Ceph stores data... each RBD is striped
into chunks
(RADOS objects with 4MB size by default); the chunks are
distributed
among the OSDs with the configured number of replicates
(probably two
in your case since you use 2 OSD hosts). RBD uses thin
provisioning,
so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD,
you need to
recover all individual chunks. Whether this is possible
depends on
your filesystem and whether the space of a former chunk is
already
assigned to other RADOS objects. The RADOS object names are
composed
of the RBD name and the offset position of the chunk, so if
an
undelete mechanism exists for the OSDs filesystem, you have
to be
able to recover file by their filename, otherwise you might
end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never
allocated
before).

Given the fact that
- you probably use XFS on the OSDs since it is the
preferred
filesystem for OSDs (there is RDR-XFS, but Ive never had to
use it)
- you would need to stop the complete ceph cluster
(recovery tools do
not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted
and thus
parts of its former space might already have been
overwritten
(replication might help you here, since there are two OSDs
to try)
- XFS undelete does not work well on fragmented files (and
OSDs tend
to introduce fragmentation...)

the answer is no, since it might not be feasible and the
chance of
success are way too low.

If you want to spend time on it I would propose the stop
the ceph
cluster as soon as possible, create copies of all involved
OSDs, start
the cluster again and attempt the recovery on the copies.

Regards,
Burkhard


Hi! Thanks for the info...I understand that this is a very
difficult and probably not feasible task but in case I need to
try a recovery what other info should I need? Can I somehow
find out on which OSDs the specific data were stored and
minimize my search there?
Any ideas on how should I proceed?

First of all you need to know the exact object names for the
RADOS
objects. As mentioned before, the name is composed of the RBD
name and
an offset.

In case of OpenStack, there are three different patterns for
RBD names:

, e.g. 50f2a0bd-15b1-4dbb-8d1f-fc43ce535f13
for glance images,
, e.g. 9aec1f45-9053-461e-b176-c65c25a48794_disk for nova
images
, e.g. volume-0ca52f58-7e75-4b21-8b0f-39cbcd431c42 for
ci

[ceph-users] Recover Data from Deleted RBD Volume

2016-08-08 Thread Georgios Dimitrakakis

Dear all,

I would like your help with an emergency issue but first let me 
describe our environment.


Our environment consists of 2OSD nodes with 10x 2TB HDDs each and 3MON 
nodes (2 of them are the OSD nodes as well) all with ceph version 0.80.9 
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)


This environment provides RBD volumes to an OpenStack Icehouse 
installation.


Although not a state of the art environment is working well and within 
our expectations.


The issue now is that one of our users accidentally deleted one of the 
volumes without keeping its data first!


Is there any way (since the data are considered critical and very 
important) to recover them from CEPH?


Looking forward for your answers!


Best regards,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-08 Thread Georgios Dimitrakakis

Hi,


On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

Dear all,

I would like your help with an emergency issue but first let me 
describe our environment.


Our environment consists of 2OSD nodes with 10x 2TB HDDs each and 
3MON nodes (2 of them are the OSD nodes as well) all with ceph version 
0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)


This environment provides RBD volumes to an OpenStack Icehouse 
installation.


Although not a state of the art environment is working well and 
within our expectations.


The issue now is that one of our users accidentally deleted one of 
the volumes without keeping its data first!


Is there any way (since the data are considered critical and very 
important) to recover them from CEPH?


Short answer: no

Long answer: no, but

Consider the way Ceph stores data... each RBD is striped into chunks
(RADOS objects with 4MB size by default); the chunks are distributed
among the OSDs with the configured number of replicates (probably two
in your case since you use 2 OSD hosts). RBD uses thin provisioning,
so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD, you need to
recover all individual chunks. Whether this is possible depends on
your filesystem and whether the space of a former chunk is already
assigned to other RADOS objects. The RADOS object names are composed
of the RBD name and the offset position of the chunk, so if an
undelete mechanism exists for the OSDs' filesystem, you have to be
able to recover file by their filename, otherwise you might end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never allocated
before).

Given the fact that
- you probably use XFS on the OSDs since it is the preferred
filesystem for OSDs (there is RDR-XFS, but I've never had to use it)
- you would need to stop the complete ceph cluster (recovery tools do
not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted and thus
parts of its former space might already have been overwritten
(replication might help you here, since there are two OSDs to try)
- XFS undelete does not work well on fragmented files (and OSDs tend
to introduce fragmentation...)

the answer is no, since it might not be feasible and the chance of
success are way too low.

If you want to spend time on it I would propose the stop the ceph
cluster as soon as possible, create copies of all involved OSDs, 
start

the cluster again and attempt the recovery on the copies.

Regards,
Burkhard


Hi! Thanks for the info...I understand that this is a very difficult 
and probably not feasible task but in case I need to try a recovery what 
other info should I need? Can I somehow find out on which OSDs the 
specific data were stored and minimize my search there?

Any ideas on how should I proceed?


Best,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-08 Thread Georgios Dimitrakakis

Hi,


On 08.08.2016 10:50, Georgios Dimitrakakis wrote:

Hi,


On 08.08.2016 09:58, Georgios Dimitrakakis wrote:

Dear all,

I would like your help with an emergency issue but first let me 
describe our environment.


Our environment consists of 2OSD nodes with 10x 2TB HDDs each and 
3MON nodes (2 of them are the OSD nodes as well) all with ceph 
version 0.80.9 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047)


This environment provides RBD volumes to an OpenStack Icehouse 
installation.


Although not a state of the art environment is working well and 
within our expectations.


The issue now is that one of our users accidentally deleted one of 
the volumes without keeping its data first!


Is there any way (since the data are considered critical and very 
important) to recover them from CEPH?


Short answer: no

Long answer: no, but

Consider the way Ceph stores data... each RBD is striped into 
chunks
(RADOS objects with 4MB size by default); the chunks are 
distributed
among the OSDs with the configured number of replicates (probably 
two
in your case since you use 2 OSD hosts). RBD uses thin 
provisioning,

so chunks are allocated upon first write access.
If an RBD is deleted all of its chunks are deleted on the
corresponding OSDs. If you want to recover a deleted RBD, you need 
to

recover all individual chunks. Whether this is possible depends on
your filesystem and whether the space of a former chunk is already
assigned to other RADOS objects. The RADOS object names are 
composed

of the RBD name and the offset position of the chunk, so if an
undelete mechanism exists for the OSDs' filesystem, you have to be
able to recover file by their filename, otherwise you might end up
mixing the content of various deleted RBDs. Due to the thin
provisioning there might be some chunks missing (e.g. never 
allocated

before).

Given the fact that
- you probably use XFS on the OSDs since it is the preferred
filesystem for OSDs (there is RDR-XFS, but I've never had to use 
it)
- you would need to stop the complete ceph cluster (recovery tools 
do

not work on mounted filesystems)
- your cluster has been in use after the RBD was deleted and thus
parts of its former space might already have been overwritten
(replication might help you here, since there are two OSDs to try)
- XFS undelete does not work well on fragmented files (and OSDs 
tend

to introduce fragmentation...)

the answer is no, since it might not be feasible and the chance of
success are way too low.

If you want to spend time on it I would propose the stop the ceph
cluster as soon as possible, create copies of all involved OSDs, 
start

the cluster again and attempt the recovery on the copies.

Regards,
Burkhard


Hi! Thanks for the info...I understand that this is a very difficult 
and probably not feasible task but in case I need to try a recovery 
what other info should I need? Can I somehow find out on which OSDs 
the specific data were stored and minimize my search there?

Any ideas on how should I proceed?

First of all you need to know the exact object names for the RADOS
objects. As mentioned before, the name is composed of the RBD name 
and

an offset.

In case of OpenStack, there are three different patterns for RBD 
names:


, e.g. 50f2a0bd-15b1-4dbb-8d1f-fc43ce535f13
for glance images,
, e.g. 9aec1f45-9053-461e-b176-c65c25a48794_disk for nova 
images

, e.g. volume-0ca52f58-7e75-4b21-8b0f-39cbcd431c42 for
cinder volumes

(not considering snapshots etc, which might use different patterns)

The RBD chunks are created using a certain prefix (using examples
from our openstack setup):

# rbd -p os-images info 8fa3d9eb-91ed-4c60-9550-a62f34aed014
rbd image '8fa3d9eb-91ed-4c60-9550-a62f34aed014':
size 446 MB in 56 objects
order 23 (8192 kB objects)
block_name_prefix: rbd_data.30e57d54dea573
format: 2
features: layering, striping
flags:
stripe unit: 8192 kB
stripe count: 1

# rados -p os-images ls | grep rbd_data.30e57d54dea573
rbd_data.30e57d54dea573.0015
rbd_data.30e57d54dea573.0008
rbd_data.30e57d54dea573.000a
rbd_data.30e57d54dea573.002d
rbd_data.30e57d54dea573.0032

I don't know how whether the prefix is derived from some other
information, but the recover the RBD you definitely need it.

_If_ you are able to recover the prefix, you can use 'ceph osd map'
to find the OSDs for each chunk:

# ceph osd map os-images rbd_data.30e57d54dea573.001a
osdmap e418590 pool 'os-images' (38) object
'rbd_data.30e57d54dea573.001a' -> pg 38.d5d81d65 (38.65)
-> up ([45,17,108], p45) acting ([45,17,108], p45)

With 20 OSDs in your case you will likely have to process all of them
if the RBD has a size of several GBs.

Regards,
Burkhard



Is it possible to get the prefix if the RBD has been deleted already?? 
Is this info somewhere stored? Can I retrieve it with another way 
besides "rbd info"? Because when I try to get it usi

Re: [ceph-users] Recover Data from Deleted RBD Volume

2016-08-21 Thread Georgios Dimitrakakis


As a closure I would like to thank all people who contributed with 
their knowledge in my problem although the final decision was not to try 
any sort of recovery since the effort required would have been 
tremendous with unambiguous results (to say at least).


Jason, Ilya, Brad, David, George, Burkhard thank you very much for your 
contribution


Kind regards,

G.


On Wed, Aug 10, 2016 at 10:55 AM, Ilya Dryomov  
wrote:

I think Jason meant to write "rbd_id." here.



Whoops -- thanks for the typo correction.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CentOS7 Mounting Problem

2017-03-23 Thread Georgios Dimitrakakis

Hello Ceph community!

I would like some help with a new CEPH installation.

I have install Jewel on CentOS7 and after the reboot my OSDs are not 
mount automatically and as a consequence ceph is not operating 
normally...


What can I do?

Could you please help me solve the problem?


Regards,

G.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


  1   2   >