[ceph-users] v0.61.2 'ceph df' can't use

2013-05-20 Thread Kelvin_Huang
Hi all,
I upgrade Ceph to the v0.62, but I find the 'ceph df' command can't use... that 
show 'unrecognized command' , why ? that need other options ?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Hardware Sizing

2013-05-20 Thread Bjorn Mork
Hi Team,

This is my first post to this community.

I have some basic queries to start with CEPH software...I found that
http://www.supermicro.com.tw/products/system/2U/6027/SSG-6027R-E1R12T.cfmis
being recommend as a start of storage server.

As my target is to start with 12 TB solution (production environment, high
performance) having three copies of my data. I am confused, that

1.   How many servers will be required i.e OSD, MON, MDS (above mentioned
chassis).

2.   Should I separate role to each server? or single server will be good
enough?

3. How many raid-cards in each server will be required?
3.1   I mean separate for read and write can be configured or not? I need
best performance and throughput.

Can anyone suggest? Thanks in advance...

B~Mork
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel panic on rbd map when cluster is out of monitor quorum

2013-05-20 Thread Joe Ryner
Hi,

My kernels are running:
3.8.11-200.fc18.x86_64 and 3.8.9-200.fc18.x86_64

My cephx settings are below

auth cluster required = cephx
auth service required = cephx
auth client required = cephx

I will be working on my test cluster later this week and will try to reproduce 
and will file a bug then.

Joe


- Original Message -
From: Sage Weil s...@inktank.com
To: Joe Ryner jry...@cait.org
Cc: ceph-users@lists.ceph.com
Sent: Friday, May 17, 2013 4:01:35 PM
Subject: Re: [ceph-users] Kernel panic on rbd map when cluster is out of 
monitor quorum

On Fri, 17 May 2013, Joe Ryner wrote:
 Hi All,
 
 I have had an issue recently while working on my ceph clusters.  The 
 following issue seems to be true on bobtail and cuttlefish.  I have two 
 production clusters in two different data centers and a test cluster.  We are 
 using ceph to run virtual machines.  I use rbd as block devices for sanlock.
 
 I am running Fedora 18.
 
 I have been moving monitors around and in the process I got the cluster 
 out of quorum, so ceph stopped responding.  During this time I decided 
 to reboot a ceph node that performs an rbd map during startup.  The 
 system boots ok but the service script that is performing the rbd map 
 doesn't finish and eventually the system will OOPS and then finally 
 panic.  I was able to disable the rbd map during boot and finally got 
 the cluster back in quorum and everything settled down nicely.

What kernel version?  Are you using cephx authentication?  If you could 
open a bug at tracker.ceph.com that would be most helpful!

 Question, has anyone seen this behavior of crashing/panic?  I have seen this 
 happen on both of my production clusters.
 Secondly, the ceph command hangs when the cluster is out of quorum, is there 
 a timeout available?

Not currently.  You can do this yourself with 'timeout 120 ...' with any 
recent coreutils.

Thanks-
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW

2013-05-20 Thread Gandalf Corvotempesta
Hi,
i'm receiving an EntityTooLarge error when trying to upload an object of 100MB

I've already set LimitRequestBody to 0 in apache. Anyting else to check ?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] planning ceph from admin perspective. Ceph calculator?

2013-05-20 Thread Ugis
Hi,

Seems more and more that ceph is out in the wild - people are using it for
production, development speeds up etc.

Still how can one pick the configuration suiting his needs?

For example, we wish to replace older SAN IBM/HP storages with ceph, we
know iops, bandwidth capabilities of those, but there is no ceph
calculator to get estimations how many OSDs/hosts we need to match
existing storage performance parameters.

We have done some tests internally with up to 7 OSDs(2-3 hosts), but
increasing OSD count in such small amounts does not  influence ceph
performance considerably/lineary to extrapolate till needed performance.
Have been following performance figures in maillists, Marks performance
tests and read Inktanks provided reference architecture at
http://www.inktank.com/resource/ceph-reference-architecture/ (btw, that doc
mentions other Multi-Rack Object Storage Reference Architecture which I
cannot find, anyone has fount it?). In the end I have come to wild guess
that starting ~24 spinning OSDs, 2-3 hosts should match our needed starting
performance. At this piont it would be helpful to have some estimation tool
or reliable reference that estimation is realistic :).

Sure ceph is dynamic creature, many parameters influence resulting
performance(SSD/spinning HDD, network, filesystems, journals, replication
count etc.) but still when people face question can we do it with ceph?
some metology or tools like ceph calculator could help to estimate needed
HW and consequently come to expected investment needed for that. Admins
need to convince management at times on certain solution, right? :)

I have been thinking for solutions to this informantion gap and I see 2
supplementing solutions:
1)create publically available ceph configurations+performance reference
lists from real life where people can add their cephs and compare. Just
standartized approach for conducting tests must be in place - to compare
apples to apples. For example people could specify their OSD host count,
OSD count, OSD filesystem, OSD server model, CPU, RAM, network,
interfaces(type, speed) ceph version, replica count and the like + provide
standardized performance test results of their ceph, like rados bench, fio
tests with 4K, 4M, random/sequential, read/write.
Other could look for matching working configuration and compare to their
clusters. This should encourage startups with real examples and for
existing ceph users look for possible tuning.

2)develop theoretical ceph calculator or formula where one can specify
needed performance characteristics(iops, bandwidth,size), specify planned
HW parameters(if available) and get estimated ceph configuration(needed
hosts,CPUs,RAM,OSDs,network). This should take into consideration HDD
count, size, smallest iops per HDD, network latency, RAM, replica count,
connection type to ceph(direct via kernel client, userland, via FC/iscsi
proxy etc.) and other influencing parameters. There will allways be palce
for advanced know-how tuning, this would just be for easy estimated
calculations to get started.

Both things seem to naturally land  to http://wiki.ceph.com/ and be hosted
there as current central ceph knowledge base, right? :)

What do you think on chances of implementing both things?

Ugis
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Hardware Sizing

2013-05-20 Thread Mark Nelson

On 05/20/2013 08:01 AM, Bjorn Mork wrote:

Hi Team,

This is my first post to this community.

I have some basic queries to start with CEPH software...I found that
http://www.supermicro.com.tw/products/system/2U/6027/SSG-6027R-E1R12T.cfm is
being recommend as a start of storage server.


This is a reasonable server for a basic Ceph POC using spinning disks 
with no SSD journals.  Since it's using on-board ethernet and RAID, it 
should be relatively inexpensive, but if any of the on-board components 
fail the whole motherboard has to be replaced.  It's a good starting 
point though.




As my target is to start with 12 TB solution (production environment,
high performance) having three copies of my data. I am confused, that

1.   How many servers will be required i.e OSD, MON, MDS (above
mentioned chassis).


For production you should have at least 3 MONs.  You only need an MDS if 
you plan to use CephFS.  We tend to recommend 1 OSD per disk for most 
configurations.




2.   Should I separate role to each server? or single server will be
good enough?


You want each MON on a different server, and for a production deployment 
I really don't like seeing less than 5 servers for OSDs.  You can 
technically run a single mon and all of your OSDs on 1 server, but it's 
not really what Ceph was designed for.




3. How many raid-cards in each server will be required?
3.1   I mean separate for read and write can be configured or not? I
need best performance and throughput.


There's a lot of different ways you can configure ceph servers with 
various trade-offs.  A general rule of thumb is that you want at least 
3-5 servers for OSDs (and preferably more), and for high performance SSD 
journals or at the very least a controller with WB cache and 1 OSD per disk.


You may be interested in some of our performance comparison tests:

http://ceph.com/community/ceph-performance-part-1-disk-controller-write-throughput/
http://ceph.com/community/ceph-performance-part-2-write-throughput-without-ssd-journals/
http://ceph.com/uncategorized/argonaut-vs-bobtail-performance-preview/

Mark



Can anyone suggest? Thanks in advance...

B~Mork


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] scrub error: found clone without head

2013-05-20 Thread Olivier Bonvalet
Great, thanks. I will follow this issue, and add informations if needed.

Le lundi 20 mai 2013 à 17:22 +0300, Dzianis Kahanovich a écrit :
 http://tracker.ceph.com/issues/4937
 
 For me it progressed up to ceph reinstall with repair data from backup (I help
 ceph die, but it was IMHO self-provocation for force reinstall). Now (at least
 to my summer outdoors) I keep v0.62 (3 nodes) with every pool size=3 
 min_size=2
 (was - size=2 min_size=1).
 
 But try to do nothing first and try to install latest version. And keep your
 vote to issue #4937 to force developers.
 
 Olivier Bonvalet пишет:
  Le mardi 07 mai 2013 à 15:51 +0300, Dzianis Kahanovich a écrit :
  I have 4 scrub errors (3 PGs - found clone without head), on one OSD. Not
  repairing. How to repair it exclude re-creating of OSD?
 
  Now it easy to clean+create OSD, but in theory - in case there are 
  multiple
  OSDs - it may cause data lost.
 
  -- 
  WBR, Dzianis Kahanovich AKA Denis Kaganovich, 
  http://mahatma.bspu.unibel.by/
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
  
  
  Hi,
  
  I have same problem : 8 objects (4 PG) with error found clone without
  head. How can I fix that ?
  
  thanks,
  Olivier
  
  
  
 
 
 -- 
 WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/
 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Setting OSD weight

2013-05-20 Thread Alex Bligh
How do I set the weight for OSDs? I have 4 OSDs I want to create
with very low weight (1) so they are never used if any other OSDs
are added subsequently (and would like to avoid placement groups).

These OSDs have been created with default settings using the manual
OSD add procedure as per ceph docs. But (unless I am being stupid
which is quite possible), setting the weight (either to 0.0001 or
to 2) appears to have no effect per a ceph osd dump.

-- 
Alex Bligh



root@kvm:~# ceph osd dump
 
epoch 12
fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
created 2013-05-20 14:58:02.250461
modified 2013-05-20 14:59:54.580601
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 320 pgp_num 320 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 2 up_thru 10 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
osd.1 up   in  weight 1 up_from 4 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
osd.2 up   in  weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a


root@kvm:~# ceph osd crush set 0 2 root=default
set item id 0 name 'osd.0' weight 2 at location {root=default} to crush map
root@kvm:~# ceph osd dump
 
epoch 14
fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
created 2013-05-20 14:58:02.250461
modified 2013-05-20 15:13:21.009317
flags 

pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
pg_num 320 pgp_num 320 last_change 1 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins pg_num 
320 pgp_num 320 last_change 1 owner 0

max_osd 4
osd.0 up   in  weight 1 up_from 2 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
osd.1 up   in  weight 1 up_from 4 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
osd.2 up   in  weight 1 up_from 3 up_thru 13 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
[0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.61.2 'ceph df' can't use

2013-05-20 Thread Sage Weil
On Mon, 20 May 2013, kelvin_hu...@wiwynn.com wrote:
 Hi all,
 I upgrade Ceph to the v0.62, but I find the 'ceph df' command can't use... 
 that show 'unrecognized command' , why ? that need other options ?

Make sure the ceph-mon daemons have been restarted so that they are 
running the new version.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Setting OSD weight

2013-05-20 Thread Sage Weil
Look at 'ceph osd tree'.  The weight value in 'ceph osd dump' output is 
the in/out correction, not the crush weight.

s

On Mon, 20 May 2013, Alex Bligh wrote:

 How do I set the weight for OSDs? I have 4 OSDs I want to create
 with very low weight (1) so they are never used if any other OSDs
 are added subsequently (and would like to avoid placement groups).
 
 These OSDs have been created with default settings using the manual
 OSD add procedure as per ceph docs. But (unless I am being stupid
 which is quite possible), setting the weight (either to 0.0001 or
 to 2) appears to have no effect per a ceph osd dump.
 
 -- 
 Alex Bligh
 
 
 
 root@kvm:~# ceph osd dump
  
 epoch 12
 fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
 created 2013-05-20 14:58:02.250461
 modified 2013-05-20 14:59:54.580601
 flags 
 
 pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0
 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2 up_thru 10 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
 exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
 osd.1 up   in  weight 1 up_from 4 up_thru 11 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
 exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
 osd.2 up   in  weight 1 up_from 3 up_thru 11 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
 exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
 osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
 exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a
 
 
 root@kvm:~# ceph osd crush set 0 2 root=default
 set item id 0 name 'osd.0' weight 2 at location {root=default} to crush map
 root@kvm:~# ceph osd dump
  
 epoch 14
 fsid ed0e2e56-bc17-4ef2-a1db-b030c77a8d45
 created 2013-05-20 14:58:02.250461
 modified 2013-05-20 15:13:21.009317
 flags 
 
 pool 0 'data' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0 crash_replay_interval 45
 pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0
 pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins 
 pg_num 320 pgp_num 320 last_change 1 owner 0
 
 max_osd 4
 osd.0 up   in  weight 1 up_from 2 up_thru 13 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6800/30687 10.161.208.1:6801/30687 10.161.208.1:6803/30687 
 exists,up 9cc2a2cf-e79e-404b-9b49-55c8954b0684
 osd.1 up   in  weight 1 up_from 4 up_thru 13 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6804/30800 10.161.208.1:6806/30800 10.161.208.1:6807/30800 
 exists,up 11628f8d-8234-4329-bf6e-e130d76f18f5
 osd.2 up   in  weight 1 up_from 3 up_thru 13 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6809/30913 10.161.208.1:6810/30913 10.161.208.1:6811/30913 
 exists,up 050c8955-84aa-4025-961a-f9d9fe60a5b0
 osd.3 up   in  weight 1 up_from 5 up_thru 11 down_at 0 last_clean_interval 
 [0,0) 10.161.208.1:6812/31024 10.161.208.1:6813/31024 10.161.208.1:6814/31024 
 exists,up bcd4ad0e-c0e4-4c46-95c2-e68906f8e69a
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Setting OSD weight

2013-05-20 Thread Alex Bligh

On 20 May 2013, at 17:19, Sage Weil wrote:

 Look at 'ceph osd tree'.  The weight value in 'ceph osd dump' output is 
 the in/out correction, not the crush weight.

Doh. Thanks.

Is there a difference between:
  ceph osd crush set 0 2 root=default
and
  ceph osd crush reweight osd.0 2
?

-- 
Alex Bligh




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Growing the cluster.

2013-05-20 Thread Nicolas Fernandez
Hello,
I'm deploying a test cluster on 0.61.2 version between two nodes (OSD/MDS),
and another (MON).
I have a problem making my cluster grow, today i've added an OSD into a
node that was a osd exist. I've made a reweight and add a replica.The
crushmap is up to date but now i'm getting some pgs in stuck unclean.I've
been cheecking tuneables options but that haven't sold the issue, how can i
fix the healthof the cluster?.

My cluster status:

# ceph -s
   health HEALTH_WARN 192 pgs degraded; 177 pgs stuck unclean; recovery
10910/32838 degraded (33.224%); clock skew detected on mon.b
   monmap e1: 3 mons at {a=
192.168.2.144:6789/0,b=192.168.2.194:6789/0,c=192.168.2.145:6789/0},
election epoch 148, quorum 0,1,2 a,b,c
   osdmap e576: 3 osds: 3 up, 3 in
pgmap v17715: 576 pgs: 79 active, 305 active+clean, 98 active+degraded,
94 active+clean+degraded; 1837 MB data, 6778 MB used, 440 GB / 446 GB
avail; 10910/32838 degraded (33.224%)
   mdsmap e136: 1/1/1 up {0=a=up:active}

The replica configuration is:

pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 576 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 192 pgp_num 192 last_change 556 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 1 owner 0

OSD Tree:

#ceph osd tree

# idweighttype nameup/downreweight
-13root default
-33rack unknownrack
-21host ceph01
01osd.0up1
-42host ceph02
11osd.1up1
21osd.2up1

Thanks.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.61.2 'ceph df' can't use

2013-05-20 Thread Joao Eduardo Luis

On 05/20/2013 10:28 AM, kelvin_hu...@wiwynn.com wrote:

Hi all,
I upgrade Ceph to the v0.62, but I find the 'ceph df' command can't use... that 
show 'unrecognized command' , why ? that need other options ?


Have you made sure both the client and the monitors have been upgraded 
accordingly?  Have you restarted your monitors if so?  All of them?


  -Joao

--
Joao Eduardo Luis
Software Engineer | http://inktank.com | http://ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW High Availability

2013-05-20 Thread Igor Laskovy
Hi all,

Well, looks like DragonDisk (http://www.dragondisk.com/) deal with RRDNS
well, it just run with both RGWs ;)

But what actually I need to know now is why RGW not start at boot time with
Initialization timeout, failed to initialize error in logs. It
run successful by hands after that.


On Thu, May 9, 2013 at 7:28 PM, Dimitri Maziuk dmaz...@bmrb.wisc.eduwrote:

 On 05/09/2013 09:57 AM, Tyler Brekke wrote:
  For High availability RGW you would need a load balancer. HA Proxy is
  an example of a load balancer that has been used successfully with
  rados gateway endpoints.

 Strictly speaking for HA you need an HA solution. E.g. heartbeat. Main
 difference between that and load balancing is that one server serves the
 clients until it dies, then another takes over. With load balancing, all
 servers get a share of the requests. It can be configured to do HA: set
 main server's share to 100%, then the backup will get no requests as
 long as the main is up.

 RRDNS is a load balancing solution. Dep. on the implementation it can
 simply return a list of IPs instead of a single IP for the host name,
 then it's up to the client to pick one. A simple stupid client may
 always pick the first one. A simple stupid server may always return the
 list in the same order. That could be how all your clients always pick
 the same server.

 --
 Dimitri Maziuk
 Programmer/sysadmin
 BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




-- 
Igor Laskovy
facebook.com/igor.laskovy
studiogrizzly.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Determining when an 'out' OSD is actually unused

2013-05-20 Thread Dan Mick



On 05/20/2013 01:33 PM, Alex Bligh wrote:

If I want to remove an osd, I use 'ceph out' before taking it down, i.e. 
stopping the OSD process, and removing the disk.

How do I (preferably programatically) tell when it is safe to stop the OSD 
process? The documentation says 'ceph -w', which is not especially helpful, (a) 
if I want to do it programatically, or (b) if there are other problems in the 
cluster so ceph was not reporting HEALTH_OK to start with.

Is there a better way?



We've had some discussions about this recently, but there's no great way 
of doing this right now.  We should probably have a query option that 
returns number of PGs on this OSD or some such.



--
Dan Mick, Filesystem Engineering
Inktank Storage, Inc.   http://inktank.com
Ceph docs: http://ceph.com/docs
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Growing the cluster.

2013-05-20 Thread Dan Mick
What does your crushmap look like?  There's a good chance you're 
choosing first hosts, and then OSDs, which means you can't come up with 
3 replicas (because there are only two hosts).


Try:
ceph -o my.crush.map osd getcrushmap
crushtool -i my.crush.map --test --output-csv

and then look at the .csv files created in that directory; that 
simulates some random object placements, and will let you know which 
OSDs the crushmap chose.  I bet you'll see that the data pool isn't 
replicating to 3 OSDs.


On 05/20/2013 11:51 AM, Nicolas Fernandez wrote:

Hello,
I'm deploying a test cluster on 0.61.2 version between two nodes
(OSD/MDS), and another (MON).
I have a problem making my cluster grow, today i've added an OSD into a
node that was a osd exist. I've made a reweight and add a replica.The
crushmap is up to date but now i'm getting some pgs in stuck
unclean.I've been cheecking tuneables options but that haven't sold the
issue, how can i fix the healthof the cluster?.

My cluster status:

# ceph -s
health HEALTH_WARN 192 pgs degraded; 177 pgs stuck unclean; recovery
10910/32838 degraded (33.224%); clock skew detected on mon.b
monmap e1: 3 mons at
{a=192.168.2.144:6789/0,b=192.168.2.194:6789/0,c=192.168.2.145:6789/0
http://192.168.2.144:6789/0,b=192.168.2.194:6789/0,c=192.168.2.145:6789/0},
election epoch 148, quorum 0,1,2 a,b,c
osdmap e576: 3 osds: 3 up, 3 in
 pgmap v17715: 576 pgs: 79 active, 305 active+clean, 98
active+degraded, 94 active+clean+degraded; 1837 MB data, 6778 MB used,
440 GB / 446 GB avail; 10910/32838 degraded (33.224%)
mdsmap e136: 1/1/1 up {0=a=up:active}

The replica configuration is:

pool 0 'data' rep size 3 min_size 2 crush_ruleset 0 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 576 owner 0 crash_replay_interval 45
pool 1 'metadata' rep size 2 min_size 1 crush_ruleset 1 object_hash
rjenkins pg_num 192 pgp_num 192 last_change 556 owner 0
pool 2 'rbd' rep size 2 min_size 1 crush_ruleset 2 object_hash rjenkins
pg_num 192 pgp_num 192 last_change 1 owner 0

OSD Tree:

#ceph osd tree

# idweighttype nameup/downreweight
-13root default
-33rack unknownrack
-21host ceph01
01osd.0up1
-42host ceph02
11osd.1up1
21osd.2up1

Thanks.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com