Re: [ceph-users] RGW hammer/master woes

2015-03-05 Thread Pavan Rallabhandi
Is there anyone who is hitting this? or any help on this is much appreciated.

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Pavan 
Rallabhandi
Sent: Saturday, February 28, 2015 11:42 PM
To: ceph-us...@ceph.com
Subject: [ceph-users] RGW hammer/master woes

Am struggling to get through a basic PUT via swift client with RGW and CEPH 
binaries built out of Hammer/Master codebase, whereas the same (command on the 
same setup) is going through with RGW and CEPH binaries built out of Giant.

Find below RGW log snippet and the command that was run. Am I missing anything 
obvious here?

The user info looks like this:

{ user_id: johndoe,
  display_name: John Doe,
  email: j...@example.com,
  suspended: 0,
  max_buckets: 1000,
  auid: 0,
  subusers: [
{ id: johndoe:swift,
  permissions: full-control}],
  keys: [
{ user: johndoe,
  access_key: 7B39L2TUQ448LZW4RI3M,
  secret_key: lshKCoacSlbyVc7mBLLr4cJ26fEEM22Tcmp29hT3},
{ user: johndoe:swift,
  access_key: SHZ64EF7CIB4V42I14AH,
  secret_key: }],
  swift_keys: [
{ user: johndoe:swift,
  secret_key: asdf}],
  caps: [],
  op_mask: read, write, delete,
  default_placement: ,
  placement_tags: [],
  bucket_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  user_quota: { enabled: false,
  max_size_kb: -1,
  max_objects: -1},
  temp_url_keys: []}


The command that was run and the logs:

snip

swift -A http://localhost:8989/auth -U johndoe:swift -K asdf upload mycontainer 
ceph

2015-02-28 23:28:39.272897 7fb610ff9700  1 == starting new request 
req=0x7fb5f0009990 =
2015-02-28 23:28:39.272913 7fb610ff9700  2 req 0:0.16::PUT 
/swift/v1/mycontainer/ceph::initializing
2015-02-28 23:28:39.272918 7fb610ff9700 10 host=localhost:8989
2015-02-28 23:28:39.272921 7fb610ff9700 20 subdomain= domain= in_hosted_domain=0
2015-02-28 23:28:39.272938 7fb610ff9700 10 meta HTTP_X_OBJECT_META_MTIME
2015-02-28 23:28:39.272945 7fb610ff9700 10 x 
x-amz-meta-mtime:1425140933.648506
2015-02-28 23:28:39.272964 7fb610ff9700 10 ver=v1 first=mycontainer req=ceph
2015-02-28 23:28:39.272971 7fb610ff9700 10 s-object=ceph s-bucket=mycontainer
2015-02-28 23:28:39.272976 7fb610ff9700  2 req 0:0.79:swift:PUT 
/swift/v1/mycontainer/ceph::getting op
2015-02-28 23:28:39.272982 7fb610ff9700  2 req 0:0.85:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:authorizing
2015-02-28 23:28:39.273008 7fb610ff9700 10 swift_user=johndoe:swift
2015-02-28 23:28:39.273026 7fb610ff9700 20 build_token 
token=0d006a6f686e646f653a73776966744436beb90402b13c4f53f35472c2cf0f
2015-02-28 23:28:39.273057 7fb610ff9700  2 req 0:0.000160:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:reading permissions
2015-02-28 23:28:39.273100 7fb610ff9700 15 Read 
AccessControlPolicyAccessControlPolicy 
xmlns=http://s3.amazonaws.com/doc/2006-03-01/;OwnerIDjohndoe/IDDisplayNameJohn
 Doe/DisplayName/OwnerAccessControlListGrantGrantee 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; 
xsi:type=CanonicalUserIDjohndoe/IDDisplayNameJohn 
Doe/DisplayName/GranteePermissionFULL_CONTROL/Permission/Grant/AccessControlList/AccessControlPolicy
2015-02-28 23:28:39.273114 7fb610ff9700  2 req 0:0.000216:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:init op
2015-02-28 23:28:39.273120 7fb610ff9700  2 req 0:0.000223:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op mask
2015-02-28 23:28:39.273123 7fb610ff9700 20 required_mask= 2 user.op_mask=7
2015-02-28 23:28:39.273125 7fb610ff9700  2 req 0:0.000228:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op permissions
2015-02-28 23:28:39.273129 7fb610ff9700  5 Searching permissions for 
uid=johndoe mask=50
2015-02-28 23:28:39.273131 7fb610ff9700  5 Found permission: 15
2015-02-28 23:28:39.273133 7fb610ff9700  5 Searching permissions for group=1 
mask=50
2015-02-28 23:28:39.273135 7fb610ff9700  5 Permissions for group not found
2015-02-28 23:28:39.273136 7fb610ff9700  5 Searching permissions for group=2 
mask=50
2015-02-28 23:28:39.273137 7fb610ff9700  5 Permissions for group not found
2015-02-28 23:28:39.273138 7fb610ff9700  5 Getting permissions id=johndoe 
owner=johndoe perm=2
2015-02-28 23:28:39.273140 7fb610ff9700 10  uid=johndoe requested perm 
(type)=2, policy perm=2, user_perm_mask=2, acl perm=2
2015-02-28 23:28:39.273143 7fb610ff9700  2 req 0:0.000246:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:verifying op params
2015-02-28 23:28:39.273146 7fb610ff9700  2 req 0:0.000249:swift:PUT 
/swift/v1/mycontainer/ceph:put_obj:executing
2015-02-28 23:28:39.273279 7fb610ff9700 10 x 
x-amz-meta-mtime:1425140933.648506
2015-02-28 23:28:39.273313 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 
obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0
2015-02-28 23:28:39.274354 7fb610ff9700 20 get_obj_state: rctx=0x7fb610ff41f0 
obj=mycontainer:ceph state=0x7fb5f0016940 s-prefetch_data=0
2015-02-28 23:28:39.274394 

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Hi Robert,

it seems I have not listened well on your advice - I set osd to out,
instead of stoping it - and now instead of some ~ 3% of degraded objects,
now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
is happening again, but this is small percentage..

Do you know if later when I remove this OSD from crush map - no more data
will be rebalanced (as per CEPH official documentation) - since already
missplaced objects are geting distributed away to all other nodes ?

(after service ceph stop osd.0 - there was 2.45% degraded data - but no
backfilling was happening for some reason...it just stayed degraded... so
this is a reason why I started back the OSD, and then set it to out...)

Thanks

On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Robert,

 I already have this stuff set. CEph is 0.87.0 now...

 Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
 move data in less than 8h per my last experineced that was arround8h, but
 some 1G OSDs were included...

 Thx!

 On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I think
 there was some priority work done in firefly to help make backfills
 lower priority. I think it has gotten better in later versions.

 On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
 crush
  map - weather that will cause more than 37% of data moved (80% or
 whatever)
 
  I'm also wondering if the thortling that I applied is fine or not - I
 will
  introduce the osd_recovery_delay_start 10sec as Irek said.
 
  I'm just wondering hom much will be the performance impact, because:
  - when stoping OSD, the impact while backfilling was fine more or a
 less - I
  can leave with this
  - when I removed OSD from cursh map - first 1h or so, impact was
 tremendous,
  and later on during recovery process impact was much less but still
  noticable...
 
  Thanks for the tip of course !
  Andrija
 
  On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote:
 
  I would be inclined to shut down both OSDs in a node, let the cluster
  recover. Once it is recovered, shut down the next two, let it recover.
  Repeat until all the OSDs are taken out of the cluster. Then I would
  set nobackfill and norecover. Then remove the hosts/disks from the
  CRUSH then unset nobackfill and norecover.
 
  That should give you a few small changes (when you shut down OSDs) and
  then one big one to get everything in the final place. If you are
  still adding new nodes, when nobackfill and norecover is set, you can
  add them in so that the one big relocate fills the new drives too.
 
  On Tue, Mar 3, 2015 at 5:58 AM, Andrija Panic andrija.pa...@gmail.com
 
  wrote:
   Thx Irek. Number of replicas is 3.
  
   I have 3 servers with 2 OSDs on them on 1g switch (1 OSD already
   decommissioned), which is further connected to a new 10G
 switch/network
   with
   3 servers on it with 12 OSDs each.
   I'm decommissioning old 3 nodes on 1G network...
  
   So you suggest removing whole node with 2 OSDs manually from crush
 map?
   Per my knowledge, ceph never places 2 replicas on 1 node, all 3
 replicas
   were originally been distributed over all 3 nodes. So anyway It
 could be
   safe to remove 2 OSDs at once together with the node itself...since
   replica
   count is 3...
   ?
  
   Thx again for your time
  
   On Mar 3, 2015 1:35 PM, Irek Fasikhov malm...@gmail.com wrote:
  
   Once you have only three nodes in the cluster.
   I recommend you add new nodes to the cluster, and then delete the
 old.
  
   2015-03-03 15:28 GMT+03:00 Irek Fasikhov malm...@gmail.com:
  
   You have a number of replication?
  
   2015-03-03 15:14 GMT+03:00 Andrija Panic andrija.pa...@gmail.com
 :
  
   Hi Irek,
  
   yes, stoping OSD (or seting it to OUT) resulted in only 3% of data
   degraded and moved/recovered.
   When I after that removed it from Crush map ceph osd crush rm
 id,
   that's when the stuff with 37% happened.
  
   And thanks Irek for help - could you kindly just let me know of
 the
   prefered steps when removing whole node?
   Do you mean I first stop all OSDs again, or just remove each OSD
 from
   crush map, or perhaps, just decompile cursh map, delete the node
   completely,
   compile back in, and let it heal/recover ?
  
   Do you think this would result in less data missplaces and moved
   arround
   ?
  
   Sorry for bugging you, I really appreaciate your help.
  
   Thanks
  
   On 3 March 2015 at 12:58, Irek Fasikhov malm...@gmail.com
 wrote:
  
   A large percentage of the rebuild of the cluster map (But low
   

Re: [ceph-users] Ceph repo - RSYNC?

2015-03-05 Thread Wido den Hollander
On 03/05/2015 07:14 PM, Brian Rak wrote:
 Do any of the Ceph repositories run rsync?  We generally mirror the
 repository locally so we don't encounter any unexpected upgrades.
 
 eu.ceph.com used to run this, but it seems to be down now.
 
 # rsync rsync://eu.ceph.com
 rsync: failed to connect to eu.ceph.com: Connection refused (111)
 rsync error: error in socket IO (code 10) at clientserver.c(124)
 [receiver=3.0.6]
 

Argh! That rsync daemon somehow sometimes dies and I don't notice.

I'll see if I can fix this.

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rgw admin api - users

2015-03-05 Thread Yehuda Sadeh-Weinraub
The metadata api can do it:

GET /admin/metadata/user


Yehuda

- Original Message -
 From: Joshua Weaver joshua.wea...@ctl.io
 To: ceph-us...@ceph.com
 Sent: Thursday, March 5, 2015 1:43:33 PM
 Subject: [ceph-users] rgw admin api - users
 
 According to the docs at
 http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info
 I should be able to invoke /admin/user without a quid specified, and get a
 list of users.
 No matter what I try, I get a 403.
 After looking at the source at github (ceph/ceph), it appears that there
 isn’t any code path that would result in a collection of users to be
 generated from that resource.
 
 Am I missing something?
 
 TIA,
 _josh
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] client-ceph [can not connect from client][connect protocol feature mismatch]

2015-03-05 Thread Sonal Dubey
Hi,

I am newbie for ceph, and ceph-user group. Recently I have been working on
a ceph client. It worked on all the environments while when i tested on the
production, it is not able to connect to ceph.

Following are the operating system details and error. If someone has seen
this problem before, any help is really appreciated.

OS -

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.2 LTS
Release: 12.04
Codename: precise

2015-03-05 13:37:16.816322 7f5191deb700 -- 10.8.25.112:0/2487 
10.138.23.241:6789/0 pipe(0x12489f0 sd=3 pgs=0 cs=0 l=0).connect protocol
feature mismatch, my 1ffa  peer 42041ffa missing 4204
2015-03-05 13:37:17.635776 7f5191deb700 -- 10.8.25.112:0/2487 
10.138.23.241:6789/0 pipe(0x12489f0 sd=3 pgs=0 cs=0 l=0).connect protocol
feature mismatch, my 1ffa  peer 42041ffa missing 4204
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Robert LeBlanc
Setting an OSD out will start the rebalance with the degraded object count.
The OSD is still alive and can participate in the relocation of the
objects. This is preferable so that you don't happen to get less the
min_size because a disk fails during the rebalance then I/O stops on the
cluster.

Because CRUSH is an algorithm, anything that changes algorithm will cause a
change in the output (location). When you set/fail an OSD, it changes the
CRUSH, but the host and weight of the host are still in effect. When you
remove the host or change the weight of the host (by removing a single
OSD), it makes a change to the algorithm which will also cause some changes
in how it computes the locations.

Disclaimer - I have not tried this

It may be possible to minimize the data movement by doing the following:

   1. set norecover and nobackfill on the cluster
   2. Set the OSDs to be removed to out
   3. Adjust the weight of the hosts in the CRUSH (if removing all OSDs for
   the host, set it to zero)
   4. If you have new OSDs to add, add them into the cluster now
   5. Once all OSDs changes have been entered, unset norecover and
   nobackfill
   6. This will migrate the data off the old OSDs and onto the new OSDs in
   one swoop.
   7. Once the data migration is complete, set norecover and nobackfill on
   the cluster again.
   8. Remove the old OSDs
   9. Unset norecover and nobackfill

The theory is that by setting the host weights to 0, removing the
OSDs/hosts later should minimize the data movement afterwards because the
algorithm should have already dropped it out as a candidate for placement.

If this works right, then you basically queue up a bunch of small changes,
do one data movement, always keep all copies of your objects online and
minimize the impact of the data movement by leveraging both your old and
new hardware at the same time.

If you try this, please report back on your experience. I'm might try it in
my lab, but I'm really busy at the moment so I don't know if I'll get to it
real soon.

On Thu, Mar 5, 2015 at 12:53 PM, Andrija Panic andrija.pa...@gmail.com
wrote:

 Hi Robert,

 it seems I have not listened well on your advice - I set osd to out,
 instead of stoping it - and now instead of some ~ 3% of degraded objects,
 now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
 is happening again, but this is small percentage..

 Do you know if later when I remove this OSD from crush map - no more data
 will be rebalanced (as per CEPH official documentation) - since already
 missplaced objects are geting distributed away to all other nodes ?

 (after service ceph stop osd.0 - there was 2.45% degraded data - but no
 backfilling was happening for some reason...it just stayed degraded... so
 this is a reason why I started back the OSD, and then set it to out...)

 Thanks

 On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Robert,

 I already have this stuff set. CEph is 0.87.0 now...

 Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
 move data in less than 8h per my last experineced that was arround8h, but
 some 1G OSDs were included...

 Thx!

 On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I think
 there was some priority work done in firefly to help make backfills
 lower priority. I think it has gotten better in later versions.

 On Wed, Mar 4, 2015 at 1:35 AM, Andrija Panic andrija.pa...@gmail.com
 wrote:
  Thank you Rober - I'm wondering when I do remove total of 7 OSDs from
 crush
  map - weather that will cause more than 37% of data moved (80% or
 whatever)
 
  I'm also wondering if the thortling that I applied is fine or not - I
 will
  introduce the osd_recovery_delay_start 10sec as Irek said.
 
  I'm just wondering hom much will be the performance impact, because:
  - when stoping OSD, the impact while backfilling was fine more or a
 less - I
  can leave with this
  - when I removed OSD from cursh map - first 1h or so, impact was
 tremendous,
  and later on during recovery process impact was much less but still
  noticable...
 
  Thanks for the tip of course !
  Andrija
 
  On 3 March 2015 at 18:34, Robert LeBlanc rob...@leblancnet.us wrote:
 
  I would be inclined to shut down both OSDs in a node, let the cluster
  recover. Once it is recovered, shut down the next two, let it recover.
  Repeat until all the OSDs are taken out of the cluster. Then I would
  set nobackfill and norecover. Then remove the hosts/disks from the
  CRUSH then unset nobackfill and norecover.
 
  That should give you a few small changes (when you shut down OSDs) and
  then one big one to get everything in the final place. If you 

Re: [ceph-users] Rebalance/Backfill Throtling - anything missing here?

2015-03-05 Thread Andrija Panic
Thanks a lot Robert.

I have actually already tried folowing:

a) set one OSD to out (6% of data misplaced, CEPH recovered fine), stop
OSD, remove OSD from crush map (again 36% of data misplaced !!!) - then
inserted OSD back in to crushmap - and those 36% displaced objects
disappeared, of course - I'v undone the crush remove...
so damage undone - the OSD is just out and cluster healthy again.


b) set norecover, nobackfill, and then:
- Remove one OSD from crush (the running OSD, not the one from point a)
- only 18% of data misplaced !!! (no recovery was happening though, because
of norecover, nobackfill)
- Removed another OSD from same node - total of only 20% of objects
missplaced (with 2 OSDs on same node, removed from crush map)
-So these 2 OSD were still running UP and IN, and I just removed them
from crush map, per the advice to avoid calcualting Crush map twice = from:
http://image.slidesharecdn.com/scalingcephatcern-140311134847-phpapp01/95/scaling-ceph-at-cern-ceph-day-frankfurt-19-638.jpg?cb=1394564547
- And I added back this 2 OSD to crush map, this was just a test...

So the algorith is very funny in some aspect..but it's all pseudo stuff so
I kind of understand...

I will share my finding during the rest of the OSD demotion, after I demote
them...

Thanks for your detailed inputs !
Andrija


On 5 March 2015 at 22:51, Robert LeBlanc rob...@leblancnet.us wrote:

 Setting an OSD out will start the rebalance with the degraded object
 count. The OSD is still alive and can participate in the relocation of the
 objects. This is preferable so that you don't happen to get less the
 min_size because a disk fails during the rebalance then I/O stops on the
 cluster.

 Because CRUSH is an algorithm, anything that changes algorithm will cause
 a change in the output (location). When you set/fail an OSD, it changes the
 CRUSH, but the host and weight of the host are still in effect. When you
 remove the host or change the weight of the host (by removing a single
 OSD), it makes a change to the algorithm which will also cause some changes
 in how it computes the locations.

 Disclaimer - I have not tried this

 It may be possible to minimize the data movement by doing the following:

1. set norecover and nobackfill on the cluster
2. Set the OSDs to be removed to out
3. Adjust the weight of the hosts in the CRUSH (if removing all OSDs
for the host, set it to zero)
4. If you have new OSDs to add, add them into the cluster now
5. Once all OSDs changes have been entered, unset norecover and
nobackfill
6. This will migrate the data off the old OSDs and onto the new OSDs
in one swoop.
7. Once the data migration is complete, set norecover and nobackfill
on the cluster again.
8. Remove the old OSDs
9. Unset norecover and nobackfill

 The theory is that by setting the host weights to 0, removing the
 OSDs/hosts later should minimize the data movement afterwards because the
 algorithm should have already dropped it out as a candidate for placement.

 If this works right, then you basically queue up a bunch of small changes,
 do one data movement, always keep all copies of your objects online and
 minimize the impact of the data movement by leveraging both your old and
 new hardware at the same time.

 If you try this, please report back on your experience. I'm might try it
 in my lab, but I'm really busy at the moment so I don't know if I'll get to
 it real soon.

 On Thu, Mar 5, 2015 at 12:53 PM, Andrija Panic andrija.pa...@gmail.com
 wrote:

 Hi Robert,

 it seems I have not listened well on your advice - I set osd to out,
 instead of stoping it - and now instead of some ~ 3% of degraded objects,
 now there is 0.000% of degraded, and arround 6% misplaced - and rebalancing
 is happening again, but this is small percentage..

 Do you know if later when I remove this OSD from crush map - no more data
 will be rebalanced (as per CEPH official documentation) - since already
 missplaced objects are geting distributed away to all other nodes ?

 (after service ceph stop osd.0 - there was 2.45% degraded data - but no
 backfilling was happening for some reason...it just stayed degraded... so
 this is a reason why I started back the OSD, and then set it to out...)

 Thanks

 On 4 March 2015 at 17:54, Andrija Panic andrija.pa...@gmail.com wrote:

 Hi Robert,

 I already have this stuff set. CEph is 0.87.0 now...

 Thanks, will schedule this for weekend, 10G network and 36 OSDs - should
 move data in less than 8h per my last experineced that was arround8h, but
 some 1G OSDs were included...

 Thx!

 On 4 March 2015 at 17:49, Robert LeBlanc rob...@leblancnet.us wrote:

 You will most likely have a very high relocation percentage. Backfills
 always are more impactful on smaller clusters, but osd max backfills
 should be what you need to help reduce the impact. The default is 10,
 you will want to use 1.

 I didn't catch which version of Ceph you are running, but I 

Re: [ceph-users] pool distribution quality report script

2015-03-05 Thread Mark Nelson

Hi David,

Mind sending me the output of ceph pg dump -f json?

thanks!
Mark

On 03/05/2015 12:52 PM, David Burley wrote:

Mark,

It worked for me earlier this morning but the new rev is throwing a
traceback:

$ ceph pg dump -f json | python ./readpgdump.py  pgdump_analysis.txt
dumped all in format json
Traceback (most recent call last):
   File ./readpgdump.py, line 294, in module
 parse_json(data)
   File ./readpgdump.py, line 263, in parse_json
 print_report(pool_counts, total_counts, JSON)
   File ./readpgdump.py, line 119, in print_report
 print_data(data, pool_weights, total_weights)
   File ./readpgdump.py, line 161, in print_data
 print format_line(Efficiency score using optimal weights for pool
%s: %.1f%% % (pool, efficiency_score(data[name],
weights['acting_totals'])))
   File ./readpgdump.py, line 71, in efficiency_score
 if weights and weights[osd]:
KeyError: 0

On Thu, Mar 5, 2015 at 1:46 PM, Mark Nelson mnel...@redhat.com
mailto:mnel...@redhat.com wrote:

Hi Blair,

I've updated the script and it now (theoretically) computes optimal
crush weights based on both primary and secondary acting set OSDs.
It also attempts to show you the efficiency of equal weights vs
using weights optimized for different pools (or all pools).  This is
done by looking at the way weights would be assigned and how they
would affect the current pool distribution, then looking at the skew
for the heaviest weighted OSD vs the average.

Unfortunately the output has now become beastly and complex (granted
this is a large cluster with many pools!).  I think the trick now is
how to make the interface for this more manageable.  For instance
perhaps it's not interesting to know how one pool's weights affect
the efficiency of the acting primary OSDs for another pool.

I've included sample output, but it's huge (15K lines long!)

Mark


On 03/05/2015 01:52 AM, Blair Bethwaite wrote:

Hi Mark,

Cool, that looks handy. Though it'd be even better if it could go a
step further and recommend re-weighting values to balance things out
(or increased PG counts where needed).

Cheers,

On 5 March 2015 at 15:11, Mark Nelson mnel...@redhat.com
mailto:mnel...@redhat.com wrote:

Hi All,

Recently some folks showed interest in gathering pool
distribution
statistics and I remembered I wrote a script to do that a
while back. It was
broken due to a change in the ceph pg dump output format
that was committed
a while back, so I cleaned the script up, added detection of
header fields,
automatic json support, and also added in calculation of
expected max and
min PGs per OSD and std deviation.

The script is available here:


https://github.com/ceph/ceph-__tools/blob/master/cbt/tools/__readpgdump.py

https://github.com/ceph/ceph-tools/blob/master/cbt/tools/readpgdump.py

Some general comments:

1) Expected numbers are derived by treating PGs and OSDs as a
balls-in-buckets problem ala Raab  Steger:

http://www14.in.tum.de/__personen/raab/publ/balls.pdf
http://www14.in.tum.de/personen/raab/publ/balls.pdf

2) You can invoke it either by passing it a file or stdout, ie:

ceph pg dump -f json | ./readpgdump.py

or

./readpgdump.py ~/pgdump.out


3) Here's a snippet of some of some sample output from a 210
OSD cluster.
Does this output make sense to people?  Is it useful?

[nhm@burnupiX tools]$ ./readpgdump.py ~/pgdump.out


+-__--__-+
| Detected input as plain
|


+-__--__-+



+-__--__-+
| Pool ID: 681
|


+-__--__-+
| Participating OSDs: 210
|
| Participating PGs: 4096
|


+-__--__-+
| OSDs in Primary Role (Acting)
|
| Expected PGs Per OSD: Min: 4, Max: 33, Mean: 19.5, Std
Dev: 7.2
|
| Actual PGs Per OSD: Min: 7, Max: 43, Mean: 19.5, Std
Dev: 6.5
|
| 5 Most Subscribed OSDs: 199(43), 175(36), 149(34),
167(32), 20(31)
|
| 5 Least 

[ceph-users] RadosGW - Create bucket via admin API

2015-03-05 Thread Italo Santos
Hello guys,  

On adminops documentation that saw how to remove a bucket, but I can’t find the 
URI to create one, I’d like to know if this is possible?  

Regards.

Italo Santos
http://italosantos.com.br/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Firefly, cephfs issues: different unix rights depending on the client and ls are slow

2015-03-05 Thread Francois Lafont
Hi,

I'm sorry to revive my post but I can't to solve my problems
and I see anything in the log. I have tried with Hammer version
and I found the same phenomena.

In fact, first, I have tried the same installation (ie the same
conf via puppet) of my cluster but in virtualbox environment
and I have the same phenomena. Second, I have reinstalled my
virtualbox environment but with the Hammer version of Ceph (ie
the testing version 0.93-1trusty) and I have the same issues too.

Le 04/03/2015 14:15, Francois Lafont wrote :

[...]

 ~# mkdir /cephfs
 ~# mount -t ceph 10.0.2.150,10.0.2.151,10.0.2.152:/ /cephfs/ -o 
 name=cephfs,secretfile=/etc/ceph/ceph.client.cephfs.secret
 
 Then in ceph-testfs, I do:
 
 root@test-cephfs:~# mkdir /cephfs/d1
 root@test-cephfs:~# ll /cephfs/
 total 4
 drwxr-xr-x  1 root root0 Mar  4 11:45 ./
 drwxr-xr-x 24 root root 4096 Mar  4 11:42 ../
 drwxr-xr-x  1 root root0 Mar  4 11:45 d1/
 
 After, in test-cephfs2, I do:
 
 root@test-cephfs2:~# ll /cephfs/
 total 4
 drwxr-xr-x  1 root root0 Mar  4 11:45 ./
 drwxr-xr-x 24 root root 4096 Mar  4 11:42 ../
 drwxrwxrwx  1 root root0 Mar  4 11:45 d1/
 
 1) Why the unix rights of d1/ are different when I'm in test-cephfs
 and when I'm in test-cephfs2? It should be the same, isn't it?

This problem is not random and I can reproduce it indefinitely.

 2) If I create 100 files in /cephfs/d1/ with test-cephfs:
 
 for i in $(seq 100)
 do
 echo $(date +%s.%N) /cephfs/d1/f_$i
 done
 
 sometimes, in test-cephfs2, when I do a simple:
 
 root@test-cephfs2:~# time \ls -la /cephfs

Sorry error of copy and paste, of course it was:

root@test-cephfs2:~# time \ls -la /cephfs/d1/

 the command can take 2 or 3 seconds which seems to me very long
 for a directory with just 100 files. Generally, if I repeat the
 command on test-cephfs2 just after, it's immediate but not always.
 I can not reproduce the problem in a determinist way. Sometimes,
 to reproduce the problem, I must remove all the files in /cephfs/
 on test-cepfs and recreate them. It's very strange. Sometimes and
 randomly, something seems to be stalled but I don't know what. I
 suspect a problem of mds tuning but, In fact, I don't know what
 to do.

I have the same problem with hammer too.
But someone can confirm me that 3s (not always) for ls -la in
a cephfs directory which contains 100 file it's pathological? After
all, maybe is it normal? I don't have much experience with cephfs.

Thanks for your help.

-- 
François Lafont
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] client-ceph [can not connect from client][connect protocol feature mismatch]

2015-03-05 Thread Kamil Kuramshin

What did you mean when say ceph client?
The log piece that you posted seems to be about kernel that you are 
using not supporting some features of ceph. Try to update you kernel if 
your 'client' is Rados Block Device client.



06.03.2015 00:48, Sonal Dubey пишет:

Hi,

I am newbie for ceph, and ceph-user group. Recently I have been 
working on a ceph client. It worked on all the environments while when 
i tested on the production, it is not able to connect to ceph.


Following are the operating system details and error. If someone has 
seen this problem before, any help is really appreciated.


OS -

lsb_release -a
No LSB modules are available.
Distributor ID:Ubuntu
Description:Ubuntu 12.04.2 LTS
Release:12.04
Codename:precise

2015-03-05 13:37:16.816322 7f5191deb700 -- 10.8.25.112:0/2487 
http://10.8.25.112:0/2487  10.138.23.241:6789/0 
http://10.138.23.241:6789/0 pipe(0x12489f0 sd=3 pgs=0 cs=0 
l=0).connect protocol feature mismatch, my 1ffa  peer 42041ffa 
missing 4204
2015-03-05 13:37:17.635776 7f5191deb700 -- 10.8.25.112:0/2487 
http://10.8.25.112:0/2487  10.138.23.241:6789/0 
http://10.138.23.241:6789/0 pipe(0x12489f0 sd=3 pgs=0 cs=0 
l=0).connect protocol feature mismatch, my 1ffa  peer 42041ffa 
missing 4204



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw admin api - users

2015-03-05 Thread Joshua Weaver
According to the docs at 
http://docs.ceph.com/docs/master/radosgw/adminops/#get-user-info
I should be able to invoke /admin/user without a quid specified, and get a list 
of users.
No matter what I try, I get a 403.
After looking at the source at github (ceph/ceph), it appears that there isn’t 
any code path that would result in a collection of users to be generated from 
that resource.

Am I missing something?

TIA,
_josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Daniel Schneller daniel.schnel...@centerdevice.com
 To: ceph-users@lists.ceph.com
 Sent: Tuesday, March 3, 2015 2:54:13 AM
 Subject: [ceph-users] Understand RadosGW logs
 
 Hi!
 
 After realizing the problem with log rotation (see
 http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
 and fixing it, I now for the first time have some
 meaningful (and recent) logs to look at.
 
 While from an application perspective there seem
 to be no issues, I would like to understand some
 messages I find with relatively high frequency in
 the logs:
 
 Exhibit 1
 -
 2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1

It means that returning data to the client got some error, usually means that 
the client disconnected before completion.
 
 I cannot find anything relevant by Googling for
 that, apart from the actual line of code that
 produces this line.
 What does that mean? Is it an indication of data
 corruption or are there more benign reasons for
 this line?
 
 
 Exhibit 2
 --
 Several of these blocks
 
 2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
 req=0x7fcf5800f3b0 =
 2015-03-03 07:06:17.836671 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836758 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:17.836918 7fcf36fed700  0
 RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
 part_ofs=0 rule-part_size=0
 2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
 req=0x7fcf5800f3b0 http_status=200 ==
 ...
 2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:27:29.866718 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866778 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866852 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.866917 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.875466 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.884434 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.906155 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.914364 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:29.940653 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:30.272816 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.125773 7fcf28fd1700  0
 RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
 part_ofs=0 rule-part_size=0
 2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
 d-client_c-handle_data() returned -1
 2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 ...
 2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
 req=0x7fcf580102a0 =
 2015-03-03 09:28:43.016414 7fcf2a7d4700  0
 RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
 part_ofs=0 rule-part_size=0
 2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
 req=0x7fcf580102a0 http_status=200 ==
 
 First, what is the req= line? Is that a thread-id?
 I am asking, because the same id is used over and over
 in the same file over time.

It's the request id (within the current radosgw instance)

 
 More 

[ceph-users] tgt and krbd

2015-03-05 Thread Nick Fisk
Hi All,

 

Just a heads up after a day's experimentation.

 

I believe tgt with its default settings has a small write cache when
exporting a kernel mapped RBD. Doing some write tests I saw 4 times the
write throughput when using tgt aio + krbd compared to tgt with the builtin
librbd.

 

After running the following command against the LUN, which apparently
disables write cache, Performance dropped back to what I am seeing using
tgt+librbd and also the same as fio.

 

tgtadm --op update --mode logicalunit --tid 2 --lun 3 -P
mode_page=8:0:18:0x10:0:0xff:0xff:0:0:0xff:0xff:0xff:0xff:0x80:0x14:0:0:0:0:
0:0

 

From that I can only deduce that using tgt + krbd in its default state is
not 100% safe to use, especially in an HA environment.

 

Nick




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-kvm and cloned rbd image

2015-03-05 Thread koukou73gr

On 03/05/2015 03:40 AM, Josh Durgin wrote:


It looks like your libvirt rados user doesn't have access to whatever
pool the parent image is in:


librbd::AioRequest: write 0x7f1ec6ad6960 
rbd_data.24413d1b58ba.0186 1523712~4096 should_complete: r 
= -1


-1 is EPERM, for operation not permitted.

Check the libvirt user capabilites shown in ceph auth list - it should
have at least r and class-read access to the pool storing the parent
image. You can update it via the 'ceph auth caps' command.


Josh,

All  images, parent, snapshot and clone reside on the same pool 
(libvirt-pool *) and the user (libvirt) seems to have the proper 
capabilities. See:


client.libvirt
key: 
caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rw 
class-read pool=rbd


This same pool contains other (flat) images used to back my production 
VMs. They are all accessed with this same user and there have been no 
problems so far. I just can't  seem able to use cloned images.


-K.



* In my original email describing the problem I used 'rbd' instead of 
'libvirt-pool' for the pool name for simplicity. As more and more 
configuration items are requested, it makes more sense to use the real 
pool name to avoid causing any misconceptions.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The project of ceph client file system porting from Linux to AIX

2015-03-05 Thread Dennis Chen
Hello Ketor,

About 1 more years ago, I need a free DFS can be used in AIX
environment as a tiered storage solution for Bank DC, that why the
project.
This project just port the CephFS in Linux kernel to AIX kernel(maybe
RBD in future), so it's a kernel mode AIX cephfs.
But I have multiple projects at hand now, so the AIX cephfs project is
in pending status... It's open, anyone can make changes from this
project if he want.

On Thu, Mar 5, 2015 at 3:11 PM, Ketor D d.ke...@gmail.com wrote:
 Hi Dennis,
   I am interested in your project.
   I wrote a Win32 cephfs client https://github.com/ceph/ceph-dokan.
   But ceph-dokan runs in user-mode. I see you port code from
 kernel cephfs, are you planning to write a kernel mode AIX-cephfs?

 Thanks!


 2015-03-04 17:59 GMT+08:00 Dennis Chen kernel.org@gmail.com:
 Hello,

 The ceph cluster now can only be used by Linux system AFAICT, so I
 planed to port the ceph client file system from Linux to AIX as a
 tiered storage solution in that platform. Below is the source code
 repository I've done, which is still in progress. 3 important modules:

 1. aixker: maintain a uniform kernel API beteween the Linux and AIX
 2. net: as a data transfering layer between the client and cluster
 3. fs: as an adaptor to make the AIX can recognize the Linux file system.

 https://github.com/Dennis-Chen1977/aix-cephfs

 Welcome any comments or anything...

 --
 Den
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Den
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH hardware recommendations and cluster design questions

2015-03-05 Thread Adrian Sevcenco
Thank you all for all good advises and much needed documentation.
I have a lot to digest :)

Adrian

On 03/04/2015 08:17 PM, Stephen Mercier wrote:
 To expand upon this, the very nature and existence of Ceph is to replace
 RAID. The FS itself replicates data and handles the HA functionality
 that you're looking for. If you're going to build a single server with
 all those disks, backed by a ZFS RAID setup, you're going to be much
 better suited with an iSCSI setup. The idea of ceph is that it takes the
 place of all the ZFS bells and whistles. A CEPH cluster that only has
 one OSD backed by that huge ZFS setup becomes just a wire-protocol to
 speak to the server. The magic in ceph comes from the replication and
 distribution of the data across many OSDs, hopefully living in many
 hosts. My own setup for instance uses 96 OSDs that are spread across 4
 hosts (I know I know guys - CPU is a big deal with SSDs so 24 per host
 is a tall order - didn't know that when we built it - been working ok so
 far) that is then distributed between 2 cabinets on 2 separate
 cooling/power/data zones in our datacenter. My CRUSH map is currently
 setup for 3 copies of all data, and laid out so that at least one copy
 is located in each cabinet, and then the cab that gets the 2 copies also
 makes sure that each copy is on a different host. No RAID needed because
 ceph makes sure that I have a safe amount of copies of the data, in a
 distribution layout that allows us to sleep at night. In my opinion,
 ceph is much more pleasant, powerful, and versatile to deal with than
 both hardware RAID and ZFS (Both of which we have instances of deployed
 as well from previous iterations of infrastructure deployments). Now,
 you could always create small little zRAID clusters using ZFS, and then
 give an OSD to each of those, if you wanted even an additional layer of
 safety. Heck, you could even have hardware RAID behind the zRAID, for
 even another layer. Where YOU need to make the decision is the trade-off
 between HA functionality/peace of mind, performance, and
 useability/maintainability.
 
 Would me happy to answer any questions you still have...
 
 Cheers,
 -- 
 Stephen Mercier
 Senior Systems Architect
 Attainia, Inc.
 Phone: 866-288-2464 ext. 727
 Email: stephen.merc...@attainia.com mailto:stephen.merc...@attainia.com
 Web: www.attainia.com http://www.attainia.com
 
 Capital equipment lifecycle planning  budgeting solutions for healthcare
 
 
 
 
 
 
 On Mar 4, 2015, at 10:42 AM, Alexandre DERUMIER wrote:
 
 Hi for hardware, inktank have good guides here:

 http://www.inktank.com/resource/inktank-hardware-selection-guide/
 http://www.inktank.com/resource/inktank-hardware-configuration-guide/

 ceph works well with multiple osd daemon (1 osd by disk),
 so you should not use raid.

 (xfs is the recommended fs for osd daemons).

 you don't need disk spare too, juste enough disk space to handle a
 disk failure.
 (datas are replicated-rebalanced on other disks/osd in case of disk
 failure)


 - Mail original -
 De: Adrian Sevcenco adrian.sevce...@cern.ch
 À: ceph-users ceph-users@lists.ceph.com
 Envoyé: Mercredi 4 Mars 2015 18:30:31
 Objet: [ceph-users] CEPH hardware recommendations and cluster
 designquestions

 Hi! I seen the documentation
 http://ceph.com/docs/master/start/hardware-recommendations/ but those
 minimum requirements without some recommendations don't tell me much ...

 So, from what i seen for mon and mds any cheap 6 core 16+ gb ram amd
 would do ... what puzzles me is that per daemon construct ...
 Why would i need/require to have multiple daemons? with separate servers
 (3 mon + 1 mds - i understood that this is the requirement) i imagine
 that each will run a single type of daemon.. did i miss something?
 (beside that maybe is a relation between daemons and block devices and
 for each block device should be a daemon?)

 for mon and mds : would help the clients if these are on 10 GbE?

 for osd : i plan to use a 36 disk server as osd server (ZFS RAIDZ3 all
 disks + 2 ssds mirror for ZIL and L2ARC) - that would give me ~ 132 TB
 how much ram i would really need? (128 gb would be way to much i think)
 (that RAIDZ3 for 36 disks is just a thought - i have also choices like:
 2 X 18 RAIDZ2 ; 34 disks RAIDZ3 + 2 hot spare)

 Regarding journal and scrubbing : by using ZFS i would think that i can
 safely not use the CEPH ones ... is this ok?

 Do you have some other advises and recommendations for me? (the
 read:writes ratios will be 10:1)

 Thank you!!
 Adrian




smime.p7s
Description: S/MIME Cryptographic Signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph repo - RSYNC?

2015-03-05 Thread Michael Kuriger
I use reposync to keep mine updated when needed.


Something like:
cd ~ /ceph/repos
reposync -r Ceph -c /etc/yum.repos.d/ceph.repo
reposync -r Ceph-noarch -c /etc/yum.repos.d/ceph.repo
reposync -r elrepo-kernel -c /etc/yum.repos.d/elrepo.repo



 
Michael Kuriger
Sr. Unix Systems Engineer
S mk7...@yp.com |  818-649-7235


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Brian 
Rak
Sent: Thursday, March 05, 2015 10:14 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Ceph repo - RSYNC?

Do any of the Ceph repositories run rsync?  We generally mirror the repository 
locally so we don't encounter any unexpected upgrades.

eu.ceph.com used to run this, but it seems to be down now.

# rsync rsync://eu.ceph.com
rsync: failed to connect to eu.ceph.com: Connection refused (111) rsync error: 
error in socket IO (code 10) at clientserver.c(124) [receiver=3.0.6]

___
ceph-users mailing list
ceph-users@lists.ceph.com
https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.ceph.com_listinfo.cgi_ceph-2Dusers-2Dceph.comd=AwICAgc=lXkdEK1PC7UK9oKA-BBSI8p1AamzLOSncm6Vfn0C_UQr=CSYA9OS6Qd7fQySI2LDvlQm=5oPk_opCf1eJ_BZLqS3mzFHka3r1-lGm_ya8mvkaIh8s=sYjohrI39G9Owm-E92bzgsL53AYrmkFJJEzt-fEC7awe=
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] pool distribution quality report script

2015-03-05 Thread Mark Nelson

Hi Blair,

I've updated the script and it now (theoretically) computes optimal 
crush weights based on both primary and secondary acting set OSDs.  It 
also attempts to show you the efficiency of equal weights vs using 
weights optimized for different pools (or all pools).  This is done by 
looking at the way weights would be assigned and how they would affect 
the current pool distribution, then looking at the skew for the heaviest 
weighted OSD vs the average.


Unfortunately the output has now become beastly and complex (granted 
this is a large cluster with many pools!).  I think the trick now is how 
to make the interface for this more manageable.  For instance perhaps 
it's not interesting to know how one pool's weights affect the 
efficiency of the acting primary OSDs for another pool.


I've included sample output, but it's huge (15K lines long!)

Mark

On 03/05/2015 01:52 AM, Blair Bethwaite wrote:

Hi Mark,

Cool, that looks handy. Though it'd be even better if it could go a
step further and recommend re-weighting values to balance things out
(or increased PG counts where needed).

Cheers,

On 5 March 2015 at 15:11, Mark Nelson mnel...@redhat.com wrote:

Hi All,

Recently some folks showed interest in gathering pool distribution
statistics and I remembered I wrote a script to do that a while back. It was
broken due to a change in the ceph pg dump output format that was committed
a while back, so I cleaned the script up, added detection of header fields,
automatic json support, and also added in calculation of expected max and
min PGs per OSD and std deviation.

The script is available here:

https://github.com/ceph/ceph-tools/blob/master/cbt/tools/readpgdump.py

Some general comments:

1) Expected numbers are derived by treating PGs and OSDs as a
balls-in-buckets problem ala Raab  Steger:

http://www14.in.tum.de/personen/raab/publ/balls.pdf

2) You can invoke it either by passing it a file or stdout, ie:

ceph pg dump -f json | ./readpgdump.py

or

./readpgdump.py ~/pgdump.out


3) Here's a snippet of some of some sample output from a 210 OSD cluster.
Does this output make sense to people?  Is it useful?


[nhm@burnupiX tools]$ ./readpgdump.py ~/pgdump.out

++
| Detected input as plain
|

++


++
| Pool ID: 681
|

++
| Participating OSDs: 210
|
| Participating PGs: 4096
|

++
| OSDs in Primary Role (Acting)
|
| Expected PGs Per OSD: Min: 4, Max: 33, Mean: 19.5, Std Dev: 7.2
|
| Actual PGs Per OSD: Min: 7, Max: 43, Mean: 19.5, Std Dev: 6.5
|
| 5 Most Subscribed OSDs: 199(43), 175(36), 149(34), 167(32), 20(31)
|
| 5 Least Subscribed OSDs: 121(7), 46(7), 70(8), 94(9), 122(9)
|
| Avg Deviation from Most Subscribed OSD: 54.6%
|

++
| OSDs in Secondary Role (Acting)
|
| Expected PGs Per OSD: Min: 18, Max: 59, Mean: 39.0, Std Dev: 10.2
|
| Actual PGs Per OSD: Min: 17, Max: 61, Mean: 39.0, Std Dev: 9.7
|
| 5 Most Subscribed OSDs: 44(61), 14(60), 2(59), 167(59), 164(57)
|
| 5 Least Subscribed OSDs: 35(17), 31(20), 37(20), 145(20), 16(20)
|
| Avg Deviation from Most Subscribed OSD: 36.0%
|

++
| OSDs in All Roles (Acting)
|
| Expected PGs Per OSD: Min: 32, Max: 83, Mean: 58.5, Std Dev: 12.5
|
| Actual PGs Per OSD: Min: 29, Max: 93, Mean: 58.5, Std Dev: 14.6
|
| 5 Most Subscribed OSDs: 199(93), 175(92), 44(92), 167(91), 14(91)
|
| 5 Least Subscribed OSDs: 121(29), 35(30), 47(30), 131(32), 145(32)
|
| Avg Deviation from Most Subscribed OSD: 37.1%
|

++
| OSDs in Primary Role (Up)
|
| Expected PGs Per OSD: Min: 4, Max: 33, Mean: 19.5, Std Dev: 7.2
|
| Actual PGs Per OSD: Min: 7, Max: 43, Mean: 19.5, Std Dev: 6.5
|
| 5 Most Subscribed OSDs: 199(43), 175(36), 149(34), 167(32), 20(31)
|
| 5 Least Subscribed OSDs: 121(7), 46(7), 70(8), 94(9), 122(9)
|
| Avg Deviation from Most Subscribed OSD: 54.6%
|

++
| OSDs in Secondary Role (Up)
|
| Expected PGs Per OSD: Min: 18, Max: 59, Mean: 39.0, Std Dev: 10.2
|
| Actual PGs Per OSD: Min: 17, Max: 61, Mean: 39.0, Std Dev: 9.7
|
| 5 Most Subscribed OSDs: 44(61), 14(60), 2(59), 167(59), 164(57)
|
| 5 Least Subscribed OSDs: 35(17), 31(20), 37(20), 145(20), 16(20)
|
| Avg Deviation from Most Subscribed OSD: 36.0%
|

++
| OSDs in All Roles (Up)
|
| Expected PGs Per OSD: Min: 32, Max: 83, Mean: 58.5, Std Dev: 12.5
|
| Actual PGs 

Re: [ceph-users] pool distribution quality report script

2015-03-05 Thread David Burley
Mark,

It worked for me earlier this morning but the new rev is throwing a
traceback:

$ ceph pg dump -f json | python ./readpgdump.py  pgdump_analysis.txt
dumped all in format json
Traceback (most recent call last):
  File ./readpgdump.py, line 294, in module
parse_json(data)
  File ./readpgdump.py, line 263, in parse_json
print_report(pool_counts, total_counts, JSON)
  File ./readpgdump.py, line 119, in print_report
print_data(data, pool_weights, total_weights)
  File ./readpgdump.py, line 161, in print_data
print format_line(Efficiency score using optimal weights for pool %s:
%.1f%% % (pool, efficiency_score(data[name], weights['acting_totals'])))
  File ./readpgdump.py, line 71, in efficiency_score
if weights and weights[osd]:
KeyError: 0

On Thu, Mar 5, 2015 at 1:46 PM, Mark Nelson mnel...@redhat.com wrote:

 Hi Blair,

 I've updated the script and it now (theoretically) computes optimal crush
 weights based on both primary and secondary acting set OSDs.  It also
 attempts to show you the efficiency of equal weights vs using weights
 optimized for different pools (or all pools).  This is done by looking at
 the way weights would be assigned and how they would affect the current
 pool distribution, then looking at the skew for the heaviest weighted OSD
 vs the average.

 Unfortunately the output has now become beastly and complex (granted this
 is a large cluster with many pools!).  I think the trick now is how to make
 the interface for this more manageable.  For instance perhaps it's not
 interesting to know how one pool's weights affect the efficiency of the
 acting primary OSDs for another pool.

 I've included sample output, but it's huge (15K lines long!)

 Mark


 On 03/05/2015 01:52 AM, Blair Bethwaite wrote:

 Hi Mark,

 Cool, that looks handy. Though it'd be even better if it could go a
 step further and recommend re-weighting values to balance things out
 (or increased PG counts where needed).

 Cheers,

 On 5 March 2015 at 15:11, Mark Nelson mnel...@redhat.com wrote:

 Hi All,

 Recently some folks showed interest in gathering pool distribution
 statistics and I remembered I wrote a script to do that a while back. It
 was
 broken due to a change in the ceph pg dump output format that was
 committed
 a while back, so I cleaned the script up, added detection of header
 fields,
 automatic json support, and also added in calculation of expected max and
 min PGs per OSD and std deviation.

 The script is available here:

 https://github.com/ceph/ceph-tools/blob/master/cbt/tools/readpgdump.py

 Some general comments:

 1) Expected numbers are derived by treating PGs and OSDs as a
 balls-in-buckets problem ala Raab  Steger:

 http://www14.in.tum.de/personen/raab/publ/balls.pdf

 2) You can invoke it either by passing it a file or stdout, ie:

 ceph pg dump -f json | ./readpgdump.py

 or

 ./readpgdump.py ~/pgdump.out


 3) Here's a snippet of some of some sample output from a 210 OSD cluster.
 Does this output make sense to people?  Is it useful?

  [nhm@burnupiX tools]$ ./readpgdump.py ~/pgdump.out

 +---
 -+
 | Detected input as plain
 |

 +---
 -+


 +---
 -+
 | Pool ID: 681
 |

 +---
 -+
 | Participating OSDs: 210
 |
 | Participating PGs: 4096
 |

 +---
 -+
 | OSDs in Primary Role (Acting)
 |
 | Expected PGs Per OSD: Min: 4, Max: 33, Mean: 19.5, Std Dev: 7.2
 |
 | Actual PGs Per OSD: Min: 7, Max: 43, Mean: 19.5, Std Dev: 6.5
 |
 | 5 Most Subscribed OSDs: 199(43), 175(36), 149(34), 167(32), 20(31)
 |
 | 5 Least Subscribed OSDs: 121(7), 46(7), 70(8), 94(9), 122(9)
 |
 | Avg Deviation from Most Subscribed OSD: 54.6%
 |

 +---
 -+
 | OSDs in Secondary Role (Acting)
 |
 | Expected PGs Per OSD: Min: 18, Max: 59, Mean: 39.0, Std Dev: 10.2
 |
 | Actual PGs Per OSD: Min: 17, Max: 61, Mean: 39.0, Std Dev: 9.7
 |
 | 5 Most Subscribed OSDs: 44(61), 14(60), 2(59), 167(59), 164(57)
 |
 | 5 Least Subscribed OSDs: 35(17), 31(20), 37(20), 145(20), 16(20)
 |
 | Avg Deviation from Most Subscribed OSD: 36.0%
 |

 +---
 -+
 | OSDs in All Roles (Acting)
 |
 | Expected PGs Per OSD: Min: 32, Max: 83, Mean: 58.5, Std Dev: 12.5
 |
 | Actual PGs Per OSD: Min: 29, Max: 93, Mean: 58.5, Std Dev: 14.6
 |
 | 5 Most Subscribed OSDs: 199(93), 175(92), 44(92), 167(91), 14(91)
 |
 | 5 Least Subscribed OSDs: 121(29), 35(30), 47(30), 131(32), 145(32)
 |
 | Avg Deviation from Most Subscribed OSD: 37.1%
 |

 +---
 -+
 | OSDs in Primary Role (Up)
 

[ceph-users] Ceph repo - RSYNC?

2015-03-05 Thread Brian Rak
Do any of the Ceph repositories run rsync?  We generally mirror the 
repository locally so we don't encounter any unexpected upgrades.


eu.ceph.com used to run this, but it seems to be down now.

# rsync rsync://eu.ceph.com
rsync: failed to connect to eu.ceph.com: Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(124) 
[receiver=3.0.6]


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Understand RadosGW logs

2015-03-05 Thread Daniel Schneller

Bump...

On 2015-03-03 10:54:13 +, Daniel Schneller said:


Hi!

After realizing the problem with log rotation (see
http://thread.gmane.org/gmane.comp.file-systems.ceph.user/17708)
and fixing it, I now for the first time have some
meaningful (and recent) logs to look at.

While from an application perspective there seem
to be no issues, I would like to understand some
messages I find with relatively high frequency in
the logs:

Exhibit 1
-
2015-03-03 11:14:53.685361 7fcf4bfef700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:15:57.476059 7fcf39ff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:17:43.570986 7fcf25fcb700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:22:00.881640 7fcf39ff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:22:48.147011 7fcf35feb700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:27:40.572723 7fcf50ff9700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:29:40.082954 7fcf36fed700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 11:30:32.204492 7fcf4dff3700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1

I cannot find anything relevant by Googling for
that, apart from the actual line of code that
produces this line.
What does that mean? Is it an indication of data
corruption or are there more benign reasons for
this line?


Exhibit 2
--
Several of these blocks

2015-03-03 07:06:17.805772 7fcf36fed700  1 == starting new request
req=0x7fcf5800f3b0 =
2015-03-03 07:06:17.836671 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836758 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
part_ofs=0 rule-part_size=0
2015-03-03 07:06:17.836918 7fcf36fed700  0
RGWObjManifest::operator++(): result: ofs=13055243 stripe_ofs=13055243
part_ofs=0 rule-part_size=0
2015-03-03 07:06:18.263126 7fcf36fed700  1 == req done
req=0x7fcf5800f3b0 http_status=200 ==
...
2015-03-03 09:27:29.855001 7fcf28fd1700  1 == starting new request
req=0x7fcf580102a0 =
2015-03-03 09:27:29.866718 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=4718592 stripe_ofs=4718592
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866778 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=8912896 stripe_ofs=8912896
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866852 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=13107200 stripe_ofs=13107200
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.866917 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=17301504 stripe_ofs=17301504
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.875466 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=21495808 stripe_ofs=21495808
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.884434 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=25690112 stripe_ofs=25690112
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.906155 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=29884416 stripe_ofs=29884416
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.914364 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=34078720 stripe_ofs=34078720
part_ofs=0 rule-part_size=0
2015-03-03 09:27:29.940653 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=38273024 stripe_ofs=38273024
part_ofs=0 rule-part_size=0
2015-03-03 09:27:30.272816 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=42467328 stripe_ofs=42467328
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.125773 7fcf28fd1700  0
RGWObjManifest::operator++(): result: ofs=46661632 stripe_ofs=46661632
part_ofs=0 rule-part_size=0
2015-03-03 09:27:31.192661 7fcf28fd1700  0 ERROR: flush_read_list():
d-client_c-handle_data() returned -1
2015-03-03 09:27:31.194481 7fcf28fd1700  1 == req done
req=0x7fcf580102a0 http_status=200 ==
...
2015-03-03 09:28:43.008517 7fcf2a7d4700  1 == starting new request
req=0x7fcf580102a0 =
2015-03-03 09:28:43.016414 7fcf2a7d4700  0
RGWObjManifest::operator++(): result: ofs=887579 stripe_ofs=887579
part_ofs=0 rule-part_size=0
2015-03-03 09:28:43.022387 7fcf2a7d4700  1 == req done
req=0x7fcf580102a0 http_status=200 ==

First, what is the req= line? Is that a thread-id?
I am asking, because the same id is used over and over
in the same file over time.

More importantly, what do the RGWObjManifest::operator++():...
lines mean? In the middle case above the block even ends
with one of the ERROR lines mentioned before, but the HTTP
status is still 200, suggesting a succesful operation.

Thanks in advance for shedding some light, because I would like
to know if I need to take some action or at least keep an
eye on these via monitoring?

Cheers,
Daniel





[ceph-users] Question about notification of OSD down in client side

2015-03-05 Thread Dennis Chen
Hello,

Is there some way to make the client(via RADOS API or something like
that) to get the notification of an event (for example, an OSD down)
happened in the cluster?

-- 
Den
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph User Teething Problems

2015-03-05 Thread Datatone Lists

Thank you all for such wonderful feedback.

Thank you to John Spray for putting me on the right track. I now see
that the cephfs aspect of the project is being de-emphasised, so that
the manual deployment instructions tell how to set up the object store,
and then the cephfs is a separate issue that needs to be explicitly set
up and configured in its own right. So that explains why the cephfs
pools are not created by default, and why the required cephfs pools are
now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
'cephfs_metadata'. I have created these pools, and created a new cephfs
filesystem, and I can mount it without problem.

This confirms my suspicion that the manual deployment pages are in need
of review and revision. They still refer to three default pools. I am
happy that this section should deal with the object store setup only,
but I still think that the osd part is a bit confused and confusing,
particularly with respect to what is done on which machine. It would
then be useful to say something like this completes the configuration
of the basic store. If you wish to use cephfs, you must set up a
metadata server, appropriate pools, and a cephfs filesystem. (See
http://...).

I was not trying to be smart or obscure when I made a brief and
apparently dismissive reference to ceph-deploy. I railed against it and
the demise of mkcephfs on this list at the point that mkcephfs was
discontinued in the releases. That caused a few supportive responses at
the time, so I know that I'm not alone. I did not wish to trawl over
those arguments again unnecessarily.

There is a principle that is being missed. The 'ceph' code contains
everything required to set up and operate a ceph cluster. There should
be documentation detailing how this is done.

'Ceph-deploy' is a separate thing. It is one of several tools that
promise to make setting things up easy. However, my resistance is based
on two factors. If I recall correctly, it is one of those projects in
which the configuration needs to know what 'distribution' is being
used. (Presumably, this is to try to deduce where various things are
located). So if one is not using one of these 'distributions', one is
stuffed right from the start. Secondly, the challenge that we are
trying to overcome is learning what the various ceph components need,
and how they need to be set up and configured. I don't think that the
don't worry your pretty little head about that, we have a natty tool
to do it for you approach is particularly useful.

So I am not knocking ceph-deploy, Travis, it is just that I do not
believe that it is relevant or useful to me at this point in time.

I see that Lionel Bouton seems to share my views here.

In general, the ceph documentation (in my humble opinion) needs to be
draughted with a keen eye on the required scope. Deal with ceph; don't
let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
anything else that is not actually part of ceph.

As an example, once you have configured your osd, you start it with:

ceph-osd -i {osd-number}

It is as simple as that! 

If it is required to start the osd automatically, then that will be
done using sysvinit, upstart, systemd, or whatever else is being used
to bring the system up in the first place. It is unnecessary and
confusing to try to second-guess the environment in which ceph may be
being used, and contaminate the documentation with such details.
(Having said that, I see no problem with adding separate, helpful,
sections such as Suggestions for starting using 'upstart', or
Suggestions for starting using 'systemd').

So I would reiterate the point that the really important documentation
is probably quite simple for an expert to produce. Just spell out what
each component needs in terms of keys, access to keys, files, and so
on. Spell out how to set everything up. Also how to change things after
the event, so that 'trial and error' does not have to contain really
expensive errors. Once we understand the fundamentals, getting fancy
and efficient is a completely separate further goal, and is not really
a responsibility of core ceph development.

I have an inexplicable emotional desire to see ceph working well with
btrfs, which I like very much and have been using since the very early
days. Despite all the 'not ready for production' warnings, I adopted it
with enthusiasm, and have never had cause to regret it, and only once
or twice experienced a failure that was painful to me. However, as I
have experimented with ceph over the years, it has been very clear that
ceph seems to be the most ruthless stress test for it, and it has
always broken quite quickly (I also used xfs for comparison). I have
seen evidence of much work going into btrfs in the kernel development
now that the lead developer has moved from Oracle to, I think, Facebook.

I now share the view that I think Robert LeBlanc has, that maybe btrfs
will now stand the ceph test.

Thanks, Lincoln Bryant, for confirming 

Re: [ceph-users] Ceph User Teething Problems

2015-03-05 Thread Robert LeBlanc
David,

You will need to up the limit of open files in the linux system. Check
/etc/security/limits.conf. it is explained somewhere in the docs and the
autostart scripts 'fixes' the issue for most people. When I did a manual
deploy for the same reasons you are, I ran into this too.

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 5, 2015 3:14 AM, Datatone Lists li...@datatone.co.uk wrote:


 Thank you all for such wonderful feedback.

 Thank you to John Spray for putting me on the right track. I now see
 that the cephfs aspect of the project is being de-emphasised, so that
 the manual deployment instructions tell how to set up the object store,
 and then the cephfs is a separate issue that needs to be explicitly set
 up and configured in its own right. So that explains why the cephfs
 pools are not created by default, and why the required cephfs pools are
 now referred to, not as 'data' and 'metadata', but 'cepfs_data' and
 'cephfs_metadata'. I have created these pools, and created a new cephfs
 filesystem, and I can mount it without problem.

 This confirms my suspicion that the manual deployment pages are in need
 of review and revision. They still refer to three default pools. I am
 happy that this section should deal with the object store setup only,
 but I still think that the osd part is a bit confused and confusing,
 particularly with respect to what is done on which machine. It would
 then be useful to say something like this completes the configuration
 of the basic store. If you wish to use cephfs, you must set up a
 metadata server, appropriate pools, and a cephfs filesystem. (See
 http://...).

 I was not trying to be smart or obscure when I made a brief and
 apparently dismissive reference to ceph-deploy. I railed against it and
 the demise of mkcephfs on this list at the point that mkcephfs was
 discontinued in the releases. That caused a few supportive responses at
 the time, so I know that I'm not alone. I did not wish to trawl over
 those arguments again unnecessarily.

 There is a principle that is being missed. The 'ceph' code contains
 everything required to set up and operate a ceph cluster. There should
 be documentation detailing how this is done.

 'Ceph-deploy' is a separate thing. It is one of several tools that
 promise to make setting things up easy. However, my resistance is based
 on two factors. If I recall correctly, it is one of those projects in
 which the configuration needs to know what 'distribution' is being
 used. (Presumably, this is to try to deduce where various things are
 located). So if one is not using one of these 'distributions', one is
 stuffed right from the start. Secondly, the challenge that we are
 trying to overcome is learning what the various ceph components need,
 and how they need to be set up and configured. I don't think that the
 don't worry your pretty little head about that, we have a natty tool
 to do it for you approach is particularly useful.

 So I am not knocking ceph-deploy, Travis, it is just that I do not
 believe that it is relevant or useful to me at this point in time.

 I see that Lionel Bouton seems to share my views here.

 In general, the ceph documentation (in my humble opinion) needs to be
 draughted with a keen eye on the required scope. Deal with ceph; don't
 let it get contaminated with 'ceph-deploy', 'upstart', 'systemd', or
 anything else that is not actually part of ceph.

 As an example, once you have configured your osd, you start it with:

 ceph-osd -i {osd-number}

 It is as simple as that!

 If it is required to start the osd automatically, then that will be
 done using sysvinit, upstart, systemd, or whatever else is being used
 to bring the system up in the first place. It is unnecessary and
 confusing to try to second-guess the environment in which ceph may be
 being used, and contaminate the documentation with such details.
 (Having said that, I see no problem with adding separate, helpful,
 sections such as Suggestions for starting using 'upstart', or
 Suggestions for starting using 'systemd').

 So I would reiterate the point that the really important documentation
 is probably quite simple for an expert to produce. Just spell out what
 each component needs in terms of keys, access to keys, files, and so
 on. Spell out how to set everything up. Also how to change things after
 the event, so that 'trial and error' does not have to contain really
 expensive errors. Once we understand the fundamentals, getting fancy
 and efficient is a completely separate further goal, and is not really
 a responsibility of core ceph development.

 I have an inexplicable emotional desire to see ceph working well with
 btrfs, which I like very much and have been using since the very early
 days. Despite all the 'not ready for production' warnings, I adopted it
 with enthusiasm, and have never had cause to regret it, and only once
 or twice experienced a failure that was painful to me. However, as I
 have 

[ceph-users] 【ceph-users】Journal size when use ceph-deploy to add new osd

2015-03-05 Thread Alexander Yang
hello everyone,
recently, I have a doubt about ceph osd journal.
I use ceph-deploy to add new osd which the version is 1.4.0.   And my ceph
version is 0.80.5
the /dev/sdb is a sata disk,and the /dev/sdk is a ssd disk, the sdk1
partition size is 50G.


ceph-deploy osd prepare host1:/dev/sdb1:/dev/sdk1
ceph-deploy osd activate host1:/dev/sdb1:/dev/sdk1


After I done this two command,the new osd is begin to work.
In my ceph.conf, I don't set the osd journal path and the osd journal
size,they are default.

then I use command *ceph --admin-daemon /var/run/ceph/ceph-osd.*.asok
 config show | grep osd_journal_size * to check the osd socket file.
I get the result :*osd_journal_size:5120。*
That means the journal size is 5GB, the ceph-deploy don't set the journal
full of my ssd disk.


OK! then I restart this osd, and get the osd log:
...
...
2015-03-06 11:51:30.451245 7fd6c39df7a0  0
filestore(/var/lib/ceph/osd/ceph-11) mount: WRITEAHEAD journal mode
explicitly enabled in conf
2015-03-06 11:51:30.454400 7fd6c39df7a0  1 journal _open
/var/lib/ceph/osd/ceph-11/journal fd 21:* 53687091200 bytes*, block size
4096 bytes, directio = 1, aio = 0
2015-03-06 11:51:30.454551 7fd6c39df7a0  1 journal _open
/var/lib/ceph/osd/ceph-11/journal fd 21: *53687091200 bytes*, block size
4096 bytes, directio = 1, aio = 0
2015-03-06 11:51:30.505709 7fd6c39df7a0  0 cls
cls/hello/cls_hello.cc:271: loading cls_hello
...

This means the journal size is 50GB, the ceph-deploy set the journal full
of my ssd disk.


So...which is correct?
And  when I set the osd_journal_size after I used ceph-deploy to add a new
osd, whether this set begin to work ??

Thanks very much!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com