Re: [ceph-users] Issue with free Inodes

2015-03-25 Thread Kamil Kuramshin

May be some one can spot a new light,

1. Only SSD-cache OSDs affected by this issue
2. Total cache OSD count is 12x60GiB, backend filesystem is ext4
3. I have created 2 cache tier pools with replica size=3 on that OSD, 
both with pg_num:400, pgp_num:400

4. There was a crush ruleset:
superuser@admin:~$ ceph osd crush rule dump ssd
{ rule_id: 3,
  rule_name: ssd,
  ruleset: 3,
  type: 1,
  min_size: 1,
  max_size: 10,
  steps: [
{ op: take,
  item: -21,
  item_name: ssd},
{ op: chooseleaf_firstn,
  num: 0,
  type: disktype},
{ op: emit}]}
for gathering all SSD OSDs from all nodes by *disktype*

I guess there may be a lot of *directories* that was created on 
filesystem for organizing placement groups, can that cause that very big 
amount of inodes occupied by directory records?




24.03.2015 16:52, Gregory Farnum пишет:

On Tue, Mar 24, 2015 at 12:13 AM, Christian Balzer ch...@gol.com wrote:

On Tue, 24 Mar 2015 09:41:04 +0300 Kamil Kuramshin wrote:


Yes I read it and do no not understand what you mean when say *verify
this*? All 3335808 inodes are definetly files and direcories created by
ceph OSD process:


What I mean is how/why did Ceph create 3+ million files, where in the tree
are they actually or are they evenly distributed in the respective PG
sub-directories.

Or to ask it differently, how large is your cluster (how many OSDs,
objects), in short the output of ceph -s.

If cache-tiers actually are reserving each object that exists on the
backing store (even if there isn't data in it yet on the cache tier) and
your cluster is large enough, it might explain this.

Nope. As you've said, this doesn't make any sense unless the objects
are all ludicrously small (and you can't actually get 10-byte objects
in Ceph; the names alone tend to be bigger than that) or something
else is using up inodes.


And that should both be mentioned and precautions to not run out of inodes
should be made by the Ceph code.

If not, this may be a bug after all.

Would be nice if somebody from the Ceph devs could have gander at this.

Christian


*tune2fs 1.42.5 (29-Jul-2012)*
Filesystem volume name:   none
Last mounted on:  /var/lib/ceph/tmp/mnt.05NAJ3
Filesystem UUID: e4dcca8a-7b68-4f60-9b10-c164dc7f9e33
Filesystem magic number:  0xEF53
Filesystem revision #:1 (dynamic)
Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options:user_xattr acl
Filesystem state: clean
Errors behavior:  Continue
Filesystem OS type:   Linux
*Inode count:  3335808*
Block count:  13342945
Reserved block count: 667147
Free blocks:  5674105
*Free inodes:  0*
First block:  0
Block size:   4096
Fragment size:4096
Reserved GDT blocks:  1020
Blocks per group: 32768
Fragments per group:  32768
Inodes per group: 8176
Inode blocks per group:   511
Flex block group size:16
Filesystem created:   Fri Feb 20 16:44:25 2015
Last mount time:  Tue Mar 24 09:33:19 2015
Last write time:  Tue Mar 24 09:33:27 2015
Mount count:  7
Maximum mount count:  -1
Last checked: Fri Feb 20 16:44:25 2015
Check interval:   0 (none)
Lifetime writes:  4116 GB
Reserved blocks uid:  0 (user root)
Reserved blocks gid:  0 (group root)
First inode:  11
Inode size:   256
Required extra isize: 28
Desired extra isize:  28
Journal inode:8
Default directory hash:   half_md4
Directory Hash Seed: 148ee5dd-7ee0-470c-a08a-b11c318ff90b
Journal backup:   inode blocks

*fsck.ext4 /dev/sda1*
e2fsck 1.42.5 (29-Jul-2012)
/dev/sda1: clean, 3335808/3335808 files, 7668840/13342945 blocks

23.03.2015 17:09, Christian Balzer пишет:

On Mon, 23 Mar 2015 15:26:07 +0300 Kamil Kuramshin wrote:


Yes, I understand that.

The initial purpose of first email was just an advise for new comers.
My fault was in that I was selected ext4 for SSD disks as backend.
But I  did not foresee that inode number can reach its limit before
the free space :)

And maybe there must be some sort of warning not only for free space
in MiBs(GiBs,TiBs) and there must be dedicated warning about free
inodes for filesystems with static inode allocation  like ext4.
Because if OSD reach inode limit it becames totally unusable and
immediately goes down, and from that moment there is no way to start
it!


While all that is true and should probably be addressed, please re-read
what I wrote before.

With the 3.3 million inodes used and thus likely as many files (did you
verify this?) and 4MB objects that would make something in the 12TB
ballpark area.

Something very very strange and wrong is going on with your cache tier.

Christian


23.03.2015 

[ceph-users] ceph -w: Understanding MB data versus MB used

2015-03-25 Thread Saverio Proto
Hello there,

I started to push data into my ceph cluster. There is something I
cannot understand in the output of ceph -w.

When I run ceph -w I get this kinkd of output:

2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056
active+clean; 2379 MB data, 19788 MB used, 33497 GB / 33516 GB avail


2379MB is actually the data I pushed into the cluster, I can see it
also in the ceph df output, and the numbers are consistent.

What I dont understand is 19788MB used. All my pools have size 3, so I
expected something like 2379 * 3. Instead this number is very big.

I really need to understand how MB used grows because I need to know
how many disks to buy.

Any hints ?

thank you.

Saverio
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Udo Lembke
Hi,
due to two more hosts (now 7 storage nodes) I want to create an new
ec-pool and get an strange effect:

ceph@admin:~$ ceph health detail
HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2
pgs stuck undersized; 2 pgs undersized
pg 22.3e5 is stuck unclean since forever, current state
active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
pg 22.240 is stuck unclean since forever, current state
active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
pg 22.3e5 is stuck undersized for 406.614447, current state
active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
pg 22.240 is stuck undersized for 406.616563, current state
active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
pg 22.3e5 is stuck degraded for 406.614566, current state
active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
pg 22.240 is stuck degraded for 406.616679, current state
active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
pg 22.3e5 is active+undersized+degraded, acting
[76,15,82,11,57,29,2147483647]
pg 22.240 is active+undersized+degraded, acting
[38,85,17,74,2147483647,10,58]

But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!
Where the heck came the 2147483647 from?

I do following commands:
ceph osd erasure-code-profile set 7hostprofile k=5 m=2
ruleset-failure-domain=host
ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile

my version:
ceph -v
ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)


I found an issue in my crush-map - one SSD was twice in the map:
host ceph-061-ssd {
id -16  # do not change unnecessarily
# weight 0.000
alg straw
hash 0  # rjenkins1
}
root ssd {
id -13  # do not change unnecessarily
# weight 0.780
alg straw
hash 0  # rjenkins1
item ceph-01-ssd weight 0.170
item ceph-02-ssd weight 0.170
item ceph-03-ssd weight 0.000
item ceph-04-ssd weight 0.170
item ceph-05-ssd weight 0.170
item ceph-06-ssd weight 0.050
item ceph-07-ssd weight 0.050
item ceph-061-ssd weight 0.000
}

Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd,
but after fix the crusmap the issue with the osd 2147483647 still excist.

Any idea how to fix that?

regards

Udo

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ERROR: missing keyring, cannot use cephx for authentication

2015-03-25 Thread oyym...@gmail.com
Hi,Jesus
I encountered similar problem.
1. shut down one of nodes, but all osds can't reactive on the node after reboot.
2. run service ceph restart  manually, got the same error message:
[root@storage4 ~]# /etc/init.d/ceph start 
=== osd.15 === 
2015-03-23 14:43:32.399811 7fed0fcf4700 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication 
2015-03-23 14:43:32.399814 7fed0fcf4700 0 librados: osd.15 initialization error 
(2) No such file or directory 
Error connecting to cluster: ObjectNotFound 
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.15 
--keyring=/var/lib/ceph/osd/ceph-15/keyring osd crush create-or-move -- 15 0.19 
host=storage4 root=default'
..
3.  ll /var/lib/ceph/osd/ceph-15/ 
total 0

all files disappeared in the /var/lib/ceph/osd/ceph-15/ 






oyym...@gmail.com
 
From: Jesus Chavez (jeschave)
Date: 2015-03-24 05:09
To: ceph-users
Subject: [ceph-users] ERROR: missing keyring, cannot use cephx for 
authentication
Hi all, I did HA failover test shutting down 1 node and I see that only 1 OSD 
came up after reboot: 

[root@geminis ceph]# df -h
Filesystem Size  Used Avail Use% Mounted on
/dev/mapper/rhel-root   50G  4.5G   46G   9% /
devtmpfs   126G 0  126G   0% /dev
tmpfs  126G   80K  126G   1% /dev/shm
tmpfs  126G  9.9M  126G   1% /run
tmpfs  126G 0  126G   0% /sys/fs/cgroup
/dev/sda1  494M  165M  330M  34% /boot
/dev/mapper/rhel-home   36G   44M   36G   1% /home
/dev/sdc1  3.7T  134M  3.7T   1% /var/lib/ceph/osd/ceph-14

If I run service ceph restart I got this error message…

Stopping Ceph osd.94 on geminis...done
=== osd.94 ===
2015-03-23 15:05:41.632505 7fe7b9941700 -1 monclient(hunting): ERROR: missing 
keyring, cannot use cephx for authentication
2015-03-23 15:05:41.632508 7fe7b9941700  0 librados: osd.94 initialization 
error (2) No such file or directory
Error connecting to cluster: ObjectNotFound
failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.94 
--keyring=/var/lib/ceph/osd/ceph-94/keyring osd crush create-or-move -- 94 0.05 
host=geminis root=default


I have ceph.conf and ceph.client.admin.keyring under /etc/ceph:


[root@geminis ceph]# ls /etc/ceph
ceph.client.admin.keyring  ceph.conf  rbdmap  tmp1OqNFi  tmptQ0a1P
[root@geminis ceph]#


does anybody know what could be wrong?

Thanks





Jesus Chavez
SYSTEMS ENGINEER-C.SALES

jesch...@cisco.com
Phone: +52 55 5267 3146
Mobile: +51 1 5538883255

CCIE - 44433

Cisco.com





  Think before you print.
This email may contain confidential and privileged material for the sole use of 
the intended recipient. Any review, use, distribution or disclosure by others 
is strictly prohibited. If you are not the intended recipient (or authorized to 
receive for the recipient), please contact the sender by reply email and delete 
all copies of this message.
Please click here for Company Registration Information.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Erasure coding

2015-03-25 Thread Tom Verdaat
Hi guys,

We've got a very small Ceph cluster (3 hosts, 5 OSD's each for cold data)
that we intend to grow later on as more storage is needed. We would very
much like to use Erasure Coding for some pools but are facing some
challenges regarding the optimal initial profile “replication” settings
given the limited number of initial hosts that we can use to spread the
chunks. Could somebody please help me with the following questions?

   1.

   Suppose we initially use replication in stead of erasure. Can we convert
   a replicated pool to an erasure coded pool later on?
   2.

   Will Ceph gain the ability to change the K and N values for an existing
   pool in the near future?
   3.

   Can the failure domain be changed for an existing pool? E.g. can we
   start with failure domain OSD and then switch it to Host after adding more
   hosts?
   4.

   Where can I find a good comparison of the available erasure code plugins
   that allows me to properly decide which one suits are needs best?

 Many thanks for your help!

 Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Snapshots and fstrim with cache tiers ?

2015-03-25 Thread Frédéric Nass


Hello, 




I have a few questions regarding snapshots and fstrim with cache tiers. 




In the cache tier and erasure coding FAQ related to ICE 1.2 (based on 
Firefly), Inktank says Snapshots are not supported in conjunction with cache 
tiers. 

What are the risks of using snapshots with cache tiers ? Would this better not 
use it recommandation still be true with Giant or Hammer ? 




Regarding the fstrim command, it doesn't seem to work with cache tiers. The 
freed up blocks don't get back in the ceph cluster. 
Can someone confirm this ? Is there something we can do to get those freed up 
blocks back in the cluster ? 




Also, can we run an fstrim task from the cluster side ? That is, without having 
to map and mount each rbd image or rely on the client to operate this task ? 




Best regards, 





-- 

Frédéric Nass 

Sous-direction Infrastructures 
Direction du Numérique 
Université de Lorraine 

email : frederic.n...@univ-lorraine.fr 
Tél : +33 3 83 68 53 83 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Erasure coding

2015-03-25 Thread Loic Dachary
Hi Tom,

On 25/03/2015 11:31, Tom Verdaat wrote: Hi guys,
 
 We've got a very small Ceph cluster (3 hosts, 5 OSD's each for cold data) 
 that we intend to grow later on as more storage is needed. We would very much 
 like to use Erasure Coding for some pools but are facing some challenges 
 regarding the optimal initial profile “replication” settings given the 
 limited number of initial hosts that we can use to spread the chunks. Could 
 somebody please help me with the following questions?
 
  1.
 
 Suppose we initially use replication in stead of erasure. Can we convert 
 a replicated pool to an erasure coded pool later on?

What you would do is create an erasure coded pool later and have the initial 
replicated pool as a cache in front of it. 

http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

Objects from the replicated pool will move to the erasure coded pool if they 
are not used and it will save space. You don't need to create the erasure coded 
pool on your small cluster. You can do it when it grows larger or when it 
becomes full.

  2.
 
 Will Ceph gain the ability to change the K and N values for an existing 
 pool in the near future?

I don't think so.

  3.
 
 Can the failure domain be changed for an existing pool? E.g. can we start 
 with failure domain OSD and then switch it to Host after adding more hosts?

The failure domain, although listed in the erasure code profile for 
convenience, really belongs to the crush ruleset applied to the pool. It can 
therefore be changed after the pool is created. It is likely to result in 
objects moving a lot during the transition but it should work fine otherwise.

  4.
 
 Where can I find a good comparison of the available erasure code plugins 
 that allows me to properly decide which one suits are needs best?

In a nutshell, jerasure is flexible and is likely to be what you want, isa 
computes faster than jerasure but only works on intel processors (note however 
that the erasure code computation does not make a significant difference 
overall), lrc and shec (to be published in hammer) minimize network usage 
during recovery but uses more space than jerasure or isa.

Cheers

 Many thanks for your help!
 
 Tom
 
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ERROR: missing keyring, cannot use cephx for authentication

2015-03-25 Thread Robert LeBlanc
It doesn't look like your OSD is mounted. What do you have when you run
mount? How did you create your OSDs?

Robert LeBlanc

Sent from a mobile device please excuse any typos.
On Mar 25, 2015 1:31 AM, oyym...@gmail.com oyym...@gmail.com wrote:

 Hi,Jesus
 I encountered similar problem.
 *1.* shut down one of nodes, but all osds can't reactive on the node
 after reboot.
 *2.* run service ceph restart  manually, got the same error message:
 [root@storage4 ~]# /etc/init.d/ceph start
 === osd.15 ===
 2015-03-23 14:43:32.399811 7fed0fcf4700 -1 monclient(hunting): ERROR:
 missing keyring, cannot use cephx for authentication
 2015-03-23 14:43:32.399814 7fed0fcf4700 0 librados: osd.15 initialization
 error (2) No such file or directory
 Error connecting to cluster: ObjectNotFound
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.15
 --keyring=/var/lib/ceph/osd/ceph-15/keyring osd crush create-or-move -- 15
 0.19 host=storage4 root=default'
 ..
 3.  ll /var/lib/ceph/osd/ceph-15/
 total 0

 all files *disappeared* in the /var/lib/ceph/osd/ceph-15/




 --
 oyym...@gmail.com


 *From:* Jesus Chavez (jeschave) jesch...@cisco.com
 *Date:* 2015-03-24 05:09
 *To:* ceph-users ceph-users@lists.ceph.com
 *Subject:* [ceph-users] ERROR: missing keyring, cannot use cephx for
 authentication
 Hi all, I did HA failover test shutting down 1 node and I see that only 1
 OSD came up after reboot:

  [root@geminis ceph]# df -h
 Filesystem Size  Used Avail Use% Mounted on
 /dev/mapper/rhel-root   50G  4.5G   46G   9% /
 devtmpfs   126G 0  126G   0% /dev
 tmpfs  126G   80K  126G   1% /dev/shm
 tmpfs  126G  9.9M  126G   1% /run
 tmpfs  126G 0  126G   0% /sys/fs/cgroup
 /dev/sda1  494M  165M  330M  34% /boot
 /dev/mapper/rhel-home   36G   44M   36G   1% /home
 /dev/sdc1  3.7T  134M  3.7T   1% /var/lib/ceph/osd/ceph-14

  If I run service ceph restart I got this error message…

  Stopping Ceph osd.94 on geminis...done
 === osd.94 ===
 2015-03-23 15:05:41.632505 7fe7b9941700 -1 monclient(hunting): ERROR:
 missing keyring, cannot use cephx for authentication
 2015-03-23 15:05:41.632508 7fe7b9941700  0 librados: osd.94 initialization
 error (2) No such file or directory
 Error connecting to cluster: ObjectNotFound
 failed: 'timeout 30 /usr/bin/ceph -c /etc/ceph/ceph.conf --name=osd.94
 --keyring=/var/lib/ceph/osd/ceph-94/keyring osd crush create-or-move -- 94
 0.05 host=geminis root=default


  I have ceph.conf and ceph.client.admin.keyring under /etc/ceph:


  [root@geminis ceph]# ls /etc/ceph
 ceph.client.admin.keyring  ceph.conf  rbdmap  tmp1OqNFi  tmptQ0a1P
 [root@geminis ceph]#


  does anybody know what could be wrong?

  Thanks





 * Jesus Chavez*
 SYSTEMS ENGINEER-C.SALES

 jesch...@cisco.com
 Phone: *+52 55 5267 3146 %2B52%2055%205267%203146*
 Mobile: *+51 1 5538883255*

 CCIE - 44433


 Cisco.com http://www.cisco.com/





   Think before you print.

 This email may contain confidential and privileged material for the sole
 use of the intended recipient. Any review, use, distribution or disclosure
 by others is strictly prohibited. If you are not the intended recipient (or
 authorized to receive for the recipient), please contact the sender by
 reply email and delete all copies of this message.

 Please click here
 http://www.cisco.com/web/about/doing_business/legal/cri/index.html for
 Company Registration Information.





 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Gregory Farnum
On Wed, Mar 25, 2015 at 1:20 AM, Udo Lembke ulem...@polarzone.de wrote:
 Hi,
 due to two more hosts (now 7 storage nodes) I want to create an new
 ec-pool and get an strange effect:

 ceph@admin:~$ ceph health detail
 HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2
 pgs stuck undersized; 2 pgs undersized

This is the big clue: you have two undersized PGs!

 pg 22.3e5 is stuck unclean since forever, current state
 active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

2147483647 is the largest number you can represent in a signed 32-bit
integer. There's an output error of some kind which is fixed
elsewhere; this should be -1.

So for whatever reason (in general it's hard on CRUSH trying to select
N entries out of N choices), CRUSH hasn't been able to map an OSD to
this slot for you. You'll want to figure out why that is and fix it.
-Greg

 pg 22.240 is stuck unclean since forever, current state
 active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
 pg 22.3e5 is stuck undersized for 406.614447, current state
 active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
 pg 22.240 is stuck undersized for 406.616563, current state
 active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
 pg 22.3e5 is stuck degraded for 406.614566, current state
 active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
 pg 22.240 is stuck degraded for 406.616679, current state
 active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
 pg 22.3e5 is active+undersized+degraded, acting
 [76,15,82,11,57,29,2147483647]
 pg 22.240 is active+undersized+degraded, acting
 [38,85,17,74,2147483647,10,58]

 But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!
 Where the heck came the 2147483647 from?

 I do following commands:
 ceph osd erasure-code-profile set 7hostprofile k=5 m=2
 ruleset-failure-domain=host
 ceph osd pool create ec7archiv 1024 1024 erasure 7hostprofile

 my version:
 ceph -v
 ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)


 I found an issue in my crush-map - one SSD was twice in the map:
 host ceph-061-ssd {
 id -16  # do not change unnecessarily
 # weight 0.000
 alg straw
 hash 0  # rjenkins1
 }
 root ssd {
 id -13  # do not change unnecessarily
 # weight 0.780
 alg straw
 hash 0  # rjenkins1
 item ceph-01-ssd weight 0.170
 item ceph-02-ssd weight 0.170
 item ceph-03-ssd weight 0.000
 item ceph-04-ssd weight 0.170
 item ceph-05-ssd weight 0.170
 item ceph-06-ssd weight 0.050
 item ceph-07-ssd weight 0.050
 item ceph-061-ssd weight 0.000
 }

 Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd,
 but after fix the crusmap the issue with the osd 2147483647 still excist.

 Any idea how to fix that?

 regards

 Udo

 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error creating image in rbd-erasure-pool

2015-03-25 Thread Gregory Farnum
Yes.

On Wed, Mar 25, 2015 at 4:13 AM, Frédéric Nass
frederic.n...@univ-lorraine.fr wrote:
 Hi Greg,

 Thank you for this clarification. It helps a lot.

 Does this can't think of any issues apply to both rbd and pool snapshots ?

 Frederic.

 

 On Tue, Mar 24, 2015 at 12:09 PM, Brendan Moloney molo...@ohsu.edu wrote:

 Hi Loic and Markus,
 By the way, Inktank do not support snapshot of a pool with cache tiering
 :

*
 https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf

 Hi,

 You seem to be talking about pool snapshots rather than RBD snapshots.
 But in the linked document it is not clear that there is a distinction:

 Can I use snapshots with a cache tier?
 Snapshots are not supported in conjunction with cache tiers.

 Can anyone clarify if this is just pool snapshots?

 I think that was just a decision based on the newness and complexity
 of the feature for product purposes. Snapshots against cache tiered
 pools certainly should be fine in Giant/Hammer and we can't think of
 any issues in Firefly off the tops of our heads.
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




 --

 Cordialement,

 Frédéric Nass.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph -w: Understanding MB data versus MB used

2015-03-25 Thread Gregory Farnum
On Wed, Mar 25, 2015 at 1:24 AM, Saverio Proto ziopr...@gmail.com wrote:
 Hello there,

 I started to push data into my ceph cluster. There is something I
 cannot understand in the output of ceph -w.

 When I run ceph -w I get this kinkd of output:

 2015-03-25 09:11:36.785909 mon.0 [INF] pgmap v278788: 26056 pgs: 26056
 active+clean; 2379 MB data, 19788 MB used, 33497 GB / 33516 GB avail


 2379MB is actually the data I pushed into the cluster, I can see it
 also in the ceph df output, and the numbers are consistent.

 What I dont understand is 19788MB used. All my pools have size 3, so I
 expected something like 2379 * 3. Instead this number is very big.

 I really need to understand how MB used grows because I need to know
 how many disks to buy.

MB used is the summation of (the programmatic equivalent to) df
across all your nodes, whereas MB data is calculated by the OSDs
based on data they've written down. Depending on your configuration
MB used can include thing like the OSD journals, or even totally
unrelated data if the disks are shared with other applications.

MB used including the space used by the OSD journals is my first
guess about what you're seeing here, in which case you'll notice that
it won't grow any faster than MB data does once the journal is fully
allocated.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Radosgw authorization failed

2015-03-25 Thread Yehuda Sadeh-Weinraub


- Original Message -
 From: Neville neville.tay...@hotmail.co.uk
 To: ceph-users@lists.ceph.com
 Sent: Wednesday, March 25, 2015 8:16:39 AM
 Subject: [ceph-users] Radosgw authorization failed
 
 Hi all,
 
 I'm testing backup product which supports Amazon S3 as target for Archive
 storage and I'm trying to setup a Ceph cluster configured with the S3 API to
 use as an internal target for backup archives instead of AWS.
 
 I've followed the online guide for setting up Radosgw and created a default
 region and zone based on the AWS naming convention US-East-1. I'm not sure
 if this is relevant but since I was having issues I thought it might need to
 be the same.
 
 I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can
 create a bucket, create a folder, list buckets etc. The problem is when the
 backup software tries to create an object I get an authorization failure.
 It's using the same user/access/secret as I'm using from boto.s3 and I'm
 sure the creds are right as it lets me create the initial connection, it
 just fails when trying to create an object (backup folder).
 
 Here's the extract from the radosgw log:
 
 -
 2015-03-25 15:07:26.449227 7f1050dc7700 2 req 5:0.000419:s3:GET
 /:list_bucket:init op
 2015-03-25 15:07:26.449232 7f1050dc7700 2 req 5:0.000424:s3:GET
 /:list_bucket:verifying op mask
 2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7
 2015-03-25 15:07:26.449235 7f1050dc7700 2 req 5:0.000427:s3:GET
 /:list_bucket:verifying op permissions
 2015-03-25 15:07:26.449237 7f1050dc7700 5 Searching permissions for uid=test
 mask=49
 2015-03-25 15:07:26.449238 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449239 7f1050dc7700 5 Searching permissions for group=1
 mask=49
 2015-03-25 15:07:26.449240 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449241 7f1050dc7700 5 Searching permissions for group=2
 mask=49
 2015-03-25 15:07:26.449242 7f1050dc7700 5 Found permission: 15
 2015-03-25 15:07:26.449243 7f1050dc7700 5 Getting permissions id=test
 owner=test perm=1
 2015-03-25 15:07:26.449244 7f1050dc7700 10 uid=test requested perm (type)=1,
 policy perm=1, user_perm_mask=1, acl perm=1
 2015-03-25 15:07:26.449245 7f1050dc7700 2 req 5:0.000437:s3:GET
 /:list_bucket:verifying op params
 2015-03-25 15:07:26.449247 7f1050dc7700 2 req 5:0.000439:s3:GET
 /:list_bucket:executing
 2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list
 test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2])
 start num 1001
 2015-03-25 15:07:26.450828 7f1050dc7700 2 req 5:0.002020:s3:GET
 /:list_bucket:http status=200
 2015-03-25 15:07:26.450832 7f1050dc7700 1 == req done req=0x7f107000e2e0
 http_status=200 ==
 2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request
 req=0x7f107000f0e0
 2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
 2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
 2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request
 req=0x7f107000f6b0
 2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request
 req=0x7f107000f0e0
 2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
 2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
 2015-03-25 15:07:26.517084 7f1058dd7700 20
 CONTENT_TYPE=application/octet-stream
 2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
 2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
 2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
 2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
 2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
 2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS
 F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
 2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015
 15:07:26 GMT
 2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
 2015-03-25 15:07:26.517093 7f1058dd7700 20
 HTTP_HOST=test1.devops-os-cog01.devops.local
 2015-03-25 15:07:26.517094 7f1058dd7700 20
 HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1
 Java_HotSpot(TM)_Client_VM/24.55-b03
 2015-03-25 15:07:26.517096 7f1058dd7700 20
 HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26
 2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88
 2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
 2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on
 2015-03-25 15:07:26.517100 7f1058dd7700 20
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 2015-03-25 15:07:26.517100 7f1058dd7700 20 QUERY_STRING=
 2015-03-25 15:07:26.517101 7f1058dd7700 20 REMOTE_ADDR=10.40.41.106
 2015-03-25 15:07:26.517102 7f1058dd7700 20 REMOTE_PORT=55439
 2015-03-25 15:07:26.517103 7f1058dd7700 20 

Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Don Doerner
Sorry all: my company's e-mail security got in the way there.  Try these 
references...

*http://tracker.ceph.com/issues/10350

*
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

-don-

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Don 
Doerner
Sent: 25 March, 2015 08:01
To: Udo Lembke; ceph-us...@ceph.com
Subject: Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded


Assuming you've calculated the number of PGs reasonably, see 
herehttps://urldefense.proofpoint.com/v1/url?u=http://tracker.ceph.com/issues/10350k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0As=dc3fc62fa581494703a491f5e7090feafb1dc52128f072e3e4d4a5a882ef9c90
 and 
herehttps://urldefense.proofpoint.com/v1/url?u=http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/%23crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/k=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=paGdpY4XEjd5skha6nYQHvvZ31Gx2psGdOhHbuywrRU%3D%0As=1683ddcb2c3bb9c786555c0aad19daaa03b91ad8f3241035f496d16c0e57b552.
  I'm guessing these will address your issue.  That weird number means that no 
OSD was found/assigned to the PG.



-don-





-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo 
Lembke
Sent: 25 March, 2015 01:21
To: ceph-us...@ceph.commailto:ceph-us...@ceph.com
Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded



Hi,

due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and 
get an strange effect:



ceph@admin:~$ ceph health detail

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, 
current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck unclean since forever, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck undersized for 406.614447, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck undersized for 406.616563, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck degraded for 406.614566, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck degraded for 406.616679, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] 
pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58]



But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!

Where the heck came the 2147483647 from?



I do following commands:

ceph osd erasure-code-profile set 7hostprofile k=5 m=2 
ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 
7hostprofile



my version:

ceph -v

ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)





I found an issue in my crush-map - one SSD was twice in the map:

host ceph-061-ssd {

id -16  # do not change unnecessarily

# weight 0.000

alg straw

hash 0  # rjenkins1

}

root ssd {

id -13  # do not change unnecessarily

# weight 0.780

alg straw

hash 0  # rjenkins1

item ceph-01-ssd weight 0.170

item ceph-02-ssd weight 0.170

item ceph-03-ssd weight 0.000

item ceph-04-ssd weight 0.170

item ceph-05-ssd weight 0.170

item ceph-06-ssd weight 0.050

item ceph-07-ssd weight 0.050

item ceph-061-ssd weight 0.000

}



Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but 
after fix the crusmap the issue with the osd 2147483647 still excist.



Any idea how to fix that?



regards



Udo



___

ceph-users mailing list

ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comk=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0As=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b


The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 

Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Udo Lembke
Hi Gregory,
thanks for the answer!

I have look which storage nodes are missing, and it's two differrent:
pg 22.240 is stuck undersized for 24437.862139, current state 
active+undersized+degraded, last acting
[38,85,17,74,2147483647,10,58]
pg 22.240 is stuck undersized for 24437.862139, current state 
active+undersized+degraded, last acting
[ceph-04,ceph-07,ceph-02,ceph-06,2147483647,ceph-01,ceph-05]
ceph-03 is missing

pg 22.3e5 is stuck undersized for 24437.860025, current state 
active+undersized+degraded, last acting
[76,15,82,11,57,29,2147483647]
pg 22.3e5 is stuck undersized for 24437.860025, current state 
active+undersized+degraded, last acting
[ceph-06,ceph-ceph-02,ceph-07,ceph-01,ceph-05,ceph-03,2147483647]
ceph-04 is missing

Perhaps I hit an PGs/OSD max?!

I look with the script from 
http://cephnotes.ksperis.com/blog/2015/02/23/get-the-number-of-placement-groups-per-osd

pool :  17  18  19  9   10  20  21  13  22  
23  16  | SUM

...
host ceph-03:
osd.24  0   12  2   2   4   76  16  5   74  
0   66  | 257
osd.25  0   17  3   4   4   89  16  4   82  
0   60  | 279
osd.26  0   20  2   5   3   71  12  5   81  
0   61  | 260
osd.27  0   18  2   4   3   73  21  3   76  
0   61  | 261
osd.28  0   14  2   9   4   73  23  9   94  
0   64  | 292
osd.29  0   19  3   3   4   54  25  4   89  
0   62  | 263
osd.30  0   22  2   6   3   80  15  6   92  
0   47  | 273
osd.31  0   25  4   2   3   87  20  3   76  
0   62  | 282
osd.32  0   13  4   2   2   64  14  1   82  
0   69  | 251
osd.33  0   12  2   5   5   89  25  7   83  
0   68  | 296
osd.34  0   28  0   8   5   81  18  3   99  
0   65  | 307
osd.35  0   17  3   2   4   74  21  3   95  
0   58  | 277
host ceph-04:
osd.36  0   13  1   9   6   72  17  5   93  
0   56  | 272
osd.37  0   21  2   5   6   83  20  4   78  
0   71  | 290
osd.38  0   17  3   2   5   64  22  7   76  
0   57  | 253
osd.39  0   21  3   7   6   79  27  4   80  
0   68  | 295
osd.40  0   15  1   5   7   71  17  6   93  
0   74  | 289
osd.41  0   16  5   5   6   76  18  6   95  
0   70  | 297
osd.42  0   13  0   6   1   71  25  4   83  
0   56  | 259
osd.43  0   20  2   2   6   81  23  4   89  
0   59  | 286
osd.44  0   21  2   5   6   77  9   5   76  
0   52  | 253
osd.45  0   11  4   8   3   76  24  6   82  
0   49  | 263
osd.46  0   17  2   5   6   57  15  4   84  
0   62  | 252
osd.47  0   19  3   2   3   84  19  5   94  
0   48  | 277
...

SUM :   768 1536192 384 384 61441536384 7168
24  5120|


Pool 22 is the new ec7archiv.

But on ceph-04 there aren't OSD with more than 300 PGs...

Udo

Am 25.03.2015 14:52, schrieb Gregory Farnum:
 On Wed, Mar 25, 2015 at 1:20 AM, Udo Lembke ulem...@polarzone.de wrote:
 Hi,
 due to two more hosts (now 7 storage nodes) I want to create an new
 ec-pool and get an strange effect:

 ceph@admin:~$ ceph health detail
 HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2
 pgs stuck undersized; 2 pgs undersized
 
 This is the big clue: you have two undersized PGs!
 
 pg 22.3e5 is stuck unclean since forever, current state
 active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]
 
 2147483647 is the largest number you can represent in a signed 32-bit
 integer. There's an output error of some kind which is fixed
 elsewhere; this should be -1.
 
 So for whatever reason (in general it's hard on CRUSH trying to select
 N entries out of N choices), CRUSH hasn't been able to map an OSD to
 this slot for you. You'll want to figure out why that is and fix it.
 -Greg
 
 pg 22.240 is stuck unclean since forever, current state
 active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]
 pg 

[ceph-users] Radosgw authorization failed

2015-03-25 Thread Neville
Hi all,
 
I'm testing backup product which supports Amazon S3 as target for Archive 
storage and I'm trying to setup a Ceph cluster configured with the S3 API to 
use as an internal target for backup archives instead of AWS.
 
I've followed the online guide for setting up Radosgw and created a default 
region and zone based on the AWS naming convention US-East-1. I'm not sure if 
this is relevant but since I was having issues I thought it might need to be 
the same.
 
I've tested the radosgw using boto.s3 and it seems to work ok i.e. I can create 
a bucket, create a folder, list buckets etc. The problem is when the backup 
software tries to create an object I get an authorization failure. It's using 
the same user/access/secret as I'm using from boto.s3 and I'm sure the creds 
are right as it lets me create the initial connection, it just fails when 
trying to create an object (backup folder).
 
Here's the extract from the radosgw log:
 
-
2015-03-25 15:07:26.449227 7f1050dc7700  2 req 5:0.000419:s3:GET 
/:list_bucket:init op
2015-03-25 15:07:26.449232 7f1050dc7700  2 req 5:0.000424:s3:GET 
/:list_bucket:verifying op mask
2015-03-25 15:07:26.449234 7f1050dc7700 20 required_mask= 1 user.op_mask=7
2015-03-25 15:07:26.449235 7f1050dc7700  2 req 5:0.000427:s3:GET 
/:list_bucket:verifying op permissions
2015-03-25 15:07:26.449237 7f1050dc7700  5 Searching permissions for uid=test 
mask=49
2015-03-25 15:07:26.449238 7f1050dc7700  5 Found permission: 15
2015-03-25 15:07:26.449239 7f1050dc7700  5 Searching permissions for group=1 
mask=49
2015-03-25 15:07:26.449240 7f1050dc7700  5 Found permission: 15
2015-03-25 15:07:26.449241 7f1050dc7700  5 Searching permissions for group=2 
mask=49
2015-03-25 15:07:26.449242 7f1050dc7700  5 Found permission: 15
2015-03-25 15:07:26.449243 7f1050dc7700  5 Getting permissions id=test 
owner=test perm=1
2015-03-25 15:07:26.449244 7f1050dc7700 10  uid=test requested perm (type)=1, 
policy perm=1, user_perm_mask=1, acl perm=1
2015-03-25 15:07:26.449245 7f1050dc7700  2 req 5:0.000437:s3:GET 
/:list_bucket:verifying op params
2015-03-25 15:07:26.449247 7f1050dc7700  2 req 5:0.000439:s3:GET 
/:list_bucket:executing
2015-03-25 15:07:26.449252 7f1050dc7700 10 cls_bucket_list 
test1(@{i=.us-east.rgw.buckets.index}.us-east.rgw.buckets[us-east.280959.2]) 
start  num 1001
2015-03-25 15:07:26.450828 7f1050dc7700  2 req 5:0.002020:s3:GET 
/:list_bucket:http status=200
2015-03-25 15:07:26.450832 7f1050dc7700  1 == req done req=0x7f107000e2e0 
http_status=200 ==
2015-03-25 15:07:26.516999 7f1069df9700 20 enqueued request req=0x7f107000f0e0
2015-03-25 15:07:26.517006 7f1069df9700 20 RGWWQ:
2015-03-25 15:07:26.517007 7f1069df9700 20 req: 0x7f107000f0e0
2015-03-25 15:07:26.517010 7f1069df9700 10 allocated request req=0x7f107000f6b0
2015-03-25 15:07:26.517021 7f1058dd7700 20 dequeued request req=0x7f107000f0e0
2015-03-25 15:07:26.517023 7f1058dd7700 20 RGWWQ: empty
2015-03-25 15:07:26.517081 7f1058dd7700 20 CONTENT_LENGTH=88
2015-03-25 15:07:26.517084 7f1058dd7700 20 CONTENT_TYPE=application/octet-stream
2015-03-25 15:07:26.517085 7f1058dd7700 20 CONTEXT_DOCUMENT_ROOT=/var/www
2015-03-25 15:07:26.517086 7f1058dd7700 20 CONTEXT_PREFIX=
2015-03-25 15:07:26.517087 7f1058dd7700 20 DOCUMENT_ROOT=/var/www
2015-03-25 15:07:26.517088 7f1058dd7700 20 FCGI_ROLE=RESPONDER
2015-03-25 15:07:26.517089 7f1058dd7700 20 GATEWAY_INTERFACE=CGI/1.1
2015-03-25 15:07:26.517090 7f1058dd7700 20 HTTP_AUTHORIZATION=AWS 
F79L68W19B3GCLOSE3F8:AcXqtvlBzBMpwdL+WuhDRoLT/Bs=
2015-03-25 15:07:26.517091 7f1058dd7700 20 HTTP_CONNECTION=Keep-Alive
2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_DATE=Wed, 25 Mar 2015 15:07:26 
GMT
2015-03-25 15:07:26.517092 7f1058dd7700 20 HTTP_EXPECT=100-continue
2015-03-25 15:07:26.517093 7f1058dd7700 20 
HTTP_HOST=test1.devops-os-cog01.devops.local
2015-03-25 15:07:26.517094 7f1058dd7700 20 
HTTP_USER_AGENT=aws-sdk-java/unknown-version Windows_Server_2008_R2/6.1 
Java_HotSpot(TM)_Client_VM/24.55-b03
2015-03-25 15:07:26.517096 7f1058dd7700 20 
HTTP_X_AMZ_META_CREATIONTIME=2015-03-25T15:07:26
2015-03-25 15:07:26.517097 7f1058dd7700 20 HTTP_X_AMZ_META_SIZE=88
2015-03-25 15:07:26.517098 7f1058dd7700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
2015-03-25 15:07:26.517099 7f1058dd7700 20 HTTPS=on
2015-03-25 15:07:26.517100 7f1058dd7700 20 
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
2015-03-25 15:07:26.517100 7f1058dd7700 20 QUERY_STRING=
2015-03-25 15:07:26.517101 7f1058dd7700 20 REMOTE_ADDR=10.40.41.106
2015-03-25 15:07:26.517102 7f1058dd7700 20 REMOTE_PORT=55439
2015-03-25 15:07:26.517103 7f1058dd7700 20 REQUEST_METHOD=PUT
2015-03-25 15:07:26.517104 7f1058dd7700 20 REQUEST_SCHEME=https
2015-03-25 15:07:26.517105 7f1058dd7700 20 
REQUEST_URI=/ca_ccifs_c6dccf63-ec57-45b2-87e7-d9b14b971ca3
2015-03-25 15:07:26.517106 7f1058dd7700 20 

Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Udo Lembke
Hi Don,
thanks for the info!

looks that choose_tries set to 200 do the trick.

But the setcrushmap takes a long long time (alarming, but the client have still 
IO)... hope it's finished soon ;-)


Udo

Am 25.03.2015 16:00, schrieb Don Doerner:
 Assuming you've calculated the number of PGs reasonably, see here 
 http://tracker.ceph.com/issues/10350 and here
 http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/.
  
 I’m guessing these will address your issue.  That weird number means that no 
 OSD was found/assigned to the PG.
 
  
 
 -don-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-25 Thread f...@univ-lr.fr

Hi Somnath,

Thanks, the tcmalloc env variable trick definitely had an impact on 
FetchFromSpans calls.
   export TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=1310851072; 
/etc/init.d/ceph stop; /etc/init.d/ceph start



Nevertheless, if these FetchFromSpans library calls activity is now even 
on all hosts, the CPU activity of the ceph-osd processes remains twice 
as high on 2 hosts :

http://www.4shared.com/photo/3IP8jGPWba/UnevenLoad4-perf.html
http://www.4shared.com/photo/XX4C9NHTba/UnevenLoad4-top.html

and this can be observed under load of a benchmark or when idling too :
http://www.4shared.com/photo/x2Fl_in-ce/UnevenLoad4-top-idle.html

I'm now almost doubting of the values reported by the command 'top' as 
'perf top' doesn't reveal major differences in calls ...


Could you elaborate on your sentence saw the node consuming more cpus 
has more memory pressure as well  ? You mean on your site ?
I can't see memory pressure on my hosts (~28GB available mem) but 
perhaps I'm not looking at the right thing. And no swap on the hosts.



Here is the osd tree leading to linear distribution I mentionned :

ceph osd tree
# idweighttype nameup/downreweight
-1217.8root default
-254.45host siggy
03.63osd.0up1   
13.63osd.1up1   
23.63osd.2up1   
33.63osd.3up1   
43.63osd.4up1   
53.63osd.5up1   
63.63osd.6up1   
73.63osd.7up1   
83.63osd.8up1   
93.63osd.9up1   
103.63osd.10up1   
113.63osd.11up1   
123.63osd.12up1   
133.63osd.13up1   
143.63osd.14up1   
-354.45host horik
153.63osd.15up1   
163.63osd.16up1   
173.63osd.17up1   
183.63osd.18up1   
193.63osd.19up1   
203.63osd.20up1   
213.63osd.21up1   
223.63osd.22up1   
233.63osd.23up1   
243.63osd.24up1   
253.63osd.25up1   
263.63osd.26up1   
273.63osd.27up1   
283.63osd.28up1   
293.63osd.29up1   
-454.45host floki
303.63osd.30up1   
313.63osd.31up1   
323.63osd.32up1   
333.63osd.33up1   
343.63osd.34up1   
353.63osd.35up1   
363.63osd.36up1   
373.63osd.37up1   
383.63osd.38up1   
393.63osd.39up1   
403.63osd.40up1   
413.63osd.41up1   
423.63osd.42up1   
433.63osd.43up1   
443.63osd.44up1   
-554.45host borg
453.63osd.45up1   
463.63osd.46up1   
473.63osd.47up1   
483.63osd.48up1   
493.63osd.49up1   
503.63osd.50up1   
513.63osd.51up1   
523.63osd.52up1   
533.63osd.53up1   
543.63osd.54up1   
553.63osd.55up1   
563.63osd.56up1   
573.63osd.57up1   
583.63osd.58up1   
593.63osd.59up1


Regards,
Frederic

Somnath Roy somnath@sandisk.com a écrit le 23/03/15 17:33 :


Yes, we are also facing similar issue on load (and running after some 
time). This is a tcmalloc behavior.


You can try setting the following env variable to a bigger value say 
128MB or so.


 


TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES

 

This env variable is supposed to alleviate the issue but what we found 
in the Ubuntu 14.04 version of tcmalloc this env variable is noop. 
This was a bug in tcmalloc which is been fixed in latest tcmalloc code 
base.


Not sure about RHEL though. In that case, you may want to try with 
latest tcmalloc. Just replacing LD_LIBRARY_PATH to the new tcmalloc 
location should work good.


 

Latest Ceph master has support for jemalloc and you may want to try 
with that if this is your test cluster.


 

Another point, I saw the node consuming more cpus has more memory 
pressure as well (and that’s why tcmalloc also having that issue). Can 
you give us output of ‘ceph osd tree’ to check if the load 
distribution is even ? Also, check if those systems are swapping or not.


 


Hope 

Re: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 active+undersized+degraded

2015-03-25 Thread Don Doerner
Assuming you've calculated the number of PGs reasonably, see 
herehttp://tracker.ceph.com/issues/10350 and 
herehttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soonhttp://ceph.com/docs/master/rados/troubleshooting/troubleshooting-pg/.
  I'm guessing these will address your issue.  That weird number means that no 
OSD was found/assigned to the PG.



-don-





-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Udo 
Lembke
Sent: 25 March, 2015 01:21
To: ceph-us...@ceph.com
Subject: [ceph-users] Strange osd in PG with new EC-Pool - pgs: 2 
active+undersized+degraded



Hi,

due to two more hosts (now 7 storage nodes) I want to create an new ec-pool and 
get an strange effect:



ceph@admin:~$ ceph health detail

HEALTH_WARN 2 pgs degraded; 2 pgs stuck degraded; 2 pgs stuck unclean; 2 pgs 
stuck undersized; 2 pgs undersized pg 22.3e5 is stuck unclean since forever, 
current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck unclean since forever, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck undersized for 406.614447, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck undersized for 406.616563, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is stuck degraded for 406.614566, current state

active+undersized+degraded, last acting [76,15,82,11,57,29,2147483647]

pg 22.240 is stuck degraded for 406.616679, current state

active+undersized+degraded, last acting [38,85,17,74,2147483647,10,58]

pg 22.3e5 is active+undersized+degraded, acting [76,15,82,11,57,29,2147483647] 
pg 22.240 is active+undersized+degraded, acting [38,85,17,74,2147483647,10,58]



But I have only 91 OSDs (84 Sata + 7 SSDs) not 2147483647!

Where the heck came the 2147483647 from?



I do following commands:

ceph osd erasure-code-profile set 7hostprofile k=5 m=2 
ruleset-failure-domain=host ceph osd pool create ec7archiv 1024 1024 erasure 
7hostprofile



my version:

ceph -v

ceph version 0.87.1 (283c2e7cfa2457799f534744d7d549f83ea1335e)





I found an issue in my crush-map - one SSD was twice in the map:

host ceph-061-ssd {

id -16  # do not change unnecessarily

# weight 0.000

alg straw

hash 0  # rjenkins1

}

root ssd {

id -13  # do not change unnecessarily

# weight 0.780

alg straw

hash 0  # rjenkins1

item ceph-01-ssd weight 0.170

item ceph-02-ssd weight 0.170

item ceph-03-ssd weight 0.000

item ceph-04-ssd weight 0.170

item ceph-05-ssd weight 0.170

item ceph-06-ssd weight 0.050

item ceph-07-ssd weight 0.050

item ceph-061-ssd weight 0.000

}



Host ceph-061-ssd don't excist and osd-61 is the SSD from ceph-03-ssd, but 
after fix the crusmap the issue with the osd 2147483647 still excist.



Any idea how to fix that?



regards



Udo



___

ceph-users mailing list

ceph-users@lists.ceph.commailto:ceph-users@lists.ceph.com

https://urldefense.proofpoint.com/v1/url?u=http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.comk=8F5TVnBDKF32UabxXsxZiA%3D%3D%0Ar=klXZewu0kUquU7GVFsSHwpsWEaffmLRymeSfL%2FX1EJo%3D%0Am=7L%2Bu4ghQ7Cz2ppDjpUHHs74BvxHqx4qrftnh0Jo1y68%3D%0As=4cbce863e3e10b02556b5b7be498e83c60fb4e16cf29235bb0a35dd2bb68828b

--
The information contained in this transmission may be confidential. Any 
disclosure, copying, or further distribution of confidential information is not 
permitted unless such privilege is explicitly granted in writing by Quantum. 
Quantum reserves the right to have electronic communications, including email 
and attachments, sent across its networks filtered through anti virus and spam 
software programs and retain such messages in order to comply with applicable 
data security and retention requirements. Quantum is not responsible for the 
proper and complete transmission of the substance of this communication or for 
any delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Uneven CPU usage on OSD nodes

2015-03-25 Thread Somnath Roy
Hi Fredrick,
See my response inline.

Thanks  Regards
Somnath

From: f...@univ-lr.fr [mailto:f...@univ-lr.fr]
Sent: Wednesday, March 25, 2015 8:07 AM
To: Somnath Roy
Cc: Ceph Users
Subject: Re: [ceph-users] Uneven CPU usage on OSD nodes

Hi Somnath,

Thanks, the tcmalloc env variable trick definitely had an impact on 
FetchFromSpans calls.
export TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES=1310851072; /etc/init.d/ceph 
stop; /etc/init.d/ceph start


Nevertheless, if these FetchFromSpans library calls activity is now even on all 
hosts, the CPU activity of the ceph-osd processes remains twice as high on 2 
hosts :
http://www.4shared.com/photo/3IP8jGPWba/UnevenLoad4-perf.html
http://www.4shared.com/photo/XX4C9NHTba/UnevenLoad4-top.html

and this can be observed under load of a benchmark or when idling too :
http://www.4shared.com/photo/x2Fl_in-ce/UnevenLoad4-top-idle.html

[Somnath] Hope you are using latest tcmalloc as I said there is a bug in 
tcmalloc coming with Ubuntu 14.04. Not sure about RHEL though. Nevertheless, 
the tcmalloc stuff went away it seems. Now, it is all about crc. As you can see 
(from perf top), the cpu usage for this crc calculation is taking more cpus on 
the two nodes. I guess that’s the difference now. Please turn off crc 
calculation by using the following config option.

#ms_nocrc = true--- This is in Giant and prior
   //Following two for the latest master/hammer
ms_crc_data = false
ms_crc_header = false

The idle time cpu difference is not that bad. Need ‘perf top’ to see what is 
going on in idle time.

I'm now almost doubting of the values reported by the command 'top' as 'perf 
top' doesn't reveal major differences in calls ...

Could you elaborate on your sentence saw the node consuming more cpus has more 
memory pressure as well  ? You mean on your site ?
I can't see memory pressure on my hosts (~28GB available mem) but perhaps I'm 
not looking at the right thing. And no swap on the hosts.

[Somnath] In your previous screen shots, the node having more cpu usage was 
using more memory. The mem% reported by top is more against ceph-osds. That’s 
what I was pointing. But, now it is similar for both the cases.
Here is the osd tree leading to linear distribution I mentionned :

ceph osd tree
# idweighttype nameup/downreweight
-1217.8root default
-254.45host siggy
03.63osd.0up1
13.63osd.1up1
23.63osd.2up1
33.63osd.3up1
43.63osd.4up1
53.63osd.5up1
63.63osd.6up1
73.63osd.7up1
83.63osd.8up1
93.63osd.9up1
103.63osd.10up1
113.63osd.11up1
123.63osd.12up1
133.63osd.13up1
143.63osd.14up1
-354.45host horik
153.63osd.15up1
163.63osd.16up1
173.63osd.17up1
183.63osd.18up1
193.63osd.19up1
203.63osd.20up1
213.63osd.21up1
223.63osd.22up1
233.63osd.23up1
243.63osd.24up1
253.63osd.25up1
263.63osd.26up1
273.63osd.27up1
283.63osd.28up1
293.63osd.29up1
-454.45host floki
303.63osd.30up1
313.63osd.31up1
323.63osd.32up1
333.63osd.33up1
343.63osd.34up1
353.63osd.35up1
363.63osd.36up1
373.63osd.37up1
383.63osd.38up1
393.63osd.39up1
403.63osd.40up1
413.63osd.41up1
423.63osd.42up1
433.63osd.43up1
443.63osd.44up1
-554.45host borg
453.63osd.45up1
463.63osd.46up1
473.63osd.47up1
483.63osd.48up1
493.63osd.49up1
503.63osd.50up1
513.63osd.51up1
523.63osd.52up1
533.63osd.53up1
543.63osd.54up1
553.63osd.55up1
563.63osd.56up1
573.63osd.57up1
583.63osd.58up1
593.63osd.59up1

Regards,
Frederic

Somnath Roy somnath@sandisk.commailto:somnath@sandisk.com a écrit 
le 23/03/15 17:33 :
Yes, we 

Re: [ceph-users] how do I destroy cephfs? (interested in cephfs + tiering + erasure coding)

2015-03-25 Thread Gregory Farnum
On Wed, Mar 25, 2015 at 10:36 AM, Jake Grimmett j...@mrc-lmb.cam.ac.uk wrote:
 Dear All,

 Please forgive this post if it's naive, I'm trying to familiarise myself
 with cephfs!

 I'm using Scientific Linux 6.6. with Ceph 0.87.1

 My first steps with cephfs using a replicated pool worked OK.

 Now trying now to test cephfs via a replicated caching tier on top of an
 erasure pool. I've created an erasure pool, cannot put it under the existing
 replicated pool.

 My thoughts were to delete the existing cephfs, and start again, however I
 cannot delete the existing cephfs:

 errors are as follows:

 [root@ceph1 ~]# ceph fs rm cephfs2
 Error EINVAL: all MDS daemons must be inactive before removing filesystem

 I've tried killing the ceph-mds process, but this does not prevent the above
 error.

 I've also tried this, which also errors:

 [root@ceph1 ~]# ceph mds stop 0
 Error EBUSY: must decrease max_mds or else MDS will immediately reactivate

Right, so did you run ceph mds set_max_mds 0 and then repeating the
stop command? :)


 This also fail...

 [root@ceph1 ~]# ceph-deploy mds destroy
 [ceph_deploy.conf][DEBUG ] found configuration file at:
 /root/.cephdeploy.conf
 [ceph_deploy.cli][INFO  ] Invoked (1.5.21): /usr/bin/ceph-deploy mds destroy
 [ceph_deploy.mds][ERROR ] subcommand destroy not implemented

 Am I doing the right thing in trying to wipe the original cephfs config
 before attempting to use an erasure cold tier? Or can I just redefine the
 cephfs?

Yeah, unfortunately you need to recreate it if you want to try and use
an EC pool with cache tiering, because CephFS knows what pools it
expects data to belong to. Things are unlikely to behave correctly if
you try and stick an EC pool under an existing one. :(

Sounds like this is all just testing, which is good because the
suitability of EC+cache is very dependent on how much hot data you
have, etc...good luck!
-Greg


 many thanks,

 Jake Grimmett
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New deployment: errors starting OSDs: invalid (someone else's?) journal

2015-03-25 Thread Robert LeBlanc
I don't know much about ceph-deploy,  but I know that ceph-disk has
problems automatically adding an SSD OSD when there are journals of
other disks already on it. I've had to partition the disk ahead of
time and pass in the partitions to make ceph-disk work.

Also, unless you are sure that the dev devices will be deterministicly
named the same each time, I'd recommend you not use /dev/sd* for
pointing to your journals. Instead use something that will always be
the same, since Ceph with partition the disks with GPT, you can use
the partuuid to point to the journal partition and it will always be
right. A while back I used this to fix my journal links when I did
it wrong. You will want to double check that it will work right for
you. no warranty and all that jazz...

#convert the /dev/sd* links for journals into UUIDs

for lnk  in $(ls /var/lib/ceph/osd/); do OSD=/var/lib/ceph/osd/$lnk;
DEV=$(readlink $OSD/journal | cut -d'/' -f3); echo $DEV; PUUID=$(ls
-lh /dev/disk/by-partuuid/ | grep $DEV | cut -d' ' -f 9); ln -sf
/dev/disk/by-partuuid/$PUUID $OSD/journal; done

On Wed, Mar 25, 2015 at 10:46 AM, Antonio Messina
antonio.s.mess...@gmail.com wrote:
 Hi all,

 I'm trying to install ceph on a 7-nodes preproduction cluster. Each
 node has 24x 4TB SAS disks (2x dell md1400 enclosures) and 6x 800GB
 SSDs (for cache tiering, not journals). I'm using Ubuntu 14.04 and
 ceph-deploy to install the cluster, I've been trying both Firefly and
 Giant and getting the same error. However, the logs I'm reporting are
 relative to the Firefly installation.

 The installation seems to go fine until I try to install the last 2
 OSDs (they are SSD disks) of each host. All the OSDs from 0 to 195 are
 UP and IN, but when I try to deploy the next OSD (no matter what host)
 ceph-osd daemon won't start. The error I get is:

 2015-03-25 17:00:17.130937 7fe231312800  0 ceph version 0.80.9
 (b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-osd, pid
 20280
 2015-03-25 17:00:17.133601 7fe231312800 10
 filestore(/var/lib/ceph/osd/ceph-196) dump_stop
 2015-03-25 17:00:17.133694 7fe231312800  5
 filestore(/var/lib/ceph/osd/ceph-196) basedir
 /var/lib/ceph/osd/ceph-196 journal /var/lib/ceph/osd/ceph-196/journal
 2015-03-25 17:00:17.133725 7fe231312800 10
 filestore(/var/lib/ceph/osd/ceph-196) mount fsid is
 8c2fa707-750a-4773-8918-a368367d9cf5
 2015-03-25 17:00:17.133789 7fe231312800  0
 filestore(/var/lib/ceph/osd/ceph-196) mount detected xfs (libxfs)
 2015-03-25 17:00:17.133810 7fe231312800  1
 filestore(/var/lib/ceph/osd/ceph-196)  disabling 'filestore replica
 fadvise' due to known issues with fadvise(DONTNEED) on xfs
 2015-03-25 17:00:17.135882 7fe231312800  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
 FIEMAP ioctl is supported and appears to work
 2015-03-25 17:00:17.135892 7fe231312800  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
 FIEMAP ioctl is disabled via 'filestore fiemap' config option
 2015-03-25 17:00:17.136318 7fe231312800  0
 genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
 syncfs(2) syscall fully supported (by glibc and kernel)
 2015-03-25 17:00:17.136373 7fe231312800  0
 xfsfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_feature:
 extsize is disabled by conf
 2015-03-25 17:00:17.136640 7fe231312800  5
 filestore(/var/lib/ceph/osd/ceph-196) mount op_seq is 1
 2015-03-25 17:00:17.137547 7fe231312800 20 filestore (init)dbobjectmap: seq 
 is 1
 2015-03-25 17:00:17.137560 7fe231312800 10
 filestore(/var/lib/ceph/osd/ceph-196) open_journal at
 /var/lib/ceph/osd/ceph-196/journal
 2015-03-25 17:00:17.137575 7fe231312800  0
 filestore(/var/lib/ceph/osd/ceph-196) mount: enabling WRITEAHEAD
 journal mode: checkpoint is not enabled
 2015-03-25 17:00:17.137580 7fe231312800 10
 filestore(/var/lib/ceph/osd/ceph-196) list_collections
 2015-03-25 17:00:17.137661 7fe231312800 10 journal journal_replay fs op_seq 1
 2015-03-25 17:00:17.137668 7fe231312800  2 journal open
 /var/lib/ceph/osd/ceph-196/journal fsid
 8c2fa707-750a-4773-8918-a368367d9cf5 fs_op_seq 1
 2015-03-25 17:00:17.137670 7fe22b8b1700 20
 filestore(/var/lib/ceph/osd/ceph-196) sync_entry waiting for
 max_interval 5.00
 2015-03-25 17:00:17.137690 7fe231312800 10 journal _open_block_device:
 ignoring osd journal size. We'll use the entire block device (size:
 5367661056)
 2015-03-25 17:00:17.162489 7fe231312800  1 journal _open
 /var/lib/ceph/osd/ceph-196/journal fd 20: 5367660544 bytes, block size
 4096 bytes, directio = 1, aio = 1
 2015-03-25 17:00:17.162502 7fe231312800 10 journal read_header
 2015-03-25 17:00:17.172249 7fe231312800 10 journal header: block_size
 4096 alignment 4096 max_size 5367660544
 2015-03-25 17:00:17.172256 7fe231312800 10 journal header: start 50987008
 2015-03-25 17:00:17.172257 7fe231312800 10 journal  write_pos 4096
 2015-03-25 17:00:17.172259 7fe231312800 10 journal open header.fsid =
 942f2d62-dd99-42a8-878a-feea443aaa61
 2015-03-25 17:00:17.172264 

Re: [ceph-users] New deployment: errors starting OSDs: invalid (someone else's?) journal

2015-03-25 Thread Antonio Messina
On Wed, Mar 25, 2015 at 6:06 PM, Robert LeBlanc rob...@leblancnet.us wrote:
 I don't know much about ceph-deploy,  but I know that ceph-disk has
 problems automatically adding an SSD OSD when there are journals of
 other disks already on it. I've had to partition the disk ahead of
 time and pass in the partitions to make ceph-disk work.

This is not my case: the journal is created automatically by
ceph-deploy on the same disk, so that for each disk, /dev/sdX1 is the
data partition and /dev/sdX2 is the journal partition. This is also
what I want: I know there is a performance drop, but I expect it to be
mitigated by the cache tier. (and I plan to test both configuration
anyway)

 Also, unless you are sure that the dev devices will be deterministicly
 named the same each time, I'd recommend you not use /dev/sd* for
 pointing to your journals. Instead use something that will always be
 the same, since Ceph with partition the disks with GPT, you can use
 the partuuid to point to the journal partition and it will always be
 right. A while back I used this to fix my journal links when I did
 it wrong. You will want to double check that it will work right for
 you. no warranty and all that jazz...

Thank you for pointing this out, it's an important point. However, the
links are actually created using the partuuid. The command I posted in
my previous email included the output of a pair of nested readlink
in order to get the /dev/sd* names, because in this way it's easier to
see if there are duplicates and where :)

The output of ls -l /var/lib/ceph/osd/ceph-*/journal is actually:

lrwxrwxrwx 1 root root 58 Mar 25 11:38
/var/lib/ceph/osd/ceph-0/journal -
/dev/disk/by-partuuid/18305316-96b0-4654-aaad-7aeb891429f6
lrwxrwxrwx 1 root root 58 Mar 25 11:49
/var/lib/ceph/osd/ceph-7/journal -
/dev/disk/by-partuuid/a263b19a-cb0d-4b4c-bd81-314619d5755d
lrwxrwxrwx 1 root root 58 Mar 25 12:21
/var/lib/ceph/osd/ceph-14/journal -
/dev/disk/by-partuuid/79734e0e-87dd-40c7-ba83-0d49695a75fb
lrwxrwxrwx 1 root root 58 Mar 25 12:31
/var/lib/ceph/osd/ceph-21/journal -
/dev/disk/by-partuuid/73a504bc-3179-43fd-942c-13c6bd8633c5
lrwxrwxrwx 1 root root 58 Mar 25 12:42
/var/lib/ceph/osd/ceph-28/journal -
/dev/disk/by-partuuid/ecff10df-d757-4b1f-bef4-88dd84d84ef1
lrwxrwxrwx 1 root root 58 Mar 25 12:52
/var/lib/ceph/osd/ceph-35/journal -
/dev/disk/by-partuuid/5be30238-3f07-4950-b39f-f5e4c7305e4c
lrwxrwxrwx 1 root root 58 Mar 25 13:02
/var/lib/ceph/osd/ceph-42/journal -
/dev/disk/by-partuuid/3cdb65f2-474c-47fb-8d07-83e7518418ff
lrwxrwxrwx 1 root root 58 Mar 25 13:12
/var/lib/ceph/osd/ceph-49/journal -
/dev/disk/by-partuuid/a47fe2b7-e375-4eea-b7a9-0354a24548dc
lrwxrwxrwx 1 root root 58 Mar 25 13:22
/var/lib/ceph/osd/ceph-56/journal -
/dev/disk/by-partuuid/fb42b7d6-bc6c-4063-8b73-29beb1f65107
lrwxrwxrwx 1 root root 58 Mar 25 13:33
/var/lib/ceph/osd/ceph-63/journal -
/dev/disk/by-partuuid/72aff32b-ca56-4c25-b8ea-ff3aba8db507
lrwxrwxrwx 1 root root 58 Mar 25 13:43
/var/lib/ceph/osd/ceph-70/journal -
/dev/disk/by-partuuid/b7c17a75-47cd-401e-b963-afe910612bd6
lrwxrwxrwx 1 root root 58 Mar 25 13:53
/var/lib/ceph/osd/ceph-77/journal -
/dev/disk/by-partuuid/2c1c2501-fa82-4fc9-a586-03cc4d68faef
lrwxrwxrwx 1 root root 58 Mar 25 14:03
/var/lib/ceph/osd/ceph-84/journal -
/dev/disk/by-partuuid/46f619a5-3edf-44e9-99a6-24d98bcd174a
lrwxrwxrwx 1 root root 58 Mar 25 14:13
/var/lib/ceph/osd/ceph-91/journal -
/dev/disk/by-partuuid/5feef832-dd82-4aa0-9264-dc9496a3f93a
lrwxrwxrwx 1 root root 58 Mar 25 14:24
/var/lib/ceph/osd/ceph-98/journal -
/dev/disk/by-partuuid/055793a0-99d4-49c4-9698-bd8880c21d9c
lrwxrwxrwx 1 root root 58 Mar 25 14:34
/var/lib/ceph/osd/ceph-105/journal -
/dev/disk/by-partuuid/20547f26-6ef3-422b-9732-ad8b0b5b5379
lrwxrwxrwx 1 root root 58 Mar 25 14:44
/var/lib/ceph/osd/ceph-112/journal -
/dev/disk/by-partuuid/2abea809-59c4-41da-bb52-28ef1911ec43
lrwxrwxrwx 1 root root 58 Mar 25 14:54
/var/lib/ceph/osd/ceph-119/journal -
/dev/disk/by-partuuid/d8d15bb8-4b3d-4375-b6e1-62794971df7e
lrwxrwxrwx 1 root root 58 Mar 25 15:05
/var/lib/ceph/osd/ceph-126/journal -
/dev/disk/by-partuuid/ff6ee2b2-9c33-4902-a5e3-f6e9db5714e9
lrwxrwxrwx 1 root root 58 Mar 25 15:15
/var/lib/ceph/osd/ceph-133/journal -
/dev/disk/by-partuuid/9faccb6e-ada9-4742-aa31-eb1308769205
lrwxrwxrwx 1 root root 58 Mar 25 15:25
/var/lib/ceph/osd/ceph-140/journal -
/dev/disk/by-partuuid/2df13c88-ee58-4881-a373-a36a09fb6366
lrwxrwxrwx 1 root root 58 Mar 25 15:36
/var/lib/ceph/osd/ceph-147/journal -
/dev/disk/by-partuuid/13cda9d1-0fec-40cc-a6fc-7cc56f7ffb78
lrwxrwxrwx 1 root root 58 Mar 25 15:46
/var/lib/ceph/osd/ceph-154/journal -
/dev/disk/by-partuuid/5d37bfe9-c0f9-49e0-a951-b0ed04c5de51
lrwxrwxrwx 1 root root 58 Mar 25 15:57
/var/lib/ceph/osd/ceph-161/journal -
/dev/disk/by-partuuid/d34f3abb-3fb7-4875-90d3-d2d3836f6e4d
lrwxrwxrwx 1 root root 58 Mar 25 16:07
/var/lib/ceph/osd/ceph-168/journal -
/dev/disk/by-partuuid/02c3db3e-159c-47d9-8a63-0389ea89fad1
lrwxrwxrwx 1 root root 58 Mar 25 16:16

Re: [ceph-users] New deployment: errors starting OSDs: invalid (someone else's?) journal

2015-03-25 Thread Robert LeBlanc
Probably a case of trying to read too fast. Sorry about that.

As far as your theory on the cache pool, I haven't tried that, but my
gut feeling is that it won't help as much as having the journal on the
SSD. The Cache tier isn't trying to collate writes, not like the
journal is doing. Then on the spindle you are having to write to two
very different parts of the drive for every piece of data, although
this is somewhat reduced by the journal, I feel it will still be
significant. When I see writes coming off my SSD journals to the
spindles, I'm still getting a lot of merged IO (at least during a
backfill/recovery). I'm interested in your results.

As far as the foreign journal, I would run dd over the journal
partition and try it again. It sounds like something didn't get
cleaned up from a previous run.


On Wed, Mar 25, 2015 at 11:14 AM, Antonio Messina
antonio.s.mess...@gmail.com wrote:
 On Wed, Mar 25, 2015 at 6:06 PM, Robert LeBlanc rob...@leblancnet.us wrote:
 I don't know much about ceph-deploy,  but I know that ceph-disk has
 problems automatically adding an SSD OSD when there are journals of
 other disks already on it. I've had to partition the disk ahead of
 time and pass in the partitions to make ceph-disk work.

 This is not my case: the journal is created automatically by
 ceph-deploy on the same disk, so that for each disk, /dev/sdX1 is the
 data partition and /dev/sdX2 is the journal partition. This is also
 what I want: I know there is a performance drop, but I expect it to be
 mitigated by the cache tier. (and I plan to test both configuration
 anyway)

 Also, unless you are sure that the dev devices will be deterministicly
 named the same each time, I'd recommend you not use /dev/sd* for
 pointing to your journals. Instead use something that will always be
 the same, since Ceph with partition the disks with GPT, you can use
 the partuuid to point to the journal partition and it will always be
 right. A while back I used this to fix my journal links when I did
 it wrong. You will want to double check that it will work right for
 you. no warranty and all that jazz...

 Thank you for pointing this out, it's an important point. However, the
 links are actually created using the partuuid. The command I posted in
 my previous email included the output of a pair of nested readlink
 in order to get the /dev/sd* names, because in this way it's easier to
 see if there are duplicates and where :)

 The output of ls -l /var/lib/ceph/osd/ceph-*/journal is actually:

 lrwxrwxrwx 1 root root 58 Mar 25 11:38
 /var/lib/ceph/osd/ceph-0/journal -
 /dev/disk/by-partuuid/18305316-96b0-4654-aaad-7aeb891429f6
 lrwxrwxrwx 1 root root 58 Mar 25 11:49
 /var/lib/ceph/osd/ceph-7/journal -
 /dev/disk/by-partuuid/a263b19a-cb0d-4b4c-bd81-314619d5755d
 lrwxrwxrwx 1 root root 58 Mar 25 12:21
 /var/lib/ceph/osd/ceph-14/journal -
 /dev/disk/by-partuuid/79734e0e-87dd-40c7-ba83-0d49695a75fb
 lrwxrwxrwx 1 root root 58 Mar 25 12:31
 /var/lib/ceph/osd/ceph-21/journal -
 /dev/disk/by-partuuid/73a504bc-3179-43fd-942c-13c6bd8633c5
 lrwxrwxrwx 1 root root 58 Mar 25 12:42
 /var/lib/ceph/osd/ceph-28/journal -
 /dev/disk/by-partuuid/ecff10df-d757-4b1f-bef4-88dd84d84ef1
 lrwxrwxrwx 1 root root 58 Mar 25 12:52
 /var/lib/ceph/osd/ceph-35/journal -
 /dev/disk/by-partuuid/5be30238-3f07-4950-b39f-f5e4c7305e4c
 lrwxrwxrwx 1 root root 58 Mar 25 13:02
 /var/lib/ceph/osd/ceph-42/journal -
 /dev/disk/by-partuuid/3cdb65f2-474c-47fb-8d07-83e7518418ff
 lrwxrwxrwx 1 root root 58 Mar 25 13:12
 /var/lib/ceph/osd/ceph-49/journal -
 /dev/disk/by-partuuid/a47fe2b7-e375-4eea-b7a9-0354a24548dc
 lrwxrwxrwx 1 root root 58 Mar 25 13:22
 /var/lib/ceph/osd/ceph-56/journal -
 /dev/disk/by-partuuid/fb42b7d6-bc6c-4063-8b73-29beb1f65107
 lrwxrwxrwx 1 root root 58 Mar 25 13:33
 /var/lib/ceph/osd/ceph-63/journal -
 /dev/disk/by-partuuid/72aff32b-ca56-4c25-b8ea-ff3aba8db507
 lrwxrwxrwx 1 root root 58 Mar 25 13:43
 /var/lib/ceph/osd/ceph-70/journal -
 /dev/disk/by-partuuid/b7c17a75-47cd-401e-b963-afe910612bd6
 lrwxrwxrwx 1 root root 58 Mar 25 13:53
 /var/lib/ceph/osd/ceph-77/journal -
 /dev/disk/by-partuuid/2c1c2501-fa82-4fc9-a586-03cc4d68faef
 lrwxrwxrwx 1 root root 58 Mar 25 14:03
 /var/lib/ceph/osd/ceph-84/journal -
 /dev/disk/by-partuuid/46f619a5-3edf-44e9-99a6-24d98bcd174a
 lrwxrwxrwx 1 root root 58 Mar 25 14:13
 /var/lib/ceph/osd/ceph-91/journal -
 /dev/disk/by-partuuid/5feef832-dd82-4aa0-9264-dc9496a3f93a
 lrwxrwxrwx 1 root root 58 Mar 25 14:24
 /var/lib/ceph/osd/ceph-98/journal -
 /dev/disk/by-partuuid/055793a0-99d4-49c4-9698-bd8880c21d9c
 lrwxrwxrwx 1 root root 58 Mar 25 14:34
 /var/lib/ceph/osd/ceph-105/journal -
 /dev/disk/by-partuuid/20547f26-6ef3-422b-9732-ad8b0b5b5379
 lrwxrwxrwx 1 root root 58 Mar 25 14:44
 /var/lib/ceph/osd/ceph-112/journal -
 /dev/disk/by-partuuid/2abea809-59c4-41da-bb52-28ef1911ec43
 lrwxrwxrwx 1 root root 58 Mar 25 14:54
 /var/lib/ceph/osd/ceph-119/journal -
 /dev/disk/by-partuuid/d8d15bb8-4b3d-4375-b6e1-62794971df7e
 lrwxrwxrwx 1 

[ceph-users] won leader election with quorum during osd setcrushmap

2015-03-25 Thread Udo Lembke
Hi,
due to PG-trouble with an EC-Pool I modify the crushmap (step set_choose_tries 
200) from

rule ec7archiv {
ruleset 6
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step take default
step chooseleaf indep 0 type host
step emit
}

to

rule ec7archiv {
ruleset 6
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step set_choose_tries 200
step take default
step chooseleaf indep 0 type host
step emit
}

ceph osd setcrushmap runs since one hour and ceph -w give following output:

2015-03-25 17:20:18.163295 mon.0 [INF] mdsmap e766: 1/1/1 up {0=b=up:active}, 1 
up:standby
2015-03-25 17:20:18.163370 mon.0 [INF] osdmap e130004: 91 osds: 91 up, 91 in
2015-03-25 17:20:28.525445 mon.0 [INF] from='client.? 172.20.2.1:0/1007537' 
entity='client.admin' cmd=[{prefix: osd
setcrushmap}]: dispatch
2015-03-25 17:20:28.525580 mon.0 [INF] mon.0 calling new monitor election
2015-03-25 17:20:28.526263 mon.0 [INF] mon.0@0 won leader election with quorum 
0,1,2


Fortunaly the clients have still access to the cluster (kvm)!!

How long take such an setcrushmap?? Normaly it's done in few seconds.
Has the setcrushmap chance to get ready?

Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how do I destroy cephfs? (interested in cephfs + tiering + erasure coding)

2015-03-25 Thread Jake Grimmett

Dear All,

Please forgive this post if it's naive, I'm trying to familiarise myself 
with cephfs!


I'm using Scientific Linux 6.6. with Ceph 0.87.1

My first steps with cephfs using a replicated pool worked OK.

Now trying now to test cephfs via a replicated caching tier on top of an 
erasure pool. I've created an erasure pool, cannot put it under the 
existing replicated pool.


My thoughts were to delete the existing cephfs, and start again, however 
I cannot delete the existing cephfs:


errors are as follows:

[root@ceph1 ~]# ceph fs rm cephfs2
Error EINVAL: all MDS daemons must be inactive before removing filesystem

I've tried killing the ceph-mds process, but this does not prevent the 
above error.


I've also tried this, which also errors:

[root@ceph1 ~]# ceph mds stop 0
Error EBUSY: must decrease max_mds or else MDS will immediately reactivate

This also fail...

[root@ceph1 ~]# ceph-deploy mds destroy
[ceph_deploy.conf][DEBUG ] found configuration file at: 
/root/.cephdeploy.conf

[ceph_deploy.cli][INFO  ] Invoked (1.5.21): /usr/bin/ceph-deploy mds destroy
[ceph_deploy.mds][ERROR ] subcommand destroy not implemented

Am I doing the right thing in trying to wipe the original cephfs config 
before attempting to use an erasure cold tier? Or can I just redefine 
the cephfs?


many thanks,

Jake Grimmett
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] New deployment: errors starting OSDs: invalid (someone else's?) journal

2015-03-25 Thread Antonio Messina
Hi all,

I'm trying to install ceph on a 7-nodes preproduction cluster. Each
node has 24x 4TB SAS disks (2x dell md1400 enclosures) and 6x 800GB
SSDs (for cache tiering, not journals). I'm using Ubuntu 14.04 and
ceph-deploy to install the cluster, I've been trying both Firefly and
Giant and getting the same error. However, the logs I'm reporting are
relative to the Firefly installation.

The installation seems to go fine until I try to install the last 2
OSDs (they are SSD disks) of each host. All the OSDs from 0 to 195 are
UP and IN, but when I try to deploy the next OSD (no matter what host)
ceph-osd daemon won't start. The error I get is:

2015-03-25 17:00:17.130937 7fe231312800  0 ceph version 0.80.9
(b5a67f0e1d15385bc0d60a6da6e7fc810bde6047), process ceph-osd, pid
20280
2015-03-25 17:00:17.133601 7fe231312800 10
filestore(/var/lib/ceph/osd/ceph-196) dump_stop
2015-03-25 17:00:17.133694 7fe231312800  5
filestore(/var/lib/ceph/osd/ceph-196) basedir
/var/lib/ceph/osd/ceph-196 journal /var/lib/ceph/osd/ceph-196/journal
2015-03-25 17:00:17.133725 7fe231312800 10
filestore(/var/lib/ceph/osd/ceph-196) mount fsid is
8c2fa707-750a-4773-8918-a368367d9cf5
2015-03-25 17:00:17.133789 7fe231312800  0
filestore(/var/lib/ceph/osd/ceph-196) mount detected xfs (libxfs)
2015-03-25 17:00:17.133810 7fe231312800  1
filestore(/var/lib/ceph/osd/ceph-196)  disabling 'filestore replica
fadvise' due to known issues with fadvise(DONTNEED) on xfs
2015-03-25 17:00:17.135882 7fe231312800  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
FIEMAP ioctl is supported and appears to work
2015-03-25 17:00:17.135892 7fe231312800  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-03-25 17:00:17.136318 7fe231312800  0
genericfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_features:
syncfs(2) syscall fully supported (by glibc and kernel)
2015-03-25 17:00:17.136373 7fe231312800  0
xfsfilestorebackend(/var/lib/ceph/osd/ceph-196) detect_feature:
extsize is disabled by conf
2015-03-25 17:00:17.136640 7fe231312800  5
filestore(/var/lib/ceph/osd/ceph-196) mount op_seq is 1
2015-03-25 17:00:17.137547 7fe231312800 20 filestore (init)dbobjectmap: seq is 1
2015-03-25 17:00:17.137560 7fe231312800 10
filestore(/var/lib/ceph/osd/ceph-196) open_journal at
/var/lib/ceph/osd/ceph-196/journal
2015-03-25 17:00:17.137575 7fe231312800  0
filestore(/var/lib/ceph/osd/ceph-196) mount: enabling WRITEAHEAD
journal mode: checkpoint is not enabled
2015-03-25 17:00:17.137580 7fe231312800 10
filestore(/var/lib/ceph/osd/ceph-196) list_collections
2015-03-25 17:00:17.137661 7fe231312800 10 journal journal_replay fs op_seq 1
2015-03-25 17:00:17.137668 7fe231312800  2 journal open
/var/lib/ceph/osd/ceph-196/journal fsid
8c2fa707-750a-4773-8918-a368367d9cf5 fs_op_seq 1
2015-03-25 17:00:17.137670 7fe22b8b1700 20
filestore(/var/lib/ceph/osd/ceph-196) sync_entry waiting for
max_interval 5.00
2015-03-25 17:00:17.137690 7fe231312800 10 journal _open_block_device:
ignoring osd journal size. We'll use the entire block device (size:
5367661056)
2015-03-25 17:00:17.162489 7fe231312800  1 journal _open
/var/lib/ceph/osd/ceph-196/journal fd 20: 5367660544 bytes, block size
4096 bytes, directio = 1, aio = 1
2015-03-25 17:00:17.162502 7fe231312800 10 journal read_header
2015-03-25 17:00:17.172249 7fe231312800 10 journal header: block_size
4096 alignment 4096 max_size 5367660544
2015-03-25 17:00:17.172256 7fe231312800 10 journal header: start 50987008
2015-03-25 17:00:17.172257 7fe231312800 10 journal  write_pos 4096
2015-03-25 17:00:17.172259 7fe231312800 10 journal open header.fsid =
942f2d62-dd99-42a8-878a-feea443aaa61
2015-03-25 17:00:17.172264 7fe231312800 -1 journal FileJournal::open:
ondisk fsid 942f2d62-dd99-42a8-878a-feea443aaa61 doesn't match
expected 8c2fa707-750a-4773-8918-a368367d9cf5, invalid (someone
else's?) journal
2015-03-25 17:00:17.172268 7fe231312800  3 journal journal_replay open
failed with (22) Invalid argument
2015-03-25 17:00:17.172284 7fe231312800 -1
filestore(/var/lib/ceph/osd/ceph-196) mount failed to open journal
/var/lib/ceph/osd/ceph-196/journal: (22) Invalid argument
2015-03-25 17:00:17.172304 7fe22b8b1700 20
filestore(/var/lib/ceph/osd/ceph-196) sync_entry woke after 0.034632
2015-03-25 17:00:17.172330 7fe22b8b1700 10 journal commit_start
max_applied_seq 1, open_ops 0
2015-03-25 17:00:17.172333 7fe22b8b1700 10 journal commit_start
blocked, all open_ops have completed
2015-03-25 17:00:17.172334 7fe22b8b1700 10 journal commit_start nothing to do
2015-03-25 17:00:17.172465 7fe231312800 -1  ** ERROR: error converting
store /var/lib/ceph/osd/ceph-196: (22) Invalid argument

I'm attaching the full log of ceph-deploy osd create osd-l2-05:sde
and the /var/log/ceph/ceph-osd.196.log, after trying to re-start the
osd with increased verbosing, as long as the ceph.conf I'm using.

I've also checked if the journal symlinks were correct, and they all

Re: [ceph-users] Erasure coding

2015-03-25 Thread Tom Verdaat
Great info! Many thanks!

Tom

2015-03-25 13:30 GMT+01:00 Loic Dachary l...@dachary.org:

 Hi Tom,

 On 25/03/2015 11:31, Tom Verdaat wrote: Hi guys,
 
  We've got a very small Ceph cluster (3 hosts, 5 OSD's each for cold
 data) that we intend to grow later on as more storage is needed. We would
 very much like to use Erasure Coding for some pools but are facing some
 challenges regarding the optimal initial profile “replication” settings
 given the limited number of initial hosts that we can use to spread the
 chunks. Could somebody please help me with the following questions?
 
   1.
 
  Suppose we initially use replication in stead of erasure. Can we
 convert a replicated pool to an erasure coded pool later on?

 What you would do is create an erasure coded pool later and have the
 initial replicated pool as a cache in front of it.

 http://docs.ceph.com/docs/master/rados/operations/cache-tiering/

 Objects from the replicated pool will move to the erasure coded pool if
 they are not used and it will save space. You don't need to create the
 erasure coded pool on your small cluster. You can do it when it grows
 larger or when it becomes full.

   2.
 
  Will Ceph gain the ability to change the K and N values for an
 existing pool in the near future?

 I don't think so.

   3.
 
  Can the failure domain be changed for an existing pool? E.g. can we
 start with failure domain OSD and then switch it to Host after adding more
 hosts?

 The failure domain, although listed in the erasure code profile for
 convenience, really belongs to the crush ruleset applied to the pool. It
 can therefore be changed after the pool is created. It is likely to result
 in objects moving a lot during the transition but it should work fine
 otherwise.

   4.
 
  Where can I find a good comparison of the available erasure code
 plugins that allows me to properly decide which one suits are needs best?

 In a nutshell, jerasure is flexible and is likely to be what you want, isa
 computes faster than jerasure but only works on intel processors (note
 however that the erasure code computation does not make a significant
 difference overall), lrc and shec (to be published in hammer) minimize
 network usage during recovery but uses more space than jerasure or isa.

 Cheers

  Many thanks for your help!
 
  Tom
 
 
 
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 

 --
 Loïc Dachary, Artisan Logiciel Libre


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] clients and monitors

2015-03-25 Thread Sage Weil
On Wed, 25 Mar 2015, Deneau, Tom wrote:
 A couple of client-monitor questions:
 
 1) When a client contacts a monitor to get the cluster map, how does it
decide which monitor to try to contact?

It picks a random monitor what the information it's seeded with at startup 
(via ceph.conf or the -m command line option).  Once it reaches one mon, 
it gets the MonMap which tells it who all of the mons are.

 2) Having gotten the cluster map, assuming a client wants to do multiple 
 reads and writes, 
does the client have to re-contact the monitor to get the latest cluster 
 map
for each operation?

No, the mons are primarily needed for the initial startup/authentication 
step.  They are also queried as a last resort if the client thinks it 
has an out of data osdmap and hasn't gotten one from an OSD, and to 
renew authentication tickets.

sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW Ceph Tech Talk Tomorrow

2015-03-25 Thread Patrick McGarry
Hey cephers,

Just a reminder that the monthly Ceph Tech Talk tomorrow at 1p EDT
will be by Yehuda on the RADOS Gateway. Make sure you stop by to get a
deeper technical understanding of RGW if you're interested. It's an
open virtual meeting for those that wish to attend, and will also be
recorded and put on YouTube for those unable to make it.

http://ceph.com/ceph-tech-talks/

Please let me know if you have any questions. Thanks.


-- 

Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] clients and monitors

2015-03-25 Thread Deneau, Tom
A couple of client-monitor questions:

1) When a client contacts a monitor to get the cluster map, how does it
   decide which monitor to try to contact?

2) Having gotten the cluster map, assuming a client wants to do multiple reads 
and writes, 
   does the client have to re-contact the monitor to get the latest cluster map
   for each operation?

-- Tom Deneau, AMD


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] error creating image in rbd-erasure-pool

2015-03-25 Thread Frédéric Nass
Hi Greg, 

Thank you for this clarification. It helps a lot. 

Does this can't think of any issues apply to both rbd and pool snapshots ? 

Frederic. 

- Mail original -

 On Tue, Mar 24, 2015 at 12:09 PM, Brendan Moloney molo...@ohsu.edu wrote:
 
  Hi Loic and Markus,
  By the way, Inktank do not support snapshot of a pool with cache tiering :
 
  *
  https://download.inktank.com/docs/ICE%201.2%20-%20Cache%20and%20Erasure%20Coding%20FAQ.pdf
 
  Hi,
 
  You seem to be talking about pool snapshots rather than RBD snapshots. But
  in the linked document it is not clear that there is a distinction:
 
  Can I use snapshots with a cache tier?
  Snapshots are not supported in conjunction with cache tiers.
 
  Can anyone clarify if this is just pool snapshots?

 I think that was just a decision based on the newness and complexity
 of the feature for product purposes. Snapshots against cache tiered
 pools certainly should be fine in Giant/Hammer and we can't think of
 any issues in Firefly off the tops of our heads.
 -Greg
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 

Cordialement, 

Frédéric Nass. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com