Re: [ceph-users] help me turn off "many more objects that average"

2018-09-12 Thread Chad William Seys

Hi Paul,
 Yes, all monitors have been restarted.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] help me turn off "many more objects that average"

2018-09-12 Thread Chad William Seys

Hi all,
  I'm having trouble turning off the warning "1 pools have many more 
objects per pg than average".


I've tried a lot of variations on the below, my current ceph.conf:

#...
[mon]

#...
mon_pg_warn_max_object_skew = 0

All of my monitors have been restarted.

Seems like I'm missing something.  Syntax error?  Wrong section? No 
vertical blank whitespace allowed?  Not supported in Luminous?


Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how can time machine know difference between cephfs fuse and kernel client?

2018-08-23 Thread Chad William Seys

Hi All,
  I think my problem was that I had quotas set at multiple levels of a 
subtree, and maybe some were conflicting.  (E.g. Parent said quota=1GB, 
child said quota=200GB.)  I could not reproduce the problem, but setting 
quotas only on the user's subdirectory and not elsewhere along the way 
to the root fixed the problem. :)
  I'm actually using the kernel cephfs for timemacine, plus a .plist 
file to tell Time Machine not to use too much space.  There is an 
example here: 
https://www.reddit.com/r/homelab/comments/83vkaz/howto_make_time_machine_backups_on_a_samba/
  For windows, I don't know of a way to give it a hint with a file, so 
I'm using cephfs quotas.
  As for AFP, there are a few reasons I decided not to use it.  We 
already have an Samba setup with authentication.  SMB time machine only 
works with macOS 10.12 and newer, so if there were time and we had more 
older clients netatalk would make sense also.  Supposedly AFP is going 
away someday: 
https://apple.stackexchange.com/questions/285417/is-afp-slated-to-be-removed-from-future-versions-of-macos


Thanks for the responses!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-08-17 Thread Chad William Seys
0670 7f0e300b5140 15 inode.get on 0x556a3f128000 
0x12291c4.head now 2
2018-08-17 14:34:55.040672 7f0e300b5140 20 client.18814183 _ll_get 
0x556a3f128000 0x12291c4 -> 1
2018-08-17 14:34:55.041035 7f0e300b5140 10 client.18814183 
ll_register_callbacks cb 0x556a3ee82c80 invalidate_ino_cb 1 
invalidate_dentry_cb 1 switch_interrupt_cb 1 remount_cb 1
2018-08-17 14:34:55.048403 7f0e298a6700 10 client.18814183 put_inode on 
0x1.head(faked_ino=0 ref=3 ll_ref=0 cap_refs={} open={} mode=40755 
size=0/0 nlink=1 mtime=2018-05-16 08:33:31.388505 
caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x556a3f128c00)
2018-08-17 14:34:55.048421 7f0e298a6700 15 inode.put on 0x556a3f128c00 
0x1.head now 2
2018-08-17 14:34:55.048424 7f0e298a6700 10 client.18814183 put_inode on 
0x1.head(faked_ino=0 ref=2 ll_ref=0 cap_refs={} open={} mode=40755 
size=0/0 nlink=1 mtime=2018-05-16 08:33:31.388505 
caps=pAsLsXsFs(0=pAsLsXsFs) has_dir_layout 0x556a3f128c00)
2018-08-17 14:34:55.048429 7f0e298a6700 15 inode.put on 0x556a3f128c00 
0x1.head now 1

2018-08-17 14:34:55.051029 7f0e2589e700  1 client.18814183 using remount_cb
2018-08-17 14:34:55.055053 7f0e2509d700  3 client.18814183 ll_getattr 
0x12291c4.head
2018-08-17 14:34:55.055070 7f0e2509d700 10 client.18814183 _getattr mask 
pAsLsXsFs issued=1
2018-08-17 14:34:55.055074 7f0e2509d700 10 client.18814183 fill_stat on 
0x12291c4 snap/devhead mode 040555 mtime 2018-08-15 14:09:02.547890 
ctime 2018-08-17 14:28:09.654639
2018-08-17 14:34:55.055089 7f0e2509d700  3 client.18814183 ll_getattr 
0x12291c4.head = 0
2018-08-17 14:34:55.055100 7f0e2509d700  3 client.18814183 ll_forget 
0x12291c4 1
2018-08-17 14:34:55.055102 7f0e2509d700 20 client.18814183 _ll_put 
0x556a3f128000 0x12291c4 1 -> 1

2018-08-17 14:34:55.965416 7f0e2a8a8700 10 client.18814183 renew_caps()
2018-08-17 14:34:55.965432 7f0e2a8a8700 15 client.18814183 renew_caps 
requesting from mds.0

2018-08-17 14:34:55.965436 7f0e2a8a8700 10 client.18814183 renew_caps mds.0
2018-08-17 14:34:55.965504 7f0e2a8a8700 20 client.18814183 trim_cache 
size 0 max 16384
2018-08-17 14:34:55.967114 7f0e298a6700 10 client.18814183 
handle_client_session client_session(renewcaps seq 2) v1 from mds.0

ceph-fuse[30502]: fuse finished with error 0 and tester_r 0
*** Caught signal (Segmentation fault) **





On 07/09/2018 08:48 AM, John Spray wrote:

On Fri, Jul 6, 2018 at 6:30 PM Chad William Seys
 wrote:


Hi all,
I'm having a problem that when I mount cephfs with a quota in the
root mount point, no ceph-fuse appears in 'mount' and df reports:

Filesystem 1K-blocks  Used Available Use% Mounted on
ceph-fuse  0 0 0- /srv/smb

If I 'ls' I see the expected files:
# ls -alh
total 6.0K
drwxrwxr-x+ 1 root smbadmin  18G Jul  5 17:06 .
drwxr-xr-x  5 root smbadmin 4.0K Jun 16  2017 ..
drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache
drwxrwxr-x+ 1 smbadmin smbadmin  15G Jul  6 11:51 instr_files
drwxrwx---+ 1 smbadmin smbadmin0 Jul  6 11:50 mcdermott-group

Quotas are being used:
getfattr --only-values -n ceph.quota.max_bytes /srv/smb
1

Turning off the quota at the mountpoint allows df and mount to work
correctly.

I'm running 12.2.4 on the servers and 12.2.5 on the client.


That's pretty weird, not something I recall seeing before.  When
quotas are in use, Ceph is implementing the same statfs() hook to
report usage to the OS, but it's doing a getattr() call to the MDS
inside that function.  I wonder if something is going slowly, and
perhaps the OS is ignoring filesystems that don't return promptly, to
avoid hanging "df" on a misbehaving filesystem?

I'd debug this by setting "debug ms = 1", and finding the client's log
in /var/log/ceph.

John



Is there a bug report for this?
Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how can time machine know difference between cephfs fuse and kernel client?

2018-08-17 Thread Chad William Seys
Also, when using cephfs fuse client,  Windows File History reports no 
space free.  Free Space: 0 bytes, Total Space: 186 GB.


C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how can time machine know difference between cephfs fuse and kernel client?

2018-08-17 Thread Chad William Seys

Hello all,
  I have used cephfs served over Samba to set up a "time capsule" 
server.  However, I could only get this to work using the cephfs kernel 
module.  Time machine would give errors if cephfs were mounted with 
fuse. (Sorry, I didn't write down the error messages!)
  Anyone have an idea how the two methods of mounting are detectable by 
time machine through Samba?
  Windows 10 File History behaved the same way.  Error messages are 
"Could not enable File History. There is not enough space on the disk". 
(Although it shows the correct amount of space.) And "File History 
doesn't recognize this drive."
  I'd like to use cephfs fuse for the quota support.  (The kernel 
client is said to support quotas with Mimic and kernel version >= 4.17, 
but that is to cutting edge for me ATM.)


Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephfs fuse versus kernel performance

2018-08-15 Thread Chad William Seys

Hi all,
  Anyone know of benchmarks of cephfs through fuse versus kernel?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-07-09 Thread Chad William Seys

Hi Greg,

Am i reading this right that you've got a 1-*byte* quota but have 
gigabytes of data in the tree?
I have no idea what that might do to the system, but it wouldn't totally 
surprise me if that was messing something up. Since <10KB definitely 
rounds towards 0...


Yeah, that directory only contains subdirectories, and those subdirs 
have separate quotes set.


E.g. getfattr --only-values -n ceph.quota.max_bytes 
/srv/smb/mcdermott-group/

2

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] luminous ceph-fuse with quotas breaks 'mount' and 'df'

2018-07-06 Thread Chad William Seys

Hi all,
  I'm having a problem that when I mount cephfs with a quota in the 
root mount point, no ceph-fuse appears in 'mount' and df reports:


Filesystem 1K-blocks  Used Available Use% Mounted on
ceph-fuse  0 0 0- /srv/smb

If I 'ls' I see the expected files:
# ls -alh
total 6.0K
drwxrwxr-x+ 1 root smbadmin  18G Jul  5 17:06 .
drwxr-xr-x  5 root smbadmin 4.0K Jun 16  2017 ..
drwxrwx---+ 1 smbadmin smbadmin 3.0G Jan 18 10:50 bigfix-relay-cache
drwxrwxr-x+ 1 smbadmin smbadmin  15G Jul  6 11:51 instr_files
drwxrwx---+ 1 smbadmin smbadmin0 Jul  6 11:50 mcdermott-group

Quotas are being used:
getfattr --only-values -n ceph.quota.max_bytes /srv/smb
1

Turning off the quota at the mountpoint allows df and mount to work 
correctly.


I'm running 12.2.4 on the servers and 12.2.5 on the client.

Is there a bug report for this?
Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osds with different disk sizes may killing, > performance (?? ?)

2018-04-18 Thread Chad William Seys
You'll find it said time and time agin on the ML... avoid disks of 
different sizes in the same cluster. It's a headache that sucks. It's

not impossible, it's not even overly hard to pull off... but it's
very easy to cause a mess and a lot of headaches. It will also make
it harder to diagnose performance issues in the cluster.

Not very practical for clusters which aren't new.


There is no way to fill up all disks evenly with the same number of
Bytes and then stop filling the small disks when they're full and
only continue filling the larger disks.


This is possible with adjusting crush weights.  Initially the smaller 
drives are weighted more highly than larger drives.  As data gets added 
the weights are changed so that larger drives continue to fill while no 
drives becomes overfull.



What will happen if you are filling all disks evenly with Bytes
instead of % is that the small disks will get filled completely and
all writes to the cluster will block until you do something to reduce
the amount used on the full disks.
That means the crush weights were not adjusted correctly as the cluster 
filled.



but in this case you would have a steep drop off of performance. when
you reach the fill level where small drives do not accept more data,
suddenly you would have a performance cliff where only your larger disks
are doing new writes. and only larger disks doing reads on new data.


Good point!  Although if this is implemented by changing crush weights, 
adjusting the weights as the cluster fills will cause the data to churn 
and the new data will not only be assigned to larger drives. :)


Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] osds with different disk sizes may killing, > performance (?? ?)

2018-04-16 Thread Chad William Seys
You'll find it said time and time agin on the ML... avoid disks of 
different sizes in the same cluster. It's a headache that sucks. It's

not impossible, it's not even overly hard to pull off... but it's
very easy to cause a mess and a lot of headaches. It will also make
it harder to diagnose performance issues in the cluster.

Not very practical for clusters which aren't new.


There is no way to fill up all disks evenly with the same number of
Bytes and then stop filling the small disks when they're full and
only continue filling the larger disks.


This is possible with adjusting crush weights.  Initially the smaller 
drives are weighted more highly than larger drives.  As data gets added 
the weights are changed so that larger drives continue to fill while no 
drives becomes overfull.



What will happen if you are filling all disks evenly with Bytes
instead of % is that the small disks will get filled completely and
all writes to the cluster will block until you do something to reduce
the amount used on the full disks.
That means the crush weights were not adjusted correctly as the cluster 
filled.



but in this case you would have a steep drop off of performance. when
you reach the fill level where small drives do not accept more data,
suddenly you would have a performance cliff where only your larger disks
are doing new writes. and only larger disks doing reads on new data.


Good point!  Although if this is implemented by changing crush weights, 
adjusting the weights as the cluster fills will cause the data to churn 
and the new data will not only be assigned to larger drives. :)


Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osds with different disk sizes may killing performance (?? ?)

2018-04-12 Thread Chad William Seys

Hello,
  I think your observations suggest that, to a first approximation, 
filling drives with bytes to the same absolute level is better for 
performance than filling drives to the same percentage full. Assuming 
random distribution of PGs, this would cause the smallest drives to be 
as active as the largest drives.
  E.g. if every drive had 1TB of data, each would be equally likely to 
contain the PG of interest.
  Of course, as more data was added the smallest drives could not hold 
more and the larger drives become more active, but at least the smaller 
drives would as active as possible.


Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rbd and cephfs (data) in one pool?

2017-12-27 Thread Chad William Seys

Hello,
  Is it possible to place rbd and cephfs data in the same pool?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what does associating ceph pool to application do?

2017-10-06 Thread Chad William Seys
Thanks John!  I see that a pool can have more than one "application". 
Should I feel free to combine uses (e.g. cephfs,rbd) or is this 
counterindicated?


Thanks!
Chad.


Just to stern this up a bit...

In the future, you may find that things stop working if you remove the
application tags.

For example, in Mimic, application tags will be involved in
authentication for CephFS, and if you removed the cephfs tags from
your data pool, your clients would stop being able to access it.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what does associating ceph pool to application do?

2017-10-06 Thread Chad William Seys

Scrolled down a bit and found this blog post:
https://ceph.com/community/new-luminous-pool-tags/

If things haven't changed:

   Could someone tell me / link to what associating a ceph pool to an 
application does?


ATM it's a tag and does nothing to the pool/PG/etc structure

   I hope this info includes why "Disabling an application within a pool 
might result in loss of application functionality" when running 'ceph 
osd application disable  '


A stern warning to ignore.

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] what does associating ceph pool to application do?

2017-10-06 Thread Chad William Seys

Hi All,
  Could someone tell me / link to what associating a ceph pool to an 
application does?
  I hope this info includes why "Disabling an application within a pool 
might result in loss of application functionality" when running 'ceph 
osd application disable  '


Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-10-05 Thread Chad William Seys

Thanks David,
  When I convert to bluestore and the dust settles I hope to do a same 
cluster comparison and post here!


Chad.

On 09/30/2017 07:29 PM, David Turner wrote:

 > In my case, the replica-3 and k2m2 are stored on the same spinning disks.

That is exactly what I meant by same pool.  The only way for a cache to 
make sense would be if the data being written or read will be modified 
or heavily read for X amount of time and then ignored.


If things are rarely read, and randomly so, them prompting then into a 
cache tier just makes you wait for the object to be promoted to cache 
before you read it once or twice before it sits in there until it's 
demoted again.  If you have random io and anything can really be read 
next, then a cache tier on the same disks as the EC pool will only cause 
things to be promoted and demoted for no apparent reason.


You can always test this for your use case and see if it helps enough to 
create a pool and tier that you need to manage or not. I'm planning to 
remove my cephfs cache tier once I upgrade to Luminous as I only have it 
as a requirement. It causes me to show down my writes heavily as 
eviction io is useless and wasteful of cluster io for me.  I haven't 
checked on the process for that yet, but I'm assuming it's a set command 
on the pool that will then allow me to disable and remove the cache 
tier.  I mention that because if it is that easy to enable/disable, then 
testing it should be simple and easy to compare.



On Sat, Sep 30, 2017, 8:10 PM Chad William Seys <cws...@physics.wisc.edu 
<mailto:cws...@physics.wisc.edu>> wrote:


Hi David,
    Thanks for the clarification.  Reminded me of some details I forgot
to mention.
    In my case, the replica-3 and k2m2 are stored on the same spinning
disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting
PG only takes up the same amount of space as a replica-2 while allowing
2 disks to fail like replica-3 without loss.)
    I'm using this setup as RBDs and cephfs to store things like local
mirrors of linux packages and drive images to be broadcast over network.
   Seems to be about as fast as a normal hard drive. :)
    So is this the situation where the "cache tier [is] ont the same
root
of osds as the EC pool"?

Thanks for the advice!
Chad.

On 09/30/2017 12:32 PM, David Turner wrote:
 > I can only think of 1 type of cache tier usage that is faster if
you are
 > using the cache tier on the same root of osds as the EC pool. 
That is

 > cold storage where the file is written initially, modified and
read door
 > the first X hours, and then remains in cold storage for the
remainder of
 > its life with rate reads.
 >
 > Other than that there are a few use cases using a faster root of osds
 > that might make sense, but generally it's still better to utilize
that
 > faster storage in the rest of the osd stack either as journals for
 > filestore or Wal/DB partitions for bluestore.
 >
 >
 > On Sat, Sep 30, 2017, 12:56 PM Chad William Seys
 > <cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>
<mailto:cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>>>
wrote:
 >
 >     Hi all,
 >         Now that Luminous supports direct writing to EC pools I was
 >     wondering
 >     if one can get more performance out of an erasure-coded pool with
 >     overwrites or an erasure-coded pool with a cache tier?
 >         I currently have a 3 replica pool in front of a k2m2
erasure coded
 >     pool.  Luminous documentation on cache tiering
 >

http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution
 >     makes it sound like cache tiering is usually not recommonded.
 >
 >     Thanks!
 >     Chad.
 >     ___
 >     ceph-users mailing list
 > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
<mailto:ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>>
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-09-30 Thread Chad William Seys

Hi David,
  Thanks for the clarification.  Reminded me of some details I forgot 
to mention.
  In my case, the replica-3 and k2m2 are stored on the same spinning 
disks. (Mainly using EC for "compression" b/c with the EC k2m2 setting 
PG only takes up the same amount of space as a replica-2 while allowing 
2 disks to fail like replica-3 without loss.)
  I'm using this setup as RBDs and cephfs to store things like local 
mirrors of linux packages and drive images to be broadcast over network. 
 Seems to be about as fast as a normal hard drive. :)
  So is this the situation where the "cache tier [is] ont the same root 
of osds as the EC pool"?


Thanks for the advice!
Chad.

On 09/30/2017 12:32 PM, David Turner wrote:
I can only think of 1 type of cache tier usage that is faster if you are 
using the cache tier on the same root of osds as the EC pool.  That is 
cold storage where the file is written initially, modified and read door 
the first X hours, and then remains in cold storage for the remainder of 
its life with rate reads.


Other than that there are a few use cases using a faster root of osds 
that might make sense, but generally it's still better to utilize that 
faster storage in the rest of the osd stack either as journals for 
filestore or Wal/DB partitions for bluestore.



On Sat, Sep 30, 2017, 12:56 PM Chad William Seys 
<cws...@physics.wisc.edu <mailto:cws...@physics.wisc.edu>> wrote:


Hi all,
    Now that Luminous supports direct writing to EC pools I was
wondering
if one can get more performance out of an erasure-coded pool with
overwrites or an erasure-coded pool with a cache tier?
    I currently have a 3 replica pool in front of a k2m2 erasure coded
pool.  Luminous documentation on cache tiering

http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution
makes it sound like cache tiering is usually not recommonded.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure-coded with overwrites versus erasure-coded with cache tiering

2017-09-30 Thread Chad William Seys

Hi all,
  Now that Luminous supports direct writing to EC pools I was wondering 
if one can get more performance out of an erasure-coded pool with 
overwrites or an erasure-coded pool with a cache tier?
  I currently have a 3 replica pool in front of a k2m2 erasure coded 
pool.  Luminous documentation on cache tiering 
http://docs.ceph.com/docs/luminous/rados/operations/cache-tiering/#a-word-of-caution 
makes it sound like cache tiering is usually not recommonded.


Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mds fails to start after upgrading to 10.2.6

2017-03-16 Thread Chad William Seys

Hi All,
  After upgrading to 10.2.6 on Debian Jessie, the MDS server fails to 
start.  Below is what is written to the log file from attempted start to 
failure:

  Any ideas?  I'll probably try rolling back to 10.2.5 in the meantime.

Thanks!
C.

On 03/16/2017 12:48 PM, r...@mds01.hep.wisc.edu wrote:

2017-03-16 12:46:38.063709 7f605e746180  0 set uid:gid to 64045:64045 
(ceph:ceph)
2017-03-16 12:46:38.063825 7f605e746180  0 ceph version 10.2.6 
(656b5b63ed7c43bd014bcafd81b001959d5f089f), process ceph-mds, pid 10858
2017-03-16 12:46:39.755982 7f6057b62700  1 mds.mds01.hep.wisc.edu 
handle_mds_map standby
2017-03-16 12:46:39.898430 7f6057b62700  1 mds.0.4072 handle_mds_map i am now 
mds.0.4072
2017-03-16 12:46:39.898437 7f6057b62700  1 mds.0.4072 handle_mds_map state change 
up:boot --> up:replay
2017-03-16 12:46:39.898459 7f6057b62700  1 mds.0.4072 replay_start
2017-03-16 12:46:39.898466 7f6057b62700  1 mds.0.4072  recovery set is
2017-03-16 12:46:39.898475 7f6057b62700  1 mds.0.4072  waiting for osdmap 
253396 (which blacklists prior instance)
2017-03-16 12:46:40.227204 7f6052956700  0 mds.0.cache creating system inode 
with ino:100
2017-03-16 12:46:40.227569 7f6052956700  0 mds.0.cache creating system inode 
with ino:1
2017-03-16 12:46:40.954494 7f6050d48700  1 mds.0.4072 replay_done
2017-03-16 12:46:40.954526 7f6050d48700  1 mds.0.4072 making mds journal 
writeable
2017-03-16 12:46:42.211070 7f6057b62700  1 mds.0.4072 handle_mds_map i am now 
mds.0.4072
2017-03-16 12:46:42.211074 7f6057b62700  1 mds.0.4072 handle_mds_map state change 
up:replay --> up:reconnect
2017-03-16 12:46:42.211094 7f6057b62700  1 mds.0.4072 reconnect_start
2017-03-16 12:46:42.211098 7f6057b62700  1 mds.0.4072 reopen_log
2017-03-16 12:46:42.211105 7f6057b62700  1 mds.0.server reconnect_clients -- 5 
sessions
2017-03-16 12:47:28.502417 7f605535d700  1 mds.0.server reconnect gave up on 
client.14384220 10.128.198.55:0/2012593454
2017-03-16 12:47:28.505126 7f605535d700 -1 ./include/interval_set.h: In function 
'void interval_set::insert(T, T, T*, T*) [with T = inodeno_t]' thread 
7f605535d700 time 2017-03-16 12:47:28.502496
./include/interval_set.h: 355: FAILED assert(0)

 ceph version 10.2.6 (656b5b63ed7c43bd014bcafd81b001959d5f089f)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x82) 
[0x7f605e248ed2]
 2: (()+0x1ea5fe) [0x7f605de245fe]
 3: (InoTable::project_release_ids(interval_set&)+0x917) 
[0x7f605e065ad7]
 4: (Server::journal_close_session(Session*, int, Context*)+0x18e) 
[0x7f605de89f1e]
 5: (Server::kill_session(Session*, Context*)+0x133) [0x7f605de8bf23]
 6: (Server::reconnect_tick()+0x148) [0x7f605de8d378]
 7: (MDSRankDispatcher::tick()+0x389) [0x7f605de524d9]
 8: (Context::complete(int)+0x9) [0x7f605de3fcd9]
 9: (SafeTimer::timer_thread()+0x104) [0x7f605e239e84]
 10: (SafeTimerThread::entry()+0xd) [0x7f605e23ad2d]
 11: (()+0x8064) [0x7f605d53d064]
 12: (clone()+0x6d) [0x7f605ba8262d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- begin dump of recent events ---
  -263> 2017-03-16 12:46:38.056353 7f605e746180  5 asok(0x7f6068a2a000) 
register_command perfcounters_dump hook 0x7f6068a06030
  -262> 2017-03-16 12:46:38.056425 7f605e746180  5 asok(0x7f6068a2a000) 
register_command 1 hook 0x7f6068a06030
  -261> 2017-03-16 12:46:38.056431 7f605e746180  5 asok(0x7f6068a2a000) 
register_command perf dump hook 0x7f6068a06030
  -260> 2017-03-16 12:46:38.056434 7f605e746180  5 asok(0x7f6068a2a000) 
register_command perfcounters_schema hook 0x7f6068a06030
  -259> 2017-03-16 12:46:38.056437 7f605e746180  5 asok(0x7f6068a2a000) 
register_command 2 hook 0x7f6068a06030
  -258> 2017-03-16 12:46:38.056440 7f605e746180  5 asok(0x7f6068a2a000) 
register_command perf schema hook 0x7f6068a06030
  -257> 2017-03-16 12:46:38.056444 7f605e746180  5 asok(0x7f6068a2a000) 
register_command perf reset hook 0x7f6068a06030
  -256> 2017-03-16 12:46:38.056448 7f605e746180  5 asok(0x7f6068a2a000) 
register_command config show hook 0x7f6068a06030
  -255> 2017-03-16 12:46:38.056457 7f605e746180  5 asok(0x7f6068a2a000) 
register_command config set hook 0x7f6068a06030
  -254> 2017-03-16 12:46:38.056461 7f605e746180  5 asok(0x7f6068a2a000) 
register_command config get hook 0x7f6068a06030
  -253> 2017-03-16 12:46:38.056464 7f605e746180  5 asok(0x7f6068a2a000) 
register_command config diff hook 0x7f6068a06030
  -252> 2017-03-16 12:46:38.056466 7f605e746180  5 asok(0x7f6068a2a000) 
register_command log flush hook 0x7f6068a06030
  -251> 2017-03-16 12:46:38.056469 7f605e746180  5 asok(0x7f6068a2a000) 
register_command log dump hook 0x7f6068a06030
  -250> 2017-03-16 12:46:38.056472 7f605e746180  5 asok(0x7f6068a2a000) 
register_command log reopen hook 0x7f6068a06030
  -249> 2017-03-16 12:46:38.063709 7f605e746180  0 set uid:gid to 64045:64045 
(ceph:ceph)
  -248> 2017-03-16 12:46:38.063825 7f605e746180  0 ceph version 10.2.6 
(656b5b63ed7c43bd014bcafd81b001959d5f089f), process ceph-mds, pid 10858
  

Re: [ceph-users] removing ceph.quota.max_bytes

2017-02-20 Thread Chad William Seys

Thanks!  Seems non-standard, but it works. :)

C.


Anyone know what's wrong?


You can clear these by setting them to zero.

John


Everything is Jewel 10.2.5.

Thanks!
Chad.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] removing ceph.quota.max_bytes

2017-02-16 Thread Chad William Seys

Hi All,
  I'm trying to remove the extended attribute "ceph.quota.max_bytes" on 
a cephfs directory.
  I've fuse mounted a subdirectory of a cephfs filesystem under 
/ceph/cephfs .

  Next I set "ceph.quota.max_bytes"
setfattr -n ceph.quota.max_bytes -v 123456 /ceph/cephfs
  And check the attribute:
getfattr  -n ceph.quota.max_bytes /ceph/cephfs
getfattr: Removing leading '/' from absolute path names
# file: ceph/cephfs
ceph.quota.max_bytes="123456"

Then I try to remove:
setfattr -x ceph.quota.max_bytes /ceph/cephfs/
setfattr: /ceph/cephfs/: No such attribute

Anyone know what's wrong?

Everything is Jewel 10.2.5.

Thanks!
Chad.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 10.2.5 on Jessie?

2016-12-21 Thread Chad William Seys

Thanks ceph@jack and Alexandre for the reassurance!
C.

On 12/20/2016 08:37 PM, Alexandre DERUMIER wrote:

I have upgrade 3 jewel cluster on jessie to last 10.2.5, works fine.


- Mail original -
De: "Chad William Seys" <cws...@physics.wisc.edu>
À: "ceph-users" <ceph-us...@ceph.com>
Envoyé: Mardi 20 Décembre 2016 17:31:49
Objet: [ceph-users] 10.2.5 on Jessie?

Hi all,
Has anyone had success/problems with 10.2.5 on Jessie? I'm being a
little cautious before updating. ;)

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 10.2.5 on Jessie?

2016-12-20 Thread Chad William Seys

Hi all,
  Has anyone had success/problems with 10.2.5 on Jessie?  I'm being a 
little cautious before updating.  ;)


Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] new feature: auto removal of osds causing "stuck inactive"

2016-10-28 Thread Chad William Seys

Hi all,
  I recently encountered a situation where some partially removed OSDs 
caused my cluster to enter a "stuck inactive" state.  The eventually 
solution was to tell ceph the OSDs were "lost".  Because all the PGs 
were replicated elsewhere on the cluster, no data was lost.


  Would it make sense or be possible for Ceph to automatically detect 
this situation ("stuck inactive" and PGs replicated elsewhere) and 
automatically take action to un-stuck the cluster?  E.g. automatically 
mark the OSD as lost or cause the OSD be down and out to have the same 
effect?


  Ideally anything that can be safely automated should be.  :)

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Blocked ops, OSD consuming memory, hammer

2016-05-27 Thread Chad William Seys
Hi Heath,
My OSDs do the exact same thing - consume lots of RAM when the cluster 
is 
reshuffling OSDs.
Try
ceph tell osd.* heap release

as a cron job.

Here's a bug:
http://tracker.ceph.com/issues/12681

Chad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] is 0.94.7 packaged well for Debian Jessie

2016-05-24 Thread Chad William Seys

Thanks!


Hammer don't use systemd unit files, so it's working fine.

(jewel/infernalis still missing systemd .target files)

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] is 0.94.7 packaged well for Debian Jessie

2016-05-24 Thread Chad William Seys
Hi All,
Has anyone tested 0.94.7 on Debian Jessie?  I've heard that the most 
recent Jewel releases for Jessie were missing pieces (systemd files) so I am a 
little more hesitant than usual.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RE; upgraded to Ubuntu 16.04, getting assert failure

2016-04-11 Thread Chad William Seys
Hi Don,
I had a similar problem starting a mon.  In my case a computer failed 
and 
I removed and recreated the 3rd mon on a new computer.  It would start but 
never get added to the other mon's lists.
Restarting the other two mons caused them to add the third to their 
monmap .

Good luck!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy not in debian repo?

2015-11-09 Thread Chad William Seys
Hi all,
I cannot find ceph-deploy in the debian catalogs.  I have these in my 
sources:

deb http://ceph.com/debian-hammer/ jessie main 

# ceph-deploy not yet in jessie repo
deb http://ceph.com/debian-hammer wheezy main

I also see ceph-deploy in the repo.
http://download.ceph.com/debian/pool/main/c/ceph-deploy/

So, not listed in the Contents files ?

Thanks,
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] copying files from one pool to another results in more free space?

2015-10-26 Thread Chad William Seys
Hi All
I'm observing some weird behavior in the amount of space ceph reports 
while copying files from an rbd image in one pool to an rbd image in another.  
The  AVAIL number reported by 'ceph df' goes up as the copy proceeds rather 
than goes down!
The output of 'ceph df'  shows that the AVAIL space is 9219G initially 
and then after the copy has proceeded for some time shows 9927G.  (See below.)
Ceph is somehow reclaiming ~700GB of space when it should only be 
losing 
space!  I like it!  :)
Some details:
I ran fstrim on the mount points of both source and destinatino mount 
points.
The data is being copied from tibs/tibs-ecpool to 
3-replica/3-replica-ec. 
tibs is a 3 replica pool which is backed by tibs-ecpool  which is a k2m2 
erasure coded pool.  3-replica/3-replica-ec is the same arrangement, but with 
fewer PGs.
No data is being deleted.
I see that USED in 3-replica/3-replica-ec is going up as expected.
But the USED in both tibs/tibs-ecpool is going down.  This appears to 
be 
from where the space is being reclaimed.

Question: why is a read of data causing it to take up less space?

Thanks!
Chad.

# ceph df
GLOBAL:
SIZE   AVAIL RAW USED %RAW USED
22908G 9219G   13689G 59.76
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS
rbd 13281 0 1990G   3
tibs22 72724M  0.31 1990G  612278
tibs-ecpool 23  4555G 19.88 2985G 1166644
cephfs_data 27  8 0 2985G   2
cephfs_metadata 28 34800k 0 1990G  28
3-replica   31   745G  3.25 1990G 5516809
3-replica-ec32   942G  4.12 2985G  241626

 copy progresses -

# ceph df
GLOBAL:
SIZE   AVAIL RAW USED %RAW USED 
22908G 9927G   12980G 56.66 
POOLS:
NAMEID USED   %USED MAX AVAIL OBJECTS 
rbd 13281 0 2284G   3 
tibs22 68456M  0.29 2284G   88561 
tibs-ecpool 23  4227G 18.45 3427G 1082734 
cephfs_data 27  8 0 3427G   2 
cephfs_metadata 28 34832k 0 2284G  28 
3-replica   31   745G  3.25 2284G 2676207 
3-replica-ec32   945G  4.13 3427G  242194 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Correct method to deploy on jessie

2015-10-06 Thread Chad William Seys
> Most users in the apt family have deployed on Ubuntu
> though, and that's what our tests run on, fyi.

That is good to know - I wouldn't be surprised if the same packages could be 
used in Ubuntu and Debian.  Especially if the release dates of the Ubuntu and 
Debian versions were similar.

Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Correct method to deploy on jessie

2015-10-01 Thread Chad William Seys
Hi Dmitry,
You might try using the wheezy repos on jessie.  Often this will work.  
(I'm using wheezy for most of my ceph nodes, but not two of the three monitor 
nodes, which are jessie with wheezy repos.)

# Wheezy repos on Jessie
deb http://ceph.com/debian-hammer/ wheezy main

Alternatively Jessie repo:
deb http://gitbuilder.ceph.com/ceph-deb-jessie-x86_64-basic/ref/v0.94.3 jessie 
main

But read this thread for tips:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-September/004142.html
E.g.
> The thing is:  whatever I write into ceph.list, ceph-deploy just
> overwrites it with "deb http://ceph.com/debian-hammer/ jessie main"
> which does not exist :(
[...]
> you can specify any repository you like with 'ceph-deploy install
> --repo-url ', given you have the repo keys installed.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-11 Thread Chad William Seys
> note that I've only did it after most of pg were recovered

My guess / hope is that heap free would also help during the recovery process.  
Recovery causing failures does not seem like the best outcome.  :)

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Chad William Seys

> Going from 2GB to 8GB is not normal, although some slight bloating is
> expected. 

If I recall correctly, Mariusz's cluster had a period of flapping OSDs?

I experienced a  a similar situation using hammer. My OSDs went from 10GB in 
RAM in a Healthy state to 24GB RAM + 10GB swap in a recovering state.  I also 
could not re-add a node b/c every time I tried OOM killer would kill an OSD 
daemon somewhere before the cluster could become healthy again.

Therefore I propose we begin expecting bloating under these circumstances.  :) 

> In your case it just got much worse than usual for reasons yet
> unknown.

Not really unknown: B/c 'ceph tell osd.* heap release' freed RAM for Mariusz, 
I think we know the reason for so much RAM use is b/c of tcmalloc not freeing 
unused memory.   Right?

Here is a related "urgent" and "won't fix" bug to which applies 
http://tracker.ceph.com/issues/12681 .  Sage suggests making the heap release 
command a cron job .   :)

Have fun!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-09 Thread Chad William Seys
On Tuesday, September 08, 2015 18:28:48 Shinobu Kinjo wrote:
> Have you ever?
> 
> http://ceph.com/docs/master/rados/troubleshooting/memory-profiling/

No.  But the command 'ceph tell osd.* heap release' did cause my OSDs to 
consume the "normal" amount of RAM.  ("normal" in this case means the same 
amount of RAM as before my cluster went through a recovery phase.

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

2015-09-09 Thread Chad William Seys
Thanks Somnath!
I found a bug in the tracker to follow: http://tracker.ceph.com/issues/12681

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Huge memory usage spike in OSD on hammer/giant

2015-09-08 Thread Chad William Seys
Does 'ceph tell osd.* heap release' help with OSD RAM usage?

From
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-August/003932.html

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RAM usage only very slowly decreases after cluster recovery

2015-08-28 Thread Chad William Seys
Thanks! 'ceph tell osd.* heap release' seems to have worked!  Guess I'll 
sprinkle it around my maintenance scripts.

Somnath Is there a plan to make jemalloc standard in Ceph in the future?

Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RAM usage only very slowly decreases after cluster recovery

2015-08-27 Thread Chad William Seys
Hi all,

It appears that OSD daemons only very slowly free RAM after an extended period 
of an unhealthy cluster (shuffling PGs around).

Prior to a power outage (and recovery) around July 25th, the amount of RAM 
used was fairly constant, at most 10GB (out of 24GB).  You can see in the 
attached PNG osd6_stack2.png (Week 30) that the amount of used RAM on 
osd06.physics.wisc.edu was holding steady around 7GB.

Around July 25th our Ceph cluster rebooted after a power outage.  Not all 
nodes booted successfully, so Ceph proceeded to shuffle PGs to attempt to 
return 
to health with the renaming nodes.  You can see in osd6_stack2.png two 
purplish spikes showing that the node used around 10GB swap space during the 
recovery period.

Finally the cluster recovered around July 31st.  During that period some I had 
to take some osd daemons out of the pool b/c their nodes ran out of swap space 
and the daemons were killed by the out of memory (OOM) kernel feature.  (The 
recovery period was probably extended by me trying to add the daemons/drives 
back. If I recall correctly that is what was occurring during the second swap 
peak.)

This RAM usage pattern is in generalthe same for all the nodes in the cluster.

Almost three weeks later, the amount of RAM used on the node is still 
decreasing, but it has not returned to pre-power outage levels. 15GB instead 
of 7GB.

Why is Ceph using 2x more RAM than it used to in steady state?

Thanks,
Chad.

(P.S.  It is really unfortunate that Ceph uses more RAM when recovering - can 
lead to cascading failure!)___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] TRIM / DISCARD run at low priority by the OSDs?

2015-08-24 Thread Chad William Seys
Hi Alexandre,

Thanks for the note.
I was not clear enough.  The fstrim I was running was only on the krbd 
mountpoints.  The backend OSDs only have standard hard disks, not SSDs, so 
they don't need to be trimmed.

Instead I was reclaiming free space as reported by Ceph.  Running fstrim on 
the rbd mountpoints this caused the OSDs to become very busy, affecting all 
rbds, not just those being trimmed.

I was hoping someone had an idea of how to make the OSDs not become busy while 
running fstrim on the rbd mountpoints.  E.g. if Ceph made a distinction 
between trim operations on RBDs and other types, it could give those 
operations lower priority.

Thanks again!
Chad.


On Monday, August 24, 2015 18:26:30 you wrote:
 Hi,
 
 I'm not sure for krbd, but with librbd, using trim/discard on the client,
 
 don't do trim/discard on the osd physical disk.
 
 It's simply write zeroes in the rbd image.
 
 zeores write can be skipped since this commit (librbd related)
 https://github.com/xiaoxichen/ceph/commit/e7812b8416012141cf8faef577e7b27e1b
 29d5e3 +OPTION(rbd_skip_partial_discard, OPT_BOOL, false)
 
 
 Then you can still manage fstrim manually on the osd servers
 
 - Mail original -
 De: Chad William Seys cws...@physics.wisc.edu
 À: ceph-users ceph-us...@ceph.com
 Envoyé: Samedi 22 Août 2015 04:26:38
 Objet: [ceph-users] TRIM / DISCARD run at low priority by the OSDs?
 
 Hi All,
 
 Is it possible to give TRIM / DISCARD initiated by krbd low priority on the
 OSDs?
 
 I know it is possible to run fstrim at Idle priority on the rbd mount point,
 e.g. ionice -c Idle fstrim -v $MOUNT .
 
 But this Idle priority (it appears) only is within the context of the node
 executing fstrim . If the node executing fstrim is Idle then the OSDs are
 very busy and performance suffers.
 
 Is it possible to tell the OSD daemons (or whatever) to perform the TRIMs at
 low priority also?
 
 Thanks!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] TRIM / DISCARD run at low priority by the OSDs?

2015-08-21 Thread Chad William Seys
Hi All,

Is it possible to give TRIM / DISCARD initiated by krbd low priority on the 
OSDs?

I know it is possible to run fstrim at Idle priority on the rbd mount point, 
e.g. ionice -c Idle fstrim -v $MOUNT .  

But this Idle priority (it appears) only is within the context of the node 
executing fstrim .  If the node executing fstrim is Idle then the OSDs are 
very busy and performance suffers.

Is it possible to tell the OSD daemons (or whatever) to perform the TRIMs at 
low priority also?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why are there degraded PGs when adding OSDs?

2015-07-27 Thread Chad William Seys
Hi Sam,

 The pg might also be degraded right after a map change which changes the
 up/acting sets since the few objects updated right before the map change
 might be new on some replicas and old on the other replicas.  While in that
 state, those specific objects are degraded, and the pg would report
 degraded until they are recovered (which would happen asap, prior to
 backfilling the new replica). -Sam

That sounds like only a few PGs should be degraded.  I instead have about 45% 
(and higher earlier).

# ceph -s
cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6
 health HEALTH_WARN
2081 pgs backfill
6745 pgs degraded
17 pgs recovering
6728 pgs recovery_wait
6745 pgs stuck degraded
8826 pgs stuck unclean
recovery 2530124/5557452 objects degraded (45.527%)
recovery 33594/5557452 objects misplaced (0.604%)
 monmap e5: 3 mons at 
{mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.198.51:6789/0}
election epoch 16458, quorum 0,1,2 mon03,mon01,mon02
 mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active}
 osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs
  pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects
11122 GB used, 11786 GB / 22908 GB avail
2530124/5557452 objects degraded (45.527%)
33594/5557452 objects misplaced (0.604%)
9606 active+clean
6726 active+recovery_wait+degraded
2081 active+remapped+wait_backfill
  17 active+recovering+degraded
   2 active+recovery_wait+degraded+remapped
recovery io 24861 kB/s, 6 objects/s

Chad.

 
 - Original Message -
 From: Chad William Seys cws...@physics.wisc.edu
 To: ceph-users ceph-us...@ceph.com
 Sent: Monday, July 27, 2015 12:27:26 PM
 Subject: [ceph-users] why are there degraded PGs when adding OSDs?
 
 Hi All,
 
 I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that
 'ceph -s' reported both misplaced AND degraded PGs.
 
 Why should any PGs become degraded?  Seems as though Ceph should only be
 reporting misplaced PGs?
 
 From the Giant release notes:
 Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related
 commands now make a distinction between data that is degraded (there are
 fewer than the desired number of copies) and data that is misplaced (stored
 in the wrong location in the cluster). The distinction is important because
 the latter does not compromise data safety.
 
 Does Ceph delete some replicas of the PGs (leading to degradation) before
 re- replicating on the new OSD?
 
 This does not seem to be the safest algorithm.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] why are there degraded PGs when adding OSDs?

2015-07-27 Thread Chad William Seys
Hi All,

I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that 'ceph 
-s' reported both misplaced AND degraded PGs.

Why should any PGs become degraded?  Seems as though Ceph should only be 
reporting misplaced PGs?

From the Giant release notes:
Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related 
commands now make a distinction between data that is degraded (there are fewer 
than the desired number of copies) and data that is misplaced (stored in the 
wrong location in the cluster). The distinction is important because the 
latter does not compromise data safety.

Does Ceph delete some replicas of the PGs (leading to degradation) before re-
replicating on the new OSD?

This does not seem to be the safest algorithm.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why are there degraded PGs when adding OSDs?

2015-07-27 Thread Chad William Seys
Hi Sam,
I'll need help getting the osdmap and pg dump prior to addition.
I can remove the OSDs and add again if the osdmap (etc.) is not logged 
somewhere.

Chad.

 Hmm, that's odd.  Can you attach the osdmap and ceph pg dump prior to the
 addition (with all pgs active+clean), then the osdmap and ceph pg dump
 afterwards? -Sam
 
 - Original Message -
 From: Chad William Seys cws...@physics.wisc.edu
 To: Samuel Just sj...@redhat.com, ceph-users ceph-us...@ceph.com
 Sent: Monday, July 27, 2015 12:57:23 PM
 Subject: Re: [ceph-users] why are there degraded PGs when adding OSDs?
 
 Hi Sam,
 
  The pg might also be degraded right after a map change which changes the
  up/acting sets since the few objects updated right before the map change
  might be new on some replicas and old on the other replicas.  While in
  that
  state, those specific objects are degraded, and the pg would report
  degraded until they are recovered (which would happen asap, prior to
  backfilling the new replica). -Sam
 
 That sounds like only a few PGs should be degraded.  I instead have about
 45% (and higher earlier).
 
 # ceph -s
 cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6
  health HEALTH_WARN
 2081 pgs backfill
 6745 pgs degraded
 17 pgs recovering
 6728 pgs recovery_wait
 6745 pgs stuck degraded
 8826 pgs stuck unclean
 recovery 2530124/5557452 objects degraded (45.527%)
 recovery 33594/5557452 objects misplaced (0.604%)
  monmap e5: 3 mons at
 {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.198.
 51:6789/0} election epoch 16458, quorum 0,1,2 mon03,mon01,mon02
  mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active}
  osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs
   pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects
 11122 GB used, 11786 GB / 22908 GB avail
 2530124/5557452 objects degraded (45.527%)
 33594/5557452 objects misplaced (0.604%)
 9606 active+clean
 6726 active+recovery_wait+degraded
 2081 active+remapped+wait_backfill
   17 active+recovering+degraded
2 active+recovery_wait+degraded+remapped
 recovery io 24861 kB/s, 6 objects/s
 
 Chad.
 
  - Original Message -
  From: Chad William Seys cws...@physics.wisc.edu
  To: ceph-users ceph-us...@ceph.com
  Sent: Monday, July 27, 2015 12:27:26 PM
  Subject: [ceph-users] why are there degraded PGs when adding OSDs?
  
  Hi All,
  
  I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that
  'ceph -s' reported both misplaced AND degraded PGs.
  
  Why should any PGs become degraded?  Seems as though Ceph should only be
  reporting misplaced PGs?
  
  From the Giant release notes:
  Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and related
  commands now make a distinction between data that is degraded (there are
  fewer than the desired number of copies) and data that is misplaced
  (stored
  in the wrong location in the cluster). The distinction is important
  because
  the latter does not compromise data safety.
  
  Does Ceph delete some replicas of the PGs (leading to degradation) before
  re- replicating on the new OSD?
  
  This does not seem to be the safest algorithm.
  ___
  ceph-users mailing list
  ceph-users@lists.ceph.com
  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] why are there degraded PGs when adding OSDs?

2015-07-27 Thread Chad William Seys
Hi Sam,
I think I may have the problem:  I noticed that the new host was created with 
straw2 instead of straw.  Would this account for 50% of PGs being degraded?

(I'm removing the OSDs on that host and will recreate with 'firefly' tunables.)

Thanks!
Chad.

On Monday, July 27, 2015 15:09:21 Chad William Seys wrote:
 Hi Sam,
   I'll need help getting the osdmap and pg dump prior to addition.
   I can remove the OSDs and add again if the osdmap (etc.) is not logged
 somewhere.
 
 Chad.
 
  Hmm, that's odd.  Can you attach the osdmap and ceph pg dump prior to the
  addition (with all pgs active+clean), then the osdmap and ceph pg dump
  afterwards? -Sam
  
  - Original Message -
  From: Chad William Seys cws...@physics.wisc.edu
  To: Samuel Just sj...@redhat.com, ceph-users ceph-us...@ceph.com
  Sent: Monday, July 27, 2015 12:57:23 PM
  Subject: Re: [ceph-users] why are there degraded PGs when adding OSDs?
  
  Hi Sam,
  
   The pg might also be degraded right after a map change which changes the
   up/acting sets since the few objects updated right before the map change
   might be new on some replicas and old on the other replicas.  While in
   that
   state, those specific objects are degraded, and the pg would report
   degraded until they are recovered (which would happen asap, prior to
   backfilling the new replica). -Sam
  
  That sounds like only a few PGs should be degraded.  I instead have about
  45% (and higher earlier).
  
  # ceph -s
  
  cluster 7797e50e-f4b3-42f6-8454-2e2b19fa41d6
  
   health HEALTH_WARN
   
  2081 pgs backfill
  6745 pgs degraded
  17 pgs recovering
  6728 pgs recovery_wait
  6745 pgs stuck degraded
  8826 pgs stuck unclean
  recovery 2530124/5557452 objects degraded (45.527%)
  recovery 33594/5557452 objects misplaced (0.604%)
   
   monmap e5: 3 mons at
  
  {mon01=128.104.164.197:6789/0,mon02=128.104.164.198:6789/0,mon03=10.128.19
  8. 51:6789/0} election epoch 16458, quorum 0,1,2 mon03,mon01,mon02
  
   mdsmap e3032: 1/1/1 up {0=mds01.hep.wisc.edu=up:active}
   osdmap e149761: 27 osds: 27 up, 27 in; 2083 remapped pgs
   
pgmap v13464928: 18432 pgs, 9 pools, 5401 GB data, 1364 kobjects

  11122 GB used, 11786 GB / 22908 GB avail
  2530124/5557452 objects degraded (45.527%)
  33594/5557452 objects misplaced (0.604%)
  
  9606 active+clean
  6726 active+recovery_wait+degraded
  2081 active+remapped+wait_backfill
  
17 active+recovering+degraded

 2 active+recovery_wait+degraded+remapped
  
  recovery io 24861 kB/s, 6 objects/s
  
  Chad.
  
   - Original Message -
   From: Chad William Seys cws...@physics.wisc.edu
   To: ceph-users ceph-us...@ceph.com
   Sent: Monday, July 27, 2015 12:27:26 PM
   Subject: [ceph-users] why are there degraded PGs when adding OSDs?
   
   Hi All,
   
   I recently added some OSDs to the Ceph cluster (0.94.2). I noticed that
   'ceph -s' reported both misplaced AND degraded PGs.
   
   Why should any PGs become degraded?  Seems as though Ceph should only be
   reporting misplaced PGs?
   
   From the Giant release notes:
   Degraded vs misplaced: the Ceph health reports from ‘ceph -s’ and
   related
   commands now make a distinction between data that is degraded (there are
   fewer than the desired number of copies) and data that is misplaced
   (stored
   in the wrong location in the cluster). The distinction is important
   because
   the latter does not compromise data safety.
   
   Does Ceph delete some replicas of the PGs (leading to degradation)
   before
   re- replicating on the new OSD?
   
   This does not seem to be the safest algorithm.
   ___
   ceph-users mailing list
   ceph-users@lists.ceph.com
   http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel version for rbd client and hammer tunables

2015-05-12 Thread Chad William Seys
Hi Ilya and all,
Thanks for explaining.
I'm confused about what building a crushmap means.
After running
#ceph osd crush tunables hammer
data migrated around the cluster, so something changed.
I was expecting that 'straw' would be replaced by 'straw2'.  
(Unfortunately I did not dump the crushmap prior to setting tunables to 
hammer, so I don't know what change did occur.)
So I guess setting tunables to hammer is not building a crushmap.  
Could you give examples?  Would creating a new pool on the cluster now use 
straw2?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] kernel version for rbd client and hammer tunables

2015-05-12 Thread Chad William Seys
 No, pools use crush rulesets.  straw and straw2 are bucket types
 (or algorithms).
 
 As an example, if you do ceph osd crush add-bucket foo rack on
 a cluster with firefly tunables, you will get a new straw bucket.  The
 same after doing ceph osd crush tunables hammer will get you a new
 straw2 bucket, with the rest of your buckets remaining unaffected.
 straw buckets are not going to be replaced with straw2 buckets, that's
 something you as an administrator can make a choice to do.

Ah, I see now that 'alg straw' in my crushmap is in the osdXX groups.
What happens if I run
#ceph osd crush tunables hammer
and then add a new OSD?

Thanks again!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] kernel version for rbd client and hammer tunables

2015-05-12 Thread Chad William Seys
Hi Ilya and all,
Is it safe to use kernel 3.16.7 rbd with Hammer tunables?  I've tried 
this on a test Hammer cluster and the client seems to work fine.
I've also mounted cephfs on a Hammer cluster (and Hammer tunables) 
using 
kernel 3.16.  It seems to work fine (but not much testing).  I remember 
recently someone asking about mounting cephfs with Hammer tunables and it was 
stated that kernel 4.1 was needed for mounting.
Is the more correct statement that mounting is possible but problems 
will 
exist?

Thanks!
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to display client io in hammer

2015-05-04 Thread Chad William Seys
Hi all,
Looks like in Hammer 'ceph -s' no longer displays client IO and ops.
How does one display that these days?

Thanks,
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to display client io in hammer

2015-05-04 Thread Chad William Seys
Ooops!
Turns out I forgot to mount the ceph rbd, so no client IO displayed!

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Kernel version for CephFS client ?

2015-05-04 Thread Chad William Seys
Hi Florent,
  Most likely Debian will release backported kernels for Jessie, as they 
have for Wheezy.
  E.g. Wheezy has had kernel 3.16 backported to it:

https://packages.debian.org/search?suite=wheezy-backportssearchon=nameskeywords=linux-image-amd64

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph.com documentation suggestions

2015-04-21 Thread Chad William Seys
Hi,
   I've recently seen some confusion over the number of PGs per pool versus 
per cluster on the mailing list.  I also set too many PGs per pool b/c of this 
confusion.

IMO, it is fairly confusing to talk about PGs on the Pool page, but only 
vaguely talk about the number of PGs for the cluster.

Here are some examples of confusing statements with suggested alternatives 
from the online docs:

http://ceph.com/docs/master/rados/operations/pools/

A typical configuration uses approximately 100 placement groups per OSD to 
provide optimal balancing without using up too many computing resources.
-
A typical configuration uses approximately 100 placement groups per OSD for 
all pools in the cluster to provide optimal balancing without using up too 
many computing resources.


http://ceph.com/docs/master/rados/operations/placement-groups/

It is mandatory to choose the value of pg_num because it cannot be calculated 
automatically. Here are a few values commonly used:
-
It is mandatory to choose the value of pg_num.  pg_num depends on the 
planned number of pools in the cluster. It cannot be determined automatically 
on pool creation. Please use this calculator: http://ceph.com/pgcalc/;

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] advantages of multiple pools?

2015-04-17 Thread Chad William Seys
Hi All,
   What are the advantages of having multiple ceph pools (if they use the 
whole cluster)?
   Thanks!

C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph on Debian Jessie stopped working

2015-04-17 Thread Chad William Seys
Hi Greg,
   Thanks for the reply.  After looking more closely at /etc/ceph/rbdmap I 
discovered it was corrupted.  That was the only problem.

I think the dmesg line
'rbd: no image name provided'
is also a clue to this!

Hope that helps any other newbies!  :)

Thanks again,
Chad.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upgrade from Giant 0.87-1 to Hammer 0.94-1

2015-04-17 Thread Chad William Seys
Now I also know I have too many PGs!  

It is fairly confusing to talk about PGs on the Pool page, but only vaguely 
talk about the number of PGs for the cluster.

Here are some examples of confusing statements with suggested alternatives 
from the online docs:

http://ceph.com/docs/master/rados/operations/pools/

A typical configuration uses approximately 100 placement groups per OSD to 
provide optimal balancing without using up too many computing resources.
-
A typical configuration uses approximately 100 placement groups per OSD for 
all pools to provide optimal balancing without using up too many computing 
resources.


http://ceph.com/docs/master/rados/operations/placement-groups/

It is mandatory to choose the value of pg_num because it cannot be calculated 
automatically. Here are a few values commonly used:
-
It is mandatory to choose the value of pg_num.  Because pg_num depends on the 
planned number of pools in the cluster, it cannot be determined automatically 
on pool creation. Please use this calculator: http://ceph.com/pgcalc/;

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph on Debian Jessie stopped working

2015-04-15 Thread Chad William Seys
Hi All,
Earlier ceph on Debian Jessie was working.  Jessie is running 3.16.7 .

Now when I modprobe rbd , no /dev/rbd appear.

# dmesg | grep -e rbd -e ceph
[   15.814423] Key type ceph registered
[   15.814461] libceph: loaded (mon/osd proto 15/24)
[   15.831092] rbd: loaded
[   22.084573] rbd: no image name provided
[   22.230176] rbd: no image name provided


Some files appear under /sys
ls /sys/devices/rbd
power  uevent

ceph-fuse /mnt/cephfs just hangs.

I haven't changed the ceph config, but possibly there were package updates.  I 
did install a earlier Jessie kernel from a machine which is still working and 
rebooted.  No luck.

Any ideas of what to check next?

Thanks,
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] adding a new pool causes old pool warning pool x has too few pgs

2015-03-27 Thread Chad William Seys
Weird:  After a few hours, health check comes back OK without changing the 
number of PGS for any pools !

 Hi All,
 
   To a Healthy cluster I recently added two pools to ceph, 1 replicated and
   1
 
 ecpool.  Then I made the replicated pool into a cache for the ecpool.
 
   Afterwards ceph health check started complaining about a preexisting pool
 
 having too few pgs.  Previous to adding the new pools there was no warning.
 
   Why does adding new pools cause an old pool to have too few pgs?
 
 Thanks!
 Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG stuck unclean for long time

2015-02-05 Thread Chad William Seys
Anyone know what is going on with this PG?

# ceph health detail
HEALTH_WARN 1 pgs stuck unclean; recovery 735/4844641 objects degraded 
(0.015%); 245/1296706 unfound (0.019%)
pg 21.fd is stuck unclean for 349777.229468, current state active, last acting 
[19,5,15,25]
recovery 735/4844641 objects degraded (0.015%); 245/1296706 unfound (0.019%)

# ceph pg 21.fd query
(output attached)

Cluster history: two OSD on the same host were lost.  Failure domain is host, 
so any PG with replicas  1 should not have been lost.

The PG is from an erasure coded pool with k=2, m=2 .
# ceph osd erasure-code-profile get k2m2
directory=/usr/lib/ceph/erasure-code
k=2
m=2
plugin=jerasure
ruleset-failure-domain=host
technique=reed_sol_van

Thanks for any insight!
Chad.{ state: active+recovering,
  snap_trimq: [],
  epoch: 135269,
  up: [
19,
5,
15,
25],
  acting: [
19,
5,
15,
25],
  actingbackfill: [
5(1),
15(2),
19(0),
25(3)],
  info: { pgid: 21.fds0,
  last_update: 134698'529,
  last_complete: 0'0,
  log_tail: 0'0,
  last_user_version: 6075,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 130437,
  last_epoch_started: 135269,
  last_epoch_clean: 131847,
  last_epoch_split: 0,
  same_up_since: 135260,
  same_interval_since: 135267,
  same_primary_since: 135267,
  last_scrub: 0'0,
  last_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_clean_scrub_stamp: 0.00},
  stats: { version: 134698'529,
  reported_seq: 2245,
  reported_epoch: 135269,
  state: active,
  last_fresh: 2015-02-05 05:49:18.478995,
  last_change: 2015-02-05 05:49:18.478995,
  last_active: 2015-02-05 05:49:18.478995,
  last_clean: 2015-02-01 08:21:29.833949,
  last_became_active: 0.00,
  last_unstale: 2015-02-05 05:49:18.478995,
  mapping_epoch: 135266,
  log_start: 0'0,
  ondisk_log_start: 0'0,
  created: 130437,
  last_epoch_clean: 131847,
  parent: 0.0,
  parent_split_bits: 0,
  last_scrub: 0'0,
  last_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_clean_scrub_stamp: 0.00,
  log_size: 529,
  ondisk_log_size: 529,
  stats_invalid: 0,
  stat_sum: { num_bytes: 2218786816,
  num_objects: 529,
  num_object_clones: 0,
  num_object_copies: 2116,
  num_objects_missing_on_primary: 245,
  num_objects_degraded: 735,
  num_objects_unfound: 245,
  num_objects_dirty: 529,
  num_whiteouts: 0,
  num_read: 0,
  num_read_kb: 0,
  num_write: 529,
  num_write_kb: 2166784,
  num_scrub_errors: 0,
  num_shallow_scrub_errors: 0,
  num_deep_scrub_errors: 0,
  num_objects_recovered: 0,
  num_bytes_recovered: 0,
  num_keys_recovered: 0,
  num_objects_omap: 0,
  num_objects_hit_set_archive: 0},
  stat_cat_sum: {},
  up: [
19,
5,
15,
25],
  acting: [
19,
5,
15,
25],
  up_primary: 19,
  acting_primary: 19},
  empty: 0,
  dne: 0,
  incomplete: 0,
  last_epoch_started: 135269,
  hit_set_history: { current_last_update: 0'0,
  current_last_stamp: 0.00,
  current_info: { begin: 0.00,
  end: 0.00,
  version: 0'0},
  history: []}},
  peer_info: [
{ peer: 5(1),
  pgid: 21.fds1,
  last_update: 134698'529,
  last_complete: 0'0,
  log_tail: 0'0,
  last_user_version: 6075,
  last_backfill: MAX,
  purged_snaps: [],
  history: { epoch_created: 130437,
  last_epoch_started: 135269,
  last_epoch_clean: 131847,
  last_epoch_split: 0,
  same_up_since: 135260,
  same_interval_since: 135267,
  same_primary_since: 135267,
  last_scrub: 0'0,
  last_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_deep_scrub: 0'0,
  last_deep_scrub_stamp: 2015-01-22 17:00:57.846599,
  last_clean_scrub_stamp: 0.00},
  stats: { version: 134698'529,
  reported_seq: 1685,
  reported_epoch: 135255,
  state: peering,
  last_fresh: 2015-02-04 16:43:12.517163,
  last_change: 2015-02-04 

[ceph-users] PG to pool mapping?

2015-02-04 Thread Chad William Seys
Hi all,
   How do I determine which pool a PG belongs to?
   (Also, is it the case that all objects in a PG belong to one pool?)

Thanks!
C.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] verifying tiered pool functioning

2015-01-27 Thread Chad William Seys
Hi Zhang,
  Thanks for the pointer.  That page looks like the commands to set up the 
cache, not how to verify that it is working.
  I think I have been able to see objects (not PGs I guess) moving from the 
cache pool to the storage pool using 'rados df' .  (I haven't run long enough 
to verify yet.)

Thanks again!
Chad.



On Tuesday, January 27, 2015 03:47:53 you wrote:
 Do you mean cache tiering?
 You can refer to http://ceph.com/docs/master/rados/operations/cache-tiering/
 for detail command line. PGs won't migrate from pool to pool.
 
 
 -Original Message-
 From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
 Chad William Seys Sent: Thursday, January 22, 2015 5:40 AM
 To: ceph-users@lists.ceph.com
 Subject: [ceph-users] verifying tiered pool functioning
 
 Hello,
   Could anyone provide a howto verify that a tiered pool is working
 correctly? E.g.
   Command to watch as PG migrate from one pool to another?  (Or determine
 which pool a PG is currently in.) Command to see how much data is in each
 pool (global view of number of PGs I guess)?
 
 Thanks!
 Chad.
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cache pool and storage pool: possible to remove storage pool?

2015-01-27 Thread Chad William Seys
Hi all,
   Documentation explains how to remove the cache pool:
http://ceph.com/docs/master/rados/operations/cache-tiering/
   Anyone know how to remove the storage pool instead?  (E.g. the storage pool 
has wrong parameters.)
   I was hoping to push all the objects into the cache pool and then replace 
storage pool.

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] erasure coded pool why ever k1?

2015-01-22 Thread Chad William Seys
Hi Loic,
 The size of each chunk is object size / K. If you have K=1 and M=2 it will
 be the same as 3 replicas with none of the advantages ;-)

Interesting!  I did not see this explained so explicitly.

So is the general explanation of k and m something like:
k, m: fault tolerance of m+1 replicas, space of 1/k*(m+k) replicas,  plus 
slowness
?

So one should never bother with k=1 b/c:
k=1, m:  fault tolerance of m+1, space of m+1 replicas, plus slowness.
(therefore, just use m+1 replicas!)

but
k=2, m=1:
might be useful instead of 2 replicas b/c it has fault tolerance of 2 
replicas, space of 1/2*(1+2) = 3/2 = 1.5 replicas, plus slowness.

And
k=2, m=2:
which should be as tolerant as 3 replicas,  but take up as much space as 
(1/2)*(2+2)=2 replicas (right?).

Thanks again!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] how to remove storage tier

2015-01-22 Thread Chad William Seys
Hi all,
  I've got a tiered pool arrangement with a replicated pool and an erasure 
pool.  I set it up such that the replicated pool is in front of the erasure 
coded pool.  I know want to change the properties of the erasure coded pool.
  Is there a way of changing switching which erasure profile is used in the ec 
pool?  (It looks possible to change the properties of the erasure profile using 
the --force option, but that is noted to be EXTREMELY DANGEROUS.)
  Possibly safer would be to push all the objects from the ec pool into the 
replicated pool, de-tier the pools, delete the ec pool, create a new ec pool 
with new properties, then re-tier the pools.
  Unfortunately, the documentation I find talks about how to drain the cache 
pool (replicated pool) rather than the other way around.

Any ideas?
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] erasure coded pool why ever k1?

2015-01-21 Thread Chad William Seys
Hello all,
  What reasons would one want k1?
  I read that m determines the number of OSD which can fail before loss.  But 
I don't see explained how to choose k.  Any benefits for choosing k1?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] verifying tiered pool functioning

2015-01-21 Thread Chad William Seys
Hello,
  Could anyone provide a howto verify that a tiered pool is working correctly?
E.g.
  Command to watch as PG migrate from one pool to another?  (Or determine 
which pool a PG is currently in.)
  Command to see how much data is in each pool (global view of number of PGs I 
guess)?

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph PG Incomplete = Cluster unusable

2014-12-29 Thread Chad William Seys
Hi Christian,
   I had a similar problem about a month ago.
   After trying lots of helpful suggestions, I found none of it worked and
I could only delete the affected pools and start over.

  I opened a feature request in the tracker:
http://tracker.ceph.com/issues/10098

  If you find a way, let us know!

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Chad William Seys
Hi Sam,

 Sounds like you needed osd 20.  You can mark osd 20 lost.
 -Sam

Does not work:

# ceph osd lost 20 --yes-i-really-mean-it   

osd.20 is not down or doesn't exist


Also, here is an interesting post which I will follow from October:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-October/044059.html


Hello, all. I got some advice from the IRC channel (thanks bloodice!) that I 
temporarily reduce the min_size of my cluster (size = 2) from 2 down to 1. 
That immediately caused all of my incomplete PGs to start recovering and 
everything seemed to come back OK. I was serving out and RBD from here and 
xfs_repair reported no problems. So... happy ending?

What started this all was that I was altering my CRUSH map causing significant 
rebalancing on my cluster which had size = 2. During this process I lost an 
OSD (osd.10) and eventually ended up with incomplete PGs. Knowing that I only 
lost 1 osd I was pretty sure that I hadn't lost any data I just couldn't get 
the PGs to recover without changing the min_size.


It is good that this worked for him, but it also seems like a bug that it 
worked!  (I.e. ceph should have been able to recover on its own without weird 
workarounds.)

I'll let you know if this works for me!

Thanks,
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] create multiple OSDs without changing CRUSH until one last step

2014-04-10 Thread Chad William Seys
Hi Greg,
  Looks promising...

  I added

[global]
...
mon osd auto mark new in = false

then pushed config to monitor
ceph-deploy  --overwrite-conf config push mon01

then restart monitor
/etc/init.d/ceph restart mon

then tried
ceph-deploy --overwrite-conf disk prepare --zap-disk osd02:sde /dev/null

but it still got added to the osd tree with up and weights which caused data 
redistribution.


Did I miss something?  Does ceph think this OSD is not new for some reason?  
(I have had OSDs with the same number before due to removes/adds...?)

Thanks!
Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fuse or kernel to mount rbd?

2014-04-05 Thread Chad William Seys
 Not to 3.2.  I would recommend running a more recent ubuntu kernel (which
 I *think* the support on 12.04 still) like 3.8 or 3.11.  Those kernels
 should be pretty stable provided the ubuntu kernel guys are keeping up
 with the mainline stable kernels at kernel.org (they generally do).

Thanks!
 How stable would a testing Debian kernel be? (3.13?)

Chad.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com