Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Gregory Farnum
On Thu, May 25, 2017 at 8:39 AM Stuart Harland <
s.harl...@livelinktechnology.net> wrote:

> Has no-one any idea about this? If needed I can produce more information
> or diagnostics on request. I find it hard to believe that we are the only
> people experiencing this, and thus far we have lost about 40 OSDs to
> corruption due to this.
>
> Regards
>
> Stuart Harland
>
>
>
> On 24 May 2017, at 10:32, Stuart Harland 
> wrote:
>
> Hello
>
> I think I’m running into a bug that is described at
> http://tracker.ceph.com/issues/14213 for Hammer.
>
> However I’m running the latest version of Jewel 10.2.7, although I’m in
> the middle of upgrading the cluster (from 10.2.5). At first it was on a
> couple of nodes, but now it seems to be more pervasive.
>
> I have seen this issue with osd_map_cache_size set to 20 as well as 500,
> which I increased to try and compensate for it.
>
> My two questions, are
>
> 1) is this fixed, if so in which version.
>
> The only person who's reported this remotely recently was working on a
FreeBSD port. Other than the one tracker bug you found, errors like this
are usually the result of failing disks, buggy local filesystems, or
incorrect configuration (like turning off barriers).
I assume you didn't just upgrade from a pre-Jewel release that might have
been susceptible to that tracker.



> 2) is there a way to recover the damaged OSD metadata, as I really don’t
> want to keep having to rebuild large numbers of disks based on something
> arbitrary.
>
>
I saw somewhere (check the list archives?) that you may be able to get
around it by removing just the PG which is causing the crash, assuming it
has replicas elsewhere.

But more generally you want to figure out how this is happening. Either
you've got disk state which was previously broken and undetected (which, if
you've been running 10.2.5 on all your OSDs, I don't think is possible), or
you've experienced recent failures which are unlikely Ceph software bugs.
(They might be! But you'd be the only to report them anywhere I can see.)
-Greg


>
>
> SEEK_HOLE is disabled via 'filestore seek data hole' config option
>-31> 2017-05-24 10:23:10.152349 7f24035e2800  0
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features:
> splice is s
> upported
>-30> 2017-05-24 10:23:10.182065 7f24035e2800  0
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features:
> syncfs(2) s
> yscall fully supported (by glibc and kernel)
>-29> 2017-05-24 10:23:10.182112 7f24035e2800  0
> xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is
> disab
> led by conf
>-28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log
> #23079
>-27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0
> #23079
>
>-26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3
> #23078
>
>-25> 2017-05-24 10:23:10.284807 7f24035e2800  0
> filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal
> mode: c
> heckpoint is not enabled
>-24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open
> /var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
> 61577bc98 fs_op_seq 20363902
>-23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing
> committed_seq 20363681 to fs op_seq 20363902
>-21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry --
> not readable
>-20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry --
> not readable
>-19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay:
> end of journal, done.
>-18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-17> 2017-05-24 10:23:10.298470 7f24035e2800  1
> filestore(/var/lib/ceph/osd/txc1-1908) upgrade
>-16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
>-15> 2017-05-24 10:23:10.300096 7f24035e2800  1 
> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
>-14> 2017-05-24 10:23:10.300384 7f24035e2800  1 
> cls/user/cls_user.cc:375: Loaded user class!
>-13> 2017-05-24 10:23:10.300617 7f24035e2800  0 
> cls/hello/cls_hello.cc:305: loading cls_hello
>-12> 2017-05-24 10:23:10.303748 7f24035e2800  1 
> cls/refcount/cls_refcount.cc:232: Loaded refcount class!
>-11> 2017-05-24 10:23:10.304120 7f24035e2800  1 
> cls/version/cls_version.cc:228: Loaded version class!
>-10> 2017-05-24 10:23:10.304439 7f24035e2800  1 
> cls/log/cls_log.cc:317: Loaded log class!
> -9> 2017-05-24 10:23:10.307437 7f24035e2800  1 
> cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
> -8> 2017-05-24 10:23:10.307768 7f24035e2800  1 
> 

[ceph-users] Multi-Tenancy: Network Isolation

2017-05-25 Thread Deepak Naidu
I am trying to gather and understand on how can or has multitenancy solved for 
network interfaces or isolation. I can get ceph under a virtualized env and 
achieve the isolation but my question or though is more on the physical ceph 
deployment.

Is there a way, we can have multiple networks(public interfaces) dedicated to 
tenants so it can guarantee the network isolation(as they will be on different 
subnet) on ceph. I am purely looking for network isolation on physical hardware 
with single ceph cluster.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread Gregory Farnum
You absolutely cannot do this with your monitors -- as David says every
node would have to participate in every monitor decision; the long tails
would be horrifying and I expect it would collapse in ignominious defeat
very quickly.

Your MDSes should be fine since they are indeed just a bunch of standby
daemons at that point. You'd want to consider how that fits with your RAM
requirements though; it's probably not a good deployment decision even
though it would work at the daemon level.
-Greg


On Thu, May 25, 2017 at 8:30 AM David Turner  wrote:

> For the MDS, the primary doesn't hold state data that needs to be replayed
> to a standby.  The information exists in the cluster.  Your setup would be
> 1 Active, 100 Standby.  If the active went down, 1 of the standby's would
> be promoted and read the information from the cluster.
>
> With Mons, it's interesting because of the quorum mechanics.  4 mons is
> worse than 3 mons because of the chance for split brain where 2 of them
> think something is right and the other 2 think it's wrong.  You have no tie
> breaking vote.  Odd numbers are always best and it seems like your proposal
> would regularly have an even number of Mons.  I haven't heard of a
> deployment with more than 5 mons.  I would imagine there are some with 7
> mons out there, but it's not worth the hardware expense in 99.999% of cases.
>
> I'm assuming your question comes from a place of wanting to have 1
> configuration to rule them all and not have multiple types of nodes in your
> ceph deployment scripts.  Just put in the time and do it right.  Have MDS
> servers, have Mons, have OSD nodes, etc.  Once you reach scale, your mons
> are going to need their resources, your OSDs are going to need theirs, your
> RGW will be using more bandwidth, ad infinitum.  That isn't to mention all
> of the RAM that the services will need during any recovery (assume 3x
> memory requirements for most Ceph services when recovering.
>
> Hyper converged clusters are not recommended for production deployments.
> Several people use them, but generally for smaller clusters.  By the time
> you reach dozens and hundreds of servers, you will only cause yourself
> headaches by becoming the special snowflake in the community.  Every time
> you have a problem, the first place to look will be your resource
> contention between Ceph daemons.
>
>
> Back to some of your direct questions.  Not having tested this, but using
> educated guesses... A possible complication of having 100's of Mons would
> be that they all have to agree on a new map causing a LOT more
> communication between your mons which could likely lead to a bottleneck for
> map updates (snapshot creation/deletion, osds going up/down, scrubs
> happening, anything that affects data in a map).  When an MDS fails, I
> don't know how the voting would go for choosing a new Active MDS among 100
> Stand-by's.  That could either go very quickly or take quite a bit longer
> depending on the logic behind the choice.  100's of RGW servers behind an
> LB (I'm assuming) would negate any caching that is happening on the RGW
> servers as multiple accesses to the same file will not likely reach the
> same RGW.
>
> On Thu, May 25, 2017 at 10:40 AM Wes Dillingham <
> wes_dilling...@harvard.edu> wrote:
>
>> How much testing has there been / what are the implications of having a
>> large number of Monitor and Metadata daemons running in a cluster.
>>
>> Thus far I  have deployed all of our Ceph clusters as a single service
>> type per physical machine but I am interested in a use case where we deploy
>> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
>> in one and all a single cluster. I do realize it is somewhat trivial (with
>> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
>> and only expand at the OSD level but I'm still curious.
>>
>> My use case in mind is for backup targets where pools span the entire
>> cluster and am looking to streamline the process for possible rack and
>> stack situations where boxes can just be added in place booted up and they
>> auto-join the cluster as a mon/mds/mgr/osd/rgw.
>>
>> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
>> any testing with very high numbers of each? At the MDS level I would just
>> be looking for 1 Active, 1 Standby-replay and X standby until multiple
>> active MDSs are production ready. Thanks!
>>
>> --
>> Respectfully,
>>
>> Wes Dillingham
>> wes_dilling...@harvard.edu
>> Research Computing | Infrastructure Engineer
>> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

Re: [ceph-users] Bug in OSD Maps

2017-05-25 Thread Stuart Harland
Has no-one any idea about this? If needed I can produce more information or 
diagnostics on request. I find it hard to believe that we are the only people 
experiencing this, and thus far we have lost about 40 OSDs to corruption due to 
this.

Regards 

Stuart Harland


> On 24 May 2017, at 10:32, Stuart Harland  
> wrote:
> 
> Hello
> 
> I think I’m running into a bug that is described at 
> http://tracker.ceph.com/issues/14213  
> for Hammer.
> 
> However I’m running the latest version of Jewel 10.2.7, although I’m in the 
> middle of upgrading the cluster (from 10.2.5). At first it was on a couple of 
> nodes, but now it seems to be more pervasive.
> 
> I have seen this issue with osd_map_cache_size set to 20 as well as 500, 
> which I increased to try and compensate for it.
> 
> My two questions, are 
> 
> 1) is this fixed, if so in which version.
> 2) is there a way to recover the damaged OSD metadata, as I really don’t want 
> to keep having to rebuild large numbers of disks based on something arbitrary.
> 
> 
> SEEK_HOLE is disabled via 'filestore seek data hole' config option
>-31> 2017-05-24 10:23:10.152349 7f24035e2800  0 
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: splice 
> is s
> upported
>-30> 2017-05-24 10:23:10.182065 7f24035e2800  0 
> genericfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_features: 
> syncfs(2) s
> yscall fully supported (by glibc and kernel)
>-29> 2017-05-24 10:23:10.182112 7f24035e2800  0 
> xfsfilestorebackend(/var/lib/ceph/osd/txc1-1908) detect_feature: extsize is 
> disab
> led by conf
>-28> 2017-05-24 10:23:10.182839 7f24035e2800  1 leveldb: Recovering log 
> #23079
>-27> 2017-05-24 10:23:10.284173 7f24035e2800  1 leveldb: Delete type=0 
> #23079
> 
>-26> 2017-05-24 10:23:10.284223 7f24035e2800  1 leveldb: Delete type=3 
> #23078
> 
>-25> 2017-05-24 10:23:10.284807 7f24035e2800  0 
> filestore(/var/lib/ceph/osd/txc1-1908) mount: enabling WRITEAHEAD journal 
> mode: c
> heckpoint is not enabled
>-24> 2017-05-24 10:23:10.285581 7f24035e2800  2 journal open 
> /var/lib/ceph/osd/txc1-1908/journal fsid 8dada68b-0d1c-4f2a-bc96-1d8
> 61577bc98 fs_op_seq 20363902
>-23> 2017-05-24 10:23:10.289523 7f24035e2800  1 journal _open 
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-22> 2017-05-24 10:23:10.293733 7f24035e2800  2 journal open advancing 
> committed_seq 20363681 to fs op_seq 20363902
>-21> 2017-05-24 10:23:10.293743 7f24035e2800  2 journal read_entry -- not 
> readable
>-20> 2017-05-24 10:23:10.293744 7f24035e2800  2 journal read_entry -- not 
> readable
>-19> 2017-05-24 10:23:10.293745 7f24035e2800  3 journal journal_replay: 
> end of journal, done.
>-18> 2017-05-24 10:23:10.297605 7f24035e2800  1 journal _open 
> /var/lib/ceph/osd/txc1-1908/journal fd 18: 5367660544 bytes, block
> size 4096 bytes, directio = 1, aio = 1
>-17> 2017-05-24 10:23:10.298470 7f24035e2800  1 
> filestore(/var/lib/ceph/osd/txc1-1908) upgrade
>-16> 2017-05-24 10:23:10.298509 7f24035e2800  2 osd.1908 0 boot
>-15> 2017-05-24 10:23:10.300096 7f24035e2800  1  
> cls/replica_log/cls_replica_log.cc:141: Loaded replica log class!
>-14> 2017-05-24 10:23:10.300384 7f24035e2800  1  
> cls/user/cls_user.cc:375: Loaded user class!
>-13> 2017-05-24 10:23:10.300617 7f24035e2800  0  
> cls/hello/cls_hello.cc:305: loading cls_hello
>-12> 2017-05-24 10:23:10.303748 7f24035e2800  1  
> cls/refcount/cls_refcount.cc:232: Loaded refcount class!
>-11> 2017-05-24 10:23:10.304120 7f24035e2800  1  
> cls/version/cls_version.cc:228: Loaded version class!
>-10> 2017-05-24 10:23:10.304439 7f24035e2800  1  
> cls/log/cls_log.cc:317: Loaded log class!
> -9> 2017-05-24 10:23:10.307437 7f24035e2800  1  
> cls/rgw/cls_rgw.cc:3359: Loaded rgw class!
> -8> 2017-05-24 10:23:10.307768 7f24035e2800  1  
> cls/timeindex/cls_timeindex.cc:259: Loaded timeindex class!
> -7> 2017-05-24 10:23:10.307927 7f24035e2800  0  
> cls/cephfs/cls_cephfs.cc:202: loading cephfs_size_scan
> -6> 2017-05-24 10:23:10.308086 7f24035e2800  1  
> cls/statelog/cls_statelog.cc:306: Loaded log class!
> -5> 2017-05-24 10:23:10.315241 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320, adjusting msgr requires for
>  clients
> -4> 2017-05-24 10:23:10.315258 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320 was 8705, adjusting msgr req
> uires for mons
> -3> 2017-05-24 10:23:10.315267 7f24035e2800  0 osd.1908 863035 crush map 
> has features 2234490552320, adjusting msgr requires for
>  osds
> -2> 2017-05-24 10:23:10.441444 7f24035e2800  0 osd.1908 863035 load_pgs
> -1> 2017-05-24 10:23:10.442608 7f24035e2800 -1 osd.1908 863035 load_pgs: 
> have pgid 11.3f5a at epoch 863078, but missing map.  Crashing.
>  0> 2017-05-24 

Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-25 Thread Jake Grimmett
Hi John,

Sorry, I'm not sure what the largest file is on our systems.

We have lots of data sets that are ~8TB uncompressed, these typically
compress 3:1. Thus if the users wants a single file, we hit 3TB.

I'm rsyncing 360TB of data from an Isilon to cephfs, it'll be
interesting to see how cephfs copes with 400 million files...

thanks again for your help,

Jake

On 24/05/17 20:30, John Spray wrote:
> On Wed, May 24, 2017 at 8:17 PM, Jake Grimmett  wrote:
>> Hi John,
>> That's great, thank you so much for the advice.
>> Some of our users have massive files so this would have been a big block.
>>
>> Is there any particular reason for having a file size limit?
> 
> Without the size limit, a user can create a file of arbitrary size
> (without necessarily writing any data to it), such that when the MDS
> came to e.g. delete it, it would have to do a ridiculously large
> number of operations to check if any of the objects within the range
> that could exist (according to the file size) really existed.
> 
> The idea is that we don't want to prevent users creating files big
> enough to hold their data, but we don't want to let them just tell the
> system "oh hey this file that I never wrote anything to is totally an
> exabyte in size, have fun enumerating the objects when you try to
> delete it lol".
> 
> 1TB is a bit conservative these days -- that limit was probably set
> circa 10 years ago and maybe we should revist it.  As an datapoint,
> what's your largest file?
> 
>> Would setting
>> max_file_size to 0 remove all limits?
> 
> Nope, it would limit you to only creating empty files :-)
> 
> It's a 64 bit field, so you can set it to something huge if you like.
> 
> John
> 
>>
>> Thanks again,
>>
>> Jake
>>
>> On 24 May 2017 19:45:52 BST, John Spray  wrote:
>>>
>>> On Wed, May 24, 2017 at 7:41 PM, Brady Deetz  wrote:

  Are there any repercussions to configuring this on an existing large fs?
>>>
>>>
>>> No.  It's just a limit that's enforced at the point of appending to
>>> files or setting their size, it doesn't affect how anything is stored.
>>>
>>> John
>>>
  On Wed, May 24, 2017 at 1:36 PM, John Spray  wrote:
>
>
>  On Wed, May 24, 2017 at 7:19 PM, Jake Grimmett 
>  wrote:
>>
>>  Dear All,
>>
>>  I've been testing out cephfs, and bumped into what appears to be an
>>  upper
>>  file size limit of ~1.1TB
>>
>>  e.g:
>>
>>  [root@cephfs1 ~]# time rsync --progress -av /ssd/isilon_melis.tar
>>  /ceph/isilon_melis.tar
>>  sending incremental file list
>>  isilon_melis.tar
>>  1099341824000  54%  237.51MB/s1:02:05
>>  rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]:
>>  Broken pipe (32)
>>  rsync: write failed on "/ceph/isilon_melis.tar": File too large (27)
>>  rsync error: error in file IO (code 11) at receiver.c(322)
>>  [receiver=3.0.9]
>>  rsync: connection unexpectedly closed (28 bytes received so far)
>>  [sender]
>>  rsync error: error in rsync protocol data stream (code 12) at
>> io.c(605)
>>  [sender=3.0.9]
>>
>>  Firstly, is this expected?
>
>
>  CephFS has a configurable maximum file size, it's 1TB by default.
>
>  Change it with:
>ceph fs set  max_file_size 
>
>  John
>
>
>
>
>
>>
>>  If not, then does anyone have any suggestions on where to start
>> digging?
>>
>>  I'm using erasure encoding (4+1, 50 x 8TB drives over 5 servers), with
>>  an
>>  nvme hot pool of 4 drives (2 x replication).
>>
>>  I've tried both Kraken (release), and the latest Luminous Dev.
>>
>>  many thanks,
>>
>>  Jake
>>  --
>>
>> 
>>
>>  ceph-users mailing list
>>  ceph-users@lists.ceph.com
>>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> 
>
>  ceph-users mailing list
>  ceph-users@lists.ceph.com
>  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>> --
>> Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Dr Jake Grimmett
Head Of Scientific Computing
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.
Phone 01223 267019
Mobile 0776 9886539
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread David Turner
For the MDS, the primary doesn't hold state data that needs to be replayed
to a standby.  The information exists in the cluster.  Your setup would be
1 Active, 100 Standby.  If the active went down, 1 of the standby's would
be promoted and read the information from the cluster.

With Mons, it's interesting because of the quorum mechanics.  4 mons is
worse than 3 mons because of the chance for split brain where 2 of them
think something is right and the other 2 think it's wrong.  You have no tie
breaking vote.  Odd numbers are always best and it seems like your proposal
would regularly have an even number of Mons.  I haven't heard of a
deployment with more than 5 mons.  I would imagine there are some with 7
mons out there, but it's not worth the hardware expense in 99.999% of cases.

I'm assuming your question comes from a place of wanting to have 1
configuration to rule them all and not have multiple types of nodes in your
ceph deployment scripts.  Just put in the time and do it right.  Have MDS
servers, have Mons, have OSD nodes, etc.  Once you reach scale, your mons
are going to need their resources, your OSDs are going to need theirs, your
RGW will be using more bandwidth, ad infinitum.  That isn't to mention all
of the RAM that the services will need during any recovery (assume 3x
memory requirements for most Ceph services when recovering.

Hyper converged clusters are not recommended for production deployments.
Several people use them, but generally for smaller clusters.  By the time
you reach dozens and hundreds of servers, you will only cause yourself
headaches by becoming the special snowflake in the community.  Every time
you have a problem, the first place to look will be your resource
contention between Ceph daemons.


Back to some of your direct questions.  Not having tested this, but using
educated guesses... A possible complication of having 100's of Mons would
be that they all have to agree on a new map causing a LOT more
communication between your mons which could likely lead to a bottleneck for
map updates (snapshot creation/deletion, osds going up/down, scrubs
happening, anything that affects data in a map).  When an MDS fails, I
don't know how the voting would go for choosing a new Active MDS among 100
Stand-by's.  That could either go very quickly or take quite a bit longer
depending on the logic behind the choice.  100's of RGW servers behind an
LB (I'm assuming) would negate any caching that is happening on the RGW
servers as multiple accesses to the same file will not likely reach the
same RGW.

On Thu, May 25, 2017 at 10:40 AM Wes Dillingham 
wrote:

> How much testing has there been / what are the implications of having a
> large number of Monitor and Metadata daemons running in a cluster.
>
> Thus far I  have deployed all of our Ceph clusters as a single service
> type per physical machine but I am interested in a use case where we deploy
> dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
> in one and all a single cluster. I do realize it is somewhat trivial (with
> config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
> and only expand at the OSD level but I'm still curious.
>
> My use case in mind is for backup targets where pools span the entire
> cluster and am looking to streamline the process for possible rack and
> stack situations where boxes can just be added in place booted up and they
> auto-join the cluster as a mon/mds/mgr/osd/rgw.
>
> So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
> any testing with very high numbers of each? At the MDS level I would just
> be looking for 1 Active, 1 Standby-replay and X standby until multiple
> active MDSs are production ready. Thanks!
>
> --
> Respectfully,
>
> Wes Dillingham
> wes_dilling...@harvard.edu
> Research Computing | Infrastructure Engineer
> Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Upper limit of MONs and MDSs in a Cluster

2017-05-25 Thread Wes Dillingham
How much testing has there been / what are the implications of having a
large number of Monitor and Metadata daemons running in a cluster.

Thus far I  have deployed all of our Ceph clusters as a single service type
per physical machine but I am interested in a use case where we deploy
dozens/hundreds? of boxes each of which would be a mon,mds,mgr,osd,rgw all
in one and all a single cluster. I do realize it is somewhat trivial (with
config mgmt and all) to dedicate a couple of lean boxes as MDS's and MONs
and only expand at the OSD level but I'm still curious.

My use case in mind is for backup targets where pools span the entire
cluster and am looking to streamline the process for possible rack and
stack situations where boxes can just be added in place booted up and they
auto-join the cluster as a mon/mds/mgr/osd/rgw.

So does anyone run clusters with dozen's of MONs' and/or MDS or aware of
any testing with very high numbers of each? At the MDS level I would just
be looking for 1 Active, 1 Standby-replay and X standby until multiple
active MDSs are production ready. Thanks!

-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Infrastructure Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 102
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Prometheus RADOSGW usage exporter

2017-05-25 Thread Berant Lemmenes
Hello all,

I've created prometheus exporter that scrapes the RADOSGW Admin Ops API and
exports the usage information for all users and buckets. This is my first
prometheus exporter so if anyone has feedback I'd greatly appreciate it.
I've tested it against Hammer, and will shortly test against Jewel; though
looking at the docs it should work fine for Jewel as well.

https://github.com/blemmenes/radosgw_usage_exporter


Sample output:
radosgw_usage_successful_ops_total{bucket="shard0",category="create_bucket",owner="testuser"}
1.0
radosgw_usage_successful_ops_total{bucket="shard0",category="delete_obj",owner="testuser"}
1094978.0
radosgw_usage_successful_ops_total{bucket="shard0",category="list_bucket",owner="testuser"}
2276.0
radosgw_usage_successful_ops_total{bucket="shard0",category="put_obj",owner="testuser"}
1094978.0
radosgw_usage_successful_ops_total{bucket="shard0",category="stat_bucket",owner="testuser"}
20.0
radosgw_usage_received_bytes_total{bucket="shard0",category="create_bucket",owner="testuser"}
0.0
radosgw_usage_received_bytes_total{bucket="shard0",category="delete_obj",owner="testuser"}
0.0
radosgw_usage_received_bytes_total{bucket="shard0",category="list_bucket",owner="testuser"}
0.0
radosgw_usage_received_bytes_total{bucket="shard0",category="put_obj",owner="testuser"}
6352678.0
radosgw_usage_received_bytes_total{bucket="shard0",category="stat_bucket",owner="testuser"}
0.0
radosgw_usage_sent_bytes_total{bucket="shard0",category="create_bucket",owner="testuser"}
19.0
radosgw_usage_sent_bytes_total{bucket="shard0",category="delete_obj",owner="testuser"}
0.0
radosgw_usage_sent_bytes_total{bucket="shard0",category="list_bucket",owner="testuser"}
638339458.0
radosgw_usage_sent_bytes_total{bucket="shard0",category="put_obj",owner="testuser"}
79.0
radosgw_usage_sent_bytes_total{bucket="shard0",category="stat_bucket",owner="testuser"}
380.0
radosgw_usage_ops_total{bucket="shard0",category="create_bucket",owner="testuser"}
1.0
radosgw_usage_ops_total{bucket="shard0",category="delete_obj",owner="testuser"}
1094978.0
radosgw_usage_ops_total{bucket="shard0",category="list_bucket",owner="testuser"}
2276.0
radosgw_usage_ops_total{bucket="shard0",category="put_obj",owner="testuser"}
1094979.0
radosgw_usage_ops_total{bucket="shard0",category="stat_bucket",owner="testuser"}
20.0


Thanks,
Berant
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-25 Thread John Spray
On Thu, May 25, 2017 at 2:14 PM, Ken Dreyer  wrote:
> On Wed, May 24, 2017 at 12:36 PM, John Spray  wrote:
>>
>> CephFS has a configurable maximum file size, it's 1TB by default.
>>
>> Change it with:
>>   ceph fs set  max_file_size 
>
> How does this command relate to "ceph mds set max_file_size" ? Is it 
> different?

The "ceph mds set" command is the deprecated version, from before
there was more than one filesystem per cluster.  It operates on
whichever filesystem is marked as the default (see ceph fs
set-default)

John

>
> I've put some of the information in this thread into a docs PR:
> https://github.com/ceph/ceph/pull/15287
>
> - Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs file size limit 0f 1.1TB?

2017-05-25 Thread Ken Dreyer
On Wed, May 24, 2017 at 12:36 PM, John Spray  wrote:
>
> CephFS has a configurable maximum file size, it's 1TB by default.
>
> Change it with:
>   ceph fs set  max_file_size 

How does this command relate to "ceph mds set max_file_size" ? Is it different?

I've put some of the information in this thread into a docs PR:
https://github.com/ceph/ceph/pull/15287

- Ken
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com