Re: [ceph-users] Cephfs no space on device error

2018-06-12 Thread Deepak Naidu
Not sure if you have been helped, but this is know issue if you have many 
files/subfolder. It depends on what cephFS version you are running. This should 
have been resolved in the Red Hat version 3 of ceph which is based on Luminous.

http://tracker.ceph.com/issues/19438


https://access.redhat.com/solutions/3096041

Pasting the article info.

Issue
We need a folder containing more than 10 files.
Currently Ceph gives "No space left on device" when adding to a folder already 
containing 99.999 items

Resolution
Inject the increased value "mds_bal_fragment_size_max" into running MDS daemon 
with:
# ceph --admin-daemon /var/run/ceph/ config set 
mds_bal_fragment_size_max 50  # use value as you need

and test create more files.
Add this parameter to the ceph.conf under tag [mds] to ensure the value will 
stay permanent after process restart:

[mds]
mds_bal_fragment_size_max = 50  # use value of your choice


Diagnostic Steps
There is a parameter "mds_bal_fragment_size_max" check. This limits the number 
of entries that the MDS will create in a single directory fragment.

mds bal fragment size max
Description:The maximum size of a fragment before any new entries are 
rejected with ENOSPC.
Type:   32-bit Integer
Default:10  <---

Collect the run-time config of MDS daemon or grep for the 
mds_bal_fragment_size_max:
# ceph --admin-daemon /var/run/ceph/ config show > 
mds.config.show
# ceph --admin-daemon /var/run/ceph/ config show | grep 
mds_bal_fragment_size_max


--
Deepak



-Original Message-
From: ceph-users  On Behalf Of Doug Bell
Sent: Wednesday, May 30, 2018 5:36 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Cephfs no space on device error

I am new to Ceph and have built a small Ceph instance on 3 servers.  I realize 
the configuration is probably not ideal but I’d like to understand an error I’m 
getting.

Ceph hosts are cm1, cm2, cm3.  Cephfs is mounted with ceph.fuse on a server c1. 
 I am attempting to perform a simple cp-rp from one directory tree already in 
cephfs to another directory also inside of cephfs.  The directory tree is 2740 
files totaling 93G.  Approximately 3/4 of the way through the copy, the 
following error occurs:  "cp: failed to close ‘': No space left on 
device”  The odd thing is that it seems to finish the copy, as the final 
directory sizes are the same.  But scripts attached to the process see an error 
so it is causing a problem.

Any idea what is happening?  I have watched all of the ceph logs on one of the 
ceph servers and haven’t seen anything.

Here is some of the configuration.  The names actually aren’t obfuscated, they 
really are that generic.  IP Addresses are altered though.

# ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data ]

# ceph status
 cluster:
   id: c14e77f1-9898-48d8-8a52-cd1f1c5bf689
   health: HEALTH_WARN
   1 MDSs behind on trimming

 services:
   mon: 3 daemons, quorum cm1,cm3,cm2
   mgr: cm3(active), standbys: cm2, cm1
   mds: cephfs-1/1/1 up  {0=cm1=up:active}, 1 up:standby-replay, 1 up:standby
   osd: 7 osds: 7 up, 7 in

 data:
   pools:   2 pools, 256 pgs
   objects: 377k objects, 401 GB
   usage:   1228 GB used, 902 GB / 2131 GB avail
   pgs: 256 active+clean

 io:
   client:   852 B/s rd, 2 op/s rd, 0 op/s wr

# ceph osd status
++--+---+---++-++-+---+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |   state   |
++--+---+---++-++-+---+
| 0  | cm1  |  134G |  165G |0   | 0   |0   | 0   | exists,up |
| 1  | cm1  |  121G |  178G |0   | 0   |0   | 0   | exists,up |
| 2  | cm2  |  201G | 98.3G |0   | 0   |1   |90   | exists,up |
| 3  | cm2  |  207G | 92.1G |0   | 0   |0   | 0   | exists,up |
| 4  | cm3  |  217G | 82.8G |0   | 0   |0   | 0   | exists,up |
| 5  | cm3  |  192G |  107G |0   | 0   |0   | 0   | exists,up |
| 6  | cm1  |  153G |  177G |0   | 0   |1   |16   | exists,up |
++--+---+---++-++-+—+

# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZE  USE   AVAIL   %USE  VAR  PGS
0   ssd 0.29300  1.0  299G  134G165G 44.74 0.78  79
1   ssd 0.29300  1.0  299G  121G178G 40.64 0.70  75
6   ssd 0.32370  1.0  331G  153G177G 46.36 0.80 102
2   ssd 0.29300  1.0  299G  201G 100754M 67.20 1.17 129
3   ssd 0.29300  1.0  299G  207G  94366M 69.28 1.20 127
4   ssd 0.29300  1.0  299G  217G  84810M 72.39 1.26 131
5   ssd 0.29300  1.0  299G  192G107G 64.15 1.11 125
   TOTAL 2131G 1228G902G 57.65
MIN/MAX VAR: 0.70/1.26  STDDEV: 12.36

# ceph fs get cephfs
Filesystem 'cephfs' (1)
fs_name cephfs
epoch   1047
flags   c
created 2018-03-20 13:58:51.860813
modified2018-03-20 13:58:51.860813
tableserver 0
root0

Re: [ceph-users] Cephfs MDS slow requests

2018-03-15 Thread Deepak Naidu
David, few inputs based on my working experience on cephFS. Might or might not 
be relevant to the current issue seen in your cluster.


  1.  Create Metadata pool on NVMe. Folks can claim not needed, but I have seen 
worst perf when on HDD though the Metadata size is very small.
  2.  In cephFS, ensure MDS node has enough RAM allocated for MDS cache(this 
will not improve drastic perf. But some extent). On side note, MDS has some bug 
related to oversubscribed memory usage regardless of the cache settings if you 
have more than 64GB RAM. Take a look. http://tracker.ceph.com/issues/21402

http://tracker.ceph.com/issues/22599

https://bugzilla.redhat.com/show_bug.cgi?id=1531679

  1.  cephFS is not great for small files(in KB’s) but works great with large 
file sizes(MB or GB’s). So using like filer(NFS/SMB) use-case needs 
administration attention.
  2.  Next thing to ensure if the large # of inode/file counts in cephFS. 
Ensure dirfrag, active/active MDS etc tunable are implemented on the luminous 
version you used on filestore or asking users not to store multi-million of 
small files in one dir(it’s debatable scenario, not sure how much control you 
have over you customer use-case)
  3.  Always use kernel mounts. ceph-fuse are super slow(3-5 times than kernel 
mounts), I hope you may know this.


--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of David C
Sent: Wednesday, March 14, 2018 10:46 AM
To: John Spray 
Cc: ceph-users 
Subject: Re: [ceph-users] Cephfs MDS slow requests

Thanks, John. I'm pretty sure the root of my slow OSD issues is filestore 
subfolder splitting.

On Wed, Mar 14, 2018 at 2:17 PM, John Spray 
> wrote:
On Tue, Mar 13, 2018 at 7:17 PM, David C 
> wrote:
> Hi All
>
> I have a Samba server that is exporting directories from a Cephfs Kernel
> mount. Performance has been pretty good for the last year but users have
> recently been complaining of short "freezes", these seem to coincide with
> MDS related slow requests in the monitor ceph.log such as:
>
>> 2018-03-13 13:34:58.461030 osd.15 osd.15 
>> 10.10.10.211:6812/13367 5752 :
>> cluster [WRN] slow request 31.834418 seconds old, received at 2018-03-13
>> 13:34:26.626474: osd_repop(mds.0.5495:810644 3.3e e14085/14019
>> 3:7cea5bac:::10001a88b8f.:head v 14085'846936) currently commit_sent
>> 2018-03-13 13:34:59.461270 osd.15 osd.15 
>> 10.10.10.211:6812/13367 5754 :
>> cluster [WRN] slow request 32.832059 seconds old, received at 2018-03-13
>> 13:34:26.629151: osd_repop(mds.0.5495:810671 2.dc2 e14085/14020
>> 2:43bdcc3f:::10001e91a91.:head v 14085'21394) currently commit_sent
>> 2018-03-13 14:23:57.409427 osd.30 osd.30 
>> 10.10.10.212:6824/14997 5708 :
>> cluster [WRN] slow request 30.536832 seconds old, received at 2018-03-13
>> 14:23:26.872513: osd_repop(mds.0.5495:865403 2.fb6 e14085/14077
>> 2:6df955ef:::10001e93542.00c4:head v 14085'21296) currently commit_sent
>> 2018-03-13 14:23:57.409449 osd.30 osd.30 
>> 10.10.10.212:6824/14997 5709 :
>> cluster [WRN] slow request 30.529640 seconds old, received at 2018-03-13
>> 14:23:26.879704: osd_repop(mds.0.5495:865407 2.595 e14085/14019
>> 2:a9a56101:::10001e93542.00c8:head v 14085'20437) currently commit_sent
>> 2018-03-13 14:23:57.409453 osd.30 osd.30 
>> 10.10.10.212:6824/14997 5710 :
>> cluster [WRN] slow request 30.503138 seconds old, received at 2018-03-13
>> 14:23:26.906207: osd_repop(mds.0.5495:865423 2.ea e14085/14055
>> 2:57096bbf:::10001e93542.00d8:head v 14085'21147) currently commit_sent
>
>
> --
>
> Looking in the MDS log, with debug set to 4, it's full of "setfilelockrule
> 1" and "setfilelockrule 2":
>
>> 2018-03-13 14:23:00.446905 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162337
>> setfilelockrule 1, type 4, owner 14971048052668053939, pid 7, start 120,
>> length 1, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=1155,
>> caller_gid=1131{}) v2
>> 2018-03-13 14:23:00.447050 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162338
>> setfilelockrule 2, type 4, owner 14971048137043556787, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447258 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162339
>> setfilelockrule 2, type 4, owner 14971048137043550643, pid 4632, start 0,
>> length 0, wait 0 #0x10001e8dc37 2018-03-13 14:22:58.838521 caller_uid=0,
>> caller_gid=0{}) v2
>> 2018-03-13 14:23:00.447393 7fde43e73700  4 mds.0.server
>> handle_client_request client_request(client.9174621:141162340
>> 

Re: [ceph-users] rm: cannot remove dir and files (cephfs)

2018-02-21 Thread Deepak Naidu
>> rm: cannot remove '/design/4695/8/6-50kb.jpg': No space left on device
“No space left on device” issue typically in ceph FS might be caused if you 
have files > 1million(10) in “single directory”. To mitigate this try 
increasing the "mds_bal_fragment_size_max" to a higher value, example 7 million 
(70)

[mds]
mds_bal_fragment_size_max = 70

I am not going detail here, that said, there are many other tuning params 
including enabling dir frag, multiple MDS etc. It seems your CEPH version is 
10.2.10(Jewel). Luminous(12.2.x) support better with multiple MDS and dir_frag 
etc. Jewel some of these options might be experimental features you might need 
todo bit reading related to cephFS.  Just a note of advise based on my 
experience, cephFS is ideal for large files in MB/GB is not great for small 
files(in kbs). Also split your files into multiple sub-dirs to avoid the 
similar issue or any likewise.

--
Deepak



From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of ??
Sent: Friday, February 09, 2018 2:35 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] rm: cannot remove dir and files (cephfs)

ceph version 10.2.10

ceph -s
cluster 97f833aa-cc6f-41d5-bf82-bda5c09fd664
health HEALTH_OK
monmap e3: 3 mons at 
{web23=192.168.65.24:6789/0,web25=192.168.65.55:6789/0,web26=192.168.65.56:6789/0}
election epoch 1198, quorum 0,1,2 web23,web25,web26
fsmap e464: 1/1/1 up {0=web26=up:active}, 3 up:standby
osdmap e325: 4 osds: 4 up, 4 in
flags sortbitwise,require_jewel_osds
pgmap v42475: 128 pgs, 3 pools, 274 GB data, 1710 kobjects
854 GB used, 2939 GB / 3793 GB avail
128 active+clean
client io 181 kB/s wr, 0 op/s rd, 5 op/s wr

ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
3793G 2941G 851G 22.46
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
rbd 0 0 0 822G 0
cephfs_metadata 1 17968k 0 822G 101458
cephfs_data 2 273G 24.99 822G 1643059

kernel mount cephfs
grep design /etc/fstab
192.168.65.24:,192.168.65.55:,192.168.65.56:/ /design ceph 
rw,relatime,name=design,secret=...,_netdev,noatime 0 0

I can not delete same files and dir.
rm -rf /design/4*
rm: cannot remove '/design/4695/8/6-50kb.jpg': No space left on device
rm: cannot remove '/design/4695/8/9-300kb.png': No space left on device
rm: cannot remove '/design/4695/8/0-300kb.png': No space left on device

What ideas?
help




---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large amount of files - cephfs?

2017-09-27 Thread Deepak Naidu
Josef,  my comments based on experience  with cephFS(Jewel with 1MDS) 
community(free) version.

* cephFS(Jewel) considering 1 MDS(stable) performs horrible with "small million 
KB size files", even after MDS cache, dir frag tuning etc.
* cephFS(Jewel) considering 1 MDS(stable) performs great for "large GB/TB 
files" ie large IO, but still small inodes.
* Your best bet is object storage interface (S3/SWIFT API or LIBRADOS API)
* Multiple MDS and dir frag in Jewel is considered unstable(experimental 
feature).
* For testing, you can try Luminous(multiple active/active MDS) with default 
dir frag enabled, but its just got stable couple of weeks back. So keep caution 
putting PROD data on "not battle tested" versions, unless you have a backup 
strategy.

--
Deepak

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Josef 
Zelenka
Sent: Wednesday, September 27, 2017 4:57 AM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] Large amount of files - cephfs?

Hi,

we are currently working on a ceph solution for one of our customers. 
They run a file hosting and they need to store approximately 100 million of 
pictures(thumbnails). Their current code works with FTP, that they use as a 
storage. We thought that we could use cephfs for this, but i am not sure how it 
would behave with that many files, how would the performance be affected etc. 
Is cephfs useable in this scenario, or would radosgw+swift be better(they'd 
likely have to rewrite some of the code, so we'd prefer not to do this)? We 
already have some experience with cephfs for storing bigger files, streaming 
etc so i'm not completely new to this, but i thought it'd be better to ask more 
experiened users. Some advice on this would be greatly appreciated, thanks,

Josef

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph release cadence

2017-09-07 Thread Deepak Naidu
>> Maybe I missed something, but I think Ceph does not support LTS releases for 
>> 3 years.
Yes, you are correct but it averages to 18mths sometime I see 20mths(Hammer). 
But anything with 1yr release cycle is not worth the time and having near 3yr 
support model is best for PROD.

http://docs.ceph.com/docs/master/releases/

--
Deepak

-Original Message-
From: Henrik Korkuc [mailto:li...@kirneh.eu] 
Sent: Wednesday, September 06, 2017 10:50 PM
To: Deepak Naidu; Sage Weil; ceph-de...@vger.kernel.org; 
ceph-maintain...@ceph.com; ceph-us...@ceph.com
Subject: Re: [ceph-users] Ceph release cadence

On 17-09-07 02:42, Deepak Naidu wrote:
> Hope collective feedback helps. So here's one.
>
>>> - Not a lot of people seem to run the "odd" releases (e.g., infernalis, 
>>> kraken).
> I think the more obvious reason companies/users wanting to use CEPH will 
> stick with LTS versions as it models the 3yr  support cycle.
Maybe I missed something, but I think Ceph does not support LTS releases for 3 
years.

>>> * Drop the odd releases, and aim for a ~9 month cadence. This splits the 
>>> difference between the current even/odd pattern we've been doing.
> Yes, provided an easy upgrade process.
>
>
> --
> Deepak
>
>
>
>
> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf 
> Of Sage Weil
> Sent: Wednesday, September 06, 2017 8:24 AM
> To: ceph-de...@vger.kernel.org; ceph-maintain...@ceph.com; 
> ceph-us...@ceph.com
> Subject: [ceph-users] Ceph release cadence
>
> Hi everyone,
>
> Traditionally, we have done a major named "stable" release twice a year, and 
> every other such release has been an "LTS" release, with fixes backported for 
> 1-2 years.
>
> With kraken and luminous we missed our schedule by a lot: instead of 
> releasing in October and April we released in January and August.
>
> A few observations:
>
> - Not a lot of people seem to run the "odd" releases (e.g., infernalis, 
> kraken).  This limits the value of actually making them.  It also means that 
> those who *do* run them are running riskier code (fewer users -> more bugs).
>
> - The more recent requirement that upgrading clusters must make a stop 
> at each LTS (e.g., hammer -> luminous not supported, must go hammer -> 
> jewel
> -> lumninous) has been hugely helpful on the development side by 
> -> reducing
> the amount of cross-version compatibility code to maintain and reducing the 
> number of upgrade combinations to test.
>
> - When we try to do a time-based "train" release cadence, there always seems 
> to be some "must-have" thing that delays the release a bit.  This doesn't 
> happen as much with the odd releases, but it definitely happens with the LTS 
> releases.  When the next LTS is a year away, it is hard to suck it up and 
> wait that long.
>
> A couple of options:
>
> * Keep even/odd pattern, and continue being flexible with release 
> dates
>
>+ flexible
>- unpredictable
>- odd releases of dubious value
>
> * Keep even/odd pattern, but force a 'train' model with a more regular 
> cadence
>
>+ predictable schedule
>- some features will miss the target and be delayed a year
>
> * Drop the odd releases but change nothing else (i.e., 12-month 
> release
> cadence)
>
>+ eliminate the confusing odd releases with dubious value
>   
> * Drop the odd releases, and aim for a ~9 month cadence. This splits the 
> difference between the current even/odd pattern we've been doing.
>
>+ eliminate the confusing odd releases with dubious value
>+ waiting for the next release isn't quite as bad
>- required upgrades every 9 months instead of ever 12 months
>
> * Drop the odd releases, but relax the "must upgrade through every LTS" to 
> allow upgrades across 2 versions (e.g., luminous -> mimic or luminous -> 
> nautilus).  Shorten release cycle (~6-9 months).
>
>+ more flexibility for users
>+ downstreams have greater choice in adopting an upstrema release
>- more LTS branches to maintain
>- more upgrade paths to consider
>
> Other options we should consider?  Other thoughts?
>
> Thanks!
> sage
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> --
> - This email message is for the sole use of the intended 
> recipient(s) and may contain confidential information.  Any 
> unauthorized review, use, disclosure or distribution is pro

Re: [ceph-users] Ceph release cadence

2017-09-06 Thread Deepak Naidu
Hope collective feedback helps. So here's one.

>>- Not a lot of people seem to run the "odd" releases (e.g., infernalis, 
>>kraken).  
I think the more obvious reason companies/users wanting to use CEPH will stick 
with LTS versions as it models the 3yr  support cycle.

>>* Drop the odd releases, and aim for a ~9 month cadence. This splits the 
>>difference between the current even/odd pattern we've been doing.
Yes, provided an easy upgrade process.


--
Deepak




-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
Weil
Sent: Wednesday, September 06, 2017 8:24 AM
To: ceph-de...@vger.kernel.org; ceph-maintain...@ceph.com; ceph-us...@ceph.com
Subject: [ceph-users] Ceph release cadence

Hi everyone,

Traditionally, we have done a major named "stable" release twice a year, and 
every other such release has been an "LTS" release, with fixes backported for 
1-2 years.

With kraken and luminous we missed our schedule by a lot: instead of releasing 
in October and April we released in January and August.

A few observations:

- Not a lot of people seem to run the "odd" releases (e.g., infernalis, 
kraken).  This limits the value of actually making them.  It also means that 
those who *do* run them are running riskier code (fewer users -> more bugs).

- The more recent requirement that upgrading clusters must make a stop at each 
LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel 
-> lumninous) has been hugely helpful on the development side by 
-> reducing
the amount of cross-version compatibility code to maintain and reducing the 
number of upgrade combinations to test.

- When we try to do a time-based "train" release cadence, there always seems to 
be some "must-have" thing that delays the release a bit.  This doesn't happen 
as much with the odd releases, but it definitely happens with the LTS releases. 
 When the next LTS is a year away, it is hard to suck it up and wait that long.

A couple of options:

* Keep even/odd pattern, and continue being flexible with release dates

  + flexible
  - unpredictable
  - odd releases of dubious value

* Keep even/odd pattern, but force a 'train' model with a more regular cadence

  + predictable schedule
  - some features will miss the target and be delayed a year

* Drop the odd releases but change nothing else (i.e., 12-month release
cadence)

  + eliminate the confusing odd releases with dubious value
 
* Drop the odd releases, and aim for a ~9 month cadence. This splits the 
difference between the current even/odd pattern we've been doing.

  + eliminate the confusing odd releases with dubious value
  + waiting for the next release isn't quite as bad
  - required upgrades every 9 months instead of ever 12 months

* Drop the odd releases, but relax the "must upgrade through every LTS" to 
allow upgrades across 2 versions (e.g., luminous -> mimic or luminous -> 
nautilus).  Shorten release cycle (~6-9 months).

  + more flexibility for users
  + downstreams have greater choice in adopting an upstrema release
  - more LTS branches to maintain
  - more upgrade paths to consider

Other options we should consider?  Other thoughts?

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.0 Luminous released

2017-08-30 Thread Deepak Naidu
Note sure how often does the http://docs.ceph.com/docs/master/releases/ gets 
updated, timeline roadmap helps.

--
Deepak

-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
Abhishek Lekshmanan
Sent: Tuesday, August 29, 2017 11:20 AM
To: ceph-de...@vger.kernel.org; ceph-us...@ceph.com; ceph-maintain...@ceph.com; 
ceph-annou...@ceph.com
Subject: [ceph-users] v12.2.0 Luminous released


We're glad to announce the first release of Luminous v12.2.x long term stable 
release series. There have been major changes since Kraken
(v11.2.z) and Jewel (v10.2.z), and the upgrade process is non-trivial.
Please read the release notes carefully.

For more details, links & changelog please refer to the complete release notes 
entry at the Ceph blog:
http://ceph.com/releases/v12-2-0-luminous-released/


Major Changes from Kraken
-

- *General*:
  * Ceph now has a simple, built-in web-based dashboard for monitoring cluster
status.

- *RADOS*:
  * *BlueStore*:
- The new *BlueStore* backend for *ceph-osd* is now stable and the
  new default for newly created OSDs.  BlueStore manages data
  stored by each OSD by directly managing the physical HDDs or
  SSDs without the use of an intervening file system like XFS.
  This provides greater performance and features.
- BlueStore supports full data and metadata checksums
  of all data stored by Ceph.
- BlueStore supports inline compression using zlib, snappy, or LZ4. (Ceph
  also supports zstd for RGW compression but zstd is not recommended for
  BlueStore for performance reasons.)

  * *Erasure coded* pools now have full support for overwrites
allowing them to be used with RBD and CephFS.

  * *ceph-mgr*:
- There is a new daemon, *ceph-mgr*, which is a required part of
  any Ceph deployment.  Although IO can continue when *ceph-mgr*
  is down, metrics will not refresh and some metrics-related calls
  (e.g., `ceph df`) may block.  We recommend deploying several
  instances of *ceph-mgr* for reliability.  See the notes on
  Upgrading below.
- The *ceph-mgr* daemon includes a REST-based management API.
  The API is still experimental and somewhat limited but
  will form the basis for API-based management of Ceph going forward.
- ceph-mgr also includes a Prometheus exporter plugin, which can provide 
Ceph
  perfcounters to Prometheus.
- ceph-mgr now has a Zabbix plugin. Using zabbix_sender it sends trapper
  events to a Zabbix server containing high-level information of the Ceph
  cluster. This makes it easy to monitor a Ceph cluster's status and send
  out notifications in case of a malfunction.

  * The overall *scalability* of the cluster has improved. We have
successfully tested clusters with up to 10,000 OSDs.
  * Each OSD can now have a device class associated with
it (e.g., `hdd` or `ssd`), allowing CRUSH rules to trivially map
data to a subset of devices in the system.  Manually writing CRUSH
rules or manual editing of the CRUSH is normally not required.
  * There is a new upmap exception mechanism that allows individual PGs to be 
moved around to achieve
a *perfect distribution* (this requires luminous clients).
  * Each OSD now adjusts its default configuration based on whether the
backing device is an HDD or SSD. Manual tuning generally not required.
  * The prototype mClock QoS queueing algorithm is now available.
  * There is now a *backoff* mechanism that prevents OSDs from being
overloaded by requests to objects or PGs that are not currently able to
process IO.
  * There is a simplified OSD replacement process that is more robust.
  * You can query the supported features and (apparent) releases of
all connected daemons and clients with `ceph features`
  * You can configure the oldest Ceph client version you wish to allow to
connect to the cluster via `ceph osd set-require-min-compat-client` and
Ceph will prevent you from enabling features that will break compatibility
with those clients.
  * Several `sleep` settings, include `osd_recovery_sleep`,
`osd_snap_trim_sleep`, and `osd_scrub_sleep` have been
reimplemented to work efficiently.  (These are used in some cases
to work around issues throttling background work.)
  * Pools are now expected to be associated with the application using them.
Upon completing the upgrade to Luminous, the cluster will attempt to 
associate
existing pools to known applications (i.e. CephFS, RBD, and RGW). In-use 
pools
that are not associated to an application will generate a health warning. 
Any
unassociated pools can be manually associated using the new
`ceph osd pool application enable` command. For more details see
`associate pool to application` in the documentation.

- *RGW*:

  * RGW *metadata search* backed by ElasticSearch now supports end
user requests service via RGW 

Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 13 = Permission denied

2017-07-24 Thread Deepak Naidu
For permanent fix, you need to fix this using  patched kernel or upgrade to 4.9 
kernel or higher(which has the patch fix) http://tracker.ceph.com/issues/17191

Using [mds] allow r gives users “read” permission to “/” share ie any 
directory/files under “/” , Example “/dir1”,”dir2” or “/MTY” can be read using 
the KEY and USER(client.mtyadm). If this is not concern to you, then I guess 
you are fine, else consider upgrading the kernel or get your current kernel 
patched for this cephFS kernel client fix.

caps: [mds] allow r,allow rw path=/MTY

--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of 
c.mo...@web.de
Sent: Monday, July 24, 2017 7:00 AM
To: Дмитрий Глушенок
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Mount CephFS with dedicated user fails: mount error 
13 = Permission denied

THX.
Mount is working now.

The auth list for user mtyadm is now:
client.mtyadm
key: AQAlyXVZEfsYNRAAM4jHuV1Br7lpRx1qaINO+A==
caps: [mds] allow r,allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata




24. Juli 2017 13:25, "Дмитрий Глушенок" 
>
 schrieb:
Check your kernel version, prior to 4.9 it was needed to allow read on root 
path: 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/014804.html
24 июля 2017 г., в 12:36, c.mo...@web.de написал(а):
Hello!

I want to mount CephFS with a dedicated user in order to avoid putting the 
admin key on every client host.
Therefore I created a user account
ceph auth get-or-create client.mtyadm mon 'allow r' mds 'allow rw path=/MTY' 
osd 'allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata' -o 
/etc/ceph/ceph.client.mtyadm.keyring
and wrote out the keyring
ceph-authtool -p -n client.mtyadm ceph.client.mtyadm.keyring > 
ceph.client.mtyadm.key

This user is now displayed in auth list:
client.mtyadm
key: AQBYu3VZLg66LBAAGM1jW+cvNE6BoJWfsORZKA==
caps: [mds] allow rw path=/MTY
caps: [mon] allow r
caps: [osd] allow rw pool=hdb-backup,allow rw pool=hdb-backup_metadata

When I try to mount directory /MTY on the client host I get this error:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=mtyadm,secretfile=/etc/ceph/ceph.client.mtyadm.key
mount error 13 = Permission denied

The mount works using admin though:
ld2398:/etc/ceph # mount -t ceph ldcephmon1,ldcephmon2,ldcephmon2:/MTY 
/mnt/cephfs -o name=admin,secretfile=/etc/ceph/ceph.client.admin.key
ld2398:/etc/ceph # mount | grep cephfs
10.96.5.37,10.96.5.38,10.96.5.38:/MTY on /mnt/cephfs type ceph 
(rw,relatime,name=admin,secret=,acl)

What is causing this mount error?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Dmitry Glushenok
Jet Infosystems


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How's cephfs going?

2017-07-17 Thread Deepak Naidu
Based on my experience, it's really stable and yes is production ready. Most of 
the use case for cephFS depends on what your trying to achieve. Few feedbacks.

1) Kernel client is nice/stable and can achieve higher bandwidth if you have 
40G or higher network.
2) ceph-fuse is very slow, as the writes are cached on your client RAM, 
regardless of direct IO.
3) Look out for blue store for long term. It stands true for CEPH not in 
particular to ceph FS only.
4) If you want per folder based namespace(in lack of words) you need to ensure 
your running latest kernel or backport the fixes to your running kernel.
5) Higher IO blocks will provide faster throughput. It would not be great of 
smaller IO blocks.
6) Use SSD for ceph FS Metadata pool(it really helps), this is based on my 
experience, folks can debate. I guess ebay has some writeup where they didn’t 
see any advantage on using SSD.
7) Lookup below experimental features
http://docs.ceph.com/docs/master/cephfs/experimental-features/?highlight=experimental

--
Deepak


-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Blair 
Bethwaite
Sent: Sunday, July 16, 2017 8:14 PM
To: 许雪寒
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] How's cephfs going?

It works and can reasonably be called "production ready". However in Jewel 
there are still some features (e.g. directory sharding, multi active MDS, and 
some security constraints) that may limit widespread usage. Also note that 
userspace client support in e.g. nfs-ganesha and samba is a mixed bag across 
distros and you may find yourself having to resort to re-exporting ceph-fuse or 
kernel mounts in order to provide those gateway services. We haven't tried 
Luminous CephFS yet as still waiting for the first full (non-RC) release to 
drop, but things seem very positive there...

On 17 July 2017 at 12:59, 许雪寒  wrote:
> Hi, everyone.
>
>
>
> We intend to use cephfs of Jewel version, however, we don’t know its status.
> Is it production ready in Jewel? Does it still have lots of bugs? Is 
> it a major effort of the current ceph development? And who are using cephfs 
> now?
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



--
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 300 active+undersized+degraded+remapped

2017-07-01 Thread Deepak Naidu
Thanks Max, yes the location hook is ideal way. But as I have few NVME per node 
I ended up using ceph.conf to add them to correct location.

--
Deepak

On Jul 1, 2017, at 11:52 AM, Maxime Guyot 
<max...@root314.com<mailto:max...@root314.com>> wrote:

Hi Deepak,

As Wildo pointed it out in the thread you linked, "osd crush update on start" 
and osd crush location are quick ways to fix this. If you are doing custom 
locations (like for tiering NVMe vs HDD) "osd crush location hook" (Doc: 
http://docs.ceph.com/docs/master/rados/operations/crush-map/#custom-location-hooks
 ) is a good option as well: it allows you to configure the crush location of 
the OSD based on a script, it shouldn't be too hard to detect if the OSD is 
NVMe or SATA and set its location based on that. It's really nice when you add 
new OSDs to see them arrive in the right location automatically.
Shameless plug: you can find an example in this blog post 
http://www.root314.com/2017/01/15/Ceph-storage-tiers/#tiered-crushmap I hope it 
helps

Cheers,
Maxime

On Sat, 1 Jul 2017 at 03:28 Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
OK, so looks like its ceph crushmap behavior 
http://docs.ceph.com/docs/master/rados/operations/crush-map/

--
Deepak

From: ceph-users 
[mailto:ceph-users-boun...@lists.ceph.com<mailto:ceph-users-boun...@lists.ceph.com>]
 On Behalf Of Deepak Naidu
Sent: Friday, June 30, 2017 7:06 PM
To: David Turner; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped

OK, I fixed the issue. But this is very weird. But will list them so its easy 
for other to check when there is similar issue.


1)  I had create rack aware osd tree

2)  I have SATA OSD’s and NVME OSD

3)  I created rack aware policy for both SATA and NVME OSD

4)  NVME OSD was used for CEPH FS Meta

5)  Recently: When I tried reboot of OSD node, it seemed that my journal 
volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to 
create startup script to fix them.

6)  With that. I had rebooted all the OSD one by one monitoring the ceph 
status.

7)  I was at the 3rd last node, then I notice the pgstuck warning. Not sure 
when and what happened, but I started getting this PG stuck issue(which is 
listed in my original email)

8)  I wasted time to look at the issue/error, but then I found the pool 
100% used issue.

9)  Now when I tried ceph osd tree. It looks like my NVME OSD’s went back 
to the host level OSD’s rather than the newly created/mapped NVME rack level. 
Ie no OSD’s under nvme-host name. This was the issue.

10)   Luckily I had created the backup of compiled version. I imported them in 
crushmap rule and now pool status is OK.

But, my question is how did ceph re-map the CRUSH rule ?

I had to create “new host entry” for NVME in crushmap ie

host OSD1-nvme  -- This is just dummy entry in crushmap ie it 
doesn’t resolve to any hostname
host OSD1  -- This is the actual hostname and resolves 
to IP and has an hostname

Is that the issue ?

Current status

health HEALTH_OK
osdmap e5108: 610 osds: 610 up, 610 in
flags sortbitwise,require_jewel_osds
  pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects
1155 GB used, 5462 TB / 5463 TB avail
   15450 active+clean


Pool1  15   233M  0  1820T  
   3737
Pool2 16  00 1820T  
0
Pool Meta   17 34928k0   2357G  
  28


Partial list of my osd tree

-152.76392 rack rack1-nvme
-180.69098 host OSD1-nvme
 600.69098 osd.60 up  1.0  
1.0
-210.69098 host OSD2-nvme
2430.69098 osd.243up  1.0  
1.0
-240.69098 host OSD3-NGN1-nvme
4260.69098 osd.426up  1.0  
1.0
-1 5456.27734 root default
-12 2182.51099 rack rack1-sata
 -2  545.62775 host OSD1
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.

Re: [ceph-users] 300 active+undersized+degraded+remapped

2017-06-30 Thread Deepak Naidu
Sorry for the spam, but more clear way of doing custom crushmap 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-April/038835.html

--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Deepak 
Naidu
Sent: Friday, June 30, 2017 7:22 PM
To: David Turner; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped

OK, so looks like its ceph crushmap behavior 
http://docs.ceph.com/docs/master/rados/operations/crush-map/

--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Deepak 
Naidu
Sent: Friday, June 30, 2017 7:06 PM
To: David Turner; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped

OK, I fixed the issue. But this is very weird. But will list them so its easy 
for other to check when there is similar issue.


1)  I had create rack aware osd tree

2)  I have SATA OSD’s and NVME OSD

3)  I created rack aware policy for both SATA and NVME OSD

4)  NVME OSD was used for CEPH FS Meta

5)  Recently: When I tried reboot of OSD node, it seemed that my journal 
volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to 
create startup script to fix them.

6)  With that. I had rebooted all the OSD one by one monitoring the ceph 
status.

7)  I was at the 3rd last node, then I notice the pgstuck warning. Not sure 
when and what happened, but I started getting this PG stuck issue(which is 
listed in my original email)

8)  I wasted time to look at the issue/error, but then I found the pool 
100% used issue.

9)  Now when I tried ceph osd tree. It looks like my NVME OSD’s went back 
to the host level OSD’s rather than the newly created/mapped NVME rack level. 
Ie no OSD’s under nvme-host name. This was the issue.

10)   Luckily I had created the backup of compiled version. I imported them in 
crushmap rule and now pool status is OK.

But, my question is how did ceph re-map the CRUSH rule ?

I had to create “new host entry” for NVME in crushmap ie

host OSD1-nvme  -- This is just dummy entry in crushmap ie it 
doesn’t resolve to any hostname
host OSD1  -- This is the actual hostname and resolves 
to IP and has an hostname

Is that the issue ?

Current status

health HEALTH_OK
osdmap e5108: 610 osds: 610 up, 610 in
flags sortbitwise,require_jewel_osds
  pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects
1155 GB used, 5462 TB / 5463 TB avail
   15450 active+clean


Pool1  15   233M  0  1820T  
   3737
Pool2 16  00 1820T  
0
Pool Meta   17 34928k0   2357G  
  28


Partial list of my osd tree

-152.76392 rack rack1-nvme
-180.69098 host OSD1-nvme
 600.69098 osd.60 up  1.0  
1.0
-210.69098 host OSD2-nvme
2430.69098 osd.243up  1.0  
1.0
-240.69098 host OSD3-NGN1-nvme
4260.69098 osd.426up  1.0  
1.0
-1 5456.27734 root default
-12 2182.51099 rack rack1-sata
 -2  545.62775 host OSD1
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0


--
Deepak



From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, June 30, 2017 6:36 PM
To: Deepak Naidu; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped


ceph status
ceph osd tree

Is your meta pool on ssds instead of the same root and osds as the rest of the 
cluster?

On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Hello,

Re: [ceph-users] 300 active+undersized+degraded+remapped

2017-06-30 Thread Deepak Naidu
OK, so looks like its ceph crushmap behavior 
http://docs.ceph.com/docs/master/rados/operations/crush-map/

--
Deepak

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Deepak 
Naidu
Sent: Friday, June 30, 2017 7:06 PM
To: David Turner; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped

OK, I fixed the issue. But this is very weird. But will list them so its easy 
for other to check when there is similar issue.


1)  I had create rack aware osd tree

2)  I have SATA OSD’s and NVME OSD

3)  I created rack aware policy for both SATA and NVME OSD

4)  NVME OSD was used for CEPH FS Meta

5)  Recently: When I tried reboot of OSD node, it seemed that my journal 
volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to 
create startup script to fix them.

6)  With that. I had rebooted all the OSD one by one monitoring the ceph 
status.

7)  I was at the 3rd last node, then I notice the pgstuck warning. Not sure 
when and what happened, but I started getting this PG stuck issue(which is 
listed in my original email)

8)  I wasted time to look at the issue/error, but then I found the pool 
100% used issue.

9)  Now when I tried ceph osd tree. It looks like my NVME OSD’s went back 
to the host level OSD’s rather than the newly created/mapped NVME rack level. 
Ie no OSD’s under nvme-host name. This was the issue.

10)   Luckily I had created the backup of compiled version. I imported them in 
crushmap rule and now pool status is OK.

But, my question is how did ceph re-map the CRUSH rule ?

I had to create “new host entry” for NVME in crushmap ie

host OSD1-nvme  -- This is just dummy entry in crushmap ie it 
doesn’t resolve to any hostname
host OSD1  -- This is the actual hostname and resolves 
to IP and has an hostname

Is that the issue ?

Current status

health HEALTH_OK
osdmap e5108: 610 osds: 610 up, 610 in
flags sortbitwise,require_jewel_osds
  pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects
1155 GB used, 5462 TB / 5463 TB avail
   15450 active+clean


Pool1  15   233M  0  1820T  
   3737
Pool2 16  00 1820T  
0
Pool Meta   17 34928k0   2357G  
  28


Partial list of my osd tree

-152.76392 rack rack1-nvme
-180.69098 host OSD1-nvme
 600.69098 osd.60 up  1.0  
1.0
-210.69098 host OSD2-nvme
2430.69098 osd.243up  1.0  
1.0
-240.69098 host OSD3-NGN1-nvme
4260.69098 osd.426up  1.0  
1.0
-1 5456.27734 root default
-12 2182.51099 rack rack1-sata
 -2  545.62775 host OSD1
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0


--
Deepak



From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, June 30, 2017 6:36 PM
To: Deepak Naidu; ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped


ceph status
ceph osd tree

Is your meta pool on ssds instead of the same root and osds as the rest of the 
cluster?

On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Hello,

I am getting the below error and I am unable to get them resolved even after 
starting and stopping the OSD’s. All the OSD’s seems to be up.

How do I repair the OSD’s or fix them manually. I am using cephFS. But oddly 
the ceph df is showing 100% used(which is showing in KB). But the pool is 
1886G(with 3 copies). I can still write to the ceph FS without any issue. Not 
sure why is CEPH reporting the wrong info of 100% f

Re: [ceph-users] 300 active+undersized+degraded+remapped

2017-06-30 Thread Deepak Naidu
OK, I fixed the issue. But this is very weird. But will list them so its easy 
for other to check when there is similar issue.


1)  I had create rack aware osd tree

2)  I have SATA OSD’s and NVME OSD

3)  I created rack aware policy for both SATA and NVME OSD

4)  NVME OSD was used for CEPH FS Meta

5)  Recently: When I tried reboot of OSD node, it seemed that my journal 
volumes which were on NVME didn’t startup bcos of the UDEV rules and I had to 
create startup script to fix them.

6)  With that. I had rebooted all the OSD one by one monitoring the ceph 
status.

7)  I was at the 3rd last node, then I notice the pgstuck warning. Not sure 
when and what happened, but I started getting this PG stuck issue(which is 
listed in my original email)

8)  I wasted time to look at the issue/error, but then I found the pool 
100% used issue.

9)  Now when I tried ceph osd tree. It looks like my NVME OSD’s went back 
to the host level OSD’s rather than the newly created/mapped NVME rack level. 
Ie no OSD’s under nvme-host name. This was the issue.

10)   Luckily I had created the backup of compiled version. I imported them in 
crushmap rule and now pool status is OK.

But, my question is how did ceph re-map the CRUSH rule ?

I had to create “new host entry” for NVME in crushmap ie

host OSD1-nvme  -- This is just dummy entry in crushmap ie it 
doesn’t resolve to any hostname
host OSD1  -- This is the actual hostname and resolves 
to IP and has an hostname

Is that the issue ?

Current status

health HEALTH_OK
osdmap e5108: 610 osds: 610 up, 610 in
flags sortbitwise,require_jewel_osds
  pgmap v247114: 15450 pgs, 3 pools, 322 GB data, 86102 objects
1155 GB used, 5462 TB / 5463 TB avail
   15450 active+clean


Pool1  15   233M  0  1820T  
   3737
Pool2 16  00 1820T  
0
Pool Meta   17 34928k0   2357G  
  28


Partial list of my osd tree

-152.76392 rack rack1-nvme
-180.69098 host OSD1-nvme
 600.69098 osd.60 up  1.0  
1.0
-210.69098 host OSD2-nvme
2430.69098 osd.243up  1.0  
1.0
-240.69098 host OSD3-NGN1-nvme
4260.69098 osd.426up  1.0  
1.0
-1 5456.27734 root default
-12 2182.51099 rack rack1-sata
 -2  545.62775 host OSD1
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0
-2  545.62775 host OSD2
  09.09380 osd.0  up  1.0  
1.0
  19.09380 osd.1  up  1.0  
1.0
  29.09380 osd.2  up  1.0  
1.0
  39.09380 osd.3  up  1.0  
1.0


--
Deepak



From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, June 30, 2017 6:36 PM
To: Deepak Naidu; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] 300 active+undersized+degraded+remapped


ceph status
ceph osd tree

Is your meta pool on ssds instead of the same root and osds as the rest of the 
cluster?

On Fri, Jun 30, 2017, 9:29 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Hello,

I am getting the below error and I am unable to get them resolved even after 
starting and stopping the OSD’s. All the OSD’s seems to be up.

How do I repair the OSD’s or fix them manually. I am using cephFS. But oddly 
the ceph df is showing 100% used(which is showing in KB). But the pool is 
1886G(with 3 copies). I can still write to the ceph FS without any issue. Not 
sure why is CEPH reporting the wrong info of 100% full


ceph version 10.2.7

 health HEALTH_WARN
300 pgs degraded
300 pgs stuck degraded
300 pgs stuck unclean
300 pgs stuck undersized
300 pgs undersized
recovery 28/19674 objects degraded (0.142%)
recovery 56/19674 objects misplaced (0.285%)



GLOBAL:
SIZE  AVAIL RAW USED %RAW US

[ceph-users] 300 active+undersized+degraded+remapped

2017-06-30 Thread Deepak Naidu
Hello,

I am getting the below error and I am unable to get them resolved even after 
starting and stopping the OSD's. All the OSD's seems to be up.

How do I repair the OSD's or fix them manually. I am using cephFS. But oddly 
the ceph df is showing 100% used(which is showing in KB). But the pool is 
1886G(with 3 copies). I can still write to the ceph FS without any issue. Not 
sure why is CEPH reporting the wrong info of 100% full


ceph version 10.2.7

 health HEALTH_WARN
300 pgs degraded
300 pgs stuck degraded
300 pgs stuck unclean
300 pgs stuck undersized
300 pgs undersized
recovery 28/19674 objects degraded (0.142%)
recovery 56/19674 objects misplaced (0.285%)



GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
5463T 5462T 187G 0
POOLS:
NAME ID USED   %USED   MAX AVAIL OBJECTS
Pool1  15   233M  0  1820T  
   3737
Pool2 16  00   1820T
  0
PoolMeta17 34719k 100.00 0  
28


Any help is appreciated

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-deploy , osd_journal_size and entire disk partiton for journal

2017-06-12 Thread Deepak Naidu
Thanks for the note, yes I know them all. It will be shared among multiple 3-4 
HDD OSD Disks.

--
Deepak

On Jun 12, 2017, at 7:07 AM, David Turner 
<drakonst...@gmail.com<mailto:drakonst...@gmail.com>> wrote:

Why do you want a 70GB journal?  You linked to the documentation, so I'm 
assuming that you followed the formula stated to figure out how big your 
journal should be... "osd journal size = {2 * (expected throughput * filestore 
max sync interval)}".  I've never heard of a cluster that requires such a large 
journal size.  The default is there because it works for 99.999% of situations. 
 I actually can't think of a use case that would require a larger journal than 
10GB, especially on an SSD.  The vast majority of the time the space on the SSD 
is practically empty.  It doesn't fill up like a cache or anything.  It's just 
a place that writes happen quickly and then quickly flushes it to the disk.

Using 100% of your SSD size is also a bad idea based on how SSD's recover from 
unwritable sectors... they mark them as dead and move the data to an unused 
sector.  The manufacturer overprovisions the drive in the factory, but you can 
help out by not using 100% of your available size.  If you have a 70GB SSD and 
only use 5-10GB, then you will drastically increase the life of the SSD as a 
journal.

If you really want to get a 70GB journal partition, then stop the osd, flush 
the journal, set up the journal partition manually, and make sure that 
/var/lib/ceph/osd/ceph-##/journal is pointing to the proper journal before 
starting it back up.

Unless you REALLY NEED a 70GB journal partition... don't do it.

On Mon, Jun 12, 2017 at 1:07 AM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Hello folks,

I am trying to use an entire ssd partition for journal disk ie example 
/dev/sdf1 partition(70GB). But when I look up the osd config using below 
command I see ceph-deploy sets journal_size as 5GB. More confusing, I see the 
OSD logs showing the correct size in blocks in the /var/log/ceph/ceph-osd.x.log
So my question is, whether ceph is using the entire disk partition or just 
5GB(default value of ceph deploy) for my OSD journal ?

I know I can set per OSD or global OSD value for journal size in ceph.conf . I 
am using Jewel 10.2.7

ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config get osd_journal_size
{
"osd_journal_size": "5120"
}

I tried the below, but the get osd_journal_size shows as 0, which is what its 
set, so still confused more.

http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
[cid:image001.png@01D2E2FF.1F9A9D10]


Any info is appreciated.


PS: I search to find similar issue, but no response on that thread.

--
Deepak


This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-deploy , osd_journal_size and entire disk partiton for journal

2017-06-11 Thread Deepak Naidu
Hello folks,

I am trying to use an entire ssd partition for journal disk ie example 
/dev/sdf1 partition(70GB). But when I look up the osd config using below 
command I see ceph-deploy sets journal_size as 5GB. More confusing, I see the 
OSD logs showing the correct size in blocks in the /var/log/ceph/ceph-osd.x.log
So my question is, whether ceph is using the entire disk partition or just 
5GB(default value of ceph deploy) for my OSD journal ?

I know I can set per OSD or global OSD value for journal size in ceph.conf . I 
am using Jewel 10.2.7

ceph --admin-daemon /var/run/ceph/ceph-osd.3.asok config get osd_journal_size
{
"osd_journal_size": "5120"
}

I tried the below, but the get osd_journal_size shows as 0, which is what its 
set, so still confused more.

http://docs.ceph.com/docs/master/rados/configuration/osd-config-ref/
[cid:image001.png@01D2E2FF.1F9A9D10]


Any info is appreciated.


PS: I search to find similar issue, but no response on that thread.

--
Deepak


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD node type/count mixes in the cluster

2017-06-09 Thread Deepak Naidu
Thanks David for sharing your experience, appreciate it.

--
Deepak

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Friday, June 09, 2017 5:38 AM
To: Deepak Naidu; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD node type/count mixes in the cluster


I ran a cluster with 2 generations of the same vendor hardware. 24 osd 
supermicro and 32 osd supermicro (with faster/more RAM and CPU cores).  The 
cluster itself ran decently well, but the load differences was drastic between 
the 2 types of nodes. It required me to run the cluster with 2 separate config 
files for each type of node and was an utter PITA when troubleshooting 
bottlenecks.

Ultimately I moved around hardware and created a legacy cluster on the old 
hardware and created a new cluster using the newer configuration.  In general 
it was very hard to diagnose certain bottlenecks due to everything just looking 
so different.  The primary one I encountered was snap trimming due to deleted 
thousands of snapshots/day.

If you aren't pushing any limits of Ceph, you will probably be fine.  But if 
you have a really large cluster, use a lot of snapshots, or are pushing your 
cluster harder than the average user... Then I'd avoid mixing server 
configurations in a cluster.

On Fri, Jun 9, 2017, 1:36 AM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Wanted to check if anyone has a ceph cluster which has mixed vendor servers 
both with same disk size i.e. 8TB but different count i.e. Example 10 OSD 
servers from Dell with 60 Disk per server and other 10 OSD servers from HP with 
26 Disk per server.

If so does that change any performance dynamics ? or is it not advisable .

--
Deepak
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD node type/count mixes in the cluster

2017-06-08 Thread Deepak Naidu
Wanted to check if anyone has a ceph cluster which has mixed vendor servers 
both with same disk size i.e. 8TB but different count i.e. Example 10 OSD 
servers from Dell with 60 Disk per server and other 10 OSD servers from HP with 
26 Disk per server.

If so does that change any performance dynamics ? or is it not advisable .

--
Deepak
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread Deepak Naidu
>> If all 6 racks are tagged for Ceph storage nodes, I'd go ahead and just put 
>> the nodes in there now and configure the crush map accordingly
I just have 3 racks. That’s the max I have for now. 10 OSD Nodes.

--
Deepak

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Thursday, June 01, 2017 2:05 PM
To: Deepak Naidu; ceph-users
Subject: Re: [ceph-users] Crushmap from Rack aware to Node aware

If all 6 racks are tagged for Ceph storage nodes, I'd go ahead and just put the 
nodes in there now and configure the crush map accordingly.  That way you can 
grow each of the racks while keeping each failure domain closer in size to the 
rest of the cluster.

On Thu, Jun 1, 2017 at 3:40 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Perfect David for detailed explanation. Appreciate it!.

In my case I have 10 OSD servers with each 60 Disks(ya I know…) ie total 600 
OSD and I have 3 racks to spare.

--
Deepak

From: David Turner [mailto:drakonst...@gmail.com<mailto:drakonst...@gmail.com>]
Sent: Thursday, June 01, 2017 12:23 PM
To: Deepak Naidu; ceph-users
Subject: Re: [ceph-users] Crushmap from Rack aware to Node aware

The way to do this is to download your crush map, modify it manually after 
decompiling it to text format or modify it using the crushtool.  Once you have 
your crush map with the rules in place that you want, you will upload the crush 
map to the cluster.  When you change your failure domain from host to rack, or 
any other change to failure domain, it will cause all of your PGs to peer at 
the same time.  You want to make sure that you have enough memory to handle 
this scenario.  After that point, your cluster will just backfill the PGs from 
where they currently are to their new location and then clean up after itself.  
It is recommended to monitor your cluster usage and modify osd_max_backfills 
during this process to optimize how fast you can finish your backfilling while 
keeping your cluster usable by the clients.

I generally recommend starting a cluster with at least n+2 failure domains so 
would recommend against going to a rack failure domain with only 3 racks.  As 
an alternative that I've done, I've set up 6 "racks" when I only have 3 racks 
with planned growth to a full 6 racks.  When I added servers and expanded to 
fill more racks, I moved the servers to where they are represented in the crush 
map.  So if it's physically in rack1 but it's set as rack4 in the crush map, 
then I would move those servers to the physical rack 4 and start filling out 
rack 1 and rack 4 to complete their capacity, then do the same for rack 2/5 
when I start into the 5th rack.

Another option to having full racks in your crush map is having half racks.  
I've also done this for clusters that wouldn't grow larger than 3 racks.  Have 
6 failure domains at half racks.  It lowers your chance of having random drives 
fail in different failure domains at the same time and gives you more servers 
that you can run maintenance on at a time over using a host failure domain.  It 
doesn't resolve the issue of using a single cross-link for the entire rack or a 
full power failure of the rack, but it's closer.

The problem with having 3 failure domains with replica 3 is that if you lose a 
complete failure domain, then you have nowhere for the 3rd replica to go.  If 
you have 4 failure domains with replica 3 and you lose an entire failure 
domain, then you over fill the remaining 3 failure domains and can only really 
use 55% of your cluster capacity.  If you have 5 failure domains, then you 
start normalizing and losing a failure domain doesn't impact as severely.  The 
more failure domains you get to, the less it affects you when you lose one.

Let's do another scenario with 3 failure domains and replica size 3.  Every OSD 
you lose inside of a failure domain gets backfilled directly onto the remaining 
OSDs in that failure domain.  There reaches a point where a switch failure in a 
rack or losing a node in the rack could over-fill the remaining OSDs in that 
rack.  If you have enough servers and OSDs in the rack, then this becomes 
moot but if you have a smaller cluster with only 3 nodes and <4 drives in 
each... if you lose a drive in one of your nodes, then all of it's data gets 
distributed to the other 3 drives in that node.  That means you either have to 
replace your storage ASAP when it fails or never fill your cluster up more than 
55% if you want to be able to automatically recover from a drive failure.

tl;dr . Make sure you calculate what your failure domain, replica size, drive 
size, etc means for how fast you have to replace storage when it fails and how 
full you can fill your cluster to afford a hardware loss.

On Thu, Jun 1, 2017 at 12:40 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Greetings Folks.

Wanted to understand how ceph works when we start with rack aware(rack level 
repli

Re: [ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread Deepak Naidu
Perfect David for detailed explanation. Appreciate it!.

In my case I have 10 OSD servers with each 60 Disks(ya I know…) ie total 600 
OSD and I have 3 racks to spare.

--
Deepak

From: David Turner [mailto:drakonst...@gmail.com]
Sent: Thursday, June 01, 2017 12:23 PM
To: Deepak Naidu; ceph-users
Subject: Re: [ceph-users] Crushmap from Rack aware to Node aware

The way to do this is to download your crush map, modify it manually after 
decompiling it to text format or modify it using the crushtool.  Once you have 
your crush map with the rules in place that you want, you will upload the crush 
map to the cluster.  When you change your failure domain from host to rack, or 
any other change to failure domain, it will cause all of your PGs to peer at 
the same time.  You want to make sure that you have enough memory to handle 
this scenario.  After that point, your cluster will just backfill the PGs from 
where they currently are to their new location and then clean up after itself.  
It is recommended to monitor your cluster usage and modify osd_max_backfills 
during this process to optimize how fast you can finish your backfilling while 
keeping your cluster usable by the clients.

I generally recommend starting a cluster with at least n+2 failure domains so 
would recommend against going to a rack failure domain with only 3 racks.  As 
an alternative that I've done, I've set up 6 "racks" when I only have 3 racks 
with planned growth to a full 6 racks.  When I added servers and expanded to 
fill more racks, I moved the servers to where they are represented in the crush 
map.  So if it's physically in rack1 but it's set as rack4 in the crush map, 
then I would move those servers to the physical rack 4 and start filling out 
rack 1 and rack 4 to complete their capacity, then do the same for rack 2/5 
when I start into the 5th rack.

Another option to having full racks in your crush map is having half racks.  
I've also done this for clusters that wouldn't grow larger than 3 racks.  Have 
6 failure domains at half racks.  It lowers your chance of having random drives 
fail in different failure domains at the same time and gives you more servers 
that you can run maintenance on at a time over using a host failure domain.  It 
doesn't resolve the issue of using a single cross-link for the entire rack or a 
full power failure of the rack, but it's closer.

The problem with having 3 failure domains with replica 3 is that if you lose a 
complete failure domain, then you have nowhere for the 3rd replica to go.  If 
you have 4 failure domains with replica 3 and you lose an entire failure 
domain, then you over fill the remaining 3 failure domains and can only really 
use 55% of your cluster capacity.  If you have 5 failure domains, then you 
start normalizing and losing a failure domain doesn't impact as severely.  The 
more failure domains you get to, the less it affects you when you lose one.

Let's do another scenario with 3 failure domains and replica size 3.  Every OSD 
you lose inside of a failure domain gets backfilled directly onto the remaining 
OSDs in that failure domain.  There reaches a point where a switch failure in a 
rack or losing a node in the rack could over-fill the remaining OSDs in that 
rack.  If you have enough servers and OSDs in the rack, then this becomes 
moot but if you have a smaller cluster with only 3 nodes and <4 drives in 
each... if you lose a drive in one of your nodes, then all of it's data gets 
distributed to the other 3 drives in that node.  That means you either have to 
replace your storage ASAP when it fails or never fill your cluster up more than 
55% if you want to be able to automatically recover from a drive failure.

tl;dr . Make sure you calculate what your failure domain, replica size, drive 
size, etc means for how fast you have to replace storage when it fails and how 
full you can fill your cluster to afford a hardware loss.

On Thu, Jun 1, 2017 at 12:40 PM Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Greetings Folks.

Wanted to understand how ceph works when we start with rack aware(rack level 
replica) example 3 racks and 3 replica in crushmap in future is replaced by 
node aware(node level replica) ie 3 replica spread across nodes.

This can be vice-versa. If this happens. How does ceph rearrange the “old” 
data. Do I need to trigger any command to ensure the data placement is based on 
latest crushmap algorithm or ceph takes care of it automatically.

Thanks for your time.

--
Deepak

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
cep

[ceph-users] Crushmap from Rack aware to Node aware

2017-06-01 Thread Deepak Naidu
Greetings Folks.

Wanted to understand how ceph works when we start with rack aware(rack level 
replica) example 3 racks and 3 replica in crushmap in future is replaced by 
node aware(node level replica) ie 3 replica spread across nodes.

This can be vice-versa. If this happens. How does ceph rearrange the "old" 
data. Do I need to trigger any command to ensure the data placement is based on 
latest crushmap algorithm or ceph takes care of it automatically.

Thanks for your time.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-Tenancy: Network Isolation

2017-05-29 Thread Deepak Naidu
Thanks much Vlad and Dave for suggestions appreciate it !

--
Deepak

On May 29, 2017, at 1:04 AM, Дробышевский, Владимир 
<v...@itgorod.ru<mailto:v...@itgorod.ru>> wrote:

Hi, Deepak!

  The easiest way I can imagine is to use multiple VLANs, put all ceph hosts 
ports into every VLAN and use a wider subnet. For example, you can set 
192.168.0.0/16<http://192.168.0.0/16> for the public ceph network, use 
192.168.0.1-254 IPs for ceph hosts, 192.168.1.1-254/16 IPs for the first 
tenant, 192.168.2.1-254/16 for the second and so on. You'll have to be sure 
that no ceph hosts have any routing facilities running and then get a number of 
isolated L2 networks with the common part. Actually it's not a good way and 
lead to many errors (your tenants must carefully use provided IPs and do not 
cross with other IPs spaces despite of the /16 bitmask).


  An another option is - like David said - L3 routed network. In this case you 
will probably face with network bandwidth problems: all your traffic will go 
through one interface. But if your switches have L3 functionality you can route 
packets there. And again, the problem would be in bandwidth: usually switches 
doesn't have a lot of power and routed bandwidth leaves a lot to desire.


  And the craziest one :-). It just a theory, never tried this in production 
and even in a lab.

  As with previous options you go with multiple per-tenant VLANs and ceph hosts 
ports in all of these VLANs.

  You need to choose a different network for public interfaces, for ex., 
10.0.0.0/24<http://10.0.0.0/24>. Then set loopback interface on each ceph host 
and attach a single unique IP to it, like 10.0.0.1/32<http://10.0.0.1/32>, 
10.0.0.2/32<http://10.0.0.2/32> and so on. Enable IP forwarding and start RIP 
routing daemon on each ceph host. Setup and configure ceph, use attached IP as 
MON IP.

  Create ceph VLAN with all ceph hosts and set a common network IP subnet (for 
ex, 172.16.0.0/24<http://172.16.0.0/24>), attach IP from this network to every 
ceph host. Check that you can reach any of the public (loopback) IPs from any 
ceph host.

  Now create multiple per-tenant VLANs and put ceph hosts ports into every one. 
Set isolated subnets for your tenant's networks, for example, 
192.168.0.0/23<http://192.168.0.0/23>, use 192.168.0.x IPs as the additional 
addresses for the ceph hosts, 192.168.1.x as tenant network. Start RIP routing 
daemon on every tenant host. Check that you can reach every ceph public IPs 
(10.0.0.x/32).

  I would also configure RIP daemon to advertise only 10.0.0.x/32 network on 
each ceph host and set RIP daemon on passive mode on client hosts. It's better 
to configure firewall on ceph hosts as well to prevent extra-subnets 
communications.

  In theory it should work but can't say much on how stable would it be.

Best regards,
Vladimir

2017-05-26 20:36 GMT+05:00 Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>>:
Hi Vlad,

Thanks for chiming in.

>>It's not clear what you want to achieve from the ceph point of view?
Multiple tenancy. We will have multiple tenants from different isolated 
subnet/network accessing single ceph cluster which can support multiple 
tenants. The only problem I see with ceph in a physical env setup is I cannot 
isolate public networks , example mon,mds for multiple subnet/network/tenants.

>>For example, for the network isolation you can use managed switches, set 
>>different VLANs and put ceph hosts to the every VLAN.
Yes we have managed switches with VLAN. And if I add for example 2x public 
interferences on Net1(subnet 192.168.1.0/24<http://192.168.1.0/24>) and 
Net2(subnet 192.168.2.0/24<http://192.168.2.0/24>) how does the ceph.conf look 
like. How does my mon and MDS server config look like, that's the 
challenge/question.

>>But it's a shoot in the dark as I don't know what exactly you need. For 
>>example, what services (block storage, object storage, API etc) you want to 
>>offer to your tenants and so on

CephFS and Object. I am familiar on how to get the ceph storage part "tenant 
friendly", it's just the network part I need to isolate.

--
Deepak

> On May 26, 2017, at 12:03 AM, Дробышевский, Владимир 
> <v...@itgorod.ru<mailto:v...@itgorod.ru>> wrote:
>
>   It's not clear what you want to achieve from the ceph point of view? For 
> example, for the network isolation you can use managed switches, set 
> different VLANs and put ceph hosts to the every VLAN. But it's a shoot in the 
> dark as I don't know what exactly you need. For example, what services (block 
> storage, object storage, API etc) you want to offer to your tenants and so on
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized

Re: [ceph-users] Multi-Tenancy: Network Isolation

2017-05-28 Thread Deepak Naidu
Thanks David.

>>Every single one of the above needs to be able to access all of the mons and 
>>osds. I don't think you can have multiple subnets for this, 
Yes that's why this multi tenancy question 

>>but you can do this via routing. Say your private osd network is 
>>xxx.xxx.10.0, your public ceph network is .11

I don't think is routing is what I am looking at. I can solve this even using 
NAT to isolate tenants. What I am inherently looking is multi tenancy in 
network isolation with the application i.e. either virtual NIC namespace like 
virtual machines or physical isolation of network and ceph applications 
inherently working with it.

--
Deepak

> On May 28, 2017, at 8:03 AM, David Turner  wrote:
> 
> Every single one of the above needs to be able to access all of the mons and 
> osds. I don't think you can have multiple subnets for this, but you can do 
> this via routing. Say your private osd network is xxx.xxx.10.0, your public 
> ceph network is .11
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multi-Tenancy: Network Isolation

2017-05-26 Thread Deepak Naidu
Hi Vlad,

Thanks for chiming in.

>>It's not clear what you want to achieve from the ceph point of view? 
Multiple tenancy. We will have multiple tenants from different isolated 
subnet/network accessing single ceph cluster which can support multiple 
tenants. The only problem I see with ceph in a physical env setup is I cannot 
isolate public networks , example mon,mds for multiple subnet/network/tenants.

>>For example, for the network isolation you can use managed switches, set 
>>different VLANs and put ceph hosts to the every VLAN.
Yes we have managed switches with VLAN. And if I add for example 2x public 
interferences on Net1(subnet 192.168.1.0/24) and Net2(subnet 192.168.2.0/24) 
how does the ceph.conf look like. How does my mon and MDS server config look 
like, that's the challenge/question.

>>But it's a shoot in the dark as I don't know what exactly you need. For 
>>example, what services (block storage, object storage, API etc) you want to 
>>offer to your tenants and so on

CephFS and Object. I am familiar on how to get the ceph storage part "tenant 
friendly", it's just the network part I need to isolate.

--
Deepak

> On May 26, 2017, at 12:03 AM, Дробышевский, Владимир  wrote:
> 
>   It's not clear what you want to achieve from the ceph point of view? For 
> example, for the network isolation you can use managed switches, set 
> different VLANs and put ceph hosts to the every VLAN. But it's a shoot in the 
> dark as I don't know what exactly you need. For example, what services (block 
> storage, object storage, API etc) you want to offer to your tenants and so on
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multi-Tenancy: Network Isolation

2017-05-25 Thread Deepak Naidu
I am trying to gather and understand on how can or has multitenancy solved for 
network interfaces or isolation. I can get ceph under a virtualized env and 
achieve the isolation but my question or though is more on the physical ceph 
deployment.

Is there a way, we can have multiple networks(public interfaces) dedicated to 
tenants so it can guarantee the network isolation(as they will be on different 
subnet) on ceph. I am purely looking for network isolation on physical hardware 
with single ceph cluster.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Deepak Naidu
Yes via creates a swap file and nano doesn’t. But when I try fio to write, I 
don’t see this happening.

--
Deepak

From: Chris Sarginson [mailto:csarg...@gmail.com]
Sent: Thursday, April 13, 2017 2:26 PM
To: Deepak Naidu; ceph-users
Subject: Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

Is it related to this the recovery behaviour of vim creating a swap file, which 
I think nano does not do?

http://vimdoc.sourceforge.net/htmldoc/recover.html

A sync into cephfs I think needs the write to get confirmed all the way down 
from the osds performing the write before it returns the confirmation to the 
client calling the sync, though I stand to be corrected on that.

On Thu, 13 Apr 2017 at 22:04 Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Ok, I tried strace to check why vi slows or pauses. It seems to slow on fsync(3)

I didn’t see the issue with nano editor.

--
Deepak


From: Deepak Naidu
Sent: Wednesday, April 12, 2017 2:18 PM
To: 'ceph-users'
Subject: saving file on cephFS mount using vi takes pause/time

Folks,

This is bit weird issue. I am using the cephFS volume to read write files etc 
its quick less than seconds. But when editing a the file on cephFS volume using 
vi , when saving the file the save takes couple of seconds something like 
sync(flush). The same doesn’t happen on local filesystem.

Any pointers is appreciated.

--
Deepak

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-13 Thread Deepak Naidu
Ok, I tried strace to check why vi slows or pauses. It seems to slow on fsync(3)

I didn't see the issue with nano editor.

--
Deepak


From: Deepak Naidu
Sent: Wednesday, April 12, 2017 2:18 PM
To: 'ceph-users'
Subject: saving file on cephFS mount using vi takes pause/time

Folks,

This is bit weird issue. I am using the cephFS volume to read write files etc 
its quick less than seconds. But when editing a the file on cephFS volume using 
vi , when saving the file the save takes couple of seconds something like 
sync(flush). The same doesn't happen on local filesystem.

Any pointers is appreciated.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] saving file on cephFS mount using vi takes pause/time

2017-04-12 Thread Deepak Naidu
Folks,

This is bit weird issue. I am using the cephFS volume to read write files etc 
its quick less than seconds. But when editing a the file on cephFS volume using 
vi , when saving the file the save takes couple of seconds something like 
sync(flush). The same doesn't happen on local filesystem.

Any pointers is appreciated.

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-10 Thread Deepak Naidu
I still see the issue, where the space is not getting deleted. gc process works 
sometimes but sometimes it does nothing to clean the GC, as there are no items 
in the GC, but still the space is used on the pool.

Any ideas what the ideal config for automatic deletion of these objects after 
the files are deleted.
Currently set to

"rgw_gc_max_objs": "97",

--
Deepak

From: Deepak Naidu
Sent: Wednesday, April 05, 2017 2:56 PM
To: Ben Hines
Cc: ceph-users
Subject: RE: [ceph-users] ceph df space for rgw.buckets.data shows used even 
when files are deleted

Thanks Ben.

Is there are tuning param I need to use to fasten the process.

"rgw_gc_max_objs": "32",
"rgw_gc_obj_min_wait": "7200",
"rgw_gc_processor_max_time": "3600",
"rgw_gc_processor_period": "3600",


--
Deepak



From: Ben Hines [mailto:bhi...@gmail.com]
Sent: Wednesday, April 05, 2017 2:41 PM
To: Deepak Naidu
Cc: ceph-users
Subject: Re: [ceph-users] ceph df space for rgw.buckets.data shows used even 
when files are deleted

Ceph's RadosGW uses garbage collection by default.

Try running 'radosgw-admin gc list' to list the objects to be garbage 
collected, or 'radosgw-admin gc process' to trigger them to be deleted.

-Ben

On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Folks,

Trying to test the S3 object GW. When I try to upload any files the space is 
shown used(that’s normal behavior), but when the object is deleted it shows as 
used(don’t understand this).  Below example.

Currently there is no files in the entire S3 bucket, but it still shows space 
used. Any insight is appreciated.

ceph version 10.2.6

NAMEID USED   %USED 
MAX AVAIL OBJECTS
default.rgw.buckets.data 49 51200M  1.08 4598G  
 12800


--
Deepak

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS kernel driver is 10-15x slower than FUSE driver

2017-04-08 Thread Deepak Naidu
Hmm pretty odd. When I ran test on ceph kernel vs ceph fuse , FUSE has always 
been slower for both read and writes.

Try using tools like fio to run IO test with directIO(bypass system cache)

--
Deepak

On Apr 8, 2017, at 4:50 PM, Kyle Drake 
> wrote:

Pretty much says it all. 1GB test file copy to local:

$ time cp /mnt/ceph-kernel-driver-test/test.img .

real 2m50.063s
user 0m0.000s
sys 0m9.000s

$ time cp /mnt/ceph-fuse-test/test.img .

real 0m3.648s
user 0m0.000s
sys 0m1.872s

Yikes. The kernel driver averages ~5MB and the fuse driver averages ~150MBish? 
Something crazy is happening here. It's not caching, I ran both tests fresh.

Ubuntu 16.04.2, 4.4.0-72-generic, ceph-fuse 10.2.6-1xenial, ceph-fs-common 
10.2.6-0ubuntu0.16.04.1 (I also tried the 16.04.2 one, same issue).

Anyone run into this? Did a lot of digging on the ML and didn't see anything. 
I'm was going to use FUSE for production, but it tends to lag more on a lot of 
small requests so I had to fall back the kernel driver.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-05 Thread Deepak Naidu
Thanks Ben.

Is there are tuning param I need to use to fasten the process.

"rgw_gc_max_objs": "32",
"rgw_gc_obj_min_wait": "7200",
"rgw_gc_processor_max_time": "3600",
"rgw_gc_processor_period": "3600",


--
Deepak



From: Ben Hines [mailto:bhi...@gmail.com]
Sent: Wednesday, April 05, 2017 2:41 PM
To: Deepak Naidu
Cc: ceph-users
Subject: Re: [ceph-users] ceph df space for rgw.buckets.data shows used even 
when files are deleted

Ceph's RadosGW uses garbage collection by default.

Try running 'radosgw-admin gc list' to list the objects to be garbage 
collected, or 'radosgw-admin gc process' to trigger them to be deleted.

-Ben

On Wed, Apr 5, 2017 at 12:15 PM, Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:
Folks,

Trying to test the S3 object GW. When I try to upload any files the space is 
shown used(that’s normal behavior), but when the object is deleted it shows as 
used(don’t understand this).  Below example.

Currently there is no files in the entire S3 bucket, but it still shows space 
used. Any insight is appreciated.

ceph version 10.2.6

NAMEID USED   %USED 
MAX AVAIL OBJECTS
default.rgw.buckets.data 49 51200M  1.08 4598G  
 12800


--
Deepak

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph df space for rgw.buckets.data shows used even when files are deleted

2017-04-05 Thread Deepak Naidu
Folks,

Trying to test the S3 object GW. When I try to upload any files the space is 
shown used(that's normal behavior), but when the object is deleted it shows as 
used(don't understand this).  Below example.

Currently there is no files in the entire S3 bucket, but it still shows space 
used. Any insight is appreciated.

ceph version 10.2.6

NAMEID USED   %USED 
MAX AVAIL OBJECTS
default.rgw.buckets.data 49 51200M  1.08 4598G  
 12800


--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-30 Thread Deepak Naidu
Hi John, any idea on what's wrong. Any info is appreciated.

--
Deepak

-Original Message-
From: Deepak Naidu 
Sent: Thursday, March 23, 2017 2:20 PM
To: John Spray
Cc: ceph-users
Subject: RE: [ceph-users] How to mount different ceph FS using ceph-fuse or 
kernel cephfs mount

Fixing typo

>>>> What version of ceph-fuse?
ceph-fuse-10.2.6-0.el7.x86_64

--
Deepak

-Original Message-
From: Deepak Naidu
Sent: Thursday, March 23, 2017 9:49 AM
To: John Spray
Cc: ceph-users
Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or 
kernel cephfs mount

>> What version of ceph-fuse?

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611)
 
--
Deepak

>> On Mar 23, 2017, at 6:28 AM, John Spray <jsp...@redhat.com> wrote:
>> 
>> On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>> Hi John,
>> 
>> 
>> 
>> I tried the below option for ceph-fuse & kernel mount. Below is what 
>> I see/error.
>> 
>> 
>> 
>> 1)  When trying using ceph-fuse, the mount command succeeds but I see
>> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if 
>> this is normal message or some error
> 
> What version of ceph-fuse?
> 
> John
> 
>> 
>> 2)  When trying the kernel mount, the mount command just hangs & after
>> few seconds I see mount error 5 = Input/output error. I am using 
>> 4.9.15-040915-generic kernel on Ubuntu 16.x
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> 
>> -Original Message-
>> From: John Spray [mailto:jsp...@redhat.com]
>> Sent: Wednesday, March 22, 2017 6:16 AM
>> To: Deepak Naidu
>> Cc: ceph-users
>> Subject: Re: [ceph-users] How to mount different ceph FS using 
>> ceph-fuse or kernel cephfs mount
>> 
>> 
>> 
>>> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>>> 
>>> Greetings,
>> 
>> 
>> 
>> 
>>> I have below two cephFS "volumes/filesystem" created on my ceph
>> 
>>> cluster. Yes I used the "enable_multiple" flag to enable the 
>>> multiple
>> 
>>> cephFS feature. My question
>> 
>> 
>> 
>> 
>>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>> 
>>> either using kernel mount of ceph-fuse mount.
>> 
>> 
>> 
>> The option for ceph_fuse is --client_mds_namespace=dataX (you can do 
>> this on the command line or in your ceph.conf)
>> 
>> 
>> 
>> With the kernel client use "-o mds_namespace=DataX" (assuming you 
>> have a sufficiently recent kernel)
>> 
>> 
>> 
>> Cheers,
>> 
>> John
>> 
>> 
>> 
>> 
>>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>>> during
>> 
>>> the fuse mount or kernel mount
>> 
>> 
>> 
>> 
>> 
>> 
>>> [root@Admin ~]# ceph fs ls
>> 
>> 
>>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>> 
>> 
>> 
>> 
>>> --
>> 
>> 
>>> Deepak
>> 
>> 
>>> 
>> 
>>> This email message is for the sole use of the intended recipient(s)
>> 
>>> and may contain confidential information.  Any unauthorized review,
>> 
>>> use, disclosure or distribution is prohibited.  If you are not the
>> 
>>> intended recipient, please contact the sender by reply email and
>> 
>>> destroy all copies of the original message.
>> 
>>> 
>> 
>> 
>>> ___
>> 
>>> ceph-users mailing list
>> 
>>> ceph-users@lists.ceph.com
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] cephFS mounted on client shows space used -- when there is nothing used on the FS

2017-03-24 Thread Deepak Naidu
I have cephFS cluster. Below is the df from a client node.

Question is why does the df command when mounted using ceph-fuse or ceph-kernel 
mount shows "used space" when there is nothing used(empty -- no files or 
directories)

[root@storage ~]# df -h
Filesystem  Size  Used Avail Use% 
Mounted on
/dev/mapper/centos-root  45G  1.4G   44G   4% /
devtmpfs28G 0   28G   0% /dev
tmpfs28G 0   28G   0% 
/dev/shm
tmpfs28G   17M   28G   1% 
/run
tmpfs28G 0   28G   0% 
/sys/fs/cgroup
/dev/xvda1497M  168M  329M  34% /boot
/dev/mapper/centos-home   22G   34M   22G   1% /home
tmpfs5.5G 0  5.5G   0% 
/run/user/0
ceph-fuse   4.7T  1.5G  4.7T   1% 
/mnt/cephfs
[root@storage ~]#


[root@storage ~]# ls -larth /mnt/cephfs/
total 512
drwxr-xr-x. 3 root root 19 Mar 17 12:36 ..
drwxr-xr-x  1 root root  0 Mar 23 22:20 .
[root@storage ~]#


[root@storage ~]# du -shc /mnt/cephfs/
512 /mnt/cephfs/
512 total
[root@storage ~]#

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-23 Thread Deepak Naidu
Fixing typo

>>>> What version of ceph-fuse?
ceph-fuse-10.2.6-0.el7.x86_64

--
Deepak

-Original Message-
From: Deepak Naidu 
Sent: Thursday, March 23, 2017 9:49 AM
To: John Spray
Cc: ceph-users
Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or 
kernel cephfs mount

>> What version of ceph-fuse?

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611)
 
--
Deepak

>> On Mar 23, 2017, at 6:28 AM, John Spray <jsp...@redhat.com> wrote:
>> 
>> On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>> Hi John,
>> 
>> 
>> 
>> I tried the below option for ceph-fuse & kernel mount. Below is what 
>> I see/error.
>> 
>> 
>> 
>> 1)  When trying using ceph-fuse, the mount command succeeds but I see
>> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if 
>> this is normal message or some error
> 
> What version of ceph-fuse?
> 
> John
> 
>> 
>> 2)  When trying the kernel mount, the mount command just hangs & after
>> few seconds I see mount error 5 = Input/output error. I am using 
>> 4.9.15-040915-generic kernel on Ubuntu 16.x
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> 
>> -Original Message-
>> From: John Spray [mailto:jsp...@redhat.com]
>> Sent: Wednesday, March 22, 2017 6:16 AM
>> To: Deepak Naidu
>> Cc: ceph-users
>> Subject: Re: [ceph-users] How to mount different ceph FS using 
>> ceph-fuse or kernel cephfs mount
>> 
>> 
>> 
>>> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>>> 
>>> Greetings,
>> 
>> 
>> 
>> 
>>> I have below two cephFS "volumes/filesystem" created on my ceph
>> 
>>> cluster. Yes I used the "enable_multiple" flag to enable the 
>>> multiple
>> 
>>> cephFS feature. My question
>> 
>> 
>> 
>> 
>>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>> 
>>> either using kernel mount of ceph-fuse mount.
>> 
>> 
>> 
>> The option for ceph_fuse is --client_mds_namespace=dataX (you can do 
>> this on the command line or in your ceph.conf)
>> 
>> 
>> 
>> With the kernel client use "-o mds_namespace=DataX" (assuming you 
>> have a sufficiently recent kernel)
>> 
>> 
>> 
>> Cheers,
>> 
>> John
>> 
>> 
>> 
>> 
>>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>>> during
>> 
>>> the fuse mount or kernel mount
>> 
>> 
>> 
>> 
>> 
>> 
>>> [root@Admin ~]# ceph fs ls
>> 
>> 
>>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>> 
>> 
>> 
>> 
>>> --
>> 
>> 
>>> Deepak
>> 
>> 
>>> 
>> 
>>> This email message is for the sole use of the intended recipient(s)
>> 
>>> and may contain confidential information.  Any unauthorized review,
>> 
>>> use, disclosure or distribution is prohibited.  If you are not the
>> 
>>> intended recipient, please contact the sender by reply email and
>> 
>>> destroy all copies of the original message.
>> 
>>> 
>> 
>> 
>>> ___
>> 
>>> ceph-users mailing list
>> 
>>> ceph-users@lists.ceph.com
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] The performance of ceph with RDMA

2017-03-23 Thread Deepak Naidu
RDMA is of interest to me. So my below comment.

>> What surprised me is that the result of RDMA mode is almost the same as the 
>> basic mode, the iops, latency, throughput, etc.

Pardon  my knowledge here. If I read your ceph.conf and your notes. It seems 
that you are using RDMA only for “cluster/private network” ? so how do you 
expect RDMA to be efficient on client IOPS/latency/throughput.


--
Deepak


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Haomai 
Wang
Sent: Thursday, March 23, 2017 4:34 AM
To: Hung-Wei Chiu (邱宏瑋)
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] The performance of ceph with RDMA



On Thu, Mar 23, 2017 at 5:49 AM, Hung-Wei Chiu (邱宏瑋) 
> wrote:
Hi,

I use the latest (master branch, upgrade at 2017/03/22) to build ceph with RDMA 
and use the fio to test its iops/latency/throughput.

In my environment, I setup 3 hosts and list the detail of each host below.

OS: ubuntu 16.04
Storage: SSD * 4 (256G * 4)
Memory: 64GB.
NICs: two NICs, one (intel 1G) for public network and the other (mellanox 10G) 
for private network.

There're 3 monitor and 24 osds equally distributed within 3 hosts which means 
each hosts contains 1 mon and 8 osds.

For my experiment, I use two configs, basic and RDMA.

Basic
[global]
 fsid = 
0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024


RDMA
[global]
 fsid = 
0612cc7e-6239-456c-978b-b4df781fe831
mon initial members = ceph-1,ceph-2,ceph-3
mon host = 10.0.0.15,10.0.0.16,10.0.0.17
osd pool default size = 2
osd pool default pg num = 1024
osd pool default pgp num = 1024
ms_type=async+rdma
ms_async_rdma_device_name = mlx4_0


What surprised me is that the result of RDMA mode is almost the same as the 
basic mode, the iops, latency, throughput, etc.
I also try to use different pattern of the fio parameter, such as read and 
write ratio, random operations or sequence operations.
All results are the same.

yes, most of latency comes from other components now.. although we still want 
to avoid extra copy in rdma side.

so current rdma backend only means it just can be choice compared to tcp/ip 
network. more benefits need to be get from others.


In order to figure out what's going on. I do the following steps.

1. Follow this article (https://community.mellanox.com/docs/DOC-2086) to make 
sure my RDMA environment.
2. To make sure the network traffic is transmitted by RDMA, I dump the traffic 
within the private network and the answear is yes. it use the RDMA.
3. Modify the ms_async_rdma_buffer_size to (256 << 10), no change.
4. Modfiy the ms_async_rdma_send_buffers to 2048, no change.
5. Modify the ms_async_rdma_receive_buffers to 2048, no change.

After above operations, I guess maybe my Ceph setup environment is not good for 
RDMA to improve the performance.

Do anyone know what kind of the ceph environment (replicated size, # of osd, # 
of mon, etc) is good for RDMA?

Thanks in advanced.



Best Regards,

Hung-Wei Chiu(邱宏瑋)
--
Computer Center, Department of Computer Science
National Chiao Tung University

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-23 Thread Deepak Naidu
>> What version of ceph-fuse?

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611)
 
--
Deepak

>> On Mar 23, 2017, at 6:28 AM, John Spray <jsp...@redhat.com> wrote:
>> 
>> On Wed, Mar 22, 2017 at 3:30 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>> Hi John,
>> 
>> 
>> 
>> I tried the below option for ceph-fuse & kernel mount. Below is what I
>> see/error.
>> 
>> 
>> 
>> 1)  When trying using ceph-fuse, the mount command succeeds but I see
>> parse error setting 'client_mds_namespace' to 'dataX' .  Not sure if this is
>> normal message or some error
> 
> What version of ceph-fuse?
> 
> John
> 
>> 
>> 2)  When trying the kernel mount, the mount command just hangs & after
>> few seconds I see mount error 5 = Input/output error. I am using
>> 4.9.15-040915-generic kernel on Ubuntu 16.x
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> 
>> -Original Message-
>> From: John Spray [mailto:jsp...@redhat.com]
>> Sent: Wednesday, March 22, 2017 6:16 AM
>> To: Deepak Naidu
>> Cc: ceph-users
>> Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or
>> kernel cephfs mount
>> 
>> 
>> 
>>> On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu <dna...@nvidia.com> wrote:
>>> 
>>> Greetings,
>> 
>> 
>> 
>> 
>>> I have below two cephFS “volumes/filesystem” created on my ceph
>> 
>>> cluster. Yes I used the “enable_multiple” flag to enable the multiple
>> 
>>> cephFS feature. My question
>> 
>> 
>> 
>> 
>>> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount
>> 
>>> either using kernel mount of ceph-fuse mount.
>> 
>> 
>> 
>> The option for ceph_fuse is --client_mds_namespace=dataX (you can do this on
>> the command line or in your ceph.conf)
>> 
>> 
>> 
>> With the kernel client use "-o mds_namespace=DataX" (assuming you have a
>> sufficiently recent kernel)
>> 
>> 
>> 
>> Cheers,
>> 
>> John
>> 
>> 
>> 
>> 
>>> 2)  When using kernel / ceph-fuse how do I mention dataX or data1
>>> during
>> 
>>> the fuse mount or kernel mount
>> 
>> 
>> 
>> 
>> 
>> 
>>> [root@Admin ~]# ceph fs ls
>> 
>> 
>>> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>>> name: data1, metadata pool: rcpool_cepfsMeta, data pools:
>> 
>>> [rcpool_cepfsData ]
>> 
>> 
>> 
>> 
>> 
>> 
>>> --
>> 
>> 
>>> Deepak
>> 
>> 
>>> 
>> 
>>> This email message is for the sole use of the intended recipient(s)
>> 
>>> and may contain confidential information.  Any unauthorized review,
>> 
>>> use, disclosure or distribution is prohibited.  If you are not the
>> 
>>> intended recipient, please contact the sender by reply email and
>> 
>>> destroy all copies of the original message.
>> 
>>> 
>> 
>> 
>>> ___
>> 
>>> ceph-users mailing list
>> 
>>> ceph-users@lists.ceph.com
>> 
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>>> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-22 Thread Deepak Naidu
Hi John,



I tried the below option for ceph-fuse & kernel mount. Below is what I 
see/error.



1)  When trying using ceph-fuse, the mount command succeeds but I see parse 
error setting 'client_mds_namespace' to 'dataX' .  Not sure if this is normal 
message or some error

2)  When trying the kernel mount, the mount command just hangs & after few 
seconds I see mount error 5 = Input/output error. I am using 
4.9.15-040915-generic kernel on Ubuntu 16.x



--

Deepak



-Original Message-
From: John Spray [mailto:jsp...@redhat.com]
Sent: Wednesday, March 22, 2017 6:16 AM
To: Deepak Naidu
Cc: ceph-users
Subject: Re: [ceph-users] How to mount different ceph FS using ceph-fuse or 
kernel cephfs mount



On Tue, Mar 21, 2017 at 5:31 PM, Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> wrote:

> Greetings,

>

>

>

> I have below two cephFS “volumes/filesystem” created on my ceph

> cluster. Yes I used the “enable_multiple” flag to enable the multiple

> cephFS feature. My question

>

>

>

> 1)  How do I mention the fs name ie dataX or data1 during cephFS mount

> either using kernel mount of ceph-fuse mount.



The option for ceph_fuse is --client_mds_namespace=dataX (you can do this on 
the command line or in your ceph.conf)



With the kernel client use "-o mds_namespace=DataX" (assuming you have a 
sufficiently recent kernel)



Cheers,

John



>

> 2)  When using kernel / ceph-fuse how do I mention dataX or data1 during

> the fuse mount or kernel mount

>

>

>

>

>

> [root@Admin ~]# ceph fs ls

>

> name: dataX, metadata pool: rcpool_cepfsMeta, data pools:

> [rcpool_cepfsData ]

>

> name: data1, metadata pool: rcpool_cepfsMeta, data pools:

> [rcpool_cepfsData ]

>

>

>

>

>

> --

>

> Deepak

>

> 

> This email message is for the sole use of the intended recipient(s)

> and may contain confidential information.  Any unauthorized review,

> use, disclosure or distribution is prohibited.  If you are not the

> intended recipient, please contact the sender by reply email and

> destroy all copies of the original message.

> 

>

> ___

> ceph-users mailing list

> ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Thanks Brad

--
Deepak

> On Mar 21, 2017, at 9:31 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> 
>> On Wed, Mar 22, 2017 at 10:55 AM, Deepak Naidu <dna...@nvidia.com> wrote:
>> Do we know which version of ceph client does this bug has a fix. Bug:
>> http://tracker.ceph.com/issues/17191
>> 
>> 
>> 
>> I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common-
>> 10.2.6-1(Ubuntu 14.04.5)
> 
> ceph-client is the repository for the ceph kernel client (kernel modules).
> 
> The commits referenced in the tracker above went into upstream kernel 4.9-rc1.
> 
> https://lkml.org/lkml/2016/10/8/110
> 
> I doubt these are available in any CentOS 7.x kernel yet but you could
> check the source.
> 
>> 
>> 
>> 
>> --
>> 
>> Deepak
>> 
>> 
>> This email message is for the sole use of the intended recipient(s) and may
>> contain confidential information.  Any unauthorized review, use, disclosure
>> or distribution is prohibited.  If you are not the intended recipient,
>> please contact the sender by reply email and destroy all copies of the
>> original message.
>> 
>> 
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
> 
> 
> 
> -- 
> Cheers,
> Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Do we know which version of ceph-client has this fix ? http://tracker.ceph.com/issues/17191

2017-03-21 Thread Deepak Naidu
Do we know which version of ceph client does this bug has a fix. Bug: 
http://tracker.ceph.com/issues/17191

I have ceph-common-10.2.6-0 ( on CentOS 7.3.1611) & ceph-fs-common- 
10.2.6-1(Ubuntu 14.04.5)

--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to mount different ceph FS using ceph-fuse or kernel cephfs mount

2017-03-21 Thread Deepak Naidu
Greetings,

I have below two cephFS "volumes/filesystem" created on my ceph cluster. Yes I 
used the "enable_multiple" flag to enable the multiple cephFS feature. My 
question


1)  How do I mention the fs name ie dataX or data1 during cephFS mount 
either using kernel mount of ceph-fuse mount.

2)  When using kernel / ceph-fuse how do I mention dataX or data1 during 
the fuse mount or kernel mount


[root@Admin ~]# ceph fs ls
name: dataX, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_cepfsData ]
name: data1, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_cepfsData ]


--
Deepak

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS mount shows the entire cluster size as apposed to custom-cephfs-pool-size

2017-03-16 Thread Deepak Naidu
Not sure, if this is still true with Jewel CephFS ie

cephfs does not support any type of quota, df always reports entire cluster 
size.

https://www.spinics.net/lists/ceph-users/msg05623.html

--
Deepak

From: Deepak Naidu
Sent: Thursday, March 16, 2017 6:19 PM
To: 'ceph-users'
Subject: CephFS mount shows the entire cluster size as apposed to 
custom-cephfs-pool-size

Greetings,

I am trying to build a CephFS system. Currently I have created my crush map 
which uses only certain OSD & I have pools created out from them. But when I 
mount the cephFS the mount size is my entire ceph cluster size, how is that ?


Ceph cluster & pools

[ceph-admin@storageAdmin ~]$ ceph df
GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
4722G 4721G 928M  0.02
POOLS:
NAME  ID USED %USED MAX AVAIL 
OBJECTS
ecpool_disk1220 0   1199G   
 0
rcpool_disk2 240 0   1499G  
  0
rcpool_cepfsMeta 25 4420  0  76682M
20


CephFS volume & pool

Here data0 is the volume/filesystem name
rcpool_cepfsMeta - is the meta-data pool
rcpool_disk2 - is the data pool

[ceph-admin@storageAdmin ~]$ ceph fs ls
name: data0, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_disk2 ]


Command to mount CephFS
sudo mount -t ceph mon1:6789:/ /mnt/cephfs/ -o 
name=admin,secretfile=admin.secret


Client host df -h output
192.168.1.101:6789:/ 4.7T  928M  4.7T   1% /mnt/cephfs



--
Deepak





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS mount shows the entire cluster size as apposed to custom-cephfs-pool-size

2017-03-16 Thread Deepak Naidu
Greetings,

I am trying to build a CephFS system. Currently I have created my crush map 
which uses only certain OSD & I have pools created out from them. But when I 
mount the cephFS the mount size is my entire ceph cluster size, how is that ?


Ceph cluster & pools

[ceph-admin@storageAdmin ~]$ ceph df
GLOBAL:
SIZE  AVAIL RAW USED %RAW USED
4722G 4721G 928M  0.02
POOLS:
NAME  ID USED %USED MAX AVAIL 
OBJECTS
ecpool_disk1220 0   1199G   
 0
rcpool_disk2 240 0   1499G  
  0
rcpool_cepfsMeta 25 4420  0  76682M
20


CephFS volume & pool

Here data0 is the volume/filesystem name
rcpool_cepfsMeta - is the meta-data pool
rcpool_disk2 - is the data pool

[ceph-admin@storageAdmin ~]$ ceph fs ls
name: data0, metadata pool: rcpool_cepfsMeta, data pools: [rcpool_disk2 ]


Command to mount CephFS
sudo mount -t ceph mon1:6789:/ /mnt/cephfs/ -o 
name=admin,secretfile=admin.secret


Client host df -h output
192.168.1.101:6789:/ 4.7T  928M  4.7T   1% /mnt/cephfs



--
Deepak





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Creating Ceph Pools on different OSD's -- crushmap ?

2017-03-15 Thread Deepak Naidu
Ok, I found this tutorial on crushmap from han. Hopefully I should get my 
structure accomplished using crushmap.

https://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/

--
Deepak

From: Deepak Naidu
Sent: Wednesday, March 15, 2017 12:45 PM
To: ceph-users
Subject: Creating Ceph Pools on different OSD's -- crushmap ?

Hello,

I am trying to address the failure domain & performance/isolation of pools 
based on what OSD they can belong to. Let me give example. Can I achieve this 
with crurshmap ruleset or any other method, if so how?

Example:
10x storage servers each have 3x OSD ie OSD.0 through OSD29 -- Belong to 
Pool0 - This can be replicated pool or ecpool

Similarly,

10x storage servers each have 5x OSD ie OSD.30 through OSD79 -- Belong to 
Pool1 - This can be replicated pool or ecpool


Thanks for any info.

--
Deepak






---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Creating Ceph Pools on different OSD's -- crushmap ?

2017-03-15 Thread Deepak Naidu
Hello,

I am trying to address the failure domain & performance/isolation of pools 
based on what OSD they can belong to. Let me give example. Can I achieve this 
with crurshmap ruleset or any other method, if so how?

Example:
10x storage servers each have 3x OSD ie OSD.0 through OSD29 -- Belong to 
Pool0 - This can be replicated pool or ecpool

Similarly,

10x storage servers each have 5x OSD ie OSD.30 through OSD79 -- Belong to 
Pool1 - This can be replicated pool or ecpool


Thanks for any info.

--
Deepak





---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Deepak Naidu
>> because Jewel will be retired:
Hmm.  Isn't Jewel LTS ? 

Every other stable releases is a LTS (Long Term Stable) and will receive 
updates until two LTS are published. 

--
Deepak

> On Mar 15, 2017, at 10:09 AM, Shinobu Kinjo  wrote:
> 
> It may be probably kind of challenge but please consider Kraken (or
> later) because Jewel will be retired:
> 
> http://docs.ceph.com/docs/master/releases/
> 
>> On Thu, Mar 16, 2017 at 1:48 AM, Shain Miley  wrote:
>> No this is a production cluster that I have not had a chance to upgrade yet.
>> 
>> We had an is with the OS on a node so I am just trying to reinstall ceph and
>> hope that the osd data is still in tact.
>> 
>> Once I get things stable again I was planning on upgrading…but the upgrade
>> is a bit intensive by the looks of it so I need to set aside a decent amount
>> of time.
>> 
>> Thanks all!
>> 
>> Shain
>> 
>> On Mar 15, 2017, at 12:38 PM, Vasu Kulkarni  wrote:
>> 
>> Just curious, why you still want to deploy new hammer instead of stable
>> jewel? Is this a test environment? the last .10 release was basically for
>> bug fixes for 0.94.9.
>> 
>> 
>> 
>>> On Wed, Mar 15, 2017 at 9:16 AM, Shinobu Kinjo  wrote:
>>> 
>>> FYI:
>>> https://plus.google.com/+Cephstorage/posts/HuCaTi7Egg3
>>> 
 On Thu, Mar 16, 2017 at 1:05 AM, Shain Miley  wrote:
 Hello,
 I am trying to deploy ceph to a new server using ceph-deply which I have
 done in the past many times without issue.
 
 Right now I am seeing a timeout trying to connect to git.ceph.com:
 
 
 [hqosd6][INFO  ] Running command: env DEBIAN_FRONTEND=noninteractive
 apt-get
 -q install --assume-yes ca-certificates
 [hqosd6][DEBUG ] Reading package lists...
 [hqosd6][DEBUG ] Building dependency tree...
 [hqosd6][DEBUG ] Reading state information...
 [hqosd6][DEBUG ] ca-certificates is already the newest version.
 [hqosd6][DEBUG ] 0 upgraded, 0 newly installed, 0 to remove and 3 not
 upgraded.
 [hqosd6][INFO  ] Running command: wget -O release.asc
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] --2017-03-15 11:49:16--
 https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Resolving ceph.com (ceph.com)... 158.69.68.141
 [hqosd6][WARNIN] Connecting to ceph.com (ceph.com)|158.69.68.141|:443...
 connected.
 [hqosd6][WARNIN] HTTP request sent, awaiting response... 301 Moved
 Permanently
 [hqosd6][WARNIN] Location:
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [following]
 [hqosd6][WARNIN] --2017-03-15 11:49:17--
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Resolving git.ceph.com (git.ceph.com)... 8.43.84.132
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 [hqosd6][WARNIN]
 [hqosd6][WARNIN] --2017-03-15 11:51:25--  (try: 2)
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 [hqosd6][WARNIN]
 [hqosd6][WARNIN] --2017-03-15 11:53:34--  (try: 3)
 https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc
 [hqosd6][WARNIN] Connecting to git.ceph.com
 (git.ceph.com)|8.43.84.132|:443... failed: Connection timed out.
 [hqosd6][WARNIN] Retrying.
 
 
 I am wondering if this is a known issue.
 
 Just an fyi...I am using an older version of ceph-deply (1.5.36) because
 in
 the past upgrading to a newer version I was not able to install hammer
 on
 the cluster…so the workaround was to use a slightly older version.
 
 Thanks in advance for any help you may be able to provide.
 
 Shain
 
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> 
>> 
>> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.

Re: [ceph-users] Ceph-deploy and git.ceph.com

2017-03-15 Thread Deepak Naidu
I had similar issue when using older version of ceph-deploy. I see the URL 
got.ceph.com  doesn't work on browser as well.

To resolve this, I installed the latest version of ceph-deploy and it worked 
fine. New version wasn't using git.ceph.com.

During ceph-deploy you can mention what version of ceph you want example jewel, 
etc..  


--
Deepak

> On Mar 15, 2017, at 9:06 AM, Shain Miley  wrote:
> 
> s
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Deepak Naidu
I hope you did 1 minute interval of iostat. Based on your iostat & disk info.


· avgrq-sz  is showing 750.49  & avgqu-sz is showing 17.39

· 375.245 KB is your average block size.

· That said, your disk is showing a Quee of 17.39 length. Typically 
higher Q length will increase your disk IO wait whether its read or write.

Hope you have the picture of your IO now & hope this info helps.

>> I tried with fio 64k block size and various io depth ( 1.2.4.8.16….128) and 
>> I can’t reproduce the problem.
Try approx. 375.245 KB & 32 Q-Depth & see what’s your iostat looking, if it’s 
same then that’s what your disk can do.

Now if you want to compare ceph RDB perf. Do the same on a normal block device.

--
Deepak



From: Matteo Dacrema [mailto:mdacr...@enter.eu]
Sent: Tuesday, March 07, 2017 1:17 PM
To: Deepak Naidu
Cc: ceph-users
Subject: Re: [ceph-users] MySQL and ceph volumes

Hi Deepak,

thank you.

Here an example of iostat

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   5.160.002.64   15.740.00   76.45

Device: rrqm/s wrqm/sr/s
w/s   rkB/swkB/s  avgrq-sz  
 avgqu-sz  await   r_awaitw_await  
svctm   %util
vda   0.000.00 0.00 0.00 0.00   
  0.00 0.00 
0.00 0.00 0.00 0.00 0.00 0.00
vdb   0.001.00 96.00   292.00 4944.00   
14065 2.00  750.49 17.39
   43.89   17.79   52.47   2.58 100.00

vdb is the ceph volumes with xfs fs.


Disk /dev/vdb: 2199.0 GB, 219902322 bytes
255 heads, 63 sectors/track, 267349 cylinders, total 4294967296 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x

   Device Boot  Start End  Blocks   Id  System
/dev/vdb1   1  4294967295  2147483647+  ee  GPT

Regards
Matteo

Il giorno 07 mar 2017, alle ore 22:08, Deepak Naidu 
<dna...@nvidia.com<mailto:dna...@nvidia.com>> ha scritto:

My response is without any context to ceph or any SDS, purely how to check the 
IO bottleneck. You can then determine if its Ceph or any other process or disk.

>> MySQL can reach only 150 iops both read or writes showing 30% of IOwait.
Lower IOPS is not issue with itself as your block size might be higher. But 
MySQL doing higher block not sure.  You can check below iostat metrics to see 
why is the IO wait higher.

*  avgqu-sz(Avg queue length)-->  Higher the Q length 
more the IO wait
* avgrq-sz[The average size (in sectors)] -->  Shows IOblock size( check 
this when using mysql). [ you need to calculate this based on your FS block 
size in KB & don’t just you the avgrq-sz # ]


--
Deepak



-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Matteo 
Dacrema
Sent: Tuesday, March 07, 2017 12:52 PM
To: ceph-users
Subject: [ceph-users] MySQL and ceph volumes

Hi All,

I have a galera cluster running on openstack with data on ceph volumes capped 
at 1500 iops for read and write ( 3000 total ).
I can’t understand why with fio I can reach 1500 iops without IOwait and MySQL 
can reach only 150 iops both read or writes showing 30% of IOwait.

I tried with fio 64k block size and various io depth ( 1.2.4.8.16….128) and I 
can’t reproduce the problem.

Anyone can tell me where I’m wrong?

Thank you
Regards
Matteo

___
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

This email message is for the sole use of the intended recipient(s) and may 
contain confidential information.  Any unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not the intended recipient, please 
contact the sender by reply email and destroy all copies of the original 
message.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come 
spam.<http://mx01.enter.it/cgi-bin/learn-msg.cgi?id=DCF01401CF.AE456>
Clicca qui per metterlo in 
blacklist<http://mx01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1=DCF01401CF.AE456>

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MySQL and ceph volumes

2017-03-07 Thread Deepak Naidu
My response is without any context to ceph or any SDS, purely how to check the 
IO bottleneck. You can then determine if its Ceph or any other process or disk.



>> MySQL can reach only 150 iops both read or writes showing 30% of IOwait.

Lower IOPS is not issue with itself as your block size might be higher. But 
MySQL doing higher block not sure.  You can check below iostat metrics to see 
why is the IO wait higher.



*  avgqu-sz(Avg queue length)-->  Higher the Q length 
more the IO wait

* avgrq-sz[The average size (in sectors)] -->  Shows IOblock size( check 
this when using mysql). [ you need to calculate this based on your FS block 
size in KB & don’t just you the avgrq-sz # ]





--

Deepak







-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Matteo 
Dacrema
Sent: Tuesday, March 07, 2017 12:52 PM
To: ceph-users
Subject: [ceph-users] MySQL and ceph volumes



Hi All,



I have a galera cluster running on openstack with data on ceph volumes capped 
at 1500 iops for read and write ( 3000 total ).

I can’t understand why with fio I can reach 1500 iops without IOwait and MySQL 
can reach only 150 iops both read or writes showing 30% of IOwait.



I tried with fio 64k block size and various io depth ( 1.2.4.8.16….128) and I 
can’t reproduce the problem.



Anyone can tell me where I’m wrong?



Thank you

Regards

Matteo



___

ceph-users mailing list

ceph-users@lists.ceph.com

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore zetascale vs rocksdb

2017-02-13 Thread Deepak Naidu
Folks,

Has anyone been using Bluestore with CephFS. If so, did you'll test with 
zetascale vs rocksdb. Any install steps/best practice is appreciated.

PS: I still see that Bluestore is "experimental feature" any timeline, when 
will it be GA stable.

--
Deepak


---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com