Re: [ceph-users] Any CEPH's iSCSI gateway users?

2019-06-11 Thread Glen Baars
Interesting performance increase! I'm Iscsi it at a few installations and now a 
wonder what version of Centos is required to improve performance! Did the 
cluster go from Luminous to Mimic?

Glen

-Original Message-
From: ceph-users  On Behalf Of Heðin 
Ejdesgaard Møller
Sent: Saturday, 8 June 2019 8:00 AM
To: Paul Emmerich ; Igor Podlesny 
Cc: Ceph Users 
Subject: Re: [ceph-users] Any CEPH's iSCSI gateway users?

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

I recently upgraded a RHCS-3.0 cluster with 4 iGW's to RHCS-3.2 on top of RHEL-
7.6
Big block size performance went from ~350MB/s to about 1100MB/s on each lun, 
seen from a VM in vSphere-6.5 with data read from an ssd pool and written to a 
hdd pool, both being 3/2 replica.
I have not experienced any hick-up since the upgrade.
You will always have a degree of performance hit when using the iGW, because 
it's both an extra layer between consumer and hardware, and a potential choke- 
point, just like any "traditional" iSCSI based SAN solution.

If you are considering to deploy the iGW on the upstream bits then I would 
recommend you to stick to CentOS, since a lot of it's development have happened 
on the RHEL platform.

Regards
Heðin Ejdesgaard
Synack sp/f

On frí, 2019-06-07 at 12:44 +0200, Paul Emmerich wrote:
> Hi,
>
> ceph-iscsi 3.0 fixes a lot of problems and limitations of the older gateway.
>
> Best way to run it on Debian/Ubuntu is to build it yourself
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at
> https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
> On Tue, May 28, 2019 at 10:02 AM Igor Podlesny  wrote:
> > What is your experience?
> > Does it make sense to use it -- is it solid enough or beta quality
> > rather (both in terms of stability and performance)?
> >
> > I've read it was more or less packaged to work with RHEL. Does it
> > hold true still?
> > What's the best way to install it on, say, CentOS or Debian/Ubuntu?
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-BEGIN PGP SIGNATURE-

iQIzBAEBCAAdFiEElZWfRQVsNukQFi9Ko80MCbT/An0FAlz6+ocACgkQo80MCbT/
An28EBAA4FlpRYEhFSWm2dfTdYBfFLNJbLrwyMvXOe22sLHwlz3GWMnY2llJ7nyM
YAZy0DZGmujoztBos3eR1A/FB22yr6BYPjC9/f/+8vt3TMhxG5Tm0g/XifJSXaJl
zL8lA3T+XkcMZkphukjhR2BZWioam0ipT07n6+rNdQCaS9/xt7QE7gwWeGQWxKsf
EDY4XWKjiIvyuK4nt2R1raTl9uaW1FI2qM/UoHWyW+ip86syEC1p1HfqWpeU5Mm2
TXRgTVRS4tM91GfciwKdCwZIZjT10POyFfk2DHwMA40lUc8cFCyzj3aAkdJp4U4h
8Wm0QJBzabcuWHfBBJlWRARSGVXKUx08HM3alatO8vum5WSK2w9l5pgyx5H4jM5+
6YABtwvT5lwEiHL9hUoO9HDpyj/IcMzHF5yG5v5PdXCuat7HwNcv6dD2j2dEAgma
HlLRo84PNeHiIn52jSSFGr4O6MQTYei/VMD2IbrDJzjUOFCUOxdX5WsSeFdhF5Zc
LW2rcnLiTcRisxiu3MvJI1kUvGFr1GFmjQI/7MeTXiq2bfQh08LUpM6Cz/ch7iUQ
xo7zUGGuQcOx6iSmagTcMa1QqF8+txCSvCTVvlWdXLAzXsDOJ4mkGe1EWJ2pHjz2
zBcn25Qfws9DEvEww71a/sKp2tlwnKCZgKXhkIBOKyhU7x1dYOI=
=I3pv
-END PGP SIGNATURE-

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
This e-mail is intended solely for the benefit of the addressee(s) and any 
other named recipient. It is confidential and may contain legally privileged or 
confidential information. If you are not the recipient, any use, distribution, 
disclosure or copying of this e-mail is prohibited. The confidentiality and 
legal privilege attached to this communication is not waived or lost by reason 
of the mistaken transmission or delivery to you. If you have received this 
e-mail in error, please notify us immediately.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Learning rig, is it a good idea?

2019-06-11 Thread Inkatadoc
Hi all!

I'm thinking about building a learning rig for ceph. This is the parts list:

PCPartPicker Part List: https://pcpartpicker.com/list/s4vHXP

TL;DR 8 Core 3Ghz Ryzen CPU, 64 Gb RAM, 2Tb x 5 HDDs, 1 240Gb SSD in a
tower case.

My plan is to build a KVM-based setup, both for ceph and workload testing.

This are the usage scenarios I want to achieve:

1. object storage. Have a web browser store data (photos + json metadata)
over the network.
2. posix block storage: mount (ro/rw) those photos to VMs
3. nfs / samba sharing: mount (ro/rw) those photos to external laptops
(osx, linux and windows)
4. bootable block storage: Provision a VM with a bootable disk
5. shared disks: provide an additional disk to a couple of VMs (one in ro,
one rw)
6. grow / shrink the usable storage by adding / removing disks and/or
partitions
7. test for HA by faulting / disabling disks, power cycling the machine.

All this is in preparation for a bigger (0.5Pb+) project. We need to PoC
this and see if it can be done with a reasonable amount of work.

Some questions about all this: Is this rig too much / too little? Should I
scrap the idea and try to do all this in a public cloud somewhere? Do you
think all those scenarios are possible in a single box or do I definitely
need more physical boxes? I've done small software RAID / LVM installs in
the past and some simple BTRFS desktop installations as of late so I'm a
complete ceph n00b.

Any ideas & comments will be greatly appreciated.

Thanks!
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large OMAP object in RGW GC pool

2019-06-11 Thread J. Eric Ivancich
Hi Wido,

Interleaving below

On 6/11/19 3:10 AM, Wido den Hollander wrote:
> 
> I thought it was resolved, but it isn't.
> 
> I counted all the OMAP values for the GC objects and I got back:
> 
> gc.0: 0
> gc.11: 0
> gc.14: 0
> gc.15: 0
> gc.16: 0
> gc.18: 0
> gc.19: 0
> gc.1: 0
> gc.20: 0
> gc.21: 0
> gc.22: 0
> gc.23: 0
> gc.24: 0
> gc.25: 0
> gc.27: 0
> gc.29: 0
> gc.2: 0
> gc.30: 0
> gc.3: 0
> gc.4: 0
> gc.5: 0
> gc.6: 0
> gc.7: 0
> gc.8: 0
> gc.9: 0
> gc.13: 110996
> gc.10: 04
> gc.26: 42
> gc.28: 111292
> gc.17: 111314
> gc.12: 111534
> gc.31: 111956

Casey Bodley mentioned to me that he's seen similar behavior to what
you're describing when RGWs are upgraded but not all OSDs are upgraded
as well. Is it possible that the OSDs hosting gc.13, gc.10, and so forth
are running a different version of ceph?

Eric

-- 
J. Eric Ivancich
he/him/his
Red Hat Storage
Ann Arbor, Michigan, USA
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 10:24 AM Wesley Dillingham
 wrote:
>
> Thanks Jason for the info! A few questions:
>
> "The current rbd-target-api doesn't really support single path LUNs."
>
> In our testing, using single path LUNs, listing the "owner" of a given LUN 
> and then connecting directly to that gateway yielded stable and 
> well-performing results, obviously, there was a SPOF, however for this use 
> case, that is acceptable (not a root fs of a vm, etc) If a SPOF is acceptable 
> is there a particular reason that single path would not be agreeable?

I should clarify: rbd-target-api will configure multiple paths to each
LUN regardless. If you only use the single active path, I guess that's
OK.

> "It currently doesn't have any RBAC style security so I would be weary
> about exposing the current REST API to arbitrary users since you would
> give them full access to do anything"
>
> This is also somewhat of a concern but this is a cluster for a single client 
> who already has full ability to manipulate storage on the legacy system and 
> have been okay. Was planning on network segregating the API so only the given 
> client could communicate with it and also having the gateways run a cephx 
> with permissions only to a particular pool (rbd) and implementing a backup 
> system to offload daily snapshots to a different pool or cluster client does 
> not have capabilities on.
>
> The dashboard feature looks very promising however client would need to 
> interact programmatically, I do intend on experimenting with giving them 
> iscsi role in the nautilus dashboard. I poked at that a bit and am having 
> some trouble getting the dashboard working with iscsi, wondering if the issue 
> is obvious to you:

Fair enough, that would be just using yet another REST API on top of
the other REST API.

> (running 14.2.0 and ceph-iscsi-3.0-57.g4ae)
>
> and configuring the dash as follows:
>
> ceph dashboard set-iscsi-api-ssl-verification false
> ceph dashboard iscsi-gateway-add http://admin:admin@${MY_HOSTNAME}:5000
> systemctl restart ceph-mgr@${MY_HOSTNAME_SHORT}.service
>
> in the dash block/iscsi/target shows:
>
> Unsupported `ceph-iscsi` config version. Expected 8 but found 9.
>

You will need this PR [1] to bump the version support in the
dashboard. It should have been backported to Nautilus as part of
v14.2.2.

> Thanks again.
>
>
>
>
>
>
>
> 
> From: Jason Dillaman 
> Sent: Tuesday, June 11, 2019 9:37 AM
> To: Wesley Dillingham
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] limitations to using iscsi rbd-target-api directly 
> in lieu of gwcli
>
> Notice: This email is from an external sender.
>
>
>
> On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
>  wrote:
> >
> > Hello,
> >
> > I am hoping to expose a REST API to a remote client group who would like to 
> > do things like:
> >
> >
> > Create, List, and Delete RBDs and map them to gateway (make a LUN)
> > Create snapshots, list, delete, and rollback
> > Determine Owner / Active gateway of a given lun
>
> It currently doesn't have any RBAC style security so I would be weary
> about exposing the current REST API to arbitrary users since you would
> give them full access to do anything. The Ceph dashboard in Nautilus
> (and also improved in the master branch) has lots of hooks to
> configure LUNs via the rbd-target-api REST API as another alternative
> to look at.
>
> > I would run 2-4 nodes running rbd-target-gw and rbd-target-api however 
> > client wishes to not use multi-path, wants to connect directly and only to 
> > active gateway for that lun
>
> The current rbd-target-api doesn't really support single path LUNs.
>
> > In order to prevent re-inventing the wheel I was hoping to simply expose 
> > the rbd-target-api directly to client but am wondering if this is 
> > appropriate.
> >
> > My concern is that I am taking gwcli out off the picture by using 
> > rbd-target-api directly and am wondering if the rbd-target-api on its own 
> > is able to propagate changes in the config up to the RADOS configuration 
> > object and thus keep all the gateways in sync.
>
> gwcli just uses rbd-target-api to do the work, and rbd-target-api is
> responsible for keeping the gateways in-sync with each other.
>
> > My other thought was to build a simple and limited in scope api which on 
> > the backend runs gwcli commands.
> >
> > Thank you for clarification on the functionality and appropriate use.
> >
> > Respectfully,
> >
> > Wes Dillingham
> > wdilling...@godaddy.com
> > Site Reliability Engineer IV - Platform Storage / Ceph
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason

[1] https://github.com/ceph/ceph/pull/27448

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Paul Emmerich
On Tue, Jun 11, 2019 at 4:24 PM Wesley Dillingham 
wrote:

> (running 14.2.0 and ceph-iscsi-3.0-57.g4ae)
>
> and configuring the dash as follows:
>
> ceph dashboard set-iscsi-api-ssl-verification false
> ceph dashboard iscsi-gateway-add http://admin:admin@${MY_HOSTNAME}:5000
> systemctl restart ceph-mgr@${MY_HOSTNAME_SHORT}.service
>
> in the dash block/iscsi/target shows:
>
> Unsupported `ceph-iscsi` config version. Expected 8 but found 9.
>
>
> Thanks again.
>

this means your ceph-iscsi version is too new.

There isn't an officially supported way to downgrade it, though, so the
best way is to wait for this version to be supported in the dashboard.

(Or well, the config version 9 change is rather trivial, you can undo it
manually in gateway.conf:
https://github.com/ceph/ceph-iscsi/commit/2d2f26ac54149a0c0d598e0640bcfd75adb69432
)


Paul


>
>
>
>
>
>
>
> --
> *From:* Jason Dillaman 
> *Sent:* Tuesday, June 11, 2019 9:37 AM
> *To:* Wesley Dillingham
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] limitations to using iscsi rbd-target-api
> directly in lieu of gwcli
>
> Notice: This email is from an external sender.
>
>
>
> On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
>  wrote:
> >
> > Hello,
> >
> > I am hoping to expose a REST API to a remote client group who would like
> to do things like:
> >
> >
> > Create, List, and Delete RBDs and map them to gateway (make a LUN)
> > Create snapshots, list, delete, and rollback
> > Determine Owner / Active gateway of a given lun
>
> It currently doesn't have any RBAC style security so I would be weary
> about exposing the current REST API to arbitrary users since you would
> give them full access to do anything. The Ceph dashboard in Nautilus
> (and also improved in the master branch) has lots of hooks to
> configure LUNs via the rbd-target-api REST API as another alternative
> to look at.
>
> > I would run 2-4 nodes running rbd-target-gw and rbd-target-api however
> client wishes to not use multi-path, wants to connect directly and only to
> active gateway for that lun
>
> The current rbd-target-api doesn't really support single path LUNs.
>
> > In order to prevent re-inventing the wheel I was hoping to simply expose
> the rbd-target-api directly to client but am wondering if this is
> appropriate.
> >
> > My concern is that I am taking gwcli out off the picture by using
> rbd-target-api directly and am wondering if the rbd-target-api on its own
> is able to propagate changes in the config up to the RADOS configuration
> object and thus keep all the gateways in sync.
>
> gwcli just uses rbd-target-api to do the work, and rbd-target-api is
> responsible for keeping the gateways in-sync with each other.
>
> > My other thought was to build a simple and limited in scope api which on
> the backend runs gwcli commands.
> >
> > Thank you for clarification on the functionality and appropriate use.
> >
> > Respectfully,
> >
> > Wes Dillingham
> > wdilling...@godaddy.com
> > Site Reliability Engineer IV - Platform Storage / Ceph
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Jason
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] is rgw crypt default encryption key long term supported ?

2019-06-11 Thread Casey Bodley
The server side encryption features all require special x-amz headers on 
write, so they only apply to our S3 apis. But objects encrypted with 
SSE-KMS (or a default encryption key) can be read without any x-amz 
headers, so swift should be able to decrypt them too. I agree that this 
is a bug and opened http://tracker.ceph.com/issues/40257.


On 6/7/19 7:03 AM, Scheurer François wrote:

Hello Casey


We found something weird during our testing of the 
rgw_crypt_default_encryption_key=""xxx"  parameter.

s3cms behaves like expected:
s3cmd is then always writing encrypted objects
s3cmd can read encrypted and unencrypted objects

but swift does not support encryption:
swift can read only unencrypted objects (encrypted objects return error md5sum 
!= etag)
swift is not using encryption during writes (to demonstrate we can remove the 
rgw_crypt_default_encryption_key param and verify that the object is still 
readable).


Is that a bug?

Thank you .


Cheers
Francois



From: Scheurer François
Sent: Wednesday, May 29, 2019 9:28 AM
To: Casey Bodley; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hello Casey


Thank you for your reply.
To close this subject, one last question.

Do you know if it is possible to rotate the key defined by 
"rgw_crypt_default_encryption_key=" ?


Best Regards
Francois Scheurer




From: Casey Bodley 
Sent: Tuesday, May 28, 2019 5:37 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

On 5/28/19 11:17 AM, Scheurer François wrote:

Hi Casey


I greatly appreciate your quick and helpful answer :-)



It's unlikely that we'll do that, but if we do it would be announced with a 
long deprecation period and migration strategy.

Fine, just the answer we wanted to hear ;-)



However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.

sse-kms is working great, no issue or gaps with it.
We already use it in our openstack (rocky) with barbican and ceph/radosgw 
(luminous).

But we have customers that want encryption by default, something like SSE-S3 
(cf. below).
Do you know if there are plans to implement something similar?

I would love to see support for sse-s3. We've talked about building
something around vault (which I think is what minio does?), but so far
nobody has taken it up as a project.

Using dm-crypt would cost too much time for the conversion (72x 8TB SATA 
disks...) .
And dm-crypt is also storing its key on the monitors (cf. 
https://www.spinics.net/lists/ceph-users/msg52402.html).


Best Regards
Francois Scheurer


Amazon SSE-3 description:

https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingServerSideEncryption.html
Protecting Data Using Server-Side Encryption with Amazon S3-Managed Encryption 
Keys (SSE-S3)
Server-side encryption protects data at rest. Amazon S3 encrypts each object 
with a unique key. As an additional safeguard, it encrypts the key itself with 
a master key that it rotates regularly. Amazon S3 server-side encryption uses 
one of the strongest block ciphers available, 256-bit Advanced Encryption 
Standard (AES-256), to encrypt your data.


https://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTencryption.html
The following is an example of the request body for setting SSE-S3.
http://s3.amazonaws.com/doc/2006-03-01/;>

  
  AES256
  











From: Casey Bodley 
Sent: Tuesday, May 28, 2019 3:55 PM
To: Scheurer François; ceph-users@lists.ceph.com
Subject: Re: is rgw crypt default encryption key long term supported ?

Hi François,


Removing support for either of rgw_crypt_default_encryption_key or
rgw_crypt_s3_kms_encryption_keys would mean that objects encrypted with
those keys would no longer be accessible. It's unlikely that we'll do
that, but if we do it would be announced with a long deprecation period
and migration strategy.


However, I would still caution against using either as a strategy for
key management, especially when (as of mimic) the ceph configuration is
centralized in the ceph-mon database [1][2]. If there are gaps in our
sse-kms integration that makes it difficult to use in practice, I'd
really like to address those.


Casey


[1]
https://ceph.com/community/new-mimic-centralized-configuration-management/

[2]
http://docs.ceph.com/docs/mimic/rados/configuration/ceph-conf/#monitor-configuration-database


On 5/28/19 6:39 AM, Scheurer François wrote:

Dear Casey, Dear Ceph Users The following is written in the radosgw
documentation
(http://docs.ceph.com/docs/luminous/radosgw/encryption/): rgw crypt
default encryption key = 

Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Wesley Dillingham
Thanks Jason for the info! A few questions:

"The current rbd-target-api doesn't really support single path LUNs."

In our testing, using single path LUNs, listing the "owner" of a given LUN and 
then connecting directly to that gateway yielded stable and well-performing 
results, obviously, there was a SPOF, however for this use case, that is 
acceptable (not a root fs of a vm, etc) If a SPOF is acceptable is there a 
particular reason that single path would not be agreeable?

"It currently doesn't have any RBAC style security so I would be weary
about exposing the current REST API to arbitrary users since you would
give them full access to do anything"

This is also somewhat of a concern but this is a cluster for a single client 
who already has full ability to manipulate storage on the legacy system and 
have been okay. Was planning on network segregating the API so only the given 
client could communicate with it and also having the gateways run a cephx with 
permissions only to a particular pool (rbd) and implementing a backup system to 
offload daily snapshots to a different pool or cluster client does not have 
capabilities on.

The dashboard feature looks very promising however client would need to 
interact programmatically, I do intend on experimenting with giving them iscsi 
role in the nautilus dashboard. I poked at that a bit and am having some 
trouble getting the dashboard working with iscsi, wondering if the issue is 
obvious to you:

(running 14.2.0 and ceph-iscsi-3.0-57.g4ae)

and configuring the dash as follows:

ceph dashboard set-iscsi-api-ssl-verification false
ceph dashboard iscsi-gateway-add http://admin:admin@${MY_HOSTNAME}:5000
systemctl restart ceph-mgr@${MY_HOSTNAME_SHORT}.service

in the dash block/iscsi/target shows:


Unsupported `ceph-iscsi` config version. Expected 8 but found 9.

Thanks again.








From: Jason Dillaman 
Sent: Tuesday, June 11, 2019 9:37 AM
To: Wesley Dillingham
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] limitations to using iscsi rbd-target-api directly in 
lieu of gwcli

Notice: This email is from an external sender.



On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
 wrote:
>
> Hello,
>
> I am hoping to expose a REST API to a remote client group who would like to 
> do things like:
>
>
> Create, List, and Delete RBDs and map them to gateway (make a LUN)
> Create snapshots, list, delete, and rollback
> Determine Owner / Active gateway of a given lun

It currently doesn't have any RBAC style security so I would be weary
about exposing the current REST API to arbitrary users since you would
give them full access to do anything. The Ceph dashboard in Nautilus
(and also improved in the master branch) has lots of hooks to
configure LUNs via the rbd-target-api REST API as another alternative
to look at.

> I would run 2-4 nodes running rbd-target-gw and rbd-target-api however client 
> wishes to not use multi-path, wants to connect directly and only to active 
> gateway for that lun

The current rbd-target-api doesn't really support single path LUNs.

> In order to prevent re-inventing the wheel I was hoping to simply expose the 
> rbd-target-api directly to client but am wondering if this is appropriate.
>
> My concern is that I am taking gwcli out off the picture by using 
> rbd-target-api directly and am wondering if the rbd-target-api on its own is 
> able to propagate changes in the config up to the RADOS configuration object 
> and thus keep all the gateways in sync.

gwcli just uses rbd-target-api to do the work, and rbd-target-api is
responsible for keeping the gateways in-sync with each other.

> My other thought was to build a simple and limited in scope api which on the 
> backend runs gwcli commands.
>
> Thank you for clarification on the functionality and appropriate use.
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error when I compare hashes of export-diff / import-diff

2019-06-11 Thread ceph



On 6/11/19 3:24 PM, Rafael Diaz Maurin wrote:
> 3- I create a snapshot inside the source pool
> rbd snap create ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP}
> 
> 4- I export the snapshot from the source pool and I import the snapshot
> towards the destination pool (in the pipe)
> rbd export-diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} - | rbd -c ${BACKUP-CLUSTER}
> import-diff - ${POOL-DESTINATION}/${KVM-IMAGE}

Here, you are wrong
If have to export-diff (without --from-snap) to do the "first" export
What you are doing is rebasing the image, on the backup cluster, based
on crap (the dummy snapshot you created on step 3)

So:
1) Create a snapshot of source image
2) Create a dest image if not exists
3) If dest was created, export the source snapshot and import it:
  rbd export-diff  --snap  | rbd import-diff - 
3b) If dest was not created (you then have a shared snapshot between the
source and the dest), export-diff using from-snap:
   rbd export-diff --from-snap   --snap  | rbd
import-diff - 

You can checkout Backurne's code, that does what you want:
https://github.com/JackSlateur/backurne/blob/master/ceph.py#L173

Best regards,

> 
> The problem occurs when I want to validate only the diff between the 2
> snapshots (in order to be more efficient). I note that those hashes are
> differents.
> 
> Here is how I calcultate the hashes :
> Source-hash : rbd diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} --format json | md5sum | cut
> -d ' ' -f 1
> => bc56663b8ff01ec388598037a20861cf
> Destination-hash : rbd -c ${BACKUP-CLUSTER} diff --from-snap
> ${LAST-SNAP} ${POOL-DESTINATION}/${KVM-IMAGE}@${TODAY-SNAP} --format
> json | md5sum | cut -d ' ' -f 1
> => 3aa35362471419abe0a41f222c113096
> 
> In an other hand, if I compare the hashes of the export (between source
> and destination), they are the same :
> 
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
> 
> 
> Can someone has an idea of what's happenning ?
> 
> Can someone has a way to succeed in comparing the export-diff
> /import-diff ?
> 
> 
> 
> 
> Thank you,
> Rafael
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 9:29 AM Wesley Dillingham
 wrote:
>
> Hello,
>
> I am hoping to expose a REST API to a remote client group who would like to 
> do things like:
>
>
> Create, List, and Delete RBDs and map them to gateway (make a LUN)
> Create snapshots, list, delete, and rollback
> Determine Owner / Active gateway of a given lun

It currently doesn't have any RBAC style security so I would be weary
about exposing the current REST API to arbitrary users since you would
give them full access to do anything. The Ceph dashboard in Nautilus
(and also improved in the master branch) has lots of hooks to
configure LUNs via the rbd-target-api REST API as another alternative
to look at.

> I would run 2-4 nodes running rbd-target-gw and rbd-target-api however client 
> wishes to not use multi-path, wants to connect directly and only to active 
> gateway for that lun

The current rbd-target-api doesn't really support single path LUNs.

> In order to prevent re-inventing the wheel I was hoping to simply expose the 
> rbd-target-api directly to client but am wondering if this is appropriate.
>
> My concern is that I am taking gwcli out off the picture by using 
> rbd-target-api directly and am wondering if the rbd-target-api on its own is 
> able to propagate changes in the config up to the RADOS configuration object 
> and thus keep all the gateways in sync.

gwcli just uses rbd-target-api to do the work, and rbd-target-api is
responsible for keeping the gateways in-sync with each other.

> My other thought was to build a simple and limited in scope api which on the 
> backend runs gwcli commands.
>
> Thank you for clarification on the functionality and appropriate use.
>
> Respectfully,
>
> Wes Dillingham
> wdilling...@godaddy.com
> Site Reliability Engineer IV - Platform Storage / Ceph
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Error when I compare hashes of export-diff / import-diff

2019-06-11 Thread Jason Dillaman
On Tue, Jun 11, 2019 at 9:25 AM Rafael Diaz Maurin
 wrote:
>
> Hello,
>
> I have a problem when I want to validate (using md5 hashes) rbd
> export/import diff from a rbd source-pool (the production pool) towards
> another rbd destination-pool (the backup pool).
>
> Here is the algorythm :
> 1- First of all, I validate that the two hashes from lasts snapshots
> source and destination are the same :
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${LAST-SNAP} - | md5sum
> => 3f54626da234730eefc27ef2a3b6ca83
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${LAST-SNAP} - | md5sum
> => 3f54626da234730eefc27ef2a3b6ca83
>
>
> 2- If not exists, I create an empty image in the destination pool
> rbd -c ${BACKUP-CLUSTER} create ${POOL-DESTINATION}/${KVM-IMAGE} -s 1
>
> 3- I create a snapshot inside the source pool
> rbd snap create ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP}
>
> 4- I export the snapshot from the source pool and I import the snapshot
> towards the destination pool (in the pipe)
> rbd export-diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} - | rbd -c ${BACKUP-CLUSTER}
> import-diff - ${POOL-DESTINATION}/${KVM-IMAGE}

What's the actual difference between the "rbd diff" outputs? There is
a known "issue" where object-map will flag an object as dirty if you
had run an fstrim/discard on the image, but it doesn't affect the
actual validity of the data.

> The problem occurs when I want to validate only the diff between the 2
> snapshots (in order to be more efficient). I note that those hashes are
> differents.
>
> Here is how I calcultate the hashes :
> Source-hash : rbd diff --from-snap ${LAST-SNAP}
> ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} --format json | md5sum | cut
> -d ' ' -f 1
> => bc56663b8ff01ec388598037a20861cf
> Destination-hash : rbd -c ${BACKUP-CLUSTER} diff --from-snap
> ${LAST-SNAP} ${POOL-DESTINATION}/${KVM-IMAGE}@${TODAY-SNAP} --format
> json | md5sum | cut -d ' ' -f 1
> => 3aa35362471419abe0a41f222c113096
>
> In an other hand, if I compare the hashes of the export (between source
> and destination), they are the same :
>
> rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
> rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export
> ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
> => 2c4962870fdd67ca758c154760d9df83
>
>
> Can someone has an idea of what's happenning ?
>
> Can someone has a way to succeed in comparing the export-diff /import-diff ?
>
>
>
>
> Thank you,
> Rafael
>
> --
> Rafael Diaz Maurin
> DSI de l'Université de Rennes 1
> Pôle Infrastructures, équipe Systèmes
> 02 23 23 71 57
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] limitations to using iscsi rbd-target-api directly in lieu of gwcli

2019-06-11 Thread Wesley Dillingham
Hello,

I am hoping to expose a REST API to a remote client group who would like to do 
things like:


  *   Create, List, and Delete RBDs and map them to gateway (make a LUN)
  *   Create snapshots, list, delete, and rollback
  *   Determine Owner / Active gateway of a given lun

I would run 2-4 nodes running rbd-target-gw and rbd-target-api however client 
wishes to not use multi-path, wants to connect directly and only to active 
gateway for that lun

In order to prevent re-inventing the wheel I was hoping to simply expose the 
rbd-target-api directly to client but am wondering if this is appropriate.

My concern is that I am taking gwcli out off the picture by using 
rbd-target-api directly and am wondering if the rbd-target-api on its own is 
able to propagate changes in the config up to the RADOS configuration object 
and thus keep all the gateways in sync.

My other thought was to build a simple and limited in scope api which on the 
backend runs gwcli commands.

Thank you for clarification on the functionality and appropriate use.

Respectfully,

Wes Dillingham
wdilling...@godaddy.com
Site Reliability Engineer IV - Platform Storage / Ceph

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Error when I compare hashes of export-diff / import-diff

2019-06-11 Thread Rafael Diaz Maurin

Hello,

I have a problem when I want to validate (using md5 hashes) rbd 
export/import diff from a rbd source-pool (the production pool) towards 
another rbd destination-pool (the backup pool).


Here is the algorythm :
1- First of all, I validate that the two hashes from lasts snapshots 
source and destination are the same :

rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${LAST-SNAP} - | md5sum
=> 3f54626da234730eefc27ef2a3b6ca83
rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export 
${KVM-IMAGE}@${LAST-SNAP} - | md5sum

=> 3f54626da234730eefc27ef2a3b6ca83


2- If not exists, I create an empty image in the destination pool
rbd -c ${BACKUP-CLUSTER} create ${POOL-DESTINATION}/${KVM-IMAGE} -s 1

3- I create a snapshot inside the source pool
rbd snap create ${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP}

4- I export the snapshot from the source pool and I import the snapshot 
towards the destination pool (in the pipe)
rbd export-diff --from-snap ${LAST-SNAP} 
${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} - | rbd -c ${BACKUP-CLUSTER} 
import-diff - ${POOL-DESTINATION}/${KVM-IMAGE}


The problem occurs when I want to validate only the diff between the 2 
snapshots (in order to be more efficient). I note that those hashes are 
differents.


Here is how I calcultate the hashes :
Source-hash : rbd diff --from-snap ${LAST-SNAP} 
${POOL-SOURCE}/${KVM-IMAGE}@${TODAY-SNAP} --format json | md5sum | cut 
-d ' ' -f 1

=> bc56663b8ff01ec388598037a20861cf
Destination-hash : rbd -c ${BACKUP-CLUSTER} diff --from-snap 
${LAST-SNAP} ${POOL-DESTINATION}/${KVM-IMAGE}@${TODAY-SNAP} --format 
json | md5sum | cut -d ' ' -f 1

=> 3aa35362471419abe0a41f222c113096

In an other hand, if I compare the hashes of the export (between source 
and destination), they are the same :


rbd -p ${POOL-SOURCE} export ${KVM-IMAGE}@${TODAY-SNAP} - | md5sum
=> 2c4962870fdd67ca758c154760d9df83
rbd -c ${BACKUP-CLUSTER} -p ${POOL-DESTINATION} export 
${KVM-IMAGE}@${TODAY-SNAP} - | md5sum

=> 2c4962870fdd67ca758c154760d9df83


Can someone has an idea of what's happenning ?

Can someone has a way to succeed in comparing the export-diff /import-diff ?




Thank you,
Rafael

--
Rafael Diaz Maurin
DSI de l'Université de Rennes 1
Pôle Infrastructures, équipe Systèmes
02 23 23 71 57




smime.p7s
Description: Signature cryptographique S/MIME
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-11 Thread John Petrini
I certainly would, particularly on your SSD's. I'm not familiar with
those Toshibas but disabling disk cache has improved performance on my
clusters and others on this list.

Does the LSI controller you're using provide read/write cache and do
you have it enabled? 72k spinners are likely to see a huge performance
gain from controller cache, especially in regards to latency. Only
enable caching if the controller has a battery and make sure to enable
force write-through in the event that the battery fails. If your
controller doesn't have cache you may want to seriously consider
upgrading to controllers that do otherwise those 72k disks are going
to be a major limiting factor in terms of performance.

Regarding your db partition, the latest advice seems to be that your
db should be 2x the biggest layer (at least 60GB) to avoid spillover
to the OSD during compaction. See:
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg54628.html.
With 72k disks you'll want to avoid small writes hitting them directly
if possible, especially if you have no controller cache.

It would be useful to see iowait on your cluster. iostat -x 2 and let
it run for a few cycles while the cluster is busy. If there's high
iowait on your SSD's disabling disk cache may show an improvement. If
there's high iowait on the HDD's, controller cache and/or increasing
your db size may help.




John Petrini
Platforms Engineer
215.297.4400 x 232
www.coredial.com
751 Arbor Way, Hillcrest I, Suite 150 Blue Bell, PA 19422
The information transmitted is intended only for the person or entity
to which it is addressed and may contain confidential and/or
privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this
information by persons or entities other than the intended recipient
is prohibited. If you received this in error, please contact the
sender and delete the material from any computer.


On Tue, Jun 11, 2019 at 3:35 AM Stolte, Felix  wrote:
>
> Hi John,
>
> I have 9 HDDs and 3 SSDs behind a SAS3008 PCI-Express Fusion-MPT SAS-3 from 
> LSI. HDDs are HGST HUH721008AL (8TB, 7200k rpm), SSDs are Toshiba PX05SMB040 
> (400GB). OSDs are bluestore format, 3 HDDs have their wal and db on one SSD 
> (DB Size 50GB, wal 10 GB). I did not change any cache settings.
>
> I disabled cstates which improved performance slightly. Do you suggest to 
> turn off caching on disks?
>
> Regards
> Felix
>
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> -
> -
>
>
> Von: John Petrini 
> Datum: Freitag, 7. Juni 2019 um 15:49
> An: "Stolte, Felix" 
> Cc: Sinan Polat , ceph-users 
> Betreff: Re: [ceph-users] Expected IO in luminous Ceph Cluster
>
> How's iowait look on your disks?
>
> How have you configured your disks and what are your cache settings?
>
> Did you disable cstates?
>
> On Friday, June 7, 2019, Stolte, Felix  wrote:
> > Hi Sinan,
> >
> > thanks for the numbers. I am a little bit surprised that your SSD pool has 
> > nearly the same stats as you SAS pool.
> >
> > Nevertheless I would expect our pools to perform like your SAS pool, at 
> > least regarding to writes since all our write ops should be placed on our 
> > SSDs. But since I only achieve 10% of your numbers I need to figure out my 
> > bottle neck. For now I have no clue. According to our monitoring system 
> > network bandwith, ram or cpu usage is even close to be saturated.
> >
> > Could someone advice me on where to look?
> >
> > Regards Felix
> > -
> > -
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich
> > Sitz der Gesellschaft: Juelich
> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> > Prof. Dr. Sebastian M. Schmidt
> > -
> > -
> >
> >
> > Am 07.06.19, 13:33 schrieb "Sinan Polat" :
> >
> > Hi Felix,
> >
> >  

Re: [ceph-users] ceph monitor keep crash

2019-06-11 Thread Joao Eduardo Luis
On 06/04/2019 07:01 PM, Jianyu Li wrote:
> Hello,
> 
> I have a ceph cluster running over 2 years and the monitor began crash
> since yesterday. I had some flapping OSDs up and down occasionally,
> sometimes I need to rebuild the OSD. I found 3 OSDs are down yesterday,
> they may cause this issue or may not. 
> 
> Ceph Version: 12.2.12, ( upgraded from 12.2.8 not fix the issue)
> I have 5 mon nodes, when I start mon service on the first 2 nodes, they
> are good. Once I start the service on the third node, All 3 nodes begin
> keeping up/down(flapping) due to Aborted in
> OSDMonitor::build_incremental. I also tried to recover monitor from 1
> node(remove other 4 nodes) by injecting monmap, the node keep crash as
> well. 

Please increase debug levels to 'debug_mon = 10', 'debug_paxos = 10',
and send us the log once you have your next crash.

This may be a few things, but I'm guessing your other monitors have a
corrupted store somehow. Were there any hardware failures recently
before the crashes started happening?

  -Joao


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-11 Thread Sakirnth Nagarasa
On 6/11/19 10:42 AM, Igor Podlesny wrote:
> On Tue, 11 Jun 2019 at 14:46, Sakirnth Nagarasa
>  wrote:
>> On 6/7/19 3:35 PM, Jason Dillaman wrote:
> [...]
>>> Can you run "rbd rm --log-to-stderr=true --debug-rbd=20
>>> ${POOLNAME}/${IMAGE}" and provide the logs via pastebin.com?
>>>
 Cheers,
 Sakirnth
>>
>> It is not necessary anymore the remove command worked. The problem was
>> only "rbd info" command. It took approximately one day to remove the
>> cloned image (50 TB) which was not flaten. Why it took so long? The
>> clone command completed within seconds.
>>
>> Thanks,
>> Sakirnth
> 
> Sakirnth,
> 
> previously you've said (statement A): "...
> rbd rm ${POOLNAME}/${IMAGE}
> rbd: error opening image ${IMAGE}: (2) No such file or directory
> ..."
> 
> Now you're saying (statement B): "rm worked and the only issue was
> info command".
> Obviously both statements can't be true at the same time.
> Can you elaborate(?) on that matter so that  themail list users would
> have better understanding.
Yes, of course.

When I tried to remove the image after the interrupt I made a typo in
image definition. I did run the "rbd rm" command only once after the
interrupt until Jason told me to do it again. The removal worked and I
found the typo in history afterwards.

On the other hand I did run "rbd info ${POOLNAME}/${IMAGE}" with the
correct image definition several times (also checked it in history). And
the output was:

rbd: error opening image ${IMAGE}: (2) No such file or directory

Since the image was already in an inconsistent state it is not really an
issue in my opinion.

Cheers,
Sakirnth








___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-11 Thread Igor Podlesny
On Tue, 11 Jun 2019 at 14:46, Sakirnth Nagarasa
 wrote:
> On 6/7/19 3:35 PM, Jason Dillaman wrote:
[...]
> > Can you run "rbd rm --log-to-stderr=true --debug-rbd=20
> > ${POOLNAME}/${IMAGE}" and provide the logs via pastebin.com?
> >
> >> Cheers,
> >> Sakirnth
>
> It is not necessary anymore the remove command worked. The problem was
> only "rbd info" command. It took approximately one day to remove the
> cloned image (50 TB) which was not flaten. Why it took so long? The
> clone command completed within seconds.
>
> Thanks,
> Sakirnth

Sakirnth,

previously you've said (statement A): "...
rbd rm ${POOLNAME}/${IMAGE}
rbd: error opening image ${IMAGE}: (2) No such file or directory
..."

Now you're saying (statement B): "rm worked and the only issue was
info command".
Obviously both statements can't be true at the same time.
Can you elaborate(?) on that matter so that  themail list users would
have better understanding.

-- 
End of message. Next message?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Remove rbd image after interrupt of deletion command

2019-06-11 Thread Sakirnth Nagarasa
On 6/7/19 3:35 PM, Jason Dillaman wrote:
> On Fri, Jun 7, 2019 at 7:22 AM Sakirnth Nagarasa
>  wrote:
>>
>> On 6/6/19 5:09 PM, Jason Dillaman wrote:
>>> On Thu, Jun 6, 2019 at 10:13 AM Sakirnth Nagarasa
>>>  wrote:

 On 6/6/19 3:46 PM, Jason Dillaman wrote:
> Can you run "rbd trash ls --all --long" and see if your image
> is listed?

 No, it is not listed.

 I did run:
 rbd trash ls --all --long ${POOLNAME_FROM_IMAGE}

 Cheers,
 Sakirnth
>>>
>>> Is it listed under "rbd ls ${POOLNAME_FROM_IMAGE}"?
>>
>> Yes that's the point the image is still listed under "rbd ls
>> ${POOLNAME_FROM_IMAGE}". But we can't do any operations with it like
>> showing info or deleting it. The error message is in the first mail.
> 
> Can you run "rbd rm --log-to-stderr=true --debug-rbd=20
> ${POOLNAME}/${IMAGE}" and provide the logs via pastebin.com?
> 
>> Cheers,
>> Sakirnth

It is not necessary anymore the remove command worked. The problem was
only "rbd info" command. It took approximately one day to remove the
cloned image (50 TB) which was not flaten. Why it took so long? The
clone command completed within seconds.

Thanks,
Sakirnth

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expected IO in luminous Ceph Cluster

2019-06-11 Thread Stolte, Felix
Hi John,

I have 9 HDDs and 3 SSDs behind a SAS3008 PCI-Express Fusion-MPT SAS-3 from 
LSI. HDDs are HGST HUH721008AL (8TB, 7200k rpm), SSDs are Toshiba PX05SMB040 
(400GB). OSDs are bluestore format, 3 HDDs have their wal and db on one SSD (DB 
Size 50GB, wal 10 GB). I did not change any cache settings. 

I disabled cstates which improved performance slightly. Do you suggest to turn 
off caching on disks?

Regards
Felix

-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-
-


Von: John Petrini 
Datum: Freitag, 7. Juni 2019 um 15:49
An: "Stolte, Felix" 
Cc: Sinan Polat , ceph-users 
Betreff: Re: [ceph-users] Expected IO in luminous Ceph Cluster

How's iowait look on your disks? 

How have you configured your disks and what are your cache settings? 

Did you disable cstates? 

On Friday, June 7, 2019, Stolte, Felix  wrote:
> Hi Sinan,
>
> thanks for the numbers. I am a little bit surprised that your SSD pool has 
> nearly the same stats as you SAS pool.
>
> Nevertheless I would expect our pools to perform like your SAS pool, at least 
> regarding to writes since all our write ops should be placed on our SSDs. But 
> since I only achieve 10% of your numbers I need to figure out my bottle neck. 
> For now I have no clue. According to our monitoring system network bandwith, 
> ram or cpu usage is even close to be saturated.
>
> Could someone advice me on where to look?
>
> Regards Felix
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
> -
> -
>
>
> Am 07.06.19, 13:33 schrieb "Sinan Polat" :
>
>     Hi Felix,
>
>     I have 2 Pools, a SSD only and a SAS only pool.
>
>     SSD pool is spread over 12 OSD servers.
>     SAS pool is spread over 6 OSD servers.
>
>
>     See results (SSD Only Pool):
>
>     # sysbench --file-fsync-freq=1 --threads=16 fileio --file-total-size=1G
>     --file-test-mode=rndrw --file-rw-ratio=2 run
>     sysbench 1.0.17 (using system LuaJIT 2.0.4)
>
>     Running the test with following options:
>     Number of threads: 16
>     Initializing random number generator from current time
>
>
>     Extra file open flags: (none)
>     128 files, 8MiB each
>     1GiB total file size
>     Block size 16KiB
>     Number of IO requests: 0
>     Read/Write ratio for combined random IO test: 2.00
>     Periodic FSYNC enabled, calling fsync() each 1 requests.
>     Calling fsync() at the end of test, Enabled.
>     Using synchronous I/O mode
>     Doing random r/w test
>     Initializing worker threads...
>
>     Threads started!
>
>
>     File operations:
>         reads/s:                      508.38
>         writes/s:                     254.19
>         fsyncs/s:                     32735.14
>
>     Throughput:
>         read, MiB/s:                  7.94
>         written, MiB/s:               3.97
>
>     General statistics:
>         total time:                          10.0103s
>         total number of events:              36
>
>     Latency (ms):
>              min:                                    0.00
>              avg:                                    0.48
>              max:                                   10.18
>              95th percentile:                        2.11
>              sum:                               159830.07
>
>     Threads fairness:
>         events (avg/stddev):           20833.5000/335.70
>         execution time (avg/stddev):   9.9894/0.00
>     #
>
>     See results (SAS Only Pool):
>     # sysbench --file-fsync-freq=1 --threads=16 fileio --file-total-size=1G
>     --file-test-mode=rndrw --file-rw-ratio=2 run
>     sysbench 1.0.17 (using system LuaJIT 2.0.4)
>
>     Running the test with following options:
>     Number of threads: 16
>     

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-11 Thread BASSAGET Cédric
Hello Robert,
I did not make any changes, so I'm still using the prio queue.
Regards

Le lun. 10 juin 2019 à 17:44, Robert LeBlanc  a
écrit :

> I'm glad it's working, to be clear did you use wpq, or is it still the
> prio queue?
>
> Sent from a mobile device, please excuse any typos.
>
> On Mon, Jun 10, 2019, 4:45 AM BASSAGET Cédric <
> cedric.bassaget...@gmail.com> wrote:
>
>> an update from 12.2.9 to 12.2.12 seems to have fixed the problem !
>>
>> Le lun. 10 juin 2019 à 12:25, BASSAGET Cédric <
>> cedric.bassaget...@gmail.com> a écrit :
>>
>>> Hi Robert,
>>> Before doing anything on my prod env, I generate r/w on ceph cluster
>>> using fio .
>>> On my newest cluster, release 12.2.12, I did not manage to get
>>> the (REQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio
>>> ran from 4 diffrent hosts)
>>>
>>> On my prod cluster, release 12.2.9, as soon as I run fio on a single
>>> host, I see a lot of REQUEST_SLOW warninr gmessages, but "iostat -xd 1"
>>> does not show me a usage more that 5-10% on disks...
>>>
>>> Le lun. 10 juin 2019 à 10:12, Robert LeBlanc  a
>>> écrit :
>>>
 On Mon, Jun 10, 2019 at 1:00 AM BASSAGET Cédric <
 cedric.bassaget...@gmail.com> wrote:

> Hello Robert,
> My disks did not reach 100% on the last warning, they climb to 70-80%
> usage. But I see rrqm / wrqm counters increasing...
>
> Device: rrqm/s   wrqm/s r/s w/srkB/swkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
>
> sda   0.00 4.000.00   16.00 0.00   104.00
>  13.00 0.000.000.000.00   0.00   0.00
> sdb   0.00 2.001.00 3456.00 8.00 25996.00
>  15.04 5.761.670.001.67   0.03   9.20
> sdd   4.00 0.00 41462.00 1119.00 331272.00  7996.00
>  15.9419.890.470.480.21   0.02  66.00
>
> dm-0  0.00 0.00 6825.00  503.00 330856.00  7996.00
>  92.48 4.000.550.560.30   0.09  66.80
> dm-1  0.00 0.001.00 1129.00 8.00 25996.00
>  46.02 1.030.910.000.91   0.09  10.00
>
>
> sda is my system disk (SAMSUNG   MZILS480HEGR/007  GXL0), sdb and sdd
> are my OSDs
>
> would "osd op queue = wpq" help in this case ?
> Regards
>

 Your disk times look okay, just a lot more unbalanced than I would
 expect. I'd give wpq a try, I use it all the time, just be sure to also
 include the op_cutoff setting too or it doesn't have much effect. Let me
 know how it goes.
 
 Robert LeBlanc
 PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1

>>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large OMAP object in RGW GC pool

2019-06-11 Thread Wido den Hollander



On 6/4/19 8:00 PM, J. Eric Ivancich wrote:
> On 6/4/19 7:37 AM, Wido den Hollander wrote:
>> I've set up a temporary machine next to the 13.2.5 cluster with the
>> 13.2.6 packages from Shaman.
>>
>> On that machine I'm running:
>>
>> $ radosgw-admin gc process
>>
>> That seems to work as intended! So the PR seems to have fixed it.
>>
>> Should be fixed permanently when 13.2.6 is officially released.
>>
>> Wido
> 
> Thank you, Wido, for sharing the results of your experiment. I'm happy
> to learn that it was successful. And v13.2.6 was just released about 2
> hours ago.
> 

I thought it was resolved, but it isn't.

I counted all the OMAP values for the GC objects and I got back:

gc.0: 0
gc.11: 0
gc.14: 0
gc.15: 0
gc.16: 0
gc.18: 0
gc.19: 0
gc.1: 0
gc.20: 0
gc.21: 0
gc.22: 0
gc.23: 0
gc.24: 0
gc.25: 0
gc.27: 0
gc.29: 0
gc.2: 0
gc.30: 0
gc.3: 0
gc.4: 0
gc.5: 0
gc.6: 0
gc.7: 0
gc.8: 0
gc.9: 0
gc.13: 110996
gc.10: 04
gc.26: 42
gc.28: 111292
gc.17: 111314
gc.12: 111534
gc.31: 111956

So as you can see a few remain.

I ran:

$ radosgw-admin gc process --debug-rados=10

That finishes within 10 seconds. Then I tried:

$ radosgw-admin gc process --debug-rados=10 --include-all

That also finishes within 10 seconds.

What I noticed in the logs was this:

2019-06-11 09:06:58.711 7f8ffb876240 10 librados: call oid=gc.17 nspace=
2019-06-11 09:06:58.717 7f8ffb876240 10 librados: Objecter returned from
call r=-16

The return value is '-16' for gc.17 where for gc.18 or any other object
with 0 OMAP values it is:

2019-06-11 09:06:58.717 7f8ffb876240 10 librados: call oid=gc.18 nspace=
2019-06-11 09:06:58.720 7f8ffb876240 10 librados: Objecter returned from
call r=0

So I set --debug-rgw=10

RGWGC::process failed to acquire lock on gc.17

I haven't tried stopping all the RGWs yet as that will impact the
services, but might that be the root-cause here?

Wido

> Eric
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com