Re: [ceph-users] rgw gives MethodNotAllowed for OPTIONS?

2018-02-12 Thread Piers Haken
So I put a, nginx proxy in front of rgw since I couldn't find any definitive 
answer on whether or not it allows OPTIONS. Now the browser is doing a POST, 
and it's still getting a MethodNotAllowed response. This is a fresh install of 
luminous. Is this an rgw error or a civetweb error? I found an option to allow 
civetweb to accept different methods, but it doesn't seem to be supported in 
1.8 (which is the version I think luminous is using). Do I need to update this 
somehow?

POST http://storage-test01:7480/ HTTP/1.1
Host: storage-test01:7480
Connection: keep-alive
Content-Length: 1877
Pragma: no-cache
Cache-Control: no-cache
Origin: http://localhost:3032
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Content-Type: multipart/form-data; 
boundary=WebKitFormBoundaryG7wgmPygxDHGHx7F
Accept: */*
Referer: http://localhost:3032/Project/157/Assets
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,es;q=0.8

--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="key"

7276e4a8-dfa5-4bc1-8289-287a0524acf6.txt
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="Content-Type"

text/plain
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="success_action_status"

200
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="x-amz-server-side-encryption"

AES256
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="acl"

private
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="x-amz-meta-qqfilename"

asd.txt
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="AWSAccessKeyId"


--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="policy"

eyAgImV4cGlyYXRpb24iOiAiMjAxOC0wMi0xM1QwMjo0NToxMC4wNjJaIiwgICJjb25kaXRpb25zIjogWyAgICB7ICAgICAgImFjbCI6ICJwcml2YXRlIiAgICB9LCAgICB7ICAgICAgImJ1Y2tldCI6ICJkZmNuLXVwbG9hZHMtZGV2IiAgICB9LCAgICB7ICAgICAgIkNvbnRlbnQtVHlwZSI6ICJ0ZXh0L3BsYWluIiAgICB9LCAgICB7ICAgICAgInN1Y2Nlc3NfYWN0aW9uX3N0YXR1cyI6ICIyMDAiICAgIH0sICAgIHsgICAgICAieC1hbXotc2VydmVyLXNpZGUtZW5jcnlwdGlvbiI6ICJBRVMyNTYiICAgIH0sICAgIHsgICAgICAia2V5IjogIjcyNzZlNGE4LWRmYTUtNGJjMS04Mjg5LTI4N2EwNTI0YWNmNi50eHQiICAgIH0sICAgIHsgICAgICAieC1hbXotbWV0YS1xcWZpbGVuYW1lIjogImFzZC50eHQiICAgIH0sICAgIFsgICAgICAiY29udGVudC1sZW5ndGgtcmFuZ2UiLCAgICAgICIwIiwgICAgICAiMTA0ODU3NjAwIiAgICBdICBdfQ==
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="signature"

ERz2TI4ZwMK9d6uzQwoRJNPAnGY=
--WebKitFormBoundaryG7wgmPygxDHGHx7F
Content-Disposition: form-data; name="file"; filename="asd.txt"
Content-Type: text/plain

aspdjaopsdjkoijasd
asodijaosdijoaisd

--WebKitFormBoundaryG7wgmPygxDHGHx7F--







HTTP/1.1 405 Method Not Allowed
Server: nginx/1.10.3
Date: Tue, 13 Feb 2018 02:40:17 GMT
Content-Type: application/xml
Content-Length: 191
Connection: keep-alive
x-amz-request-id: tx4-005a825011-1489c-default
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET, POST, PUT, OPTIONS
Access-Control-Allow-Headers: 
authorization,x-amz-content-sha256,x-amz-date,content-md5,content-type
Access-Control-Expose-Headers: ETag

MethodNotAllowedtx4-005a825011-1489c-default1489c-default-default


From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Piers 
Haken
Sent: Monday, February 12, 2018 5:17 PM
To: ceph-users@lists.ceph.com
Subject: [ceph-users] rgw gives MethodNotAllowed for OPTIONS?

I'm trying to do direct-from-browser upload to rgw using pre-signed urls, and 
I'm getting stuck because the browser is doing a pre-flight OPTIONS request and 
rgw is giving me a MethodNotAllowed response.

Is this supported?

OPTIONS http://storage-test01:7480/ HTTP/1.1
Host: storage-test01:7480
Connection: keep-alive
Origin: http://localhost:3032
Access-Control-Request-Method: POST
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,es;q=0.8


HTTP/1.1 405 Method Not Allowed
Content-Length: 189
x-amz-request-id: txf-005a823b66-5e3e-default
Accept-Ranges: bytes
Content-Type: application/xml
Date: Tue, 13 Feb 2018 01:12:06 GMT
Connection: Keep-Alive

MethodNotAllowedtxf-005a823b66-5e3e-default5e3e-default-default

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw gives MethodNotAllowed for OPTIONS?

2018-02-12 Thread Piers Haken
I'm trying to do direct-from-browser upload to rgw using pre-signed urls, and 
I'm getting stuck because the browser is doing a pre-flight OPTIONS request and 
rgw is giving me a MethodNotAllowed response.

Is this supported?

OPTIONS http://storage-test01:7480/ HTTP/1.1
Host: storage-test01:7480
Connection: keep-alive
Origin: http://localhost:3032
Access-Control-Request-Method: POST
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9,es;q=0.8


HTTP/1.1 405 Method Not Allowed
Content-Length: 189
x-amz-request-id: txf-005a823b66-5e3e-default
Accept-Ranges: bytes
Content-Type: application/xml
Date: Tue, 13 Feb 2018 01:12:06 GMT
Connection: Keep-Alive

MethodNotAllowedtxf-005a823b66-5e3e-default5e3e-default-default

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous source packages

2018-02-12 Thread Mike O'Connor
On 13/02/2018 11:19 AM, Brad Hubbard wrote:
> On Tue, Feb 13, 2018 at 10:23 AM, Mike O'Connor  wrote:
>> Hi All
>>
>> Where can I find the source packages that the Proxmox Ceph Luminous was
>> built from ?
> You can find any source packages we release on http://download.ceph.com/
>
> You'd have to ask Proxmox which one they used and whether they modified it.
>
>>
>> Mike
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
oops I mean to ask this question on the Proxmox list not Ceph :)

Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph luminous source packages

2018-02-12 Thread Brad Hubbard
On Tue, Feb 13, 2018 at 10:23 AM, Mike O'Connor  wrote:
> Hi All
>
> Where can I find the source packages that the Proxmox Ceph Luminous was
> built from ?

You can find any source packages we release on http://download.ceph.com/

You'd have to ask Proxmox which one they used and whether they modified it.

>
>
> Mike
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Cheers,
Brad
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Understanding/correcting sudden onslaught of unfound objects

2018-02-12 Thread Graham Allan

Hi,

For the past few weeks I've been seeing a large number of pgs on our 
main erasure coded pool being flagged inconsistent, followed by them 
becoming active+recovery_wait+inconsistent with unfound objects. The 
cluster is currently running luminous 12.2.2 but has in the past also 
run its way through firefly, hammer and jewel.


Here's a sample object from "ceph pg list_missing" (there are 150 
unfound objects in this particular pg):


ceph health detail shows:

pg 70.467 is stuck unclean for 1004525.715896, current state 
active+recovery_wait+inconsistent, last acting [449,233,336,323,259,193]


ceph pg 70.467 list_missing:

{
"oid": {
"oid": 
"default.323253.6_20150226/Downloads/linux-nvme-HEAD-5aa2ffa/include/config/via/fir.h",
"key": "",
"snapid": -2,
"hash": 628294759,
"max": 0,
"pool": 70,
"namespace": ""
},
"need": "73222'132227",
"have": "0'0",
"flags": "none",
"locations": [
"193(5)",
"259(4)",
"449(0)"
]
},


When I trace through the filesystem on each OSD, I find the associated 
file present on each OSD but with size 0 bytes.


Interestingly, for the 3 OSDs for which "list_missing" shows locations 
above (193,259,449), the timestamp of the 0-byte file is recent (within 
last few weeks). For the other 3 OSDs (233,336,323), it's in the distant 
past (08/2015 and 02/2016).


All the unfound objects I've checked on this pg show the same pattern, 
along with the "have" epoch showing as "0'0".


Other than the potential data loss being disturbing, I wonder why this 
showed up so suddenly?


It seems to have been triggered by one OSD host failing over a long 
weekend. By the time we looked at it on Monday, the cluster had 
re-balanced enough data that I decided to simply leave it - we had long 
wanted to evacuate a first host to convert to a newer OS release, as 
well as Bluestore. Perhaps this was a bad choice, but the cluster 
recovery appeared to be proceeding normally, and was apparently complete 
a few days later. It was only around a week later that the unfound 
objects started.


All the unfound object file fragments I've tracked down so far have 
their older members with timestamps in the same mid-2015 to mid-2016 
period. I could be wrong but this really seems like a long-standing 
problem has just been unearthed. I wonder if it could be connected to 
this thread from early 2016, concerning a problem on the same cluster:


http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008120.html

It's a long thread, but the 0-byte files sound very like the "orphaned 
files" in that thread - related to performing a directory split while 
handling links on a filename with the special long filename handling...


http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008317.html

However unlike that thread, I'm not finding any other files with 
duplicate names in the hierarchy.


I'm not sure there's much else I can do besides record the names of any 
unfound objects before resorting to "mark_unfound_lost delete" - any 
suggestions for further research?


Thanks,

Graham
--
Graham Allan
Minnesota Supercomputing Institute - g...@umn.edu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd feature overheads

2018-02-12 Thread Blair Bethwaite
Thanks Ilya,

We can probably handle ~6.2MB for a 100TB volume. Is it reasonable to
expect a librbd client such as QEMU to only hold one object-map per guest?

Cheers,

On 12 February 2018 at 21:01, Ilya Dryomov  wrote:

> On Mon, Feb 12, 2018 at 6:25 AM, Blair Bethwaite
>  wrote:
> > Hi all,
> >
> > Wondering if anyone can clarify whether there are any significant
> overheads
> > from rbd features like object-map, fast-diff, etc. I'm interested in both
> > performance overheads from a latency and space perspective, e.g., can
> > object-map be sanely deployed on a 100TB volume or does the client try to
> > read the whole thing into memory...?
>
> Yes, it does.  Enabling object-map on images larger than 1PB isn't
> allowed for exactly that reason.  The memory overhead is 2 bits per
> object, i.e. 64K per 1TB assuming the default object size.
>
> object-map also depends on exclusive-lock, which is bad for use cases
> where sharing the same image between multiple clients is a requirement.
>
> Once object-map is enabled, fast-diff is virtually no overhead.
>
> Thanks,
>
> Ilya
>



-- 
Cheers,
~Blairo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph luminous source packages

2018-02-12 Thread Mike O'Connor
Hi All

Where can I find the source packages that the Proxmox Ceph Luminous was
built from ?


Mike

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mgr[influx] Cannot transmit statistics: influxdb python module not found.

2018-02-12 Thread Marc Roos
why not use collectd? centos7 rpms  should do fine.



On Feb 12, 2018 9:50 PM, Benjeman Meekhof  wrote:
>
> In our case I think we grabbed the SRPM from Fedora and rebuilt it on 
> Scientific Linux (another RHEL derivative).  Presumably the binary 
> didn't work or I would have installed it directly.  I'm not quite sure 
> why it hasn't migrated to EPEL yet. 
>
> I haven't tried the SRPM for latest releases, we're actually quite far 
> behind the current python-influx version since I built it a while back 
> but if I were you I'd grab whatever SRPM gets you the latest 
> python-influxdb release and give it a try. 
>
> http://rpmfind.net/linux/rpm2html/search.php?query=python-influxdb 
>
> thanks, 
> Ben 
>
> On Mon, Feb 12, 2018 at 11:03 AM,   wrote: 
> > Dear all, 
> > 
> > I'd like to store ceph luminous metrics into influxdb. It seems like influx 
> > plugin has been already backported for lumious: 
> > rpm -ql ceph-mgr-12.2.2-0.el7.x86_64|grep -i influx 
> > /usr/lib64/ceph/mgr/influx 
> > /usr/lib64/ceph/mgr/influx/__init__.py 
> > /usr/lib64/ceph/mgr/influx/__init__.pyc 
> > /usr/lib64/ceph/mgr/influx/__init__.pyo 
> > /usr/lib64/ceph/mgr/influx/module.py 
> > /usr/lib64/ceph/mgr/influx/module.pyc 
> > /usr/lib64/ceph/mgr/influx/module.pyo 
> > 
> > So following http://docs.ceph.com/docs/master/mgr/influx/ doc I enabled 
> > influx plugin by executing the following command on mgr node: 
> > ceph mgr module enable influx 
> > 
> > but in ceph log I see the following error: 
> > 2018-02-12 15:51:31.241854 7f95e7942600  0 ceph version 12.2.2 
> > (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process 
> > (unknown), pid 96425 
> > [] 
> > 2018-02-12 15:51:31.422414 7f95dec29700  1 mgr init Loading python module 
> > 'influx' 
> > [] 
> > 2018-02-12 15:51:32.227206 7f95c36ec700  1 mgr load Constructed class from 
> > module: influx 
> > [] 
> > 2018-02-12 15:51:32.228163 7f95c0ee7700  0 mgr[influx] Cannot transmit 
> > statistics: influxdb python module not found.  Did you install it? 
> > 
> > Indeed there is no python-influxdb module install on my mgr node (CentOS 7 
> > x64) but yum search can't find it with the following repos enabled: 
> > repo id 
> > repo name   status 
> > Ceph/x86_64 
> > Ceph packages for x86_64 
> > Ceph-noarch 
> > Ceph noarch packages 
> > base/7/x86_64 CentOS-7 - Base 
> > ceph-source 
> > Ceph source packages 
> > epel/x86_64 
> > Extra Packages for Enterprise Linux 7 - x86_64 
> > extras/7/x86_64 CentOS-7 - Extras 
> > updates/7/x86_64 CentOS-7 - Updates 
> > 
> > Python version is 2.7.5. 
> > 
> > Is 'pip install' the only way to go or there is still some option to have 
> > required python module via rpm? I wonder how other people deals with that 
> > issue? 
> > ___ 
> > ceph-users mailing list 
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> ___ 
> ceph-users mailing list 
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

2018-02-12 Thread Bryan Banister
Hi Janne and others,

We used the “ceph osd reweight-by-utilization “ command to move a small amount 
of data off of the top four OSDs by utilization.  Then we updated the pg_num 
and pgp_num on the pool from 512 to 1024 which started moving roughly 50% of 
the objects around as a result.  The unfortunate issue is that the weights on 
the OSDs are still roughly equivalent and the OSDs that are nearfull were still 
getting allocated objects during the rebalance backfill operations.

At this point I have made some massive changes to the weights of the OSDs in an 
attempt to stop Ceph from allocating any more data to OSDs that are getting 
close to full.  Basically the OSD with the lowest utilization remains weighted 
at 1 and the rest of the OSDs are now reduced in weight based on the percent 
usage of the OSD + the %usage of the OSD with the amount of data (21% at the 
time).  This means the OSD that is at the most full at this time at 86% full 
now has a weight of only .33 (it was at 89% when reweight was applied).  I’m 
not sure this is a good idea, but it seemed like the only option I had.  Please 
let me know if I’m making a bad situation worse!

I still have the question on how this happened in the first place and how to 
prevent it from happening going forward without a lot of monitoring and 
reweighting on weekends/etc to keep things balanced.  It sounds like Ceph is 
really expecting that objects stored into a pool will roughly have the same 
size, is that right?

Our backups going into this pool have very large variation in size, so would it 
be better to create multiple pools based on expected size of objects and then 
put backups of similar size into each pool?

The backups also have basically the same names with the only difference being 
the date which it was taken (e.g. backup name difference in subsequent days can 
be one digit at times), so does this mean that large backups with basically the 
same name will end up being placed in the same PGs based on the CRUSH 
calculation using the object name?

Thanks,
-Bryan

From: Janne Johansson [mailto:icepic...@gmail.com]
Sent: Wednesday, January 31, 2018 9:34 AM
To: Bryan Banister 
Cc: Ceph Users 
Subject: Re: [ceph-users] Help rebalancing OSD usage, Luminus 1.2.2

Note: External Email



2018-01-31 15:58 GMT+01:00 Bryan Banister 
>:


Given that this will move data around (I think), should we increase the pg_num 
and pgp_num first and then see how it looks?


I guess adding pgs and pgps will move stuff around too, but if the PGCALC 
formula says you should have more then that would still be a good
start. Still, a few manual reweights first to take the 85-90% ones down might 
be good, some move operations are going to refuse adding things
to too-full OSDs, so you would not want to get accidentally bumped above such a 
limit due to some temp-data being created during moves.

Also, dont bump pgs like crazy, you can never move down. Aim for getting ~100 
per OSD at most, and perhaps even then in smaller steps so
that the creation (and evening out of data to the new empty PGs) doesn't kill 
normal client I/O perf in the meantime.

--
May the most significant bit of your life be positive.



Note: This email is for the confidential use of the named addressee(s) only and 
may contain proprietary, confidential or privileged information. If you are not 
the intended recipient, you are hereby notified that any review, dissemination 
or copying of this email is strictly prohibited, and to please notify the 
sender immediately and destroy this email and any attachments. Email 
transmission cannot be guaranteed to be secure or error-free. The Company, 
therefore, does not make any guarantees as to the completeness or accuracy of 
this email or any attachments. This email is for informational purposes only 
and does not constitute a recommendation, offer, request or solicitation of any 
kind to buy, sell, subscribe, redeem or perform any type of transaction of a 
financial product.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] [rgw] Underscore at the beginning of access key not works after upgrade Jewel->Luminous

2018-02-12 Thread Rudenko Aleksandr
Hi friends,

I have rgw-user (_sc) with with the same access key:

radosgw-admin metadata user info --uid _sc
{
"user_id": "_sc",
"display_name": "_sc",
"email": "",
"suspended": 0,
"max_buckets": 0,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "_sc",
"access_key": "_sc",
"secret_key": "Sbg6C7wkSK+jO2t3D\/719A"
}
],
….


Everything works fine on Jewel.
But, after upgrade to Luminous this user receives “InvalidAccessKeyId”.


Radosgw-admin says:

radosgw-admin metadata user info --uid _sc
could not fetch user info: no user info saved

why it's happens?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mgr[influx] Cannot transmit statistics: influxdb python module not found.

2018-02-12 Thread Benjeman Meekhof
In our case I think we grabbed the SRPM from Fedora and rebuilt it on
Scientific Linux (another RHEL derivative).  Presumably the binary
didn't work or I would have installed it directly.  I'm not quite sure
why it hasn't migrated to EPEL yet.

I haven't tried the SRPM for latest releases, we're actually quite far
behind the current python-influx version since I built it a while back
but if I were you I'd grab whatever SRPM gets you the latest
python-influxdb release and give it a try.

http://rpmfind.net/linux/rpm2html/search.php?query=python-influxdb

thanks,
Ben

On Mon, Feb 12, 2018 at 11:03 AM,   wrote:
> Dear all,
>
> I'd like to store ceph luminous metrics into influxdb. It seems like influx
> plugin has been already backported for lumious:
> rpm -ql ceph-mgr-12.2.2-0.el7.x86_64|grep -i influx
> /usr/lib64/ceph/mgr/influx
> /usr/lib64/ceph/mgr/influx/__init__.py
> /usr/lib64/ceph/mgr/influx/__init__.pyc
> /usr/lib64/ceph/mgr/influx/__init__.pyo
> /usr/lib64/ceph/mgr/influx/module.py
> /usr/lib64/ceph/mgr/influx/module.pyc
> /usr/lib64/ceph/mgr/influx/module.pyo
>
> So following http://docs.ceph.com/docs/master/mgr/influx/ doc I enabled
> influx plugin by executing the following command on mgr node:
> ceph mgr module enable influx
>
> but in ceph log I see the following error:
> 2018-02-12 15:51:31.241854 7f95e7942600  0 ceph version 12.2.2
> (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process
> (unknown), pid 96425
> []
> 2018-02-12 15:51:31.422414 7f95dec29700  1 mgr init Loading python module
> 'influx'
> []
> 2018-02-12 15:51:32.227206 7f95c36ec700  1 mgr load Constructed class from
> module: influx
> []
> 2018-02-12 15:51:32.228163 7f95c0ee7700  0 mgr[influx] Cannot transmit
> statistics: influxdb python module not found.  Did you install it?
>
> Indeed there is no python-influxdb module install on my mgr node (CentOS 7
> x64) but yum search can't find it with the following repos enabled:
> repo id
> repo name   status
> Ceph/x86_64
> Ceph packages for x86_64
> Ceph-noarch
> Ceph noarch packages
> base/7/x86_64 CentOS-7 - Base
> ceph-source
> Ceph source packages
> epel/x86_64
> Extra Packages for Enterprise Linux 7 - x86_64
> extras/7/x86_64 CentOS-7 - Extras
> updates/7/x86_64 CentOS-7 - Updates
>
> Python version is 2.7.5.
>
> Is 'pip install' the only way to go or there is still some option to have
> required python module via rpm? I wonder how other people deals with that
> issue?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSDs with primary affinity 0 still used for primary PG

2018-02-12 Thread David Turner
If you look at the PGs that are primary on an OSD that has primary affinity
0, you'll find that they are only on OSDs with primary affinity of 0, so 1
of them has to take the reins or nobody would be responsible for the PG.
To prevent this from happening, you would need to configure your crush map
in a way where all PGs are guaranteed to land on at least 1 OSD that
doesn't have a primary affinity of 0.

On Mon, Feb 12, 2018 at 2:45 PM Teun Docter 
wrote:

> Hi,
>
> I'm looking into storing the primary copy on SSDs, and replicas on
> spinners.
> One way to achieve this should be the primary affinity setting, as
> outlined in this post:
>
>
> https://www.sebastien-han.fr/blog/2015/08/06/ceph-get-the-best-of-your-ssd-with-primary-affinity
>
> So I've deployed a small test cluster and set the affinity to 0 for half
> the OSDs and to 1 for the rest:
>
> # ceph osd tree
> ID CLASS WEIGHT  TYPE NAME   STATUS REWEIGHT PRI-AFF
> -1   0.07751 root default
> -3   0.01938 host osd001
>  1   hdd 0.00969 osd.1   up  1.0 1.0
>  4   hdd 0.00969 osd.4   up  1.0   0
> -7   0.01938 host osd002
>  2   hdd 0.00969 osd.2   up  1.0 1.0
>  6   hdd 0.00969 osd.6   up  1.0   0
> -9   0.01938 host osd003
>  3   hdd 0.00969 osd.3   up  1.0 1.0
>  7   hdd 0.00969 osd.7   up  1.0   0
> -5   0.01938 host osd004
>  0   hdd 0.00969 osd.0   up  1.0 1.0
>  5   hdd 0.00969 osd.5   up  1.0   0
>
> Then I've created a pool. The summary at the end of "ceph pg dump" looks
> like this:
>
> sum 0 0 0 0 0 0 0 0
> OSD_STAT USED  AVAIL  TOTAL  HB_PEERSPG_SUM PRIMARY_PG_SUM
> 71071M  9067M 10138M [0,1,2,3,4,5,6]192 26
> 61072M  9066M 10138M [0,1,2,3,4,5,7]198 18
> 51071M  9067M 10138M [0,1,2,3,4,6,7]192 21
> 41076M  9062M 10138M [0,1,2,3,5,6,7]202 15
> 31072M  9066M 10138M [0,1,2,4,5,6,7]202121
> 21072M  9066M 10138M [0,1,3,4,5,6,7]195114
> 11076M  9062M 10138M [0,2,3,4,5,6,7]161 95
> 01071M  9067M 10138M [1,2,3,4,5,6,7]194102
> sum  8587M 72524M 8M
>
> Now, the OSDs for which the primary affinity is set to zero are acting as
> primary a lot less than the others.
>
> But what I'm wondering about is this:
>
> For those OSDs that have primary affinity set to zero, why is the
> PRIMARY_PG_SUM column not zero?
>
> # ceph -v
> ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
>
> Note that I've created the pool after setting the primary affinity, and no
> data is stored yet.
>
> Thanks,
> Teun
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph-fuse : unmounted but ceph-fuse process not killed

2018-02-12 Thread David Turner
Why are you using force and lazy options to umount cephfs?  Those should
only be done if there are problems unmounting the volume.  Lazy umount will
indeed leave things running, but just quickly remove the FS from it's mount
point.  You should rarely use umount -f and even more rarely use umount
-l.  From the help for umount is the following excerpt.  Having orphaned
processes after using a lazy umount is the purpose of the option.

-l --lazy
Lazy unmount. Detach the filesystem from the filesystem hierarchy
now, and cleanup all references to the filesystem as soon as
it is not busy anymore. (Requires kernel 2.4.11 or later.)

On Mon, Feb 12, 2018 at 2:44 PM Florent B  wrote:

> Hi,
>
> I use Ceph Luminous last version on Debian Jessie.
>
> Sometimes, when I unmount my ceph-fuse FS using "umount -f -l
> /mnt/point", unmounting is fine but "ceph-fuse" process continues running.
>
> Is that expected ? How to fix this ?
>
> Thank you.
>
> Florent
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a "set pool readonly" command?

2018-02-12 Thread Reed Dier
I do know that there is a pause flag in Ceph.

What I do not know is if that also pauses recovery traffic, in addition to 
client traffic.

Also worth mentioning, this is a cluster-wide flag, not a pool level flag.

Reed

> On Feb 11, 2018, at 11:45 AM, David Turner  wrote:
> 
> If you set min_size to 2 or more, it will disable reads and writes to the 
> pool by blocking requests. Min_size is the minimum copies of a PG that need 
> to be online to allow it to the data. If you only have 1 copy, then it will 
> prevent io. It's not a flag you can set on the pool, but it should work out. 
> If you have size=3, then min_size=3 should block most io until the pool is 
> almost fully backfilled.
> 
> 
> On Sun, Feb 11, 2018, 9:46 AM Nico Schottelius  > wrote:
> 
> Hello,
> 
> we have one pool, in which about 10 disks failed last week (fortunately
> mostly sequentially), which now has now some pgs that are only left on
> one disk.
> 
> Is there a command to set one pool into "read-only" mode or even
> "recovery io-only" mode so that the only thing same is doing is
> recovering and no client i/o will disturb that process?
> 
> Best,
> 
> Nico
> 
> 
> 
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous 12.2.3 release date?

2018-02-12 Thread Abhishek Lekshmanan
Hans van den Bogert  writes:

> Hi Wido,
>
> Did you ever get an answer? I'm eager to know as well.

We're currently testing 12.2.3; once the QE process completes we can
publish the packages, hopefully by the end of this week
>
>
> Hans
>
> On Tue, Jan 30, 2018 at 10:35 AM, Wido den Hollander  wrote:
>> Hi,
>>
>> Is there a ETA yet for 12.2.3? Looking at the tracker there aren't that many
>> outstanding issues: http://tracker.ceph.com/projects/ceph/roadmap
>>
>> On Github we have more outstanding PRs though for the Luminous milestone:
>> https://github.com/ceph/ceph/milestone/10
>>
>> Are we expecting 12.2.3 in Feb? I'm asking because there are some Mgr
>> related fixes I'm backporting now for a few people which are in 12.2.3.
>>
>> Wido
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Abhishek Lekshmanan
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
HRB 21284 (AG Nürnberg)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] mgr[influx] Cannot transmit statistics: influxdb python module not found.

2018-02-12 Thread knawnd

Dear all,

I'd like to store ceph luminous metrics into influxdb. It seems like influx plugin has been already 
backported for lumious:

rpm -ql ceph-mgr-12.2.2-0.el7.x86_64|grep -i influx
/usr/lib64/ceph/mgr/influx
/usr/lib64/ceph/mgr/influx/__init__.py
/usr/lib64/ceph/mgr/influx/__init__.pyc
/usr/lib64/ceph/mgr/influx/__init__.pyo
/usr/lib64/ceph/mgr/influx/module.py
/usr/lib64/ceph/mgr/influx/module.pyc
/usr/lib64/ceph/mgr/influx/module.pyo

So following http://docs.ceph.com/docs/master/mgr/influx/ doc I enabled influx plugin by executing 
the following command on mgr node:

ceph mgr module enable influx

but in ceph log I see the following error:
2018-02-12 15:51:31.241854 7f95e7942600  0 ceph version 12.2.2 
(cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable), process (unknown), pid 96425

[]
2018-02-12 15:51:31.422414 7f95dec29700  1 mgr init Loading python module 
'influx'
[]
2018-02-12 15:51:32.227206 7f95c36ec700  1 mgr load Constructed class from 
module: influx
[]
2018-02-12 15:51:32.228163 7f95c0ee7700  0 mgr[influx] Cannot transmit statistics: influxdb python 
module not found.  Did you install it?


Indeed there is no python-influxdb module install on my mgr node (CentOS 7 x64) but yum search can't 
find it with the following repos enabled:
repo id  repo 
name 
  status
Ceph/x86_64  Ceph 
packages for x86_64
Ceph-noarch  Ceph 
noarch packages
base/7/x86_64 
CentOS-7 - Base
ceph-source  Ceph 
source packages
epel/x86_64  Extra 
Packages for Enterprise Linux 7 - x86_64
extras/7/x86_64 
CentOS-7 - Extras
updates/7/x86_64 
CentOS-7 - Updates


Python version is 2.7.5.

Is 'pip install' the only way to go or there is still some option to have required python module via 
rpm? I wonder how other people deals with that issue?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSDs with primary affinity 0 still used for primary PG

2018-02-12 Thread Teun Docter
Hi,

I'm looking into storing the primary copy on SSDs, and replicas on spinners.
One way to achieve this should be the primary affinity setting, as outlined in 
this post:

https://www.sebastien-han.fr/blog/2015/08/06/ceph-get-the-best-of-your-ssd-with-primary-affinity

So I've deployed a small test cluster and set the affinity to 0 for half the 
OSDs and to 1 for the rest:

# ceph osd tree
ID CLASS WEIGHT  TYPE NAME   STATUS REWEIGHT PRI-AFF 
-1   0.07751 root default
-3   0.01938 host osd001 
 1   hdd 0.00969 osd.1   up  1.0 1.0 
 4   hdd 0.00969 osd.4   up  1.0   0 
-7   0.01938 host osd002 
 2   hdd 0.00969 osd.2   up  1.0 1.0 
 6   hdd 0.00969 osd.6   up  1.0   0 
-9   0.01938 host osd003 
 3   hdd 0.00969 osd.3   up  1.0 1.0 
 7   hdd 0.00969 osd.7   up  1.0   0 
-5   0.01938 host osd004 
 0   hdd 0.00969 osd.0   up  1.0 1.0 
 5   hdd 0.00969 osd.5   up  1.0   0 

Then I've created a pool. The summary at the end of "ceph pg dump" looks like 
this:

sum 0 0 0 0 0 0 0 0 
OSD_STAT USED  AVAIL  TOTAL  HB_PEERSPG_SUM PRIMARY_PG_SUM 
71071M  9067M 10138M [0,1,2,3,4,5,6]192 26 
61072M  9066M 10138M [0,1,2,3,4,5,7]198 18 
51071M  9067M 10138M [0,1,2,3,4,6,7]192 21 
41076M  9062M 10138M [0,1,2,3,5,6,7]202 15 
31072M  9066M 10138M [0,1,2,4,5,6,7]202121 
21072M  9066M 10138M [0,1,3,4,5,6,7]195114 
11076M  9062M 10138M [0,2,3,4,5,6,7]161 95 
01071M  9067M 10138M [1,2,3,4,5,6,7]194102 
sum  8587M 72524M 8M   

Now, the OSDs for which the primary affinity is set to zero are acting as 
primary a lot less than the others.

But what I'm wondering about is this:

For those OSDs that have primary affinity set to zero, why is the 
PRIMARY_PG_SUM column not zero?

# ceph -v
ceph version 12.2.2 (cf0baba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

Note that I've created the pool after setting the primary affinity, and no data 
is stored yet.

Thanks,
Teun

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] PG replication issues

2018-02-12 Thread Alexandru Cucu
Hello,

Warning, this is a long story! There's a TL;DR; close to the end.

We are replacing some of our spinning drives with SSDs. We have 14 OSD
nodes with 12 drives each. We are replacing 4 drives from each node
with SSDs. The cluster is running Ceph Jewel (10.2.7). The affected pool
had min_size=2 and size=3.

After removing some of the drives (from a single host) we noticed the
rebalancing/recovering process got stuck and we had 1 PG with 2 unfound
objects.

Most of our Openstack VMs were having issues - were unresponsive or
had other i/o issues.

We tried quering the PG but had no response after hours of waiting.
Trying to recover or delete the unfound objects did the same thing:
absolutely nothing.

One of the two remaining OSD nodes that had the PG was experiencing
huge load spikes correlated with disk IO spikes: https://imgur.com/a/7g0eI

We had this OSD removed and after a while the other OSD started doing
the same thing - huge load spikes.

Tried doing a query on the affected PG and deleting the unfound objects.
Nothing had changed.

The OSDs this PG was supposed to be replicated to only had and empty
folder.

We removed the last OSD that had the PG with unfound objects. Now we had
an incomplete PG. Recovered the data from the OSD we removed before all
this has started and tried exporting and importing the PG using the
Ceph Object Store Tool. Unfortunately nothing happened.

Also tried using the Ceph Object Store Tool to find and delete the
unfound objects from the last two OSDs we had removed and re-import the
PG but this also didn't work.

*TL;DR;* we had 2 unfound objects on a PG after removing an OSD, cluster
status was healthy before this, pool has min_size=2 and size=3.
Had to delete the entire pool and recreate all the virtual machines.

If you have any idea why the PG was not being replicated on the other
two OSDs please let me know. Any sugestions on how to avoid this?
Just want to make sure this never happens again.

Our story is similar to this one:

http://ceph-users.ceph.narkive.com/bWszhgi1/ceph-pg-incomplete-cluster-unusable#post19

---
Alex Cucu
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread Wido den Hollander



On 02/12/2018 03:16 PM, Behnam Loghmani wrote:
so you mean that rocksdb and osdmap filled disk about 40G for only 800k 
files?

I think it's not reasonable and it's too high


Could you check the output of the OSDs using a 'perf dump' on their 
admin socket?


The 'bluestore' and 'bluefs' sections should tell you:

- db_used_bytes
- onodes

using those values you can figure out how much data the DB is using and 
how many objects you have in the OSD.


Wido



On Mon, Feb 12, 2018 at 5:06 PM, David Turner > wrote:


Some of your overhead is the Wal and rocksdb that are on the OSDs.
The Wal is pretty static in size, but rocksdb grows with the amount
of objects you have. You also have copies of the osdmap on each osd.
There's just overhead that adds up. The biggest is going to be
rocksdb with how many objects you have.


On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani
> wrote:

Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with
radosgw.

Actual size of files is something about 32G but it filled 70G of
each osd.

what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change
it and set it to smaller size, does it impact on performance?

what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Germany :)

2018-02-12 Thread Kai Wagner
Hi Wido,

how do you know about that beforehand? There's no official upcoming
event on the ceph.com page?

Just because I'm curious :)

Thanks

Kai


On 12.02.2018 10:39, Wido den Hollander wrote:
> The next one is in London on April 19th 

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is there a "set pool readonly" command?

2018-02-12 Thread David Turner
The pause flag also pauses recovery traffic.  It is literally a flag to
stop anything and everything in the cluster so you can get an expert in to
prevent something even worse from happening.

On Mon, Feb 12, 2018 at 1:56 PM Reed Dier  wrote:

> I do know that there is a pause flag in Ceph.
>
> What I do not know is if that also pauses recovery traffic, in addition to
> client traffic.
>
> Also worth mentioning, this is a cluster-wide flag, not a pool level flag.
>
> Reed
>
> On Feb 11, 2018, at 11:45 AM, David Turner  wrote:
>
> If you set min_size to 2 or more, it will disable reads and writes to the
> pool by blocking requests. Min_size is the minimum copies of a PG that need
> to be online to allow it to the data. If you only have 1 copy, then it will
> prevent io. It's not a flag you can set on the pool, but it should work
> out. If you have size=3, then min_size=3 should block most io until the
> pool is almost fully backfilled.
>
> On Sun, Feb 11, 2018, 9:46 AM Nico Schottelius <
> nico.schottel...@ungleich.ch> wrote:
>
>>
>> Hello,
>>
>> we have one pool, in which about 10 disks failed last week (fortunately
>> mostly sequentially), which now has now some pgs that are only left on
>> one disk.
>>
>> Is there a command to set one pool into "read-only" mode or even
>> "recovery io-only" mode so that the only thing same is doing is
>> recovering and no client i/o will disturb that process?
>>
>> Best,
>>
>> Nico
>>
>>
>>
>> --
>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph mons de-synced from rest of cluster?

2018-02-12 Thread Gregory Farnum
On Sun, Feb 11, 2018 at 8:19 PM Chris Apsey  wrote:

> All,
>
> Recently doubled the number of OSDs in our cluster, and towards the end
> of the rebalancing, I noticed that recovery IO fell to nothing and that
> the ceph mons eventually looked like this when I ran ceph -s
>
>cluster:
>  id: 6a65c3d0-b84e-4c89-bbf7-a38a1966d780
>  health: HEALTH_WARN
>  34922/4329975 objects misplaced (0.807%)
>  Reduced data availability: 542 pgs inactive, 49 pgs
> peering, 13502 pgs stale
>  Degraded data redundancy: 248778/4329975 objects
> degraded (5.745%), 7319 pgs unclean, 2224 pgs degraded, 1817 pgs
> undersized
>
>services:
>  mon: 3 daemons, quorum cephmon-0,cephmon-1,cephmon-2
>  mgr: cephmon-0(active), standbys: cephmon-1, cephmon-2
>  osd: 376 osds: 376 up, 376 in
>
>data:
>  pools:   9 pools, 13952 pgs
>  objects: 1409k objects, 5992 GB
>  usage:   31528 GB used, 1673 TB / 1704 TB avail
>  pgs: 3.225% pgs unknown
>   0.659% pgs not active
>   248778/4329975 objects degraded (5.745%)
>   34922/4329975 objects misplaced (0.807%)
>   6141 stale+active+clean
>   4537 stale+active+remapped+backfilling
>   1575 stale+active+undersized+degraded
>   489  stale+active+clean+remapped
>   450  unknown
>   396  stale+active+recovery_wait+degraded
>   216
> stale+active+undersized+degraded+remapped+backfilling
>   40   stale+peering
>   30   stale+activating
>   24   stale+active+undersized+remapped
>   22   stale+active+recovering+degraded
>   13   stale+activating+degraded
>   9stale+remapped+peering
>   4stale+active+remapped+backfill_wait
>   3stale+active+clean+scrubbing+deep
>   2
> stale+active+undersized+degraded+remapped+backfill_wait
>   1stale+active+remapped
>
> The problem is, everything works fine.  If I run ceph health detail and
> do a pg query against one of the 'degraded' placement groups, it reports
> back as active-clean.  All clients in the cluster can write and read at
> normal speeds, but not IO information is ever reported in ceph -s.
>
>  From what I can see, everything in the cluster is working properly
> except the actual reporting on the status of the cluster.  Has anyone
> seen this before/know how to sync the mons up to what the OSDs are
> actually reporting?  I see no connectivity errors in the logs of the
> mons or the osds.
>

It sounds like the manager has gone stale somehow. You can probably fix it
by restarting, though if you have logs it would be good to file a bug
report at tracker.ceph.com.
-Greg


>
> Thanks,
>
> ---
> v/r
>
> Chris Apsey
> bitskr...@bitskrieg.net
> https://www.bitskrieg.net
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Rocksdb: Try to delete WAL files size....

2018-02-12 Thread Dietmar Rieder
Anyone?

Am 9. Februar 2018 09:59:54 MEZ schrieb Dietmar Rieder 
:
>Hi,
>
>we are running ceph version 12.2.2 (10 nodes, 240 OSDs, 3 mon). While
>monitoring the WAL db used bytes we noticed that there are some OSDs
>that use proportionally more WAL db bytes than others (800Mb vs 300Mb).
>These OSDs eventually exceed the WAL db size (1GB in our case) and
>spill
>over to the HDD data device. So it seems flushing the WAL db does not
>free space.
>
>We looked for some hints in the logs of the OSDs in question and
>spotted
>the following entries:
>
>[...]
>2018-02-08 16:17:27.496695 7f0ffce55700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:684]
>reusing log 152 from recycle list
>2018-02-08 16:17:27.496768 7f0ffce55700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_write.cc:725]
>[default] New memtable created with log file: #162. Immutable
>memtables: 0.
>2018-02-08 16:17:27.496976 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.496841)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:1158]
>Calling FlushMemTableToOutputFile with column family [default], flush
>slots available 1, compaction slots allowed 1, compaction slots
>scheduled 1
>2018-02-08 16:17:27.496983 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:264]
>[default] [JOB 6] Flushing memtable with next log file: 162
>2018-02-08 16:17:27.497001 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
>{"time_micros": 1518103047496990, "job": 6, "event": "flush_started",
>"num_memtables": 1, "num_entries": 328542, "num_deletes": 66632,
>"memory_usage": 260058032}
>2018-02-08 16:17:27.497006 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:293]
>[default] [JOB 6] Level-0 flush table #163: started
>2018-02-08 16:17:27.627110 7f0fe7e2b700  4 rocksdb: EVENT_LOG_v1
>{"time_micros": 1518103047627094, "cf_name": "default", "job": 6,
>"event": "table_file_creation", "file_number": 163, "file_size":
>5502182, "table_properties": {"data_size": 5160167, "index_size":
>81548,
>"filter_size": 259478, "raw_key_size": 5138655, "raw_average_key_size":
>51, "raw_value_size": 3606384, "raw_average_value_size": 36,
>"num_data_blocks": 1287, "num_entries": 98984, "filter_policy_name":
>"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "66093",
>"kMergeOperands":
>"192"}}
>2018-02-08 16:17:27.627127 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/flush_job.cc:319]
>[default] [JOB 6] Level-0 flush table #163: 5502182 bytes OK
>2018-02-08 16:17:27.627449 7f0fe7e2b700  4 rocksdb:
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:242]
>adding log 155 to recycle list
>2018-02-08 16:17:27.627457 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627136)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360]
>[default] Level-0 commit table #163 started
>2018-02-08 16:17:27.627461 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627402)
>[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383]
>[default] Level-0 commit table #163: memtable #1 done
>2018-02-08 16:17:27.627474 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627415) EVENT_LOG_v1 {"time_micros":
>1518103047627409, "job": 6, "event": "flush_finished", "lsm_state": [1,
>2, 3, 0, 0, 0, 0], "immutable_memtables": 0}
>2018-02-08 16:17:27.627476 7f0fe7e2b700  4 rocksdb: (Original Log Time
>2018/02/08-16:17:27.627435)

Re: [ceph-users] Luminous 12.2.3 release date?

2018-02-12 Thread Hans van den Bogert
Hi Wido,

Did you ever get an answer? I'm eager to know as well.


Hans

On Tue, Jan 30, 2018 at 10:35 AM, Wido den Hollander  wrote:
> Hi,
>
> Is there a ETA yet for 12.2.3? Looking at the tracker there aren't that many
> outstanding issues: http://tracker.ceph.com/projects/ceph/roadmap
>
> On Github we have more outstanding PRs though for the Luminous milestone:
> https://github.com/ceph/ceph/milestone/10
>
> Are we expecting 12.2.3 in Feb? I'm asking because there are some Mgr
> related fixes I'm backporting now for a few people which are in 12.2.3.
>
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread Behnam Loghmani
so you mean that rocksdb and osdmap filled disk about 40G for only 800k
files?
I think it's not reasonable and it's too high

On Mon, Feb 12, 2018 at 5:06 PM, David Turner  wrote:

> Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
> is pretty static in size, but rocksdb grows with the amount of objects you
> have. You also have copies of the osdmap on each osd. There's just overhead
> that adds up. The biggest is going to be rocksdb with how many objects you
> have.
>
> On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani 
> wrote:
>
>> Hi there,
>>
>> I am using ceph Luminous 12.2.2 with:
>>
>> 3 osds (each osd is 100G) - no WAL/DB separation.
>> 3 mons
>> 1 rgw
>> cluster size 3
>>
>> I stored lots of thumbnails with very small size on ceph with radosgw.
>>
>> Actual size of files is something about 32G but it filled 70G of each osd.
>>
>> what's the reason of this high disk usage?
>> should I change "bluestore_min_alloc_size_hdd"? and If I change it and
>> set it to smaller size, does it impact on performance?
>>
>> what is the best practice for storing small files on bluestore?
>>
>> Best regards,
>> Behnam Loghmani
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore with so many small files

2018-02-12 Thread David Turner
Some of your overhead is the Wal and rocksdb that are on the OSDs. The Wal
is pretty static in size, but rocksdb grows with the amount of objects you
have. You also have copies of the osdmap on each osd. There's just overhead
that adds up. The biggest is going to be rocksdb with how many objects you
have.

On Mon, Feb 12, 2018, 8:06 AM Behnam Loghmani 
wrote:

> Hi there,
>
> I am using ceph Luminous 12.2.2 with:
>
> 3 osds (each osd is 100G) - no WAL/DB separation.
> 3 mons
> 1 rgw
> cluster size 3
>
> I stored lots of thumbnails with very small size on ceph with radosgw.
>
> Actual size of files is something about 32G but it filled 70G of each osd.
>
> what's the reason of this high disk usage?
> should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
> it to smaller size, does it impact on performance?
>
> what is the best practice for storing small files on bluestore?
>
> Best regards,
> Behnam Loghmani
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bluestore with so many small files

2018-02-12 Thread Behnam Loghmani
Hi there,

I am using ceph Luminous 12.2.2 with:

3 osds (each osd is 100G) - no WAL/DB separation.
3 mons
1 rgw
cluster size 3

I stored lots of thumbnails with very small size on ceph with radosgw.

Actual size of files is something about 32G but it filled 70G of each osd.

what's the reason of this high disk usage?
should I change "bluestore_min_alloc_size_hdd"? and If I change it and set
it to smaller size, does it impact on performance?

what is the best practice for storing small files on bluestore?

Best regards,
Behnam Loghmani
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] NFS-Ganesha: Files disappearing?

2018-02-12 Thread Martin Emrich

Hi!

I am trying out NFS-Ganesha-RGW (2.5.4 and also Git V2.5-stable) with 
Ceph 12.2.2.


Mounting the RGW works fine, but if I try to archive all files, some 
paths seem to "disappear":


...

tar: /store/testbucket/nhxYgfUgFivgzRxw: File removed before we read it
tar: /store/testbucket/nlkijFwqnXCGkRca: File removed before we read it
tar: /store/testbucket/oObmYVuGnoJPvgjM: File removed before we read it
tar: /store/testbucket/orQnXMVdoqPfJwnw: File removed before we read it
tar: /store/testbucket/piavKoyCJgaXjbjR: File removed before we read it
tar: /store/testbucket/pnCqpyMdiAtlSVOh: File removed before we read it
tar: /store/testbucket/pzKYPQCqzDBGbaZf: File removed before we read it
tar: /store/testbucket/qtjkhjXKyPzfcKFU: File removed before we read it
tar: /store/testbucket/sGlMWjFlfPLmcbmd: File removed before we read it
tar: /store/testbucket/uZAKQkQeXTRyLVrr: File removed before we read it
tar: /store/testbucket/ucIqmGmthjHBQWqn: File removed before we read it
tar: /store/testbucket/uiBBNTyQxwnqnKBO: File removed before we read it
tar: /store/testbucket/vRfZJRHfXdnfGGgx: File removed before we read it
tar: /store/testbucket/vbifbMjyIqOFtaFO: File removed before we read it
tar: /store/testbucket/viKmlbRBXFeEdAsh: File removed before we read it
tar: /store/testbucket/wYmlGYkjvgeDRSnh: File removed before we read it
tar: /store/testbucket/wZgjLHrQbwOSikLI: File removed before we read it
tar: /store/testbucket/xaJNasKXqMomIpeK: File removed before we read it
...

* all paths here are actually "directories" (with more objects "in them")
* Multiple runs yield the same paths, so it appears not to be random.

Anyone seen this? Could this be some timeout issue?


Thanks

Martin

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rbd feature overheads

2018-02-12 Thread Ilya Dryomov
On Mon, Feb 12, 2018 at 6:25 AM, Blair Bethwaite
 wrote:
> Hi all,
>
> Wondering if anyone can clarify whether there are any significant overheads
> from rbd features like object-map, fast-diff, etc. I'm interested in both
> performance overheads from a latency and space perspective, e.g., can
> object-map be sanely deployed on a 100TB volume or does the client try to
> read the whole thing into memory...?

Yes, it does.  Enabling object-map on images larger than 1PB isn't
allowed for exactly that reason.  The memory overhead is 2 bits per
object, i.e. 64K per 1TB assuming the default object size.

object-map also depends on exclusive-lock, which is bad for use cases
where sharing the same image between multiple clients is a requirement.

Once object-map is enabled, fast-diff is virtually no overhead.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Germany :)

2018-02-12 Thread Kai Wagner
Sometimes I'm just blind. Way to less ML :D

Thanks!


On 12.02.2018 10:51, Wido den Hollander wrote:
> Because I'm co-organizing it! :) It send out a Call for Papers last
> week to this list. 

-- 
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)




signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Germany :)

2018-02-12 Thread Wido den Hollander



On 02/12/2018 10:42 AM, Kai Wagner wrote:

Hi Wido,

how do you know about that beforehand? There's no official upcoming
event on the ceph.com page?



Because I'm co-organizing it! :) It send out a Call for Papers last week 
to this list.


Waiting for the page to come online on ceph.com, but tickets can be 
found here: 
https://www.eventbrite.co.uk/e/cloudstack-european-user-group-ceph-day-tickets-42670526694


This will be a Full Ceph day combined with the Apache CloudStack 
project. A dual-tracked day with talks about both projects and their 
combinations.


Wido


Just because I'm curious :)

Thanks

Kai


On 12.02.2018 10:39, Wido den Hollander wrote:

The next one is in London on April 19th



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph Day Germany :)

2018-02-12 Thread Wido den Hollander



On 02/12/2018 12:33 AM, c...@elchaka.de wrote:



Am 9. Februar 2018 11:51:08 MEZ schrieb Lenz Grimmer :

Hi all,

On 02/08/2018 11:23 AM, Martin Emrich wrote:


I just want to thank all organizers and speakers for the awesome Ceph
Day at Darmstadt, Germany yesterday.

I learned of some cool stuff I'm eager to try out (NFS-Ganesha for

RGW,

openATTIC,...), Organization and food were great, too.


I agree - thanks a lot to Danny Al-Gaaf and Leonardo for the overall
organization, and of course the sponsors and speakers who made it
happen! I too learned a lot.

Lenz


I absolutely agree, too. This was really great! Would be Fantastic if the ceph 
days  will happen again in Darmstadt - or Düsseldorf ;)



There will be a Ceph Day again in Germany in the future, but to allow 
everybody to vistit Ceph Days the location rotates.


The next one is in London on April 19th where the previous one was in 
September in the Netherlands.


Ceph Days will come back to Germany again.

Wido


Btw. Will the Slides and perhaps Videos of the presentation be online avaiable?

Thanks again Guys - Great day
- Mehmet
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com