Re: [ceph-users] "ERROR: rgw_obj_remove(): cls_cxx_remove returned -2" on OSDs since Hammer upgrade

2015-07-10 Thread Sylvain Munaut
Hi,

> Some of our users have experienced this as well:
> https://github.com/deis/deis/issues/3969
>
> One of our other users suggested performing a deep scrub of all PGs - the
> suspicion is that this is caused by a corrupt file on the filesystem.

That somehow appeared right when I upgraded to hammer and on all OSDs
at the same time ?
Seems doubtful ...

This message also show up always when a DELETE operation is performed
on S3. (but afaict the DELETE is performed without issue) and it's
also not on all DELETEs. (but couldn't find any definitive criteria
for when it happens and when it doesn't).


Cheers,

   Sylvain
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS kernel client reboots on write

2015-07-10 Thread Jan Pekař

Hi all,

I think I found a bug in cephfs kernel client.
When I create directory in cephfs and set layout to

ceph.dir.layout="stripe_unit=1073741824 stripe_count=1 
object_size=1073741824 pool=somepool"


attepmts to write larger file will cause kernel hung or reboot.
When I'm using cephfs client based on fuse, it works (but now I have 
some issues with fuse and concurrent writes too, but it is not this kind 
of problem).


I think object_size and stripe_unit 1073741824 is max value, or can I 
set it higher?


Default values "stripe_unit=4194304 stripe_count=1 object_size=4194304" 
works without problem on write.


My goal was not to split file between osd's each 4MB of its size but 
save it in one piece.


With regards
Jan Pekar
Imatic
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] "ERROR: rgw_obj_remove(): cls_cxx_remove returned -2" on OSDs since Hammer upgrade

2015-07-10 Thread Chris Armstrong
Some of our users have experienced this as well:
https://github.com/deis/deis/issues/3969

One of our other users suggested performing a deep scrub of all PGs - the
suspicion is that this is caused by a corrupt file on the filesystem.

On Thu, Jul 9, 2015 at 12:53 AM, Sylvain Munaut <
s.mun...@whatever-company.com> wrote:

> Hi,
>
>
> Since I upgraded to Hammer last weekend, I see errors suchs as
>
> 7eff5322d700  0  cls/rgw/cls_rgw.cc:1947: ERROR:
> rgw_obj_remove(): cls_cxx_remove returned -2
>
> in the logs.
>
> What's going on ?
>
>
> Can this be related to the unexplained write activity I see on my OSDs ?
>
>
> Cheers,
>
>
>Sylvain
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
*Chris Armstrong* | Deis Team Lead | *Engine Yard* | t: @carmstrong_afk
 | gh: carmstrong


Deis project: github.com/deis/deis | docs.deis.io | #deis

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 403 return code on S3 Gateway for remove keys or change key.

2015-07-10 Thread Tyler Bishop


The Ceph Admin REST API is producing SignatureDoesNotMatch access denied errors 
when attempting to make a request for the user's key sub-resource. Both PUT and 
DELETE actions for the /admin/user?key resource are failing even though the 
string to sign on the client and the one returned by the server are identical. 

### 
# Requesting: GET /admin/user?uid=C1 
### 

### START String To Sign from Request ### 
GET 

application/x-www-form-urlencoded 
Fri, 10 Jul 2015 17:42:47 GMT 
/admin/user 
### END String to Sign ### 

### START CURL VERBOSE ### 
* Trying 1.2.3.4... 
* Connected to s3.example.com (1.2.3.4) port 443 (#0) 
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 
* Server certificate: 
* subject: OU=Domain Control Validated; OU=COMODO SSL Wildcard; 
CN=*.s3.example.com 
* start date: 2015-06-22 00:00:00 GMT 
* expire date: 2016-06-21 23:59:59 GMT 
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=COMODO CA Limited; 
CN=COMODO RSA Domain Validation Secure Ser 
ver CA 
* SSL certificate verify result: unable to get local issuer certificate (20), 
continuing anyway. 
> GET /admin/user?uid=C1 HTTP/1.1 
User-Agent: aws-sdk-php/1.6.2 PHP/5.6.8 curl/7.40.0 openssl/1.0.1m 
Host: s3.example.com 
Accept: */* 
Accept-Encoding: gzip, deflate 
Referer: https://s3.example.com/admin/user?uid=C1 
Content-Type: application/x-www-form-urlencoded 
Date: Fri, 10 Jul 2015 17:42:47 GMT 
Authorization: AWS 27K8RGLQBN8K6G5PV3RS:Y8hxsK3lsVsXIBVsECY6iiMXQok= 
Content-Length: 0 

< HTTP/1.1 200 OK 
< Server: Tengine/2.1.0 
< Date: Fri, 10 Jul 2015 17:42:44 GMT 
< Content-Type: application/json 
< Transfer-Encoding: chunked 
< Connection: keep-alive 
< 
* Connection #0 to host s3.example.com left intact 
### END CURL VERBOSE ### 

### START Response Dump ### 
CFResponse Object 
( 
[header] => Array 
( 
[server] => Tengine/2.1.0 
[date] => Fri, 10 Jul 2015 17:42:44 GMT 
[content-type] => application/json 
[transfer-encoding] => chunked 
[connection] => keep-alive 
[_info] => Array 
( 
[url] => https://s3.example.com/admin/user?uid=C1 
[content_type] => application/json 
[http_code] => 200 
[header_size] => 163 
[request_size] => 422 
[filetime] => -1 
[ssl_verify_result] => 20 
[redirect_count] => 0 
[total_time] => 1.341 
[namelookup_time] => 0 
[connect_time] => 0.046 
[pretransfer_time] => 1.279 
[size_upload] => 0 
[size_download] => 341 
[speed_download] => 254 
[speed_upload] => 0 
[download_content_length] => -1 
[upload_content_length] => 0 
[starttransfer_time] => 1.341 
[redirect_time] => 0 
[redirect_url] => 
[primary_ip] => 1.2.3.4 
[certinfo] => Array 
( 
) 

[primary_port] => 443 
[local_ip] => 192.168.2.12 
[local_port] => 64078 
[method] => GET 
) 

[x-aws-request-url] => https://s3.example.com/admin/user?uid=C1 
[x-aws-redirects] => 0 
[x-aws-stringtosign] => GET 

application/x-www-form-urlencoded 
Fri, 10 Jul 2015 17:42:47 GMT 
/admin/user 
[x-aws-requestheaders] => Array 
( 
[Content-Type] => application/x-www-form-urlencoded 
[Date] => Fri, 10 Jul 2015 17:42:47 GMT 
[Authorization] => AWS 27K8RGLQBN8K6G5PV3RS:Y8hxsK3lsVsXIBVsECY6iiMXQok= 
[Expect] => 
) 

) 

[body] => CFSimpleXML Object 
( 
[user_id] => C1 
[display_name] => C1 
[email] => CFSimpleXML Object 
( 
) 

[suspended] => 0 
[max_buckets] => 1000 
[subusers] => CFSimpleXML Object 
( 
) 

[keys] => Array 
( 
[0] => CFSimpleXML Object 
( 
[user] => C1 
[access_key] => ANNMJKDEZ2RN60I03GI9 
[secret_key] => E5ACgu28+AP1u7z4+qbKeIfEtsaAFVrBKSgTAupE 
) 

[1] => CFSimpleXML Object 
( 
[user] => C1 
[access_key] => IQAEY8F8CFIR7XG4CAGB 
[secret_key] => hfr89xH5C01VCNNwv3wkMT5+JmsXrSwjXnB55ttS 
) 

) 

[swift_keys] => CFSimpleXML Object 
( 
) 

[caps] => CFSimpleXML Object 
( 
) 

) 

[status] => 200 
) 
### END Response Dump ### 



#
 



### 
# Requesting: DELETE /admin/user?key&uid=C1&access-key=ANNMJKDEZ2RN60I03GI9 
### 



### START String To Sign from Request ### 
DELETE 

application/x-www-form-urlencoded 
Fri, 10 Jul 2015 17:42:48 GMT 
/admin/user?key 
### END String to Sign ### 

### START CURL VERBOSE ### 
* Hostname s3.example.com was found in DNS cache 
* Trying 1.2.3.4... 
* Connected to s3.example.com (1.2.3.4) port 443 (#0) 
* SSL connection using TLSv1.2 / ECDHE-RSA-AES256-GCM-SHA384 
* Server certificate: 
* subject: OU=Domain Control Validated; OU=COMODO SSL Wildcard; 
CN=*.s3.example.com 
* start date: 2015-06-22 00:00:00 GMT 
* expire date: 2016-06-21 23:59:59 GMT 
* issuer: C=GB; ST=Greater Manchester; L=Salford; O=COMODO CA Limited; 
CN=COMODO RSA Domain Validation Secure Ser 
ver CA 
* SSL certificate verify result: unable to get local issuer certificate (20), 
continuing anyway. 
> DELETE /admin/user?key&uid=C1&access-key=ANNMJKDEZ2RN60I03GI9 HTTP/1.1 
User-Agent: aws-sdk-php/1.6.2 PHP/5.6.8 curl/7.40.0 openssl/1.0.1m 
Host: s3.example.com 
Accept: */* 
Accept-Encoding: gzip, deflate 
Referer: 
https://s3.example.com

Re: [ceph-users] Monitor questions

2015-07-10 Thread Quentin Hartman
For very small values of production. I never had more than a couple clients
hitting either of them, but they were doing "real work". Ultimately though,
we decided to just use NFS exports from a VM to do what we were trying to
do with rgw and mds.

QH

On Fri, Jul 10, 2015 at 9:47 AM, Nate Curry  wrote:

> Yes that was what I meant.  Thanks.  Was that in a production environment?
>
> Nate Curry
> On Jul 10, 2015 11:21 AM, "Quentin Hartman" 
> wrote:
>
>> You mean the hardware config? They are older Core2-based servers with 4GB
>> of RAM. Nothing special. I have one running mon and rgw, one running mon
>> and mds, and one run just a mon.
>>
>> QH
>>
>> On Fri, Jul 10, 2015 at 8:58 AM, Nate Curry  wrote:
>>
>>> What was your monitor node's configuration when you had multiple ceph
>>> daemons running on them?
>>>
>>> *Nate Curry*
>>> IT Manager
>>> ISSM
>>> *Mosaic ATM*
>>> mobile: 240.285.7341
>>> office: 571.223.7036 x226
>>> cu...@mosaicatm.com
>>>
>>> On Thu, Jul 9, 2015 at 5:36 PM, Quentin Hartman <
>>> qhart...@direwolfdigital.com> wrote:
>>>
 I have my mons sharing the ceph network, and while I currently do not
 run mds or rgw, I have run those on my mon hosts in the past with no
 perceptible ill effects.

 On Thu, Jul 9, 2015 at 3:20 PM, Nate Curry  wrote:

> I have a question in regards to monitor nodes and network layout.  Its
> my understanding that there should be two networks; a ceph only network 
> for
> comms between the various ceph nodes, and a separate storage network where
> other systems will interface with the ceph nodes.  Are the monitor nodes
> supposed to straddle both the ceph only network and the storage network or
> just in the ceph network?
>
> Another question is can I run multiple things on the monitor nodes?
> Like the RADOS GW and the MDS?
>
>
> Thanks,
>
> *Nate Curry*
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

>>>
>>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.80.10 released ?

2015-07-10 Thread Loic Dachary
The release notes have not yet been published.

On 10/07/2015 17:31, Pierre BLONDEAU wrote:
> Hi,
> 
> I can update my ceph's packages to 0.80.10.
> But i can't found informations about this version ( website, mailing list ).
> Someone know where i can found these informations ?
> 
> Regards
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor questions

2015-07-10 Thread Nate Curry
Yes that was what I meant.  Thanks.  Was that in a production environment?

Nate Curry
On Jul 10, 2015 11:21 AM, "Quentin Hartman" 
wrote:

> You mean the hardware config? They are older Core2-based servers with 4GB
> of RAM. Nothing special. I have one running mon and rgw, one running mon
> and mds, and one run just a mon.
>
> QH
>
> On Fri, Jul 10, 2015 at 8:58 AM, Nate Curry  wrote:
>
>> What was your monitor node's configuration when you had multiple ceph
>> daemons running on them?
>>
>> *Nate Curry*
>> IT Manager
>> ISSM
>> *Mosaic ATM*
>> mobile: 240.285.7341
>> office: 571.223.7036 x226
>> cu...@mosaicatm.com
>>
>> On Thu, Jul 9, 2015 at 5:36 PM, Quentin Hartman <
>> qhart...@direwolfdigital.com> wrote:
>>
>>> I have my mons sharing the ceph network, and while I currently do not
>>> run mds or rgw, I have run those on my mon hosts in the past with no
>>> perceptible ill effects.
>>>
>>> On Thu, Jul 9, 2015 at 3:20 PM, Nate Curry  wrote:
>>>
 I have a question in regards to monitor nodes and network layout.  Its
 my understanding that there should be two networks; a ceph only network for
 comms between the various ceph nodes, and a separate storage network where
 other systems will interface with the ceph nodes.  Are the monitor nodes
 supposed to straddle both the ceph only network and the storage network or
 just in the ceph network?

 Another question is can I run multiple things on the monitor nodes?
 Like the RADOS GW and the MDS?


 Thanks,

 *Nate Curry*


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 0.80.10 released ?

2015-07-10 Thread Pierre BLONDEAU
Hi,

I can update my ceph's packages to 0.80.10.
But i can't found informations about this version ( website, mailing list ).
Someone know where i can found these informations ?

Regards

-- 
--
Pierre BLONDEAU
Administrateur Systèmes & réseaux
Université de Caen
Laboratoire GREYC, Département d'informatique

tel : 02 31 56 75 42
bureau  : Campus 2, Science 3, 406
--



smime.p7s
Description: Signature cryptographique S/MIME
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor questions

2015-07-10 Thread Quentin Hartman
You mean the hardware config? They are older Core2-based servers with 4GB
of RAM. Nothing special. I have one running mon and rgw, one running mon
and mds, and one run just a mon.

QH

On Fri, Jul 10, 2015 at 8:58 AM, Nate Curry  wrote:

> What was your monitor node's configuration when you had multiple ceph
> daemons running on them?
>
> *Nate Curry*
> IT Manager
> ISSM
> *Mosaic ATM*
> mobile: 240.285.7341
> office: 571.223.7036 x226
> cu...@mosaicatm.com
>
> On Thu, Jul 9, 2015 at 5:36 PM, Quentin Hartman <
> qhart...@direwolfdigital.com> wrote:
>
>> I have my mons sharing the ceph network, and while I currently do not run
>> mds or rgw, I have run those on my mon hosts in the past with no
>> perceptible ill effects.
>>
>> On Thu, Jul 9, 2015 at 3:20 PM, Nate Curry  wrote:
>>
>>> I have a question in regards to monitor nodes and network layout.  Its
>>> my understanding that there should be two networks; a ceph only network for
>>> comms between the various ceph nodes, and a separate storage network where
>>> other systems will interface with the ceph nodes.  Are the monitor nodes
>>> supposed to straddle both the ceph only network and the storage network or
>>> just in the ceph network?
>>>
>>> Another question is can I run multiple things on the monitor nodes?
>>> Like the RADOS GW and the MDS?
>>>
>>>
>>> Thanks,
>>>
>>> *Nate Curry*
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph in a shared environment

2015-07-10 Thread Nathan Stratton
We do the same, so far no problems.


><>
nathan stratton | vp technology | broadsoft, inc | +1-240-404-6580 |
www.broadsoft.com

On Fri, Jul 10, 2015 at 6:51 AM, Jan Schermer  wrote:

> We run CEPH OSDs on the same hosts as QEMU/KVM with OpenStack. You need to
> segregate the processes so the OSDs have their dedicated cores and memory,
> other than that it works fine. Our MONs also run on the same hosts as the
> OpenStack controller nodes (L3 agents and such) - no problem here, you just
> need dedicated drives for their data.
>
> Jan
>
> On 10 Jul 2015, at 12:28, Kris Gillespie  wrote:
>
>  Hi All,
>
>  So this may have been asked but I’ve googled the crap out of this so
> maybe my google-fu needs work. Does anyone have any experience running a
> Ceph cluster with the Ceph daemons (mons/osds/rgw) running on the same
> hosts as other services (so say Docker containers, or really anything
> generating load). What has been your experience? Used cgroups or seen any
> reason too? Any performance issues? Troubleshooting a pain? Any other
> general observations?
>
>  Just curious if anyone out there has done it and to what scale and what
> issues they’ve encountered.
>
>  Cheers everyone
>
>  Kris Gillespie| System Engineer | bol.com
>
>
> De informatie verzonden met dit e-mailbericht is uitsluitend bestemd voor
> de geadresseerde. Gebruik van deze informatie door anderen dan de
> geadresseerde is uitdrukkelijk verboden. Indien u dit bericht per
> vergissing heeft ontvangen, verzoeken wij u ons onmiddelijk hiervan op de
> hoogte te stellen en het bericht te vernietigen. Openbaarmaking,
> vermenigvuldiging, verspreiding en/of verstrekking van deze informatie aan
> derden is niet toegestaan. Bol.com  b.v. staat niet in
> voor de juiste en volledige overbrenging van de inhoud van een verzonden
> e-mail, noch voor tijdige ontvangst daarvan en aanvaardt geen
> aansprakelijkheid in dezen.
> The information contained in this communication is confidential and may be
> legally privileged. It is intended solely for the use of the individual or
> entity to whom it is addressed and others authorised to receive it. If you
> are not the intended recipient please notify the sender and destroy this
> message. Any disclosure, copying, distribution or taking any action in
> reliance on the contents of this information is strictly prohibited and may
> be unlawful. Bol.com  b.v. is neither liable for the
> proper and complete transmission of the information contained in this
> communication nor for delay in its receipt.
>
>  ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Monitor questions

2015-07-10 Thread Nate Curry
What was your monitor node's configuration when you had multiple ceph
daemons running on them?

*Nate Curry*
IT Manager
ISSM
*Mosaic ATM*
mobile: 240.285.7341
office: 571.223.7036 x226
cu...@mosaicatm.com

On Thu, Jul 9, 2015 at 5:36 PM, Quentin Hartman <
qhart...@direwolfdigital.com> wrote:

> I have my mons sharing the ceph network, and while I currently do not run
> mds or rgw, I have run those on my mon hosts in the past with no
> perceptible ill effects.
>
> On Thu, Jul 9, 2015 at 3:20 PM, Nate Curry  wrote:
>
>> I have a question in regards to monitor nodes and network layout.  Its my
>> understanding that there should be two networks; a ceph only network for
>> comms between the various ceph nodes, and a separate storage network where
>> other systems will interface with the ceph nodes.  Are the monitor nodes
>> supposed to straddle both the ceph only network and the storage network or
>> just in the ceph network?
>>
>> Another question is can I run multiple things on the monitor nodes?  Like
>> the RADOS GW and the MDS?
>>
>>
>> Thanks,
>>
>> *Nate Curry*
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help with pgs undersized+degraded+peered

2015-07-10 Thread alberto ayllon
I have installed ceph (0.94.2 using ceph-deploy utility.. I have created
three VM with Ubuntu 14.04, ceph01, ceph02 and ceph03, each one has 3 OSD
daemons, and 1 mon, ceph01 also has ceph-deploy.


I need help, because I have read the online docs and try many things , but
I didn't find why my cluster status is always warning, regardless of the
number of PG's defined it are in state undersized+degraded+peered.


Here is the way I did the cluster:

*root@ceph01:~# *mkdir /opt/ceph
r*oot@ceph01:~#* cd /opt/ceph
*root@ceph01:/opt/ceph#* ceph-deploy new ceph01
*root@ceph01:/opt/ceph# *ceph-deploy install ceph01 ceph02 ceph03
*root@ceph01:/opt/ceph#* ceph-deploy mon create-initial
*root@ceph01:/opt/ceph#* ceph-deploy disk zap ceph01:vdc ceph02:vdc
ceph03:vdc ceph01:vdd ceph02:vdd ceph03:vdd  ceph01:vde ceph02:vde
ceph03:vde
*root@ceph01:/opt/ceph#* ceph-deploy osd create ceph01:vdc ceph02:vdc
ceph03:vdc ceph01:vdd ceph02:vdd ceph03:vdd  ceph01:vde ceph02:vde
ceph03:vde

*root@ceph01:/opt/ceph#* ceph-deploy mon add ceph02
*root@ceph01:/opt/ceph#* ceph-deploy mon add ceph03


*root@ceph01:/opt/ceph# *ceph status
cluster d54a2216-b522-4744-a7cc-a2106e1281b6
 health HEALTH_WARN
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck inactive
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
too few PGs per OSD (7 < min 30)
 monmap e3: 3 mons at {ceph01=
172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0
}
election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
 osdmap e29: 9 osds: 9 up, 9 in
  pgmap v49: 64 pgs, 1 pools, 0 bytes data, 0 objects
296 MB used, 45684 MB / 45980 MB avail
  64 undersized+degraded+peered

*root@ceph01:/opt/ceph#* ceph osd lspools
0 rbd,

*root@ceph01:/opt/ceph# *ceph osd pool get rbd size
size: 3

*root@ceph01:/opt/ceph#* ceph osd pool set rbd size 2
set pool 0 size to 2

*root@ceph01:/opt/ceph# *ceph osd pool set rbd min_size 1
set pool 0 min_size to 1

*root@ceph01:/opt/ceph#* ceph status
cluster d54a2216-b522-4744-a7cc-a2106e1281b6
 health HEALTH_WARN
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck inactive
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
too few PGs per OSD (7 < min 30)
 monmap e3: 3 mons at {ceph01=
172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0
}
election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
 osdmap e30: 9 osds: 9 up, 9 in
  pgmap v52: 64 pgs, 1 pools, 0 bytes data, 0 objects
296 MB used, 45684 MB / 45980 MB avail
  64 undersized+degraded+peered


If I try to increase pg_num as documentation recommend;

*root@ceph01:/opt/ceph#* ceph osd pool set rbd pg_num 512
Error E2BIG: specified pg_num 512 is too large (creating 448 new PGs on ~9
OSDs exceeds per-OSD max of 32)

Then I set the ph_num = 280

*root@ceph01:/opt/ceph#* ceph osd pool set rbd pg_num 280
set pool 0 pg_num to 280

*root@ceph01:/opt/ceph# *ceph osd pool set rbd pgp_num 280
set pool 0 pgp_num to 280


*root@ceph01:/opt/ceph#* ceph status
cluster d54a2216-b522-4744-a7cc-a2106e1281b6
 health HEALTH_WARN
280 pgs degraded
280 pgs stuck unclean
280 pgs undersized
 monmap e3: 3 mons at {ceph01=
172.16.70.158:6789/0,ceph02=172.16.70.159:6789/0,ceph03=172.16.70.160:6789/0
}
election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
 osdmap e37: 9 osds: 9 up, 9 in
  pgmap v100: 280 pgs, 1 pools, 0 bytes data, 0 objects
301 MB used, 45679 MB / 45980 MB avail
 280 active+undersized+degraded




How can I get PG's in active + clean state? maybe the online documentation
is too old?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-10 Thread Lars Marowsky-Bree
On 2015-07-10T15:20:23, Jacek Jarosiewicz  wrote:

> We have tried both - you can see performance gain, but we finally went
> toward ceph cache tier. It's much more flexible and gives similar gains in
> terms of performance.
> 
> Downside to bcache is that you can't use it on a drive that already has data
> - only new, clean partitions can be added - and (although I've read that
> bcache is quite resiliant) you can not acces raw filesystem once bcache is
> added to your partition (data is only accessible through bcache, so
> potentially if bcache goes corrupt, your data goes corrupt).
> 
> Downside to flashcache is that you can only combine partition on ssd with
> another partition on spinning drive, so you have to think ahead when
> planning your disc layout, ie.: if you partition your ssd with `n'
> partitions so that it can cache your `n' spinning drives, and then you want
> to add another spinning drive you either had to have left some space on the
> original ssd, or you have to add a new one. And if you have left some space
> - it's been just sitting there waiting for a new spinning drive.
> 
> With cache tier you can have your cake and eat it too :) - add/remove ssd's
> on demand, and add/remove spinning drives as you wish - just tune the pool
> sizes after you change your drive layout.

Great feedback, too.

So the point about bcache is very valid. But then, a cache layer does
require a lot more tuning and has many more moving parts, requires more
memory, and a more complex ceph setup.

(I was specifically wondering if a bcache could help in front of SMR
drives, actually.)

But it's really useful to know you're seeing similar speed-ups with the
cache tiering.


Regards,
Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-10 Thread Jacek Jarosiewicz
We have tried both - you can see performance gain, but we finally went 
toward ceph cache tier. It's much more flexible and gives similar gains 
in terms of performance.


Downside to bcache is that you can't use it on a drive that already has 
data - only new, clean partitions can be added - and (although I've read 
that bcache is quite resiliant) you can not acces raw filesystem once 
bcache is added to your partition (data is only accessible through 
bcache, so potentially if bcache goes corrupt, your data goes corrupt).


Downside to flashcache is that you can only combine partition on ssd 
with another partition on spinning drive, so you have to think ahead 
when planning your disc layout, ie.: if you partition your ssd with `n' 
partitions so that it can cache your `n' spinning drives, and then you 
want to add another spinning drive you either had to have left some 
space on the original ssd, or you have to add a new one. And if you have 
left some space - it's been just sitting there waiting for a new 
spinning drive.


With cache tier you can have your cake and eat it too :) - add/remove 
ssd's on demand, and add/remove spinning drives as you wish - just tune 
the pool sizes after you change your drive layout.


J


On 07/10/2015 02:07 PM, David Burley wrote:



In a similar direction, one could try using bcache on top of the actual
spinner. Have you tried that, too?


We haven't tried bcache/flashcache/...
--
David Burley
NOC Manager, Sr. Systems Programmer/Analyst
Slashdot Media

e: da...@slashdotmedia.com 


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Jacek Jarosiewicz
Administrator Systemów Informatycznych


SUPERMEDIA Sp. z o.o. z siedzibą w Warszawie
ul. Senatorska 13/15, 00-075 Warszawa
Sąd Rejonowy dla m.st.Warszawy, XII Wydział Gospodarczy Krajowego 
Rejestru Sądowego,

nr KRS 029537; kapitał zakładowy 42.756.000 zł
NIP: 957-05-49-503
Adres korespondencyjny: ul. Jubilerska 10, 04-190 Warszawa


SUPERMEDIA ->   http://www.supermedia.pl
dostep do internetu - hosting - kolokacja - lacza - telefonia
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Where does 130IOPS come from?

2015-07-10 Thread Jan Schermer
Every HDD has a “mean access time” number (related to the rotation speed, 
number of heads, etc.)
with 8ms access time this gives you 1000/8 = 125 seeks per second.

This is where it comes from :-)
Of course the best case will be better, I generally calculate 150 IOPS for any 
SATA drive, 200 for SAS. Some high-end FC disks with 15K RPM can have as high 
as 300 IOPS, but that’s probably not what we want to use here.

Jan

> On 02 Jul 2015, at 17:53, Steffen Tilsch  wrote:
> 
> Hello Cephers,
> 
> Whenever I read about HDDs for OSDs it is told that "they will deliver around 
> 130 IOPS".
> Where does this number come from and how it was measured (random/seq, how big 
> where the IOs, which queue-dephat what latency) or is it more a general 
> number depending on disk seek times?
> 
> Regards and thanks for clarifying, 
> Steffen
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Nova with Ceph generate error

2015-07-10 Thread Sebastien Han
Which request generated this trace?
Is it  nova-compute log?

> On 10 Jul 2015, at 07:13, Mario Codeniera  wrote:
> 
> Hi,
> 
> It is my first time here. I am just having an issue regarding with my 
> configuration with the OpenStack which works perfectly for the cinder and the 
> glance based on Kilo release in CentOS 7. I am based my documentation on this 
> rbd-opeenstack manual.
> 
> 
> If I enable my rbd in the nova.conf it generates error like the following in 
> the dashboard as the logs don't have any errors:
> 
> Internal Server Error (HTTP 500) (Request-ID: 
> req-231347dd-f14c-4f97-8a1d-851a149b037c)
> Code
> 500
> Details
> File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 343, in 
> decorated_function return function(self, context, *args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2737, in 
> terminate_instance do_terminate_instance(instance, bdms) File 
> "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 445, 
> in inner return f(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2735, in 
> do_terminate_instance self._set_instance_error_state(context, instance) File 
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in 
> __exit__ six.reraise(self.type_, self.value, self.tb) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2725, in 
> do_terminate_instance self._delete_instance(context, instance, bdms, quotas) 
> File "/usr/lib/python2.7/site-packages/nova/hooks.py", line 149, in inner rv 
> = f(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2694, in 
> _delete_instance quotas.rollback() File 
> "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 85, in 
> __exit__ six.reraise(self.type_, self.value, self.tb) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2664, in 
> _delete_instance self._shutdown_instance(context, instance, bdms) File 
> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2604, in 
> _shutdown_instance self.volume_api.detach(context, bdm.volume_id) File 
> "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 214, in 
> wrapper res = method(self, ctx, volume_id, *args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/nova/volume/cinder.py", line 365, in detach 
> cinderclient(context).volumes.detach(volume_id) File 
> "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 334, in 
> detach return self._action('os-detach', volume) File 
> "/usr/lib/python2.7/site-packages/cinderclient/v2/volumes.py", line 311, in 
> _action return self.api.client.post(url, body=body) File 
> "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 91, in post 
> return self._cs_request(url, 'POST', **kwargs) File 
> "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 85, in 
> _cs_request return self.request(url, method, **kwargs) File 
> "/usr/lib/python2.7/site-packages/cinderclient/client.py", line 80, in 
> request return super(SessionClient, self).request(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 206, in 
> request resp = super(LegacyJsonAdapter, self).request(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 95, in 
> request return self.session.request(url, method, **kwargs) File 
> "/usr/lib/python2.7/site-packages/keystoneclient/utils.py", line 318, in 
> inner return func(*args, **kwargs) File 
> "/usr/lib/python2.7/site-packages/keystoneclient/session.py", line 397, in 
> request raise exceptions.from_response(resp, method, url)
> Created
> 10 Jul 2015, 4:40 a.m.
> 
> 
> Again if disable I able to work but it is generated on the compute node, as I 
> observe too it doesn't display the hypervisor of the compute nodes, or maybe 
> it is related.
> 
> It was working on Juno before, but there are unexpected rework as the network 
> infrastructure was change which the I rerun the script and found lots of 
> conflicts et al as I run before using qemu-img-rhev qemu-kvm-rhev from OVirt 
> but seems the new hammer (Ceph repository) solve the issue.
> 
> Hope someone can enlighten.
> 
> Thanks,
> Mario
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Cheers.

Sébastien Han
Senior Cloud Architect

"Always give 100%. Unless you're giving blood."

Mail: s...@redhat.com
Address: 11 bis, rue Roquépine - 75008 Paris



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-10 Thread David Burley
>
> In a similar direction, one could try using bcache on top of the actual
> spinner. Have you tried that, too?
>
>
We haven't tried bcache/flashcache/...

-- 
David Burley
NOC Manager, Sr. Systems Programmer/Analyst
Slashdot Media

e: da...@slashdotmedia.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] git.ceph.com seems to lack IPv6 address

2015-07-10 Thread Jaakko Hämäläinen

Hey,

I noticed today when trying to upgrade one of our clusters from Giant to
Hammer with ceph-deploy, but i ain't able to receive release.asc.

This happens because wget to ceph.com/git fails (seems to redirect to 
git.ceph.com)

and then uses git.ceph.com that doesn't have IPv6 address at all.

Servers of this cluster are using only IPv6 addresses.
Obvious workaround would be using IPv4 addresses on servers, but i ain't
so keen to do that as it has worked  with IPv6 before.
Would it be possible to enable IPv6 on git.ceph.com or is there
some alternate way of receiving gpg-key with ceph-ceploy?

Relevant parts of ceph-deploy:
[backup01][WARNIN] --2015-07-10 14:02:33-- 
https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
[backup01][WARNIN] Resolving ceph.com (ceph.com)... 
2607:f298:4:147::b05:fe2a, 208.113.241.137
[backup01][WARNIN] Connecting to ceph.com 
(ceph.com)|2607:f298:4:147::b05:fe2a|:443... connected.
[backup01][WARNIN] HTTP request sent, awaiting response... 301 Moved 
Permanently
[backup01][WARNIN] Location: 
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc [following]
[backup01][WARNIN] --2015-07-10 14:02:34-- 
https://git.ceph.com/?p=ceph.git;a=blob_plain;f=keys/release.asc

[backup01][WARNIN] Resolving git.ceph.com (git.ceph.com)... 67.205.20.229
[backup01][WARNIN] Connecting to git.ceph.com 
(git.ceph.com)|67.205.20.229|:443... failed: Network is unreachable.



Brgds,
Jaakko
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph in a shared environment

2015-07-10 Thread Jan Schermer
We run CEPH OSDs on the same hosts as QEMU/KVM with OpenStack. You need to 
segregate the processes so the OSDs have their dedicated cores and memory, 
other than that it works fine. Our MONs also run on the same hosts as the 
OpenStack controller nodes (L3 agents and such) - no problem here, you just 
need dedicated drives for their data.

Jan

> On 10 Jul 2015, at 12:28, Kris Gillespie  wrote:
> 
> Hi All,
> 
> So this may have been asked but I’ve googled the crap out of this so maybe my 
> google-fu needs work. Does anyone have any experience running a Ceph cluster 
> with the Ceph daemons (mons/osds/rgw) running on the same hosts as other 
> services (so say Docker containers, or really anything generating load). What 
> has been your experience? Used cgroups or seen any reason too? Any 
> performance issues? Troubleshooting a pain? Any other general observations?
> 
> Just curious if anyone out there has done it and to what scale and what 
> issues they’ve encountered.
> 
> Cheers everyone
> 
> Kris Gillespie| System Engineer | bol.com 
> 
> 
> De informatie verzonden met dit e-mailbericht is uitsluitend bestemd voor de 
> geadresseerde. Gebruik van deze informatie door anderen dan de geadresseerde 
> is uitdrukkelijk verboden. Indien u dit bericht per vergissing heeft 
> ontvangen, verzoeken wij u ons onmiddelijk hiervan op de hoogte te stellen en 
> het bericht te vernietigen. Openbaarmaking, vermenigvuldiging, verspreiding 
> en/of verstrekking van deze informatie aan derden is niet toegestaan. Bol.com 
> b.v. staat niet in voor de juiste en volledige overbrenging van de inhoud van 
> een verzonden e-mail, noch voor tijdige ontvangst daarvan en aanvaardt geen 
> aansprakelijkheid in dezen. 
> The information contained in this communication is confidential and may be 
> legally privileged. It is intended solely for the use of the individual or 
> entity to whom it is addressed and others authorised to receive it. If you 
> are not the intended recipient please notify the sender and destroy this 
> message. Any disclosure, copying, distribution or taking any action in 
> reliance on the contents of this information is strictly prohibited and may 
> be unlawful. Bol.com b.v. is neither liable for the proper and complete 
> transmission of the information contained in this communication nor for delay 
> in its receipt.
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph in a shared environment

2015-07-10 Thread Kris Gillespie
Hi All,

So this may have been asked but I’ve googled the crap out of this so maybe my 
google-fu needs work. Does anyone have any experience running a Ceph cluster 
with the Ceph daemons (mons/osds/rgw) running on the same hosts as other 
services (so say Docker containers, or really anything generating load). What 
has been your experience? Used cgroups or seen any reason too? Any performance 
issues? Troubleshooting a pain? Any other general observations?

Just curious if anyone out there has done it and to what scale and what issues 
they’ve encountered.

Cheers everyone

Kris Gillespie| System Engineer | bol.com


De informatie verzonden met dit e-mailbericht is uitsluitend bestemd voor de 
geadresseerde. Gebruik van deze informatie door anderen dan de geadresseerde is 
uitdrukkelijk verboden. Indien u dit bericht per vergissing heeft ontvangen, 
verzoeken wij u ons onmiddelijk hiervan op de hoogte te stellen en het bericht 
te vernietigen. Openbaarmaking, vermenigvuldiging, verspreiding en/of 
verstrekking van deze informatie aan derden is niet toegestaan. Bol.com b.v. 
staat niet in voor de juiste en volledige overbrenging van de inhoud van een 
verzonden e-mail, noch voor tijdige ontvangst daarvan en aanvaardt geen 
aansprakelijkheid in dezen.
The information contained in this communication is confidential and may be 
legally privileged. It is intended solely for the use of the individual or 
entity to whom it is addressed and others authorised to receive it. If you are 
not the intended recipient please notify the sender and destroy this message. 
Any disclosure, copying, distribution or taking any action in reliance on the 
contents of this information is strictly prohibited and may be unlawful. 
Bol.com b.v. is neither liable for the proper and complete transmission of the 
information contained in this communication nor for delay in its receipt.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to prefer faster disks in same pool

2015-07-10 Thread Lionel Bouton
On 07/10/15 02:13, Christoph Adomeit wrote:
> Hi Guys,
>
> I have a ceph pool that is mixed with 10k rpm disks and 7.2 k rpm disks.
>
> There are 85 osds and 10 of them are 10k
> Size is not an issue, the pool is filled only 20%
>
> I want to somehow prefer the 10 k rpm disks so that they get more i/o
>
> What is the most intelligent wy to prefer the faster disks ?
> Just give them another weight or are there other methods ?

If you cluster is read intensive you can use primary affinity to
redirect reads to your 10k drives. Add

mon osd allow primary affinity = true

in your ceph.conf, restart your monitors and for each OSD on 7.2k use :

ceph osd primary-affinity <7.2k_id> 0

For every pg with at least one 10k OSD, this will make one of the 10k
drive OSD primary and will perform reads on it.

But with only 10 OSDs being 10k and 75 OSDs being 7.2k, I'm not sure
what will happen: most pgs clearly will be only on 7.2k OSDs so you may
not gain much.

It's worth a try if you don't want to reorganize your storage though and
it's by far the less time consuming if you want to revert your changes
later.

Another way with better predictability would be to define a 10k root and
use a custom rule for your pool which would take the primary from this
new root and switch to the default root for the next OSDs, but you don't
have enough of them to keep the data balanced (for a size=3 pool, you'd
need 1/3 of 10k OSD and 2/3 of 7.2k OSD). This would create a bottleneck
on your 10k drives.

I fear there's no gain in creating a separate 10k pool: you don't have
enough drives to get as much performance from the new 10k pool as you
can from the resulting 7.2k-only pool. Maybe with some specific data
access pattern this could work but I'm not sure what those would be (you
might get more useful suggestions if you describe how the current pool
is used).

Lionel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client failing to respond to cache pressure

2015-07-10 Thread 谷枫
Thank you John,
All my server is ubuntu14.04 with 3.16 kernel.
Not all of clients appear this problem, the cluster seems functioning well
now.
As you say,i will change the mds_cache_size to 50 from 10 to take a
test, thanks again!

2015-07-10 17:00 GMT+08:00 John Spray :

>
> This is usually caused by use of older kernel clients.  I don't remember
> exactly what version it was fixed in, but iirc we've seen the problem with
> 3.14 and seen it go away with 3.18.
>
> If your system is otherwise functioning well, this is not a critical error
> -- it just means that the MDS might not be able to fully control its memory
> usage (i.e. it can exceed mds_cache_size).
>
> John
>
> On 10/07/2015 05:25, 谷枫 wrote:
>
>> hi,
>> I use CephFS in production environnement with 7osd,1mds,3mon now.
>> So far so good,but i have a problem with it today.
>> The ceph status report this:
>> cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228
>>   health HEALTH_WARN
>>  mds0: Client 34271 failing to respond to cache pressure
>>  mds0: Client 74175 failing to respond to cache pressure
>>  mds0: Client 74181 failing to respond to cache pressure
>>  mds0: Client 34247 failing to respond to cache pressure
>>  mds0: Client 64162 failing to respond to cache pressure
>>  mds0: Client 136744 failing to respond to cache pressure
>>   monmap e2: 3 mons at {node01=
>> 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0  <
>> http://10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0>}
>>
>>  election epoch 186, quorum 0,1,2 node01,node02,node03
>>   mdsmap e46: 1/1/1 up {0=tree01=up:active}
>>   osdmap e717: 7 osds: 7 up, 7 in
>>pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects
>>  138 GB used, 1364 GB / 1502 GB avail
>>   264 active+clean
>>client io 1018 B/s rd, 1273 B/s wr, 0 op/s
>>
>> I add two osds with the version 0.94.2 and other old osds is 0.94.1
>> yesterday.
>> So the question is does this matter?
>> What's the warning mean ,and how can i solve this problem.Thanks!
>> This is my cluster config message with mds:
>>  "name": "mds.tree01",
>>  "debug_mds": "1\/5",
>>  "debug_mds_balancer": "1\/5",
>>  "debug_mds_locker": "1\/5",
>>  "debug_mds_log": "1\/5",
>>  "debug_mds_log_expire": "1\/5",
>>  "debug_mds_migrator": "1\/5",
>>  "admin_socket": "\/var\/run\/ceph\/ceph-mds.tree01.asok",
>>  "log_file": "\/var\/log\/ceph\/ceph-mds.tree01.log",
>>  "keyring": "\/var\/lib\/ceph\/mds\/ceph-tree01\/keyring",
>>  "mon_max_mdsmap_epochs": "500",
>>  "mon_mds_force_trim_to": "0",
>>  "mon_debug_dump_location": "\/var\/log\/ceph\/ceph-mds.tree01.tdump",
>>  "client_use_random_mds": "false",
>>  "mds_data": "\/var\/lib\/ceph\/mds\/ceph-tree01",
>>  "mds_max_file_size": "1099511627776",
>>  "mds_cache_size": "10",
>>  "mds_cache_mid": "0.7",
>>  "mds_max_file_recover": "32",
>>  "mds_mem_max": "1048576",
>>  "mds_dir_max_commit_size": "10",
>>  "mds_decay_halflife": "5",
>>  "mds_beacon_interval": "4",
>>  "mds_beacon_grace": "15",
>>  "mds_enforce_unique_name": "true",
>>  "mds_blacklist_interval": "1440",
>>  "mds_session_timeout": "120",
>>  "mds_revoke_cap_timeout": "60",
>>  "mds_recall_state_timeout": "60",
>>  "mds_freeze_tree_timeout": "30",
>>  "mds_session_autoclose": "600",
>>  "mds_health_summarize_threshold": "10",
>>  "mds_reconnect_timeout": "45",
>>  "mds_tick_interval": "5",
>>  "mds_dirstat_min_interval": "1",
>>  "mds_scatter_nudge_interval": "5",
>>  "mds_client_prealloc_inos": "1000",
>>  "mds_early_reply": "true",
>>  "mds_default_dir_hash": "2",
>>  "mds_log": "true",
>>  "mds_log_skip_corrupt_events": "false",
>>  "mds_log_max_events": "-1",
>>  "mds_log_events_per_segment": "1024",
>>  "mds_log_segment_size": "0",
>>  "mds_log_max_segments": "30",
>>  "mds_log_max_expiring": "20",
>>  "mds_bal_sample_interval": "3",
>>  "mds_bal_replicate_threshold": "8000",
>>  "mds_bal_unreplicate_threshold": "0",
>>  "mds_bal_frag": "false",
>>  "mds_bal_split_size": "1",
>>  "mds_bal_split_rd": "25000",
>>  "mds_bal_split_wr": "1",
>>  "mds_bal_split_bits": "3",
>>  "mds_bal_merge_size": "50",
>>  "mds_bal_merge_rd": "1000",
>>  "mds_bal_merge_wr": "1000",
>>  "mds_bal_interval": "10",
>>  "mds_bal_fragment_interval": "5",
>>  "mds_bal_idle_threshold": "0",
>>  "mds_bal_max": "-1",
>>  "mds_bal_max_until": "-1",
>>  "mds_bal_mode": "0",
>>  "mds_bal_min_rebalance": "0.1",
>>  "mds_bal_min_start": "0.2",
>>  "mds_bal_need_min": "0.8",
>>  "mds_bal_need_max": "1.2",
>>  "mds_bal_midchunk": "0.3",
>>  "mds_bal_minchunk": "0.001",
>>  "mds_bal_target_removal_min": "5",
>>  "md

Re: [ceph-users] mds0: Client failing to respond to cache pressure

2015-07-10 Thread John Spray


This is usually caused by use of older kernel clients.  I don't remember 
exactly what version it was fixed in, but iirc we've seen the problem 
with 3.14 and seen it go away with 3.18.


If your system is otherwise functioning well, this is not a critical 
error -- it just means that the MDS might not be able to fully control 
its memory usage (i.e. it can exceed mds_cache_size).


John

On 10/07/2015 05:25, 谷枫 wrote:

hi,
I use CephFS in production environnement with 7osd,1mds,3mon now.
So far so good,but i have a problem with it today.
The ceph status report this:
cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228
  health HEALTH_WARN
 mds0: Client 34271 failing to respond to cache pressure
 mds0: Client 74175 failing to respond to cache pressure
 mds0: Client 74181 failing to respond to cache pressure
 mds0: Client 34247 failing to respond to cache pressure
 mds0: Client 64162 failing to respond to cache pressure
 mds0: Client 136744 failing to respond to cache pressure
  monmap e2: 3 mons at 
{node01=10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0  
}
 election epoch 186, quorum 0,1,2 node01,node02,node03
  mdsmap e46: 1/1/1 up {0=tree01=up:active}
  osdmap e717: 7 osds: 7 up, 7 in
   pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects
 138 GB used, 1364 GB / 1502 GB avail
  264 active+clean
   client io 1018 B/s rd, 1273 B/s wr, 0 op/s

I add two osds with the version 0.94.2 and other old osds is 0.94.1 yesterday.
So the question is does this matter?
What's the warning mean ,and how can i solve this problem.Thanks!
This is my cluster config message with mds:
 "name": "mds.tree01",
 "debug_mds": "1\/5",
 "debug_mds_balancer": "1\/5",
 "debug_mds_locker": "1\/5",
 "debug_mds_log": "1\/5",
 "debug_mds_log_expire": "1\/5",
 "debug_mds_migrator": "1\/5",
 "admin_socket": "\/var\/run\/ceph\/ceph-mds.tree01.asok",
 "log_file": "\/var\/log\/ceph\/ceph-mds.tree01.log",
 "keyring": "\/var\/lib\/ceph\/mds\/ceph-tree01\/keyring",
 "mon_max_mdsmap_epochs": "500",
 "mon_mds_force_trim_to": "0",
 "mon_debug_dump_location": "\/var\/log\/ceph\/ceph-mds.tree01.tdump",
 "client_use_random_mds": "false",
 "mds_data": "\/var\/lib\/ceph\/mds\/ceph-tree01",
 "mds_max_file_size": "1099511627776",
 "mds_cache_size": "10",
 "mds_cache_mid": "0.7",
 "mds_max_file_recover": "32",
 "mds_mem_max": "1048576",
 "mds_dir_max_commit_size": "10",
 "mds_decay_halflife": "5",
 "mds_beacon_interval": "4",
 "mds_beacon_grace": "15",
 "mds_enforce_unique_name": "true",
 "mds_blacklist_interval": "1440",
 "mds_session_timeout": "120",
 "mds_revoke_cap_timeout": "60",
 "mds_recall_state_timeout": "60",
 "mds_freeze_tree_timeout": "30",
 "mds_session_autoclose": "600",
 "mds_health_summarize_threshold": "10",
 "mds_reconnect_timeout": "45",
 "mds_tick_interval": "5",
 "mds_dirstat_min_interval": "1",
 "mds_scatter_nudge_interval": "5",
 "mds_client_prealloc_inos": "1000",
 "mds_early_reply": "true",
 "mds_default_dir_hash": "2",
 "mds_log": "true",
 "mds_log_skip_corrupt_events": "false",
 "mds_log_max_events": "-1",
 "mds_log_events_per_segment": "1024",
 "mds_log_segment_size": "0",
 "mds_log_max_segments": "30",
 "mds_log_max_expiring": "20",
 "mds_bal_sample_interval": "3",
 "mds_bal_replicate_threshold": "8000",
 "mds_bal_unreplicate_threshold": "0",
 "mds_bal_frag": "false",
 "mds_bal_split_size": "1",
 "mds_bal_split_rd": "25000",
 "mds_bal_split_wr": "1",
 "mds_bal_split_bits": "3",
 "mds_bal_merge_size": "50",
 "mds_bal_merge_rd": "1000",
 "mds_bal_merge_wr": "1000",
 "mds_bal_interval": "10",
 "mds_bal_fragment_interval": "5",
 "mds_bal_idle_threshold": "0",
 "mds_bal_max": "-1",
 "mds_bal_max_until": "-1",
 "mds_bal_mode": "0",
 "mds_bal_min_rebalance": "0.1",
 "mds_bal_min_start": "0.2",
 "mds_bal_need_min": "0.8",
 "mds_bal_need_max": "1.2",
 "mds_bal_midchunk": "0.3",
 "mds_bal_minchunk": "0.001",
 "mds_bal_target_removal_min": "5",
 "mds_bal_target_removal_max": "10",
 "mds_replay_interval": "1",
 "mds_shutdown_check": "0",
 "mds_thrash_exports": "0",
 "mds_thrash_fragments": "0",
 "mds_dump_cache_on_map": "false",
 "mds_dump_cache_after_rejoin": "false",
 "mds_verify_scatter": "false",
 "mds_debug_scatterstat": "false",
 "mds_debug_frag": "false",
 "mds_debug_auth_pins": "false",
 "mds_debug_subtrees": "false",
 "mds_kill_mdstable_at": "0",
 "mds_kill_export_at": "0",
 "mds_kill_import_at": "0",
 "mds_kill_link_at": "0",
 "mds_kill_rename_at": "0",
 "

Re: [ceph-users] External XFS Filesystem Journal on OSD

2015-07-10 Thread Lars Marowsky-Bree
On 2015-07-09T14:05:55, David Burley  wrote:

> Converted a few of our OSD's (spinners) over to a config where the OSD
> journal and XFS journal both live on an NVMe drive (Intel P3700). The XFS
> journal might have provided some very minimal performance gains (3%,
> maybe). Given the low gains, we're going to reject this as something to dig
> into deeper and stick with the simpler configuration of just using the NVMe
> drives for OSD journaling and leave the XFS journals on the partition.

Thanks, those numbers are very useful.

In a similar direction, one could try using bcache on top of the actual
spinner. Have you tried that, too?


Regards,
Lars

-- 
Architect Storage/HA
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Dilip Upmanyu, Graham 
Norton, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] openstack + ceph volume mount to vm

2015-07-10 Thread vida ahmadi
Hi cepher,
I have a same problem
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/035999.html
Is there any solution for that?

thanks
Vida
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com