Re: [ceph-users] two keys for one single uid

2017-11-23 Thread Henrik Korkuc
radosgw-admin key create --key-type s3 --uid user_uuid 
--access-key=some_access_key --secret-key=some_secret_key


or you can instruct it to generate access/secret

On 17-11-23 01:25, Daniel Picolli Biazus wrote:

Hey Guys,

Is it possible generating two keys in one single user/uid on rados S3 ?

Something like that:

radosgw-admin user info --uid=0001
{
    "user_id": "0001",
    "display_name": "0001",
    "email": "",
    "suspended": 0,
    "max_buckets": 10,
    "auid": 0,
    "subusers": [],
    "keys": [
        {
            "user": "0001",
            "access_key": "foo1",
            "secret_key": "bar1"
        },
        {
            "user": "0001",
            "access_key": "foo2",
            "secret_key": "bar2"
        }
    ],
    "swift_keys": [],
    "caps": [
        {
            "type": "usage",
            "perm": "*"
        },
        {
            "type": "users",
            "perm": "read"
        }
    ],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "max_size_kb": -1,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": true,
        "max_size_kb": 104857600,
        "max_objects": -1
    },
    "temp_url_keys": []
}


Best Regards,
Biazus


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS desync

2017-11-03 Thread Henrik Korkuc

On 17-11-03 09:29, Andrey Klimentyev wrote:

Thanks for a swift response.

We are using 10.2.10.

They all share the same set of permissions (and one key, too). Haven't 
found anything incriminating in logs, too.


caps: [mon] allow r
caps: [osd] allow class-read object_prefix rbd_children, allow rwx 
pool=rbd


Are you sure you pasted correct user permissions? It looks like you are 
using RBD permissions for CephFS and this seems to be the problem.


On 3 November 2017 at 00:56, Gregory Farnum > wrote:


On Thu, Nov 2, 2017 at 9:05 AM Andrey Klimentyev
>
wrote:

Hi,

we've recently hit a problem in a production cluster. The gist
of it is that sometimes file will be changed on one machine,
but only the "change time" would propagate to others. The
checksum is different. Contents, obviously, differ as well.
How can I debug this?

In other words, how would I approach such problem with "stuck
files"? Haven't found anything on Google or troubleshooting docs.


What versions are you running?
The only way I can think of this happening is if one of the
clients had permission to access the CephFS namespace on the MDS,
but not to write to the OSDs which store the file data. Have you
checked that the clients all have the same caps? ("ceph auth list"
or one of the related more-specific commands will let you compare.)
-Greg


-- 
Andrey Klimentyev,

DevOps engineer @ JSC «Flant»
http://flant.com/ 
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Andrey Klimentyev,
DevOps engineer @ JSC «Flant»
http://flant.com/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What's about release-note for 10.2.10?

2017-10-06 Thread Henrik Korkuc

On 17-10-06 11:25, ulem...@polarzone.de wrote:

Hi,
again is an update available without release-note...
http://ceph.com/releases/v10-2-10-jewel-released/ isn't found.
No announcement on the mailing list (perhaps i miss something).

While I do not see v10.2.10 tag in repo yet, it looks like packages were 
built less than two days ago. So maybe they didn't have time to send 
release notes yet?


Building takes time and sending notes before packages are available is 
not a good idea too.



I know, normaly it's save to update ceph, but two releases ago it wasn't.



Udo
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Large amount of files - cephfs?

2017-09-28 Thread Henrik Korkuc

On 17-09-27 14:57, Josef Zelenka wrote:

Hi,

we are currently working on a ceph solution for one of our customers. 
They run a file hosting and they need to store approximately 100 
million of pictures(thumbnails). Their current code works with FTP, 
that they use as a storage. We thought that we could use cephfs for 
this, but i am not sure how it would behave with that many files, how 
would the performance be affected etc. Is cephfs useable in this 
scenario, or would radosgw+swift be better(they'd likely have to 
rewrite some of the code, so we'd prefer not to do this)? We already 
have some experience with cephfs for storing bigger files, streaming 
etc so i'm not completely new to this, but i thought it'd be better to 
ask more experiened users. Some advice on this would be greatly 
appreciated, thanks,


Josef

Depending on your OSD count, you should be able to put 100mil of files 
there. As others mentioned, depending on your workload, metadata may be 
a bottleneck.


If metadata is not a concern, then you just need to have enough OSDs to 
distribute RADOS objects. You should be fine with few millions objects 
per OSDs, going with tens of millions per OSD may be more problematic as 
you have larger memory usage, OSDs are slower, backfill/recovery is slow.



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.2.0 bluestore - OSD down/crash " internal heartbeat not healthy, dropping ping reques "

2017-09-20 Thread Henrik Korkuc

On 17-09-20 08:06, nokia ceph wrote:

Hello,

Env:- RHEL 7.2 , 3.10.0-327.el7.x86_64 , EC 4+1 , bluestore

We are writing to ceph via librados C API  . Testing with rados no 
issues.


The same we tested with Jewel/kraken without any issue. Need your view 
how to debug this issue?
maybe similar to http://tracker.ceph.com/issues/21180? It seems it was 
resolved for me with mentioned fix. You could apply mentioned config 
options and see if it helps (and build newer version, if able).


>>

OSD.log
==

~~~

2017-09-18 14:51:59.895746 7f1e744e0700  0 log_channel(cluster) log 
[WRN] : slow request 60.068824 seconds old, received at 2017-09-18 
14:50:59.826849: MOSDECSubOpWriteReply(1.132s0 1350/1344 
ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, 
applied=0)) currently queued_for_pg
2017-09-18 14:51:59.895749 7f1e744e0700  0 log_channel(cluster) log 
[WRN] : slow request 60.068737 seconds old, received at 2017-09-18 
14:50:59.826936: MOSDECSubOpWriteReply(1.132s0 1350/1344 
ECSubWriteReply(tid=971, last_complete=0'0, committed=0, applied=1)) 
currently queued_for_pg
2017-09-18 14:51:59.895754 7f1e744e0700  0 log_channel(cluster) log 
[WRN] : slow request 60.067539 seconds old, received at 2017-09-18 
14:50:59.828134: MOSDECSubOpWriteReply(1.132s0 1350/1344 
ECSubWriteReply(tid=971, last_complete=1350'153, committed=1, 
applied=0)) currently queued_for_pg
2017-09-18 14:51:59.923825 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1359 k (1083 k + 276 k)
2017-09-18 14:51:59.923835 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1066 k (1066 k + 0 )
2017-09-18 14:51:59.923837 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 643 k (643 k + 0 )
2017-09-18 14:51:59.923840 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1049 k (1049 k + 0 )
2017-09-18 14:51:59.923842 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 896 k (896 k + 0 )
2017-09-18 14:51:59.940780 7f1e77ca5700 20 osd.181 1350 share_map_peer 
0x7f1e8dbf2800 already has epoch 1350
2017-09-18 14:51:59.940855 7f1e78ca7700 20 osd.181 1350 share_map_peer 
0x7f1e8dbf2800 already has epoch 1350
2017-09-18 14:52:00.081390 7f1e6f572700 20 osd.181 1350 
OSD::ms_dispatch: ping magic: 0 v1
2017-09-18 14:52:00.081393 7f1e6f572700 10 osd.181 1350 do_waiters -- 
start
2017-09-18 14:52:00.081394 7f1e6f572700 10 osd.181 1350 do_waiters -- 
finish
2017-09-18 14:52:00.081395 7f1e6f572700 20 osd.181 1350 _dispatch 
0x7f1e90923a40 ping magic: 0 v1
2017-09-18 14:52:00.081397 7f1e6f572700 10 osd.181 1350 ping from 
client.414556
2017-09-18 14:52:00.123908 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1359 k (1083 k + 276 k)
2017-09-18 14:52:00.123926 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1066 k (1066 k + 0 )
2017-09-18 14:52:00.123932 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 643 k (643 k + 0 )
2017-09-18 14:52:00.123937 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 1049 k (1049 k + 0 )
2017-09-18 14:52:00.123942 7f1e71cdb700 10 trim shard target 102 M 
meta/data ratios 0.5 + 0 (52428 k + 0 ),  current 896 k (896 k + 0 )
2017-09-18 14:52:00.145445 7f1e784a6700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 60
2017-09-18 14:52:00.145450 7f1e784a6700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 60
2017-09-18 14:52:00.145496 7f1e784a6700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f1e63cbf700' had timed out after 60
2017-09-18 14:52:00.145534 7f1e784a6700 10 osd.181 1350 internal 
heartbeat not healthy, dropping ping request
2017-09-18 14:52:00.146224 7f1e78ca7700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f1e61cbb700' had timed out after 60
2017-09-18 14:52:00.146226 7f1e78ca7700  1 heartbeat_map is_healthy 
'OSD::osd_op_tp thread 0x7f1e624bc700' had timed out after 60


~~~

 thread apply all bt

Thread 54 (LWP 479360):
#0  0x7f1e7b5606d5 in ?? ()
#1  0x in ?? ()

Thread 53 (LWP 484888):
#0  0x7f1e7a644b7d in ?? ()
#1  0x in ?? ()

Thread 52 (LWP 484177):
#0  0x7f1e7b5606d5 in ?? ()
#1  0x000a in ?? ()
#2  0x7f1e88d8df98 in ?? ()
#3  0x7f1e88d8df48 in ?? ()
#4  0x000a in ?? ()
#5  0x7f1e5ccaf7f8 in ?? ()
#6  0x7f1e7e45b9ee in ?? ()
#7  0x7f1e88d8d860 in ?? ()
#8  0x7f1e8e6e5500 in ?? ()
#9  0x7f1e889881c0 in ?? ()
#10 0x7f1e7e3e9ea0 in ?? ()
#11 0x in ?? ()

Thread 51 (LWP 484176):
#0  0x7f1e7b5606d5 in ?? ()
#1  0x in ?? ()

Thread 50 (LWP 484175):
#0  0x7f1e7b5606d5 in ?? ()
#1  0x in ?? ()

Thread 49 

Re: [ceph-users] Ceph release cadence

2017-09-07 Thread Henrik Korkuc

On 17-09-06 18:23, Sage Weil wrote:

Hi everyone,

Traditionally, we have done a major named "stable" release twice a year,
and every other such release has been an "LTS" release, with fixes
backported for 1-2 years.

With kraken and luminous we missed our schedule by a lot: instead of
releasing in October and April we released in January and August.

A few observations:

- Not a lot of people seem to run the "odd" releases (e.g., infernalis,
kraken).  This limits the value of actually making them.  It also means
that those who *do* run them are running riskier code (fewer users -> more
bugs).

- The more recent requirement that upgrading clusters must make a stop at
each LTS (e.g., hammer -> luminous not supported, must go hammer -> jewel
-> lumninous) has been hugely helpful on the development side by reducing
the amount of cross-version compatibility code to maintain and reducing
the number of upgrade combinations to test.

- When we try to do a time-based "train" release cadence, there always
seems to be some "must-have" thing that delays the release a bit.  This
doesn't happen as much with the odd releases, but it definitely happens
with the LTS releases.  When the next LTS is a year away, it is hard to
suck it up and wait that long.

A couple of options:

* Keep even/odd pattern, and continue being flexible with release dates

   + flexible
   - unpredictable
   - odd releases of dubious value

* Keep even/odd pattern, but force a 'train' model with a more regular
cadence

   + predictable schedule
   - some features will miss the target and be delayed a year

* Drop the odd releases but change nothing else (i.e., 12-month release
cadence)

   + eliminate the confusing odd releases with dubious value
  
* Drop the odd releases, and aim for a ~9 month cadence. This splits the

difference between the current even/odd pattern we've been doing.

   + eliminate the confusing odd releases with dubious value
   + waiting for the next release isn't quite as bad
   - required upgrades every 9 months instead of ever 12 months

* Drop the odd releases, but relax the "must upgrade through every LTS" to
allow upgrades across 2 versions (e.g., luminous -> mimic or luminous ->
nautilus).  Shorten release cycle (~6-9 months).

   + more flexibility for users
   + downstreams have greater choice in adopting an upstrema release
   - more LTS branches to maintain
   - more upgrade paths to consider

Other options we should consider?  Other thoughts?

What about this:
* drop odd releases
* have ~9 months release schedule
* no version jumping
* bugfix support for 2 release cycles
* list of major incoming features with their status, disabled by feature 
flag.
* have more QA passed dev releases so that people waiting for new 
features would be able to try them out


This way we trade shorter release cycle with longer bugfix support but 
no version jumping. This way stable folks could upgrade from 
"legacy-stable" to "old-stable" having already multiple minor fixes in 
both releases.


And bleading-edge people waiting for some features would know current 
status of new features (e.g. multi-active MDS going stable in L was a 
surprise for me). With dev releases to run in dev/staging env for testing.


Potentially shorter releases would have less features in, so it should 
be less risky to use them and of course shorter wait before new release.



Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph release cadence

2017-09-06 Thread Henrik Korkuc

On 17-09-07 02:42, Deepak Naidu wrote:

Hope collective feedback helps. So here's one.


- Not a lot of people seem to run the "odd" releases (e.g., infernalis, kraken).

I think the more obvious reason companies/users wanting to use CEPH will stick 
with LTS versions as it models the 3yr  support cycle.
Maybe I missed something, but I think Ceph does not support LTS releases 
for 3 years.



* Drop the odd releases, and aim for a ~9 month cadence. This splits the 
difference between the current even/odd pattern we've been doing.

Yes, provided an easy upgrade process.


--
Deepak




-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Sage 
Weil
Sent: Wednesday, September 06, 2017 8:24 AM
To: ceph-de...@vger.kernel.org; ceph-maintain...@ceph.com; ceph-us...@ceph.com
Subject: [ceph-users] Ceph release cadence

Hi everyone,

Traditionally, we have done a major named "stable" release twice a year, and every other 
such release has been an "LTS" release, with fixes backported for 1-2 years.

With kraken and luminous we missed our schedule by a lot: instead of releasing 
in October and April we released in January and August.

A few observations:

- Not a lot of people seem to run the "odd" releases (e.g., infernalis, kraken).  
This limits the value of actually making them.  It also means that those who *do* run them 
are running riskier code (fewer users -> more bugs).

- The more recent requirement that upgrading clusters must make a stop at each LTS 
(e.g., hammer -> luminous not supported, must go hammer -> jewel
-> lumninous) has been hugely helpful on the development side by
-> reducing
the amount of cross-version compatibility code to maintain and reducing the 
number of upgrade combinations to test.

- When we try to do a time-based "train" release cadence, there always seems to be some 
"must-have" thing that delays the release a bit.  This doesn't happen as much with the 
odd releases, but it definitely happens with the LTS releases.  When the next LTS is a year away, 
it is hard to suck it up and wait that long.

A couple of options:

* Keep even/odd pattern, and continue being flexible with release dates

   + flexible
   - unpredictable
   - odd releases of dubious value

* Keep even/odd pattern, but force a 'train' model with a more regular cadence

   + predictable schedule
   - some features will miss the target and be delayed a year

* Drop the odd releases but change nothing else (i.e., 12-month release
cadence)

   + eliminate the confusing odd releases with dubious value
  
* Drop the odd releases, and aim for a ~9 month cadence. This splits the difference between the current even/odd pattern we've been doing.


   + eliminate the confusing odd releases with dubious value
   + waiting for the next release isn't quite as bad
   - required upgrades every 9 months instead of ever 12 months

* Drop the odd releases, but relax the "must upgrade through every LTS" to allow 
upgrades across 2 versions (e.g., luminous -> mimic or luminous -> nautilus).  Shorten 
release cycle (~6-9 months).

   + more flexibility for users
   + downstreams have greater choice in adopting an upstrema release
   - more LTS branches to maintain
   - more upgrade paths to consider

Other options we should consider?  Other thoughts?

Thanks!
sage
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
---
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
---
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multiple OSD crashing on 12.2.0. Bluestore / EC pool / rbd

2017-09-06 Thread Henrik Korkuc

On 17-09-06 16:24, Jean-Francois Nadeau wrote:

Hi,

On a 4 node / 48 OSDs Luminous cluster Im giving a try at RBD on EC 
pools + Bluestore.


Setup went fine but after a few bench runs several OSD are failing and 
many wont even restart.


ceph osd erasure-code-profile set myprofile \
   k=2\
   m=1 \
   crush-failure-domain=host
ceph osd pool create mypool 1024 1024 erasure myprofile
ceph osd pool set mypool allow_ec_overwrites true
rbd pool init mypool
ceph -s
ceph health detail
ceph osd pool create metapool 1024 1024 replicated
rbd create --size 1024G --data-pool mypool --image metapool/test1
rbd bench -p metapool test1 --io-type write --io-size 8192 
--io-pattern rand --io-total 10G

...


One of many OSD failing logs

Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs systemd[1]: 
Started Ceph object storage daemon osd.12.
Sep 05 17:02:54 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
starting osd.12 at - osd_data /var/lib/ceph/osd/ceph-12 
/var/lib/ceph/osd/ceph-12/journal
Sep 05 17:02:56 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
2017-09-05 17:02:56.627301 7fe1a2e42d00 -1 osd.12 2219 log_to_monitors 
{default=true}
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
2017-09-05 17:02:58.686723 7fe1871ac700 -1 
bluestore(/var/lib/ceph/osd/ceph-12) _txc_add_transac
tion error (2) No such file or directory not handled on operation 15 
(op 0, counting from 0)
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
2017-09-05 17:02:58.686742 7fe1871ac700 -1 
bluestore(/var/lib/ceph/osd/ceph-12) unexpected error

 code
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: 
In function 'void BlueStore::_txc_add_transaction(Blu
eStore::TransContext*, ObjectStore::Transaction*)' thread 7fe1871ac700 
time 2017-09-05 17:02:58.686821
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/
centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.0/rpm/el7/BUILD/ceph-12.2.0/src/os/bluestore/BlueStore.cc: 
9282: FAILED assert(0 == "unexpected error")
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) 
luminous (rc)
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x110) [0x7fe1a38bf510]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 2: 
(BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
ObjectStore::Transaction*)+0x1487)

 [0x7fe1a3796057]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 3: 
(BlueStore::queue_transactions(ObjectStore::Sequencer*, 
std::vector&, 
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x3a0) 
[0x7fe1a37970a0]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 4: 
(PrimaryLogPG::queue_transactions(std::vector >&, boost::intrusive_ptr)+0x65) 
[0x7fe1a3508745]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 5: 
(ECBackend::handle_sub_write(pg_shard_t, 
boost::intrusive_ptr, ECSubWrite&, ZTrace

r::Trace const&, Context*)+0x631) [0x7fe1a3628711]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 6: 
(ECBackend::_handle_message(boost::intrusive_ptr)+0x327) 
[0x7fe1a36392b7]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 7: 
(PGBackend::handle_message(boost::intrusive_ptr)+0x50) 
[0x7fe1a353da10]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 8: 
(PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0x58e) [0x

7fe1a34a9a7e]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 9: 
(OSD::dequeue_op(boost::intrusive_ptr, 
boost::intrusive_ptr, ThreadPool::TPHan

dle&)+0x3f9) [0x7fe1a333c729]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
10: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
const&)+0x57) [0x7fe1a35ac1

97]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
11: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0xfce) [0x7fe1a3367c8e]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
12: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x839) 
[0x7fe1a38c5029]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
13: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7fe1a38c6fc0]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
14: (()+0x7dc5) [0x7fe1a0484dc5]
Sep 05 17:02:58 r72-k7-06-01.k8s.ash1.cloudsys.tmcs ceph-osd[4775]: 
15: 

Re: [ceph-users] Luminous Upgrade KRBD

2017-09-06 Thread Henrik Korkuc

On 17-09-06 09:10, Ashley Merrick wrote:


I was just going by : docs.ceph.com/docs/master/start/os-recommendations/


Which states 4.9


docs.ceph.com/docs/master/rados/operations/crush-map


Only goes as far as Jewel and states 4.5


Not sure where else I can find a concrete answer to if 4.10 is new enough.

Well, it looks like docs may need to be revisited as I was unable to use 
kcephfs on 4.9 with luminous before downgrading tunables, not sure about 
4.10.




,Ashley


*From:* Henrik Korkuc <li...@kirneh.eu>
*Sent:* 06 September 2017 06:58:52
*To:* Ashley Merrick; ceph-us...@ceph.com
*Subject:* Re: [ceph-users] Luminous Upgrade KRBD
On 17-09-06 07:33, Ashley Merrick wrote:

Hello,

Have recently upgraded a cluster to Luminous (Running Proxmox), at 
the same time I have upgraded the Compute Cluster to 5.x meaning we 
now run the latest kernel version (Linux 4.10.15-1) Looking to do the 
following :


ceph osd set-require-min-compat-client luminous

Does 4.10 kernel support luminous features? I am afraid (but do not 
have info to back it up) that 4.10 is too old for Luminous features.


Below is the output of ceph features, the 4 number next to the last 
row of luminous is as expected for the 4 Compute nodes, are the other 
4 spread across hammer & jewel log's of when the node's last 
connected before they was upgraded to Proxmox 5.0, am I safe to run 
the above command? No other RBD resources are connected to this cluster.


  "client": {
        "group": {
            "features": "0x106b84a842a42",
            "release": "hammer",
            "num": 1
        },
        "group": {
            "features": "0x40106b84a842a52",
            "release": "jewel",
            "num": 3
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        }




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous Upgrade KRBD

2017-09-05 Thread Henrik Korkuc

On 17-09-06 07:33, Ashley Merrick wrote:

Hello,

Have recently upgraded a cluster to Luminous (Running Proxmox), at the 
same time I have upgraded the Compute Cluster to 5.x meaning we now 
run the latest kernel version (Linux 4.10.15-1) Looking to do the 
following :


ceph osd set-require-min-compat-client luminous

Does 4.10 kernel support luminous features? I am afraid (but do not have 
info to back it up) that 4.10 is too old for Luminous features.


Below is the output of ceph features, the 4 number next to the last 
row of luminous is as expected for the 4 Compute nodes, are the other 
4 spread across hammer & jewel log's of when the node's last connected 
before they was upgraded to Proxmox 5.0, am I safe to run the above 
command? No other RBD resources are connected to this cluster.


  "client": {
        "group": {
            "features": "0x106b84a842a42",
            "release": "hammer",
            "num": 1
        },
        "group": {
            "features": "0x40106b84a842a52",
            "release": "jewel",
            "num": 3
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        }




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 答复: How to enable ceph-mgr dashboard

2017-09-05 Thread Henrik Korkuc

what is output of "netstat -anp | grep 7000"?

On 17-09-05 14:19, 许雪寒 wrote:

Sorry, for the miss formatting, here is the right one:

Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 187, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.httpserver.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/wsgiserver/wsgiserver2.py", line 
1824, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise socket.error(msg)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Stopped 
thread '_TimeoutMonitor'.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Exception in thread HTTPServer Thread-3:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 811, in __bootstrap_inner
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.run()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File "/usr/lib64/python2.7/threading.py", 
line 764, in run
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.__target(*self.__args, **self.__kwargs)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 201, in 
_start_http_thread
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.bus.exit()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 276, in exit
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: os._exit(70) # EX_SOFTWARE
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: TypeError: os_exit_noop() takes no 
arguments (1 given)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Error in 'start' listener 
>
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 197, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: output.append(listener(*args, **kwargs))
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/_cpserver.py", line 151, in start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ServerAdapter.start(self)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 174, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.wait()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/servers.py", line 208, in 
wait
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise self.interrupt
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: error: No socket could be created
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Shutting down 
due to error in start listener:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 235, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: self.publish('start')
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 215, in 
publish
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise exc
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE HTTP Server 
cherrypy._cpwsgi_server.CPWSGIServer(('::', 7000)) already shut down
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE No thread 
running for None.
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus STOPPED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITING
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: [05/Sep/2017:19:01:56] ENGINE Bus EXITED
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858240 7f01a634e700 -1 
mgr serve dashboard.serve:
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: 2017-09-05 19:01:56.858266 7f01a634e700 -1 
mgr serve Traceback (most recent call last):
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib64/ceph/mgr/dashboard/module.py", line 989, in serve
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: cherrypy.engine.start()
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: File 
"/usr/lib/python2.7/site-packages/cherrypy/process/wspbus.py", line 250, in 
start
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: raise e_info
Sep  5 19:01:56 rg1-ceph7 ceph-mgr: ChannelFailures: error('No socket could be 
created',)

-邮件原件-
发件人: ceph-users 

[ceph-users] EC pool as a tier/cache pool

2017-08-25 Thread Henrik Korkuc

Hello,

I tried creating tiering with EC pools (EC pool as a cache for another 
EC pool) and end up with "Error ENOTSUP: tier pool 'ecpool' is an ec 
pool, which cannot be a tier". Having overwrite support on EC pools with 
direct support by RBD and CephFS it may be worth having tiering using EC 
pools (e.g. on SSDs). What others think about it? Maybe it is already 
planned?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS billions of files and inline_data?

2017-08-16 Thread Henrik Korkuc

On 17-08-16 19:40, John Spray wrote:

On Wed, Aug 16, 2017 at 3:27 PM, Henrik Korkuc <li...@kirneh.eu> wrote:

Hello,

I have use case for billions of small files (~1KB) on CephFS and as to my
experience having billions of objects in a pool is not very good idea (ops
slow down, large memory usage, etc) I decided to test CephFS inline_data.
After activating this feature and starting copy process I noticed that
objects are still created on data pool, but their size is 0. Is this
expected behavior? Maybe someone can share tips on using large amount of
small objects? I am on 12.1.3, already using decreased min block size for
bluestore.

Couple of thoughts:
  - Frequently when someone has a "billions of small files" workload
they really want an object store, not a filesystem

in this case I need POSIX, to replace current system.


  - In many cases the major per-file overhead is MDS CPU req/s rather
than the OSD ops, so inline data may be efficient but not result in
overall speedup
  - If you do need to get rid of the overhead of writing objects to the
data pool, you could work on creating a special backtraceless flag
(per-filesystem), where the filesystem cannot do lookups by inode (no
NFS, no hardlinks, limited disaster recovery), but it doesn't write
backtraces either.
It looks like I may need backtraces, so will need to put bunch of 
objects into pools.


maybe you can suggest any recommendations how to scale Ceph for billions 
of objects? More PGs per OSD, more OSDs, more pools? Somewhere in the 
list it was mentioned that OSDs need to keep object list in memory, is 
it still valid for bluestore?


Also setting bluestore_min_alloc_size_* to 1024 results in 
"/tmp/buildd/ceph-12.1.3/src/common/Checksummer.h: 219: FAILED 
assert(csum_data->length() >= (offset + length) / csum_block_size * 
sizeof(typename Alg::value_t))" during OSD start right after ceph-disk 
prepare.



John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS billions of files and inline_data?

2017-08-16 Thread Henrik Korkuc

Hello,

I have use case for billions of small files (~1KB) on CephFS and as to 
my experience having billions of objects in a pool is not very good idea 
(ops slow down, large memory usage, etc) I decided to test CephFS 
inline_data. After activating this feature and starting copy process I 
noticed that objects are still created on data pool, but their size is 
0. Is this expected behavior? Maybe someone can share tips on using 
large amount of small objects? I am on 12.1.3, already using decreased 
min block size for bluestore.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stealth Jewel release?

2017-07-09 Thread Henrik Korkuc

On 17-07-10 08:29, Christian Balzer wrote:

Hello,

so this morning I was greeted with the availability of 10.2.8 for both
Jessie and Stretch (much appreciated), but w/o any announcement here or
updated release notes on the website, etc.

Any reason other "Friday" (US time) for this?

Christian


My guess is that they didn't have time to announce it yet. Maybe pkgs 
were not ready yet on friday?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to replicate metadata only on RGW multisite?

2017-06-30 Thread Henrik Korkuc

Hello,

I have RGW multisite setup on Jewel and I would like to turn off data 
replication there so that only metadata (users, created buckets, etc) 
would be synced but not the data.


Is it possible to make such setup?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v12.1.0 Luminous RC released

2017-06-30 Thread Henrik Korkuc

On 17-06-23 17:13, Abhishek L wrote:


 
   * You can now *optimize CRUSH weights* can now be optimized to

 maintain a *near-perfect distribution of data* across OSDs.


It would be great to get some information on how to use this feature.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] num_caps

2017-06-28 Thread Henrik Korkuc

On 17-05-15 14:49, John Spray wrote:

On Mon, May 15, 2017 at 1:36 PM, Henrik Korkuc <li...@kirneh.eu> wrote:

On 17-05-15 13:40, John Spray wrote:

On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh <gh...@pw6.de> wrote:

Hi all,

When I run "ceph daemon mds. session ls" I always get a fairly
large
number for num_caps (200.000). Is this normal? I thought caps are sth.
like
open/locked files meaning a client is holding a cap on a file and no
other
client can access it during this time.

Capabilities are much broader than that, they cover clients keeping
some fresh metadata in their cache, even if the client isn't doing
anything with the file at that moment.  It's common for a client to
accumulate a large number of capabilities in normal operation, as it
keeps the metadata for many files in cache.

You can adjust the "client cache size" setting on the fuse client to
encourage it to cache metadata on fewer files and thereby hold onto
fewer capabilities if you want.

John

Is there an option (or planned option) for clients to release caps after
some time of inuse?

In my testing I saw that clients tend to hold on caps for indefinite time.

Currently in prod I have use case where are over 8mil caps and little over
800k inodes_with_caps.

Both the MDS and client caches operate on a LRU, size-limited basis.
That means that if they aren't hitting their size thresholds, they
will tend to keep lots of stuff in cache indefinitely.
I'd like to reanimate this thread. Today I tried to look into 
client_cache_size and I have Jewel clients which hold much more caps 
whan cache_size.
Looking at client stats I can see this (cache size was adjusted online 
to 1000):

"dentry_count": 1000,
"dentry_pinned_count": 20,
"inode_count": 18775,

So it looks like not everything can be controlled by client_cache_size. 
Also looking into code I didn't find where inode_map is decreased when 
lru size is lowered. Maybe it adjusted indirectly?


One repro case would be CephFS with multiple files and dirs. Scanning it 
with "ls -R" seems to honor cache size (it goes over the limit but 
decreased), but if I do "ls -Rl" caps count go over the limit and stay 
there. attr caps not releasing? or something?


Is it expected to be like that?


One could add a behaviour that also actively expires cached metadata
if it has not been used for a certain period of time, but it's not
clear what the right time threshold would be, and whether it would be
desirable for most users.  If we free up memory because the system is
quiet this minute/hour, then it potentially just creates an issue when
we get busy again and need that memory back.

With caching/resources generally, there's a conflict between the
desire to keep things in cache in case they're needed again, and the
desire to evict things from cache so that we have lots of free space
available for new entries.  Which one is better is entirely workload
dependent: there is clearly scope to add different behaviours as
options, but its hard to know how much people would really use them --
the sanity of the defaults is the most important thing.  I do think
there's a reasonable argument that part of the sane defaults should
not be to keep something in cache if it hasn't been used for e.g. a
day or more.

BTW, clients do have an additional behaviour where they will drop
unneeded caps when an MDS restarts, to avoid making a newly started
MDS do a lot of unnecessary work to restore those caps, so the
overhead of all those extra caps isn't quite as much as one might
first imagine.

John





How can I debug this if it is a cause
of concern? Is there any way to debug on which files the caps are held
excatly?

Thank you,

Ranjan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] num_caps

2017-05-15 Thread Henrik Korkuc

On 17-05-15 13:40, John Spray wrote:

On Mon, May 15, 2017 at 10:40 AM, Ranjan Ghosh  wrote:

Hi all,

When I run "ceph daemon mds. session ls" I always get a fairly large
number for num_caps (200.000). Is this normal? I thought caps are sth. like
open/locked files meaning a client is holding a cap on a file and no other
client can access it during this time.

Capabilities are much broader than that, they cover clients keeping
some fresh metadata in their cache, even if the client isn't doing
anything with the file at that moment.  It's common for a client to
accumulate a large number of capabilities in normal operation, as it
keeps the metadata for many files in cache.

You can adjust the "client cache size" setting on the fuse client to
encourage it to cache metadata on fewer files and thereby hold onto
fewer capabilities if you want.

John
Is there an option (or planned option) for clients to release caps after 
some time of inuse?


In my testing I saw that clients tend to hold on caps for indefinite time.

Currently in prod I have use case where are over 8mil caps and little 
over 800k inodes_with_caps.




How can I debug this if it is a cause
of concern? Is there any way to debug on which files the caps are held
excatly?

Thank you,

Ranjan

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is single MDS data recoverable

2017-04-27 Thread Henrik Korkuc

On 17-04-26 17:12, gjprabu wrote:

Hi Henrik,

   So, i assume if the MDS goes down there will be no data 
loss and we can install another MDS server. What about old data is 
this work properly.



What do you mean by old data? MDS locally has only it's keyring.



Regards
Prabu GJ


 On Tue, 25 Apr 2017 16:31:44 +0530 *Henrik Korkuc 
<li...@kirneh.eu>* wrote 


On 17-04-25 13:43, gjprabu wrote:

Hi Team,

   I am running cephfs setup with single MDS . Suppose
in single MDS setup if the MDS goes down what will happen for
data. Is it advisable to run multiple MDS.

MDS data is in Ceph cluster itself. After MDS failure you can
start another MDS on different server. It should pick up there
previous MDS ended.


Regards
Prabu GJ



___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Is single MDS data recoverable

2017-04-25 Thread Henrik Korkuc

On 17-04-25 13:43, gjprabu wrote:

Hi Team,

   I am running cephfs setup with single MDS . Suppose in 
single MDS setup if the MDS goes down what will happen for data. Is it 
advisable to run multiple MDS.


MDS data is in Ceph cluster itself. After MDS failure you can start 
another MDS on different server. It should pick up there previous MDS ended.



Regards
Prabu GJ



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH MON Updates Live

2017-04-25 Thread Henrik Korkuc

On 17-04-24 19:38, Ashley Merrick wrote:

Hey,

Quick question hopefully have tried a few Google searches but noting concrete.

I am running KVM VM's using KRBD, if I add and remove CEPH mon's are the 
running VM's updated with this information. Or do I need to reboot the VM's for 
them to be provided with the change of MON's?
clients are updated with information. Just make sure that you have at 
least one active mon in config in case VM get's restarted for some other 
reason.



Thanks!
Sent from my iPhone
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Question about the OSD host option

2017-04-22 Thread Henrik Korkuc
mon.* and osd.* sections are not mandatory in config. So unless you want 
to set something per daemon, you can skip them completely.


On 17-04-21 19:07, Fabian wrote:

Hi Everyone,

I play a bit around with ceph on a test cluster with 3 servers (each MON
and OSD at the same time).
I use some self written ansible rules to deploy the config and crate
the OSD with ceph-disk. Because ceph-disk use the next free OSD-ID, my
ansible scrip is not aware which ID belongs to which OSD and host. So I
don't create any [OSD.ID] section in my config and my cluster runs
fine.

Now I have read in [1] "the Ceph configuration file MUST specify the
host for each daemon". As I consider each OSD as daemon, I'm a bit
confused that it worked without the host specified.

Why do the OSD daemon need the host option? What happened if it doesn't
exist?

Is there any best practice about naming the OSDs? Or a trick to avoid
the [OSD.ID] for each daemon?

[1]http://docs.ceph.com/docs/master/rados/configuration/network-config-ref/#ceph-daemons

Thank you,

Fabian


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS fuse client users stuck

2017-03-14 Thread Henrik Korkuc

On 17-03-14 00:08, John Spray wrote:

On Mon, Mar 13, 2017 at 8:15 PM, Andras Pataki
 wrote:

Dear Cephers,

We're using the ceph file system with the fuse client, and lately some of
our processes are getting stuck seemingly waiting for fuse operations.  At
the same time, the cluster is healthy, no slow requests, all OSDs up and
running, and both the MDS and the fuse client think that there are no
pending operations.  The situation is semi-reproducible.  When I run a
various cluster jobs, some get stuck after a few hours of correct operation.
The cluster is on ceph 10.2.5 and 10.2.6, the fuse clients are 10.2.6, but I
have tried 10.2.5 and 10.2.3, all of which have the same issue.  This is on
CentOS (7.2 for the clients, 7.3 for the MDS/OSDs).

Here are some details:

The node with the stuck processes:

[root@worker1070 ~]# ps -auxwww | grep 30519
apataki   30519 39.8  0.9 8728064 5257588 ? Dl   12:11  60:50 ./Arepo
param.txt 2 6
[root@worker1070 ~]# cat /proc/30519/stack
[] fuse_file_aio_write+0xbb/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

[root@worker1070 ~]# ps -auxwww | grep 30533
apataki   30533 39.8  0.9 8795316 5261308 ? Sl   12:11  60:55 ./Arepo
param.txt 2 6
[root@worker1070 ~]# cat /proc/30533/stack
[] wait_answer_interruptible+0x91/0xe0 [fuse]
[] __fuse_request_send+0x253/0x2c0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_send_write+0xd6/0x110 [fuse]
[] fuse_perform_write+0x2ed/0x590 [fuse]
[] fuse_file_aio_write+0x2a1/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

Presumably the second process is waiting on the first holding some lock ...

The fuse client on the node:

[root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok status
{
 "metadata": {
 "ceph_sha1": "656b5b63ed7c43bd014bcafd81b001959d5f089f",
 "ceph_version": "ceph version 10.2.6
(656b5b63ed7c43bd014bcafd81b001959d5f089f)",
 "entity_id": "admin",
 "hostname": "worker1070",
 "mount_point": "\/mnt\/ceph",
 "root": "\/"
 },
 "dentry_count": 40,
 "dentry_pinned_count": 23,
 "inode_count": 123,
 "mds_epoch": 19041,
 "osd_epoch": 462327,
 "osd_epoch_barrier": 462326
}

[root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
mds_sessions
{
 "id": 3616543,
 "sessions": [
 {
 "mds": 0,
 "addr": "10.128.128.110:6800\/909443124",
 "seq": 338,
 "cap_gen": 0,
 "cap_ttl": "2017-03-13 14:47:37.575229",
 "last_cap_renew_request": "2017-03-13 14:46:37.575229",
 "cap_renew_seq": 12694,
 "num_caps": 713,
 "state": "open"
 }
 ],
 "mdsmap_epoch": 19041
}

[root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
mds_requests
{}


The overall cluster health and the MDS:

[root@cephosd000 ~]# ceph -s
 cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
  health HEALTH_WARN
 noscrub,nodeep-scrub,require_jewel_osds flag(s) set
  monmap e17: 3 mons at
{hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0}
 election epoch 29148, quorum 0,1,2 hyperv029,hyperv030,hyperv031
   fsmap e19041: 1/1/1 up {0=cephosd000=up:active}
  osdmap e462328: 624 osds: 624 up, 624 in
 flags noscrub,nodeep-scrub,require_jewel_osds
   pgmap v44458747: 42496 pgs, 6 pools, 924 TB data, 272 Mobjects
 2154 TB used, 1791 TB / 3946 TB avail
42496 active+clean
   client io 86911 kB/s rd, 556 MB/s wr, 227 op/s rd, 303 op/s wr

[root@cephosd000 ~]# ceph daemon /var/run/ceph/ceph-mds.cephosd000.asok ops
{
 "ops": [],
 "num_ops": 0
}


The odd thing is that if in this state I restart the MDS, the client process
wakes up and proceeds with its work without any errors.  As if a request was
lost and somehow retransmitted/restarted when the MDS got restarted and the
fuse layer reconnected to it.

Interesting.  A couple of ideas for more debugging:

* Next time you go through this process of restarting the MDS while
there is a stuck client, first increase the client's logging (ceph
daemon .asok> config set debug_client
20").  Then we should get a clear sense of exactly what's happening on
the MDS restart that's enabling the client to proceed.
* When inspecting the client's "mds_sessions" output, also check the
"session ls" output on the MDS side to make sure the MDS and client
both agree that it has an open session.

John


please check if kicking ceph-fuse "ceph --admin-daemon 
/var/run/ceph/ceph-client.something_something.asok kick_stale_sessions" 
works for you.


I experienced similar problems, but didn't go down fully on it. You can 
check out:


Re: [ceph-users] Shrinking lab cluster to free hardware for a new deployment

2017-03-08 Thread Henrik Korkuc

On 17-03-08 15:39, Kevin Olbrich wrote:

Hi!

Currently I have a cluster with 6 OSDs (5 hosts, 7TB RAID6 each).
We want to shut down the cluster but it holds some semi-productive VMs 
we might or might not need in the future.
To keep them, we would like to shrink our cluster from 6 to 2 OSDs (we 
use size 2 and min_size 1).


Should I set the OSDs out one by one or with norefill, norecovery 
flags set but all at once?

If last is the case, which flags should be set also?

just set OSDs out and wait for them to rebalace, OSDs will be active and 
serve traffic while data will be moving off them. I had a case where 
some pgs wouldn't move out, so after everything settles, you may need to 
remove OSDs from crush one by one.



Thanks!

Kind regards,
Kevin Olbrich.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] replica questions

2017-03-03 Thread Henrik Korkuc

On 17-03-03 12:30, Matteo Dacrema wrote:

Hi All,

I’ve a production cluster made of 8 nodes, 166 OSDs and 4 Journal SSD 
every 5 OSDs with replica 2 for a total RAW space of 150 TB.

I’ve few question about it:

  * It’s critical to have replica 2? Why?

Replica size 3 is highly recommended. I do not know exact numbers but it 
decreases chance of data loss as 2 disk failures appear to be quite 
frequent thing, especially in larger clusters.


  * Does replica 3 makes recovery faster?


no


  * Does replica 3 makes rebalancing and recovery less heavy for
customers? If I lose 1 node does replica 3 reduce the IO impact
respect a replica 2?


no


  * Does read performance increase with replica 3?


no


Thank you
Regards
Matteo


This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom they 
are addressed. If you have received this email in error please notify 
the system manager. This message contains confidential information and 
is intended only for the individual named. If you are not the named 
addressee you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately by e-mail if you have received 
this e-mail by mistake and delete this e-mail from your system. If you 
are not the intended recipient you are notified that disclosing, 
copying, distributing or taking any action in reliance on the contents 
of this information is strictly prohibited.




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] would people mind a slow osd restart during luminous upgrade?

2017-02-09 Thread Henrik Korkuc

On 17-02-09 05:09, Sage Weil wrote:

Hello, ceph operators...

Several times in the past we've had to do some ondisk format conversion
during upgrade which mean that the first time the ceph-osd daemon started
after upgrade it had to spend a few minutes fixing up it's ondisk files.
We haven't had to recently, though, and generally try to avoid such
things.

However, there's a change we'd like to make in FileStore for luminous (*)
and it would save us a lot of time and complexity if it was a one-shot
update during the upgrade.  I would probably take in the neighborhood of
1-5 minutes for a 4-6TB HDD.  That means that when restarting the daemon
during the upgrade the OSD would stay down for that period (vs the usual
<1 restart time).

Does this concern anyone?  It probably means the upgrades will take longer
if you're going host by host since the time per host will go up.
In my opinion if this is clearly communicated (release notes + OSD logs) 
it's fine otherwise it may feel that something is wrong if OSD will take 
long time to start.



sage


* eliminate 'snapdir' objects, replacing them with a head object +
whiteout.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Running 'ceph health' as non-root user

2017-02-01 Thread Henrik Korkuc

On 17-02-01 10:55, Michael Hartz wrote:

I am running ceph as part of a Proxmox Virtualization cluster, which is doing 
great.

However for monitoring purpose I would like to periodically check with 'ceph 
health' as a non-root user.
This fails with the following message:

su -c 'ceph health' -s /bin/bash nagios

Error initializing cluster client: PermissionDeniedError('error calling 
conf_read_file',)

Please note: running the command as root user works as intended.

Someone else suggested to allow group permissions on the admin keyring, i.e. 
chmod 660 /etc/ceph/ceph.client.admin.keyring
Link: https://github.com/thelan/ceph-zabbix/issues/12
This didn't work.

Has anyone hints on this?

is /etc/ceph/ceph.conf readable for that user?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Minimize data lost with PG incomplete

2017-01-31 Thread Henrik Korkuc
I am not sure about "incomplete" part out of my head, but you can try 
setting min_size to 1 for pools toreactivate some PG, if they are 
down/inactive due to missing replicas.


On 17-01-31 10:24, José M. Martín wrote:

# ceph -s
 cluster 29a91870-2ed2-40dc-969e-07b22f37928b
  health HEALTH_ERR
 clock skew detected on mon.loki04
 155 pgs are stuck inactive for more than 300 seconds
 7 pgs backfill_toofull
 1028 pgs backfill_wait
 48 pgs backfilling
 892 pgs degraded
 20 pgs down
 153 pgs incomplete
 2 pgs peering
 155 pgs stuck inactive
 1077 pgs stuck unclean
 892 pgs undersized
 1471 requests are blocked > 32 sec
 recovery 3195781/36460868 objects degraded (8.765%)
 recovery 5079026/36460868 objects misplaced (13.930%)
 mds0: Behind on trimming (175/30)
 noscrub,nodeep-scrub flag(s) set
 Monitor clock skew detected
  monmap e5: 5 mons at
{loki01=192.168.3.151:6789/0,loki02=192.168.3.152:6789/0,loki03=192.168.3.153:6789/0,loki04=192.168.3.154:6789/0,loki05=192.168.3.155:6789/0}
 election epoch 4028, quorum 0,1,2,3,4
loki01,loki02,loki03,loki04,loki05
   fsmap e95494: 1/1/1 up {0=zeus2=up:active}, 1 up:standby
  osdmap e275373: 42 osds: 42 up, 42 in; 1077 remapped pgs
 flags noscrub,nodeep-scrub
   pgmap v36642778: 4872 pgs, 4 pools, 24801 GB data, 17087 kobjects
 45892 GB used, 34024 GB / 79916 GB avail
 3195781/36460868 objects degraded (8.765%)
 5079026/36460868 objects misplaced (13.930%)
 3640 active+clean
  838 active+undersized+degraded+remapped+wait_backfill
  184 active+remapped+wait_backfill
  134 incomplete
   48 active+undersized+degraded+remapped+backfilling
   19 down+incomplete
6
active+undersized+degraded+remapped+wait_backfill+backfill_toofull
1 active+remapped+backfill_toofull
1 peering
1 down+peering
recovery io 93909 kB/s, 10 keys/s, 67 objects/s



# ceph osd tree
ID  WEIGHT   TYPE NAME   UP/DOWN REWEIGHT PRIMARY-AFFINITY
  -1 77.22777 root default
  -9 27.14778 rack sala1
  -2  5.41974 host loki01
  14  0.90329 osd.14   up  1.0  1.0
  15  0.90329 osd.15   up  1.0  1.0
  16  0.90329 osd.16   up  1.0  1.0
  17  0.90329 osd.17   up  1.0  1.0
  18  0.90329 osd.18   up  1.0  1.0
  25  0.90329 osd.25   up  1.0  1.0
  -4  3.61316 host loki03
   0  0.90329 osd.0up  1.0  1.0
   2  0.90329 osd.2up  1.0  1.0
  20  0.90329 osd.20   up  1.0  1.0
  24  0.90329 osd.24   up  1.0  1.0
  -3  9.05714 host loki02
   1  0.90300 osd.1up  0.90002  1.0
  31  2.72198 osd.31   up  1.0  1.0
  29  0.90329 osd.29   up  1.0  1.0
  30  0.90329 osd.30   up  1.0  1.0
  33  0.90329 osd.33   up  1.0  1.0
  32  2.72229 osd.32   up  1.0  1.0
  -5  9.05774 host loki04
   3  0.90329 osd.3up  1.0  1.0
  19  0.90329 osd.19   up  1.0  1.0
  21  0.90329 osd.21   up  1.0  1.0
  22  0.90329 osd.22   up  1.0  1.0
  23  2.72229 osd.23   up  1.0  1.0
  28  2.72229 osd.28   up  1.0  1.0
-10 24.61000 rack sala2.2
  -6 24.61000 host loki05
   5  2.73000 osd.5up  1.0  1.0
   6  2.73000 osd.6up  1.0  1.0
   9  2.73000 osd.9up  1.0  1.0
  10  2.73000 osd.10   up  1.0  1.0
  11  2.73000 osd.11   up  1.0  1.0
  12  2.73000 osd.12   up  1.0  1.0
  13  2.73000 osd.13   up  1.0  1.0
   4  2.73000 osd.4up  1.0  1.0
   8  2.73000 osd.8up  1.0  1.0
   7  0.03999 osd.7up  1.0  1.0
-12 25.46999 rack sala2.1
-11 25.46999 host loki06
  34  2.73000 osd.34   up  1.0  1.0
  35  2.73000 osd.35   up  1.0  1.0
  36  

Re: [ceph-users] SIGHUP to ceph processes every morning

2017-01-26 Thread Henrik Korkuc

just to add to what Pawel said: /etc/logrotate.d/ceph.logrotate

On 17-01-26 09:21, Torsten Casselt wrote:

Hi,

that makes sense. Thanks for the fast answer!

On 26.01.2017 08:04, Paweł Sadowski wrote:

Hi,

6:25 points to daily cron job, it's probably logrotate trying to force
ceph to reopen logs.


On 01/26/2017 07:34 AM, Torsten Casselt wrote:

Hi,

I get the following line in journalctl:

Jan 24 06:25:02 ceph01 ceph-osd[28398]: 2017-01-24 06:25:02.302770
7f0655516700 -1 received  signal: Hangup from  PID: 18157 task name:
killall -q -1 ceph-mon ceph-mds ceph-osd ceph-fuse radosgw  UID: 0

It happens every day at the same time which is the cron.daily time. But
there's no cronjob I can relate to ceph in the appropriate folder.
The cluster runs just fine, so it seems it restarts automatically after
the SIGHUP. Still I'm curious why the signal is sent every morning.

I use Kraken on Debian Jessie systems. Three monitors, 36 OSDs on three
nodes.

Thanks!
Torsten



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph per-user stats?

2017-01-03 Thread Henrik Korkuc

On 17-01-04 03:16, Gregory Farnum wrote:

On Fri, Dec 23, 2016 at 12:04 AM, Henrik Korkuc <li...@kirneh.eu> wrote:

Hello,

I wondered if Ceph can emit stats (via perf counters, statsd or in some
other way) IO and bandwidth stats per Ceph user? I was unable to find such
stats. I know that we can get at least some of these stats from RGW, but I'd
like to have something like that for RBD and CephFS. Example usage could be
figuring out who is hammering CephFS with IO requests.

Maybe someone could provide basic guidance where to dig in if I'd like to
make this feature myself? I there any plans/blueprints for such stats?

This really isn't feasible right now as a starter project: doing it
"for real" requires all OSDs track all clients and IOs and then a
central location to receive that data and correlate it. We discussed a
sampling implementation for ceph-mgr recently in the Ceph Dev Monthly
and I think work is upcoming to enable it, but I'm not sure what kind
of timeline it's on.
-Greg


I am thinking about implementing it myself. My high level idea is to 
have additional thread per OSD to collect these stats and periodically 
emit them to statsd for aggregation by metrics system. As I am not 
familiar with Ceph code base it may take a while to implement.. :)


If there are some docs/plans for ceph-mgr metrics maybe I could 
contribute to that instead doing it my way? This way we would be more 
aligned and it would be easier to contribute it back to Ceph.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs (fuse and Kernal) HA

2017-01-01 Thread Henrik Korkuc


On 17-01-02 06:24, Lindsay Mathieson wrote:
Hi all, familiar with ceph but out of touch on cephfs specifics, so 
some quick questions:


- cephfs requires a MDS for its metadata (file/dir structures, 
attributes etc?



yes
- Its Active/Passive, i.e only one MDS can be active at a time, with a 
number of backup passive MDS's


if we are looking into stable features then yes. There is experimental 
feature of multiple active MDSes, but it's experimental.
- The passive MDS's, are they atomically up to date with the active 
MDS? no lag?
metadata is on Ceph cluster itself so there is no syncing between MDSes. 
New MDS need to go through few phases before it becomes active.


- How robust is the HA? if there a catastrophic failure on the active 
MDS (e.g power off) would active I/O on the cephfs be interupted? any 
risk of data or meta data loss? (excluding async transactions).


power off is not a catastrophic failure :) Data traffic does not got 
through MDS so active IO doesn't get interrupted, but client may get 
stalls when trying to do metadata operations until new MDS becomes 
active. Clients should retry all not confirmed operations so unless 
something goes wrong there should be no data/metadata loss.

- Could the MDS db be corrupted by the above?

once I managed to corrupt MDS journal (it wasn't due to MDS server 
failure) but I wasn't able to repro it again, it might be related to env 
where it was running.


I guess some of the answers can only be subjective, but I'm looking 
for real world experiences. I have been stress testing other systems 
with similar architectures and been less than impressed by the 
results. Corruptions of every file with active I/Ois not a good 
outcome :( Use case is hosting VM's.



Did you consider Ceph RBD for VM hosting?


thanks,



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Atomic Operations?

2016-12-24 Thread Henrik Korkuc

On 16-12-23 22:14, Kent Borg wrote:

Hello, a newbie here!

Doing some playing with Python and librados, and it is mostly easy to 
use, but I am confused about atomic operations. The documentation 
isn't clear to me, and Google isn't giving me obvious answers either...


I would like to do some locking. The data structures I am playing with 
in RADOS should work great if I don't accidentally fire up more than 
one instance of myself. So I would like to drop an attribute on a 
central object saying it's mine, all mine--but only if another copy of 
myself hasn't done so already.


Is there some sample code to show me the safe way to do this?

I was using this approach when I needed to ensure that only one instance 
of script is running. It would grab a lock and do what it needs to do, 
you'll need to adjust it to work with your case:


def break_lock():
try:
ceph_ioctx.lock_exclusive(key=lock_obj_name + '_breaker', 
name=lock_obj_name + '_lock', cookie='cookie1', duration=60)

ceph_ioctx.remove_object(key=lock_obj_name)
ceph_ioctx.unlock(key=lock_obj_name + '_breaker', 
name=lock_obj_name + '_lock', cookie='cookie1')

ceph_ioctx.remove_object(key=lock_obj_name + '_breaker')
logger.info('Broke lock')
except:
logger.error('Failed to break lock')
return False
return True


def try_locking():
try:
ceph_ioctx.lock_exclusive(key=lock_obj_name, name=lock_obj_name 
+ '_lock', cookie='cookie1', duration=72000)

ceph_ioctx.write_full(lock_obj_name, str(date_now))
got_lock = True
logger.info('Locked')
except:
got_lock = False
logger.warning("Didn't lock")
return got_lock


def aquire_lock():
# Let's aquire lock and update objects timestamp
if not try_locking():
logger.warning('Failed to lock, preping for retry')
lock_date = ceph_ioctx.read(lock_obj_name)
lock_age = int((date_now - 
dateutil.parser.parse(lock_date)).total_seconds() / 60 / 60)

logger.debug('Lock age: %i' % (lock_age))

if lock_age > 20 and break_lock():
logger.warning('Retrying locking')
try_locking()
else:
logger.error("Failed lock breaking, bailing out")
return False
return True


def release_lock():
logger.info('Releasing lock')
ceph_ioctx.unlock(key=lock_obj_name, name=lock_obj_name + '_lock', 
cookie='cookie1')

ceph_ioctx.remove_object(key=lock_obj_name)



Thanks,

-kb, the Kent who is reluctant to just play around with locking to see 
what works...because it might look like it is working, yet still have 
a race susceptibility.


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] What is pauserd and pausewr status?

2016-12-23 Thread Henrik Korkuc

On 16-12-23 12:43, Stéphane Klein wrote:


2016-12-23 11:35 GMT+01:00 Wido den Hollander >:



> Op 23 december 2016 om 10:31 schreef Stéphane Klein
>:
>
>
> 2016-12-22 18:09 GMT+01:00 Wido den Hollander >:
>
> >
> > > Op 22 december 2016 om 17:55 schreef Stéphane Klein <
> > cont...@stephane-klein.info >:
> > >
> > >
> > > I have this status:
> > >
> > > bash-4.2# ceph status
> > > cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
> > >  health HEALTH_WARN
> > >  pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set
> > >  monmap e1: 3 mons at {ceph-mon-1=
> > > 172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-

> > mon-3=172.28.128.4:6789/0 
> > > }
> > > election epoch 12, quorum 0,1,2
ceph-mon-1,ceph-mon-2,ceph-
> > mon-3
> > >  osdmap e49: 4 osds: 4 up, 4 in
> > > flags pauserd,pausewr,sortbitwise,require_jewel_osds
> > >   pgmap v263: 64 pgs, 1 pools, 77443 kB data, 35 objects
> > > 281 MB used, 1978 GB / 1979 GB avail
> > >   64 active+clean
> > >
> > > where can I found document about:
> > >
> > > * pauserd ?
> > > * pausewr ?
> > >
> >
> > pauserd: Pause reads
> > pauserw: Pause writes
> >
> > When you set the 'pause' flag it sets both pauserd and pauserw.
> >
> > When these flags are set all I/O (RD and/or RW) is blocked to
clients.
>
> More information here:
>

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-December/015281.html


>
> But, I found nothing about Pause in documentation. Why my Ceph
Cluster have
> switched to pause?
>

Somebody has set the flag. That doesn't happen automatically.



It is a test cluster in Vagrant, there are only one admin: me :)


A admin typed the command to set that flag.


I didn't do that, I don't understand.

did you run "ceph osd pause" by any chance?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph per-user stats?

2016-12-23 Thread Henrik Korkuc

Hello,

I wondered if Ceph can emit stats (via perf counters, statsd or in some 
other way) IO and bandwidth stats per Ceph user? I was unable to find 
such stats. I know that we can get at least some of these stats from 
RGW, but I'd like to have something like that for RBD and CephFS. 
Example usage could be figuring out who is hammering CephFS with IO 
requests.


Maybe someone could provide basic guidance where to dig in if I'd like 
to make this feature myself? I there any plans/blueprints for such stats?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] If I shutdown 2 osd / 3, Ceph Cluster say 2 osd UP, why?

2016-12-22 Thread Henrik Korkuc

On 16-12-22 13:26, Stéphane Klein wrote:

Hi,

I have:

* 3 mon
* 3 osd

When I shutdown one osd, I work great:

cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
 health HEALTH_WARN
43 pgs degraded
43 pgs stuck unclean
43 pgs undersized
recovery 24/70 objects degraded (34.286%)
too few PGs per OSD (28 < min 30)
1/3 in osds are down
 monmap e1: 3 mons at 
{ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0 
}
election epoch 10, quorum 0,1,2 
ceph-mon-1,ceph-mon-2,ceph-mon-3

 osdmap e22: 3 osds: 2 up, 3 in; 43 remapped pgs
flags sortbitwise,require_jewel_osds
  pgmap v169: 64 pgs, 1 pools, 77443 kB data, 35 objects
252 MB used, 1484 GB / 1484 GB avail
24/70 objects degraded (34.286%)
  43 active+undersized+degraded
  21 active+clean

But, when I shutdown 2 osd, Ceph Cluster don't see that second osd 
node is down :(


root@ceph-mon-1:/home/vagrant# ceph status
cluster 7ecb6ebd-2e7a-44c3-bf0d-ff8d193e03ac
 health HEALTH_WARN
clock skew detected on mon.ceph-mon-2
pauserd,pausewr,sortbitwise,require_jewel_osds flag(s) set
Monitor clock skew detected
 monmap e1: 3 mons at 
{ceph-mon-1=172.28.128.2:6789/0,ceph-mon-2=172.28.128.3:6789/0,ceph-mon-3=172.28.128.4:6789/0 
}
election epoch 10, quorum 0,1,2 
ceph-mon-1,ceph-mon-2,ceph-mon-3

 osdmap e26: 3 osds: 2 up, 2 in
flags pauserd,pausewr,sortbitwise,require_jewel_osds
  pgmap v203: 64 pgs, 1 pools, 77443 kB data, 35 objects
219 MB used, 989 GB / 989 GB avail
  64 active+clean

2 osd up ! why ?

root@ceph-mon-1:/home/vagrant# ping ceph-osd-1 -c1
--- ceph-osd-1 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-2 -c1
--- ceph-osd-2 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms

root@ceph-mon-1:/home/vagrant# ping ceph-osd-3 -c1
--- ceph-osd-3 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.278/0.278/0.278/0.000 ms

My configuration:

ceph_conf_overrides:
   global:
  osd_pool_default_size: 2
  osd_pool_default_min_size: 1

Full Ansible configuration is here: 
https://github.com/harobed/poc-ceph-ansible/blob/master/vagrant-3mons-3osd/hosts/group_vars/all.yml#L11


What is my mistake? Is it Ceph bug?

try waiting a little longer. Mon needs multiple down reports to take OSD 
down. And as your cluster is very small there is small amount (1 in this 
case) of OSDs to report that others are down.



Best regards,
Stéphane
--
Stéphane Klein >

blog: http://stephane-klein.info
cv : http://cv.stephane-klein.info
Twitter: http://twitter.com/klein_stephane


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] When I shutdown one osd node, where can I see the block movement?

2016-12-22 Thread Henrik Korkuc

On 16-12-22 13:20, Stéphane Klein wrote:



2016-12-22 12:18 GMT+01:00 Henrik Korkuc <li...@kirneh.eu 
<mailto:li...@kirneh.eu>>:


On 16-12-22 13:12, Stéphane Klein wrote:

HEALTH_WARN 43 pgs degraded; 43 pgs stuck unclean; 43 pgs
undersized; recovery 24/70 objects degraded (34.286%); too few
PGs per OSD (28 < min 30); 1/3 in osds are down;


it says 1/3 OSDs are down. By default Ceph pools are setup with
size 3. If your setup is same it will not be able to restore to
normal status without size decrease or additional OSDs


I have this config:

ceph_conf_overrides:
   global:
  osd_pool_default_size: 2
  osd_pool_default_min_size: 1

see: 
https://github.com/harobed/poc-ceph-ansible/blob/master/vagrant-3mons-3osd/hosts/group_vars/all.yml#L11


Can you please provide outputs of "ceph -s" "ceph osd tree" and "ceph 
osd dump |grep size"?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] When I shutdown one osd node, where can I see the block movement?

2016-12-22 Thread Henrik Korkuc

On 16-12-22 13:12, Stéphane Klein wrote:
HEALTH_WARN 43 pgs degraded; 43 pgs stuck unclean; 43 pgs undersized; 
recovery 24/70 objects degraded (34.286%); too few PGs per OSD (28 < 
min 30); 1/3 in osds are down;


it says 1/3 OSDs are down. By default Ceph pools are setup with size 3. 
If your setup is same it will not be able to restore to normal status 
without size decrease or additional OSDs



Here Ceph say there are 24 objects to move?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Intermittent permission denied using kernel client with mds path cap

2016-11-15 Thread Henrik Korkuc
I filled http://tracker.ceph.com/issues/17858 recently, I am seeing this 
problem on 10.2.3 ceph-fuse, but maybe kernel client is affected too.


It is easy to replicate, just do deep "mkdir -p", e.g. "mkdir -p 
1/2/3/4/5/6/7/8/9/0/1/2/3/4/5/6/7/8/9"


On 16-11-11 10:46, Dan van der Ster wrote:

Hi Goncalo,

Thank you for those links. It appears that that fix was already in the
10.2.3 mds, which we are running. I've just upgraded the mds's to the
current jewel gitbuilder (10.2.3-358.g427f357.x86_64) and the problem
is still there.

(BTW, in testing this I've been toggling the mds caps between 'allow
rw' and 'allow r, allow rw path=/k8s'. During this I've noticed that
the new caps are only applied after I remount.)

Cheers, Dan


On Fri, Nov 11, 2016 at 1:29 AM, Goncalo Borges
 wrote:

Hi Dan,,,

I know there are path restriction issues in the kernel client. See the 
discussion here.

http://lists.opennebula.org/pipermail/ceph-users-ceph.com/2016-June/010656.html

http://tracker.ceph.com/issues/16358

Cheers
Goncalo


From: ceph-users [ceph-users-boun...@lists.ceph.com] on behalf of Dan van der 
Ster [d...@vanderster.com]
Sent: 11 November 2016 02:41
To: ceph-users; Yan, Zheng
Subject: [ceph-users] Intermittent permission denied using kernel client
with mds path cap

Hi all, Hi Zheng,

We're seeing a strange issue with the kernel cephfs clients, combined
with a path restricted mds cap. It seems that files/dirs are
intermittently not created due to permission denied.

For example, when I untar a kernel into cephfs, we see ~1/1000 files
failed to open/mkdir.
Client caps are in the PS [1].
We've tried kernels 3.10.0-493.el7, 4.8.6, and 4.9-rc4 -- all have the
same intermittent behaviour. We could *not* reproduce the issue with
ceph-fuse 10.2.3.

The cluster is running 10.2.3.

Now, if we remove the path restricted cap -- i.e. use mds 'allow rw'
-- then we have no more errors.

So it seems there is a race in the path restriction cap code.
We grabbed an mds log, and noticed that it seems that a file is opened
twice, then the second open fails with 'already xlocked'. A full log
for one such file is here: http://pastebin.com/raw/YyULfjND

Is anyone successfully using path caps with kernel clients? Maybe this
is a new bug?

Cheers, Dan


[1]
[client.k8s]
key: xxx==
caps: [mds] allow r, allow rw path=/k8s
caps: [mon] allow r
caps: [osd] allow rw pool=cephfs_data_k8s
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] forward cache mode support?

2016-11-07 Thread Henrik Korkuc

Hey,

trying to activate forward mode for cache pool results in "Error EPERM: 
'forward' is not a well-supported cache mode and may corrupt your data.  
pass --yes-i-really-mean-it to force."


Change for this message was introduced few months ago and I didn't 
manage to find reason for that? Were there cases of data corruption or 
it's just because some forwarded ops may fail (e.g. rewrite on EC)? Also 
there are examples in docs of forward mode use before deactivating pool, 
etc. So how bad it is?


I want to use it to forward CephFS writes to EC pool (sequential writes, 
no rewrites).


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After kernel upgrade OSD's on different disk.

2016-10-31 Thread Henrik Korkuc
How are your OSDs setup? It is possible that udev rules didn't activate 
your OSDs if it didn't match rules. Refer to 
/lib/udev/rules.d/95-ceph-osd.rules. Basically your partition types must 
be of correct type for it to work


On 16-10-31 19:10, jan hugo prins wrote:

After the kernel upgrade, I also upgraded the cluster to 10.2.3 from
10.2.2.
Let's hope I only hit a bug and that this bug is now fixed, on the other
hand, I think I also saw the issue with a 10.2.3 node, but I'm not sure.

Jan Hugo


On 10/31/2016 11:41 PM, Henrik Korkuc wrote:

this is normal. You should expect that your disks may get reordered
after reboot. I am not sure about your setup details, but in 10.2.3
udev should be able to activate your OSDs no matter the naming (there
were some bugs in previous 10.2.x releases)

On 16-10-31 18:32, jan hugo prins wrote:

Hello,

After patching my OSD servers with the latest Centos kernel and
rebooting the nodes, all OSD drives moved to different positions.

Before the reboot:

Systemdisk: /dev/sda
Journaldisk: /dev/sdb
OSD disk 1: /dev/sdc
OSD disk 2: /dev/sdd
OSD disk 3: /dev/sde

After the reboot:

Systemdisk: /dev/sde
journaldisk: /dev/sdb
OSD disk 1: /dev/sda
OSD disk 2: /dev/sdc
OSD disk 3: /dev/sdd

The result was that the OSD didn't start at boot-up and I had to
manually activate them again.
After rebooting OSD node 1 I checked the state of the Ceph cluster
before rebooting node number 2. I found that the disks were not online
and I needed to fix this. In the end I was able to do all the upgrades
etc, but this was a big surprise to me.

My idea to fix this is to use the Disk UUID instead of the dev name
(/dev/disk/by-uuid/ instead of /dev/sda) when activating the disk.
But I really don't know if this is possible.

Could anyone tell me if I can prevent this issue in the future?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] After kernel upgrade OSD's on different disk.

2016-10-31 Thread Henrik Korkuc
this is normal. You should expect that your disks may get reordered 
after reboot. I am not sure about your setup details, but in 10.2.3 udev 
should be able to activate your OSDs no matter the naming (there were 
some bugs in previous 10.2.x releases)


On 16-10-31 18:32, jan hugo prins wrote:

Hello,

After patching my OSD servers with the latest Centos kernel and
rebooting the nodes, all OSD drives moved to different positions.

Before the reboot:

Systemdisk: /dev/sda
Journaldisk: /dev/sdb
OSD disk 1: /dev/sdc
OSD disk 2: /dev/sdd
OSD disk 3: /dev/sde

After the reboot:

Systemdisk: /dev/sde
journaldisk: /dev/sdb
OSD disk 1: /dev/sda
OSD disk 2: /dev/sdc
OSD disk 3: /dev/sdd

The result was that the OSD didn't start at boot-up and I had to
manually activate them again.
After rebooting OSD node 1 I checked the state of the Ceph cluster
before rebooting node number 2. I found that the disks were not online
and I needed to fix this. In the end I was able to do all the upgrades
etc, but this was a big surprise to me.

My idea to fix this is to use the Disk UUID instead of the dev name
(/dev/disk/by-uuid/ instead of /dev/sda) when activating the disk.
But I really don't know if this is possible.

Could anyone tell me if I can prevent this issue in the future?



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"

2016-10-14 Thread Henrik Korkuc

On 16-10-13 22:46, Chris Murray wrote:

On 13/10/2016 11:49, Henrik Korkuc wrote:
Is apt/dpkg doing something now? Is problem repeatable, e.g. by 
killing upgrade and starting again. Are there any stuck systemctl 
processes?


I had no problems upgrading 10.2.x clusters to 10.2.3

On 16-10-13 13:41, Chris Murray wrote:

On 22/09/2016 15:29, Chris Murray wrote:

Hi all,

Might anyone be able to help me troubleshoot an "apt-get dist-upgrade"
which is stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"?

I'm upgrading from 10.2.2. The two OSDs on this node are up, and think
they are version 10.2.3, but the upgrade doesn't appear to be 
finishing

... ?

Thank you in advance,
Chris


Hi,

Are there possibly any pointers to help troubleshoot this? I've got 
a test system on which the same thing has happened.


The cluster's status is "HEALTH_OK" before starting. I'm running 
Debian Jessie.


dpkg.log only has the following:

2016-10-13 11:37:25 configure ceph-osd:amd64 10.2.3-1~bpo80+1 
2016-10-13 11:37:25 status half-configured ceph-osd:amd64 
10.2.3-1~bpo80+1


At this point, the ugrade gets stuck and doesn't go any further. 
Where could I look for the next clue?


Thanks,

Chris


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Thank you Henrik, I see it's a systemctl process that's stuck.

It is reproducible for me on every run of  dpkg --configure -a

And, indeed, reproducible across two separate machines.

I'll pursue the stuck "/bin/systemctl start ceph-osd.target".



you can try to check if systemctl daemon-rexec helps to solve this 
problem. I couldn't find a link quickly but it seems that Jessie systemd 
sometomes manages to get stuck on systemctl calls.



Thanks again,
Chris

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"

2016-10-13 Thread Henrik Korkuc
Is apt/dpkg doing something now? Is problem repeatable, e.g. by killing 
upgrade and starting again. Are there any stuck systemctl processes?


I had no problems upgrading 10.2.x clusters to 10.2.3

On 16-10-13 13:41, Chris Murray wrote:

On 22/09/2016 15:29, Chris Murray wrote:

Hi all,

Might anyone be able to help me troubleshoot an "apt-get dist-upgrade"
which is stuck at "Setting up ceph-osd (10.2.3-1~bpo80+1)"?

I'm upgrading from 10.2.2. The two OSDs on this node are up, and think
they are version 10.2.3, but the upgrade doesn't appear to be finishing
... ?

Thank you in advance,
Chris


Hi,

Are there possibly any pointers to help troubleshoot this? I've got a 
test system on which the same thing has happened.


The cluster's status is "HEALTH_OK" before starting. I'm running 
Debian Jessie.


dpkg.log only has the following:

2016-10-13 11:37:25 configure ceph-osd:amd64 10.2.3-1~bpo80+1 
2016-10-13 11:37:25 status half-configured ceph-osd:amd64 
10.2.3-1~bpo80+1


At this point, the ugrade gets stuck and doesn't go any further. Where 
could I look for the next clue?


Thanks,

Chris


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph website problems?

2016-10-13 Thread Henrik Korkuc
from status page it seems that Ceph didn't like networking problems. May 
we find out some details what happened? Underprovisioned servers (RAM 
upgrades were in there too)? Too much load on disks? Something else?


This situation may be not pleasant but I feel that others can learn from 
it to prevent such situations in the future.


On 16-10-13 06:55, Dan Mick wrote:

Everything should have been back some time ago ( UTC or thereabouts)

On 10/11/2016 10:41 PM, Brian :: wrote:

Looks like they are having major challenges getting that ceph cluster
running again.. Still down.

On Tuesday, October 11, 2016, Ken Dreyer > wrote:

I think this may be related:


http://www.dreamhoststatus.com/2016/10/11/dreamcompute-us-east-1-cluster-service-disruption/

On Tue, Oct 11, 2016 at 5:57 AM, Sean Redmond > wrote:

Hi,

Looks like the ceph website and related sub domains are giving errors for
the last few hours.

I noticed the below that I use are in scope.

http://ceph.com/
http://docs.ceph.com/
http://download.ceph.com/
http://tracker.ceph.com/

Thanks

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Feedback on docs after MDS damage/journal corruption

2016-10-11 Thread Henrik Korkuc

On 16-10-11 14:30, John Spray wrote:

On Tue, Oct 11, 2016 at 12:00 PM, Henrik Korkuc <li...@kirneh.eu> wrote:

Hey,

After a bright idea to pause 10.2.2 Ceph cluster for a minute to see if it
will speed up backfill I managed to corrupt my MDS journal (should it happen
after cluster pause/unpause, or is it some sort of a bug?). I had "Overall
journal integrity: DAMAGED", etc

Uh, pause/unpausing your RADOS cluster should never do anything apart
from pausing IO.  That's DEFINITELY a severe bug if it corrupted
objects!
I am digging into logs now, I'll try to collect what I can and create a 
bug report.

I was following http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/
and have some questions/feedback:

Caveat: This is a difficult area to document, because the repair tools
interfere with internal on-disk structures.  If I can use a bad
metaphor: it's like being in an auto garage, and asking for
documentation about the tools -- the manual for the wrench doesn't
tell you anything about how to fix the car engine.  Similarly it's
hard to write useful documentation about the repair tools without also
writing a detailed manual for how all the cephfs internals work.

Some notes/links still would be useful for newcomers. It's like someone 
standing at the side of the road with broken car and a wrench. I could 
try fixing it with what I had or just nuke it and get myself a new car 
:) (data was kind of expendable there)

* It would be great to have some info when ‘snap’ or ‘inode’ should be reset

You would reset these tables if you knew that for some reason they no
longer matched the reality elsewhere in the metadata.


* It is not clear when MDS start should be attempted

You would start the MDS when you believed that you had done all you
could with offline repair.  Everything on the "disaster recovery" page
is about offline tools.


* Can scan_extents/scan_inodes be run after MDS is running?

These are meant only for offline use.  You could in principle run
scan_extents while an MDS was running as long as you had no data
writes going on.  scan_inodes writes directly into the metadata pool
so is certainly not safe to run at the same time as an active MDS.


* "online MDS scrub" is mentioned in docs. Is it scan_extents/scan_inodes or
some other command?

That refers to the "forward scrub" functionality inside the MDS,
that's invoked with "scrub_path" or "tag path" commands.


Now CephFS seems to be working (I have "mds0: Metadata damage detected" but
scan_extends is currently running), let's see what happens when I finish
scan_extends/scan_inodes.

Will these actions solve possible orphaned objects in pools? What else
should I look into?

A full offline scan_extents/scan_inodes run should re-link orphans
into a top-level lost+found directory (from which you can subsequently
delete them when your MDS is back online).

John


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Feedback on docs after MDS damage/journal corruption

2016-10-11 Thread Henrik Korkuc

Hey,

After a bright idea to pause 10.2.2 Ceph cluster for a minute to see if 
it will speed up backfill I managed to corrupt my MDS journal (should it 
happen after cluster pause/unpause, or is it some sort of a bug?). I had 
"Overall journal integrity: DAMAGED", etc


I was following 
http://docs.ceph.com/docs/jewel/cephfs/disaster-recovery/ and have some 
questions/feedback:


* It would be great to have some info when ‘snap’ or ‘inode’ should be reset
* It is not clear when MDS start should be attempted
* Can scan_extents/scan_inodes be run after MDS is running?
* "online MDS scrub" is mentioned in docs. Is it 
scan_extents/scan_inodes or some other command?


Now CephFS seems to be working (I have "mds0: Metadata damage detected" 
but scan_extends is currently running), let's see what happens when I 
finish scan_extends/scan_inodes.


Will these actions solve possible orphaned objects in pools? What else 
should I look into?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 10.2.3 release announcement?

2016-09-26 Thread Henrik Korkuc

Hey,

10.2.3 is tagged in jewel branch for more than 5 days already, but there 
were no announcement for that yet. Is there any reasons for that? 
Packages seems to be present too


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: Writes are faster than reads?

2016-09-14 Thread Henrik Korkuc

On 16-09-14 18:21, Andreas Gerstmayr wrote:

Hello,

I'm currently performing some benchmark tests with our Ceph storage
cluster and trying to find the bottleneck in our system.

I'm writing a random 30GB file with the following command:
$ time fio --name=job1 --rw=write --blocksize=1MB --size=30GB
--randrepeat=0 --end_fsync=1
[...]
  WRITE: io=30720MB, aggrb=893368KB/s, minb=893368KB/s,
maxb=893368KB/s, mint=35212msec, maxt=35212msec

real0m35.539s

This makes use of the page cache, but fsync()s at the end (network
traffic from the client stops here, so the OSDs should have the data).

When I read the same file back:
$ time fio --name=job1 --rw=read --blocksize=1MB --size=30G
[...]
READ: io=30720MB, aggrb=693854KB/s, minb=693854KB/s,
maxb=693854KB/s, mint=45337msec, maxt=45337msec

real0m45.627s

It takes 10s longer. Why? When writing data to a Ceph storage cluster,
the data is written twice (unbuffered to the journal and buffered to
the backing filesystem [1]). On the other hand, reading should be much
faster because it needs only a single operation, the data should be
already in the page cache of the OSDs (I'm reading the same file I've
written before, and the OSDs have plenty of RAM) and reading from
disks is generally faster than writing. Any idea what is going on in
the background, which makes reads more expensive than writes?
I am not an expert here, but I think it basically boils down to that you 
read it linearly and write (flush cache) in parallel.


If you could read multiple parts of the same file in parallel you could 
achieve better speeds




I've run these tests multiple times with fairly consistent results.

Cluster Config:
Ceph jewel, 3 nodes with 256GB RAM and 25 disks each (only HDDs,
journal on same disk)
Pool with size=1 and 2048 PGs, CephFS stripe unit: 1MB, stripe count:
10, object size: 10MB
10 GbE, separate frontend+backend network

[1] https://www.sebastien-han.fr/blog/2014/02/17/ceph-io-patterns-the-bad/


Thanks,
Andreas
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW index-sharding on Jewel

2016-09-14 Thread Henrik Korkuc
as far as I noticed after doing zone/region changes you need to 
"radosgw-admin period update --commit" for them to take an effect


On 16-09-14 11:22, Ansgar Jazdzewski wrote:

Hi,

i curently setup my new testcluster (Jewel) and found out the index
sharding configuration had changed?

i did so far:
1. radosgw-admin realm create --rgw-realm=default --default
2. radosgw-admin zonegroup get --rgw-zonegroup=default > zonegroup.json
3. chaned value "bucket_index_max_shards": 64
4. radosgw-admin zonegroup set --rgw-zonegroup=default < zonegroup.json
5. radosgw-admin region get --rgw-zonegroup=default > region.json
6. chaned value "bucket_index_max_shards": 64
7. radosgw-admin region set --rgw-region=default --rgw-zone=default
--rgw-zonegroup=default < region.json

but bukets are created with ot sharding:
  rados -p default.rgw.buckets.index ls | grep $(radosgw-admin metadata
get bucket:images-eu-v1 | jq .data.bucket.bucket_id| tr -d '"')

thanks for your help,
Ansgar
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] problem starting osd ; PGLog.cc: 984: FAILED assert hammer 0.94.9

2016-09-13 Thread Henrik Korkuc

On 16-09-13 11:13, Ronny Aasen wrote:
I suspect this must be a difficult question since there have been no 
replies on irc or mailinglist.


assuming it's impossible to get these osd's running again.

Is there a way to recover objects from the disks. ? they are mounted 
and data is readable. I have pg's down since they want to probe these 
osd's that do not want to start.


pg query claim it can continue if i mark the osd as lost. but i would 
prefer to not loose data. especially since the data is ok and readable 
on the nonfunctioning osd.


also let me know if there is other debug i can extract in order to 
troubleshoot the non starting osd's


kind regards
Ronny Aasen


I cannot help you with this, but you can try using 
http://ceph.com/community/incomplete-pgs-oh-my/ and 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-April/000238.html 
(found this mail thread googling for the objectool post). ymmv






On 12. sep. 2016 13:16, Ronny Aasen wrote:

after adding more osd's and having a big backfill running 2 of my osd's
keep on stopping.

We also recently upgraded from 0.94.7 to 0.94.9 but i do not know if
that is related.

the log say.

 0> 2016-09-12 10:31:08.288858 7f8749125880 -1 osd/PGLog.cc: In
function 'static void PGLog::read_log(ObjectStore*, coll_t, coll_t,
ghobject_t, const pg_info_t&, std::map&,
PGLog::IndexedLog&, pg_missing_t&, std::ostringstream&,
std::set*)' thread 7f8749125880 time
2016-09-12 10:31:08.286337
osd/PGLog.cc: 984: FAILED assert(oi.version == i->first)

googeling led me to a bug that seems to be related to infernalis only.
dmesg does not show anything wrong with the hardware.

this is debian running hammer 0.94.9
and the osd is a software raid5 consisting of 5 3TB harddrives.
journal is a partition on ssd intel 3500

anyone have a clue to what can be wrong ?

kind regrads
Ronny Aasen





-- log debug_filestore=10 --
   -19> 2016-09-12 10:31:08.070947 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/1df4bfdd/rb.0.392c.238e1f29.002bd134/head '_' = 266
-18> 2016-09-12 10:31:08.083111 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/deb5bfdd/rb.0.392c.238e1f29.002bc596/head '_' = 266
-17> 2016-09-12 10:31:08.096718 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/9be5dfdd/rb.0.392c.238e1f29.002bc2bf/head '_' = 266
-16> 2016-09-12 10:31:08.110048 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/cbf8ffdd/rb.0.392c.238e1f29.002b9d89/head '_' = 266
-15> 2016-09-12 10:31:08.126263 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/head '_' = 266
-14> 2016-09-12 10:31:08.150199 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/e49d0fdd/rb.0.392c.238e1f29.002b078e/22 '_' = 259
-13> 2016-09-12 10:31:08.173223 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/head '_' = 266
-12> 2016-09-12 10:31:08.199192 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/d0827fdd/rb.0.392c.238e1f29.002b0373/22 '_' = 259
-11> 2016-09-12 10:31:08.232712 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/head '_' = 266
-10> 2016-09-12 10:31:08.265331 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/bf4effdd/rb.0.392c.238e1f29.002ae882/22 '_' = 259
 -9> 2016-09-12 10:31:08.265456 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__head_DB220FDD__1 


with flags=2: (2) No such file or directory
 -8> 2016-09-12 10:31:08.265475 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/head '_' = -2
 -7> 2016-09-12 10:31:08.265535 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00b381ae__21_DB220FDD__1 


with flags=2: (2) No such file or directory
 -6> 2016-09-12 10:31:08.265546 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/db220fdd/rb.0.392c.238e1f29.00b381ae/21 '_' = -2
 -5> 2016-09-12 10:31:08.265609 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error opening file
/var/lib/ceph/osd/ceph-8/current/1.fdd_head/DIR_D/DIR_D/DIR_F/DIR_0/DIR_2/rb.0.392c.238e1f29.00cf4057__head_12020FDD__1 


with flags=2: (2) No such file or directory
 -4> 2016-09-12 10:31:08.265628 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) getattr
1.fdd_head/1/12020fdd/rb.0.392c.238e1f29.00cf4057/head '_' = -2
 -3> 2016-09-12 10:31:08.265688 7f8749125880 10
filestore(/var/lib/ceph/osd/ceph-8) error 

Re: [ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-09-05 Thread Henrik Korkuc

On 16-09-05 14:36, Henrik Korkuc wrote:


On 16-02-27 06:09, Yehuda Sadeh-Weinraub wrote:

On Wed, Feb 24, 2016 at 5:48 PM, Ben Hines <bhi...@gmail.com> wrote:
Any idea what is going on here? I get these intermittently, 
especially with

very large file.

The client is doing RANGE requests on this >51 GB file, incrementally
fetching later chunks.

2016-02-24 16:30:59.669561 7fd33b7fe700  1 == starting new request
req=0x7fd32c0879c0 =
2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing 
for

trans_id = tx00037ad24-0056ce4b43-259914b-default
2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=
2016-02-24 16:30:59.669757 7fd33b7fe700 10
s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
s->bucket=
2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing 


2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading 


permissions
2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
50346000384
2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init 
op

2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying 


op mask
2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying 


op permissions
2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
uid=anonymous mask=49
2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not 
found
2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for 
group=1

mask=49
2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for 
group=2

mask=49
2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not 
found
2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions 
id=anonymous

owner= perm=1
2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested 
perm

(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying 


op params
2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing 

2016-02-24 16:30:59.674107 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674193 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674317 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674433 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.046110 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.150966 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.151118 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.161000 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.199553 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.278308 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.312306 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.751626 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.833570 7fd33b7fe700  0 
RGWObjManifest::operator++():

result: ofs=178257920 stripe_ofs=178257920 par

Re: [ceph-users] radosgw flush_read_list(): d->client_c->handle_data() returned -5

2016-09-05 Thread Henrik Korkuc


On 16-02-27 06:09, Yehuda Sadeh-Weinraub wrote:

On Wed, Feb 24, 2016 at 5:48 PM, Ben Hines  wrote:

Any idea what is going on here? I get these intermittently, especially with
very large file.

The client is doing RANGE requests on this >51 GB file, incrementally
fetching later chunks.

2016-02-24 16:30:59.669561 7fd33b7fe700  1 == starting new request
req=0x7fd32c0879c0 =
2016-02-24 16:30:59.669675 7fd33b7fe700  2 req 3648804:0.000114::GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::initializing for
trans_id = tx00037ad24-0056ce4b43-259914b-default
2016-02-24 16:30:59.669687 7fd33b7fe700 10 host=
2016-02-24 16:30:59.669757 7fd33b7fe700 10
s->object=/int8-0.181.4-1654016.2016-02-23_03-53-42.pkg
s->bucket=
2016-02-24 16:30:59.669767 7fd33b7fe700  2 req 3648804:0.000206:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg::getting op
2016-02-24 16:30:59.669776 7fd33b7fe700  2 req 3648804:0.000215:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:authorizing
2016-02-24 16:30:59.669785 7fd33b7fe700  2 req 3648804:0.000224:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:reading
permissions
2016-02-24 16:30:59.673797 7fd33b7fe700 10 manifest: total_size =
50346000384
2016-02-24 16:30:59.673841 7fd33b7fe700  2 req 3648804:0.004280:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:init op
2016-02-24 16:30:59.673867 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673881 7fd33b7fe700 10 cache get:
name=.users.uid+ : hit
2016-02-24 16:30:59.673921 7fd33b7fe700  2 req 3648804:0.004360:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op mask
2016-02-24 16:30:59.673929 7fd33b7fe700  2 req 3648804:0.004369:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op permissions
2016-02-24 16:30:59.673941 7fd33b7fe700  5 Searching permissions for
uid=anonymous mask=49
2016-02-24 16:30:59.673944 7fd33b7fe700  5 Permissions for user not found
2016-02-24 16:30:59.673946 7fd33b7fe700  5 Searching permissions for group=1
mask=49
2016-02-24 16:30:59.673949 7fd33b7fe700  5 Found permission: 1
2016-02-24 16:30:59.673951 7fd33b7fe700  5 Searching permissions for group=2
mask=49
2016-02-24 16:30:59.673953 7fd33b7fe700  5 Permissions for group not found
2016-02-24 16:30:59.673955 7fd33b7fe700  5 Getting permissions id=anonymous
owner= perm=1
2016-02-24 16:30:59.673957 7fd33b7fe700 10  uid=anonymous requested perm
(type)=1, policy perm=1, user_perm_mask=15, acl perm=1
2016-02-24 16:30:59.673961 7fd33b7fe700  2 req 3648804:0.004400:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:verifying
op params
2016-02-24 16:30:59.673965 7fd33b7fe700  2 req 3648804:0.004404:s3:GET
//int8-0.181.4-1654016.2016-02-23_03-53-42.pkg:get_obj:executing
2016-02-24 16:30:59.674107 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=130023424 stripe_ofs=130023424 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674193 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=134217728 stripe_ofs=134217728 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674317 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=138412032 stripe_ofs=138412032 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:30:59.674433 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=142606336 stripe_ofs=142606336 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.046110 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=146800640 stripe_ofs=146800640 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.150966 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=150994944 stripe_ofs=150994944 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.151118 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=155189248 stripe_ofs=155189248 part_ofs=104857600
rule->part_size=52428800
2016-02-24 16:31:00.161000 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=157286400 stripe_ofs=157286400 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.199553 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=161480704 stripe_ofs=161480704 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.278308 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=165675008 stripe_ofs=165675008 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.312306 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=169869312 stripe_ofs=169869312 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.751626 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=174063616 stripe_ofs=174063616 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.833570 7fd33b7fe700  0 RGWObjManifest::operator++():
result: ofs=178257920 stripe_ofs=178257920 part_ofs=157286400
rule->part_size=52428800
2016-02-24 16:31:00.871774 7fd33b7fe700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5



Re: [ceph-users] the reweight value of OSD is always 1

2016-08-31 Thread Henrik Korkuc

Hey,
it is normal for reweight value to be 1. You (with "ceph osd reweight 
OSDNUM newweight") or "ceph osd reweight-by-utilization" can decrease it 
to move some pgs out of that OSD.


Thing that usually differs and depends on disk size is "weight"

On 16-08-31 22:06, 한승진 wrote:

Hi Cephers!

The re-weight value of OSD is always 1 when we create and activate an 
OSD daemon.


I utilize ceph-deploy tool whenever deploy ceph cluster.

Is there a default reweight value of ceph-deploy tool?

Can we adjust the reweight value when we activate OSD daemon?

ID WEIGHT   TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 39.26390 root default
-2  9.81599 host ngcephnode01
 1  0.81799 osd.1   up 1.0  1.0
 3  0.81799 osd.3   up 1.0  1.0
 4  0.81799 osd.4   up 1.0  1.0
 5  0.81799 osd.5   up 1.0  1.0
 6  0.81799 osd.6   up 1.0  1.0



Thanks all!

John Haan



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW 10.2.2 SignatureDoesNotMatch with special characters in object name

2016-08-25 Thread Henrik Korkuc
looks like mine problem is little different. I am not using v4, and 
object names which fail to you works for me


On 16-08-25 11:52, jan hugo prins wrote:

Could this have something to do with: http://tracker.ceph.com/issues/17076

Jan Hugo Prins


On 08/25/2016 10:34 AM, Henrik Korkuc wrote:

Hey,

I stumbled on the problem that RGW upload results in
SignatureDoesNotMatch error when I try uploading file with '@' or some
other special characters.

Can someone confirm same issues? I didn't manage to find bugreports
about it

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW 10.2.2 SignatureDoesNotMatch with special characters in object name

2016-08-25 Thread Henrik Korkuc

Hey,

I stumbled on the problem that RGW upload results in 
SignatureDoesNotMatch error when I try uploading file with '@' or some 
other special characters.


Can someone confirm same issues? I didn't manage to find bugreports about it

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] S3 lifecycle support in Jewel

2016-08-12 Thread Henrik Korkuc

Hey,

I noticed that rgw lifecycle feature got back to master almost a month 
ago. Is there any chance that it will be backported to Jewel? If not, 
are you aware of any incompatibilities with jewel code what would 
prevent/complicate custom build with that code?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to deploy a bluestore ceph cluster without ceph-deploy.

2016-07-29 Thread Henrik Korkuc

You can do it with ceph-disk prepare --bluestore /dev/sdX

Just keep in mind that it is very unstable and will result in corruption 
or other issues.


On 16-07-29 04:36, m13913886...@yahoo.com wrote:

hello cepher , I use ceph-10.2.2 source deploy a cluster.
Since I am the source deployment , I deploy it without ceph-deploy.
how to deploy a bluestore ceph cluster without ceph-deploy.
No official online documentation.
Where relevant documents?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Monitoring slow requests

2016-07-26 Thread Henrik Korkuc

Hey,

I am wondering how people are monitoring/graphing slow requests ("oldest 
blocked for > xxx secs") on their clusters? I didn't find related 
counters to graph. So it looks like mon logs should be parsed for that 
info? Maybe someone has other ideas?


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] how to transfer ceph cluster from the old network-and-hosts to a new one

2016-07-25 Thread Henrik Korkuc

On 16-07-25 10:55, 朱 彤 wrote:


Hi all,


I m looking for a method to transfer ceph cluster.


Now the cluster is located in network1 that has hosts A, B, C...


And the target is to transfer it to network2 that has hosts a,b,c...


What I can think of, is adding hosts a, b, c into the current cluster 
like adding OSD and MON. Then after the data has been rebalanced, down 
OSD and MON on hosts A,B,C



Then the question would be how to know the old OSD could be safely down?


This method causes too much redundant operations, other than creating 
OSD or MON, in the new environment, should I also create PGs and POOLs 
just like the old cluster has?



Is there a more direct way to shift cluster from old network and hosts 
to a new one?




Hey,
please refer to recent post named "change of dns names and IP addresses 
of cluster members" in this mailing list. If both networks are 
interconnected then migration would be quite easy



Thanks!




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] change of dns names and IP addresses of cluster members

2016-07-22 Thread Henrik Korkuc

On 16-07-22 13:33, Andrei Mikhailovsky wrote:

Hello

We are planning to make changes to our IT infrastructure and as a 
result the fqdn and IPs of the ceph cluster will change. Could someone 
suggest the best way of dealing with this to make sure we have a 
minimal ceph downtime?


Can old and new network reach each other? If yes, then you can do it 
without cluster downtime. You can change OSDs IP on server by server 
basis - stop OSDs, rename host, change IP, start OSDs. They should 
connect to cluster with new IP. Rinse and repeat for all.


As for mons you'll need to remove mon out of the cluster, then readd 
with new name and IP, redistribute configs with new mons. Rinse and repeat.


Depending on your use case it is possible that you may have client 
downtime as sometimes it is not possible to change client's config 
without a restart (e.g. qemu machine config)



Many thanks

Andrei


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph OSD with 95% full

2016-07-19 Thread Henrik Korkuc

On 16-07-19 11:44, M Ranga Swami Reddy wrote:

Hi,
Using ceph cluster with 100+ OSDs and cluster is filled with 60% data.
One of the OSD is 95% full.
If an OSD is 95% full, is it impact the any storage operation? Is this
impacts on VM/Instance?
Yes, one OSD will impact whole cluster. It will block write operations 
to the cluster

Immediately I have reduced the OSD weight, which was filled with 95 %
data. After re-weight, data rebalanaced and OSD came to normal state
(ie < 80%) with 1 hour time frame.


Thanks
Swami
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problem with auto mounts osd on v10.2.2

2016-07-18 Thread Henrik Korkuc

On 16-07-18 13:37, Eduard Ahmatgareev wrote:

Hi guys.

Could you help me with some small trouble?
We have new installation ceph version 10.2.2 and we have some 
interesting trouble   with auto mounting osd after reboot storage 
node. We forced to mount osd manual after reboot, and osd work fine.
But in previous version 0.94.5  it was automatically, osd mounted 
after reboot with out any problem and manual action.


Do you know what could happen with different version 0.94.5 and 10.2.2 
in auto mounting?

We using ceph-deploy script for creating osd in ceph cluster.



there is a bug in 10.2.2. http://tracker.ceph.com/issues/16351
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New to Ceph - osd autostart problem

2016-07-18 Thread Henrik Korkuc

On 16-07-18 11:11, Henrik Korkuc wrote:

On 16-07-18 10:53, Henrik Korkuc wrote:

On 16-07-15 10:40, Oliver Dzombic wrote:

Hi,

Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
Partition unique GUID: 79FD1B30-F5AA-4033-BA03-8C7D0A7D49F5
First sector: 256 (at 1024.0 KiB)
Last sector: 976754640 (at 3.6 TiB)
Partition size: 976754385 sectors (3.6 TiB)
Attribute flags: 
Partition name: 'ceph data'


looks same with my disk, and they do autostart -- even if i deactivate
everything in /etc/systemd/system/ceph-osd.target.wants ( centos 7 ).

Ceph will simply generate the symlinks newly @ startup ^^;

This whole start up thing is something that could be renewed in the
documentation, that people would understand better what ceph does 
exactly :)


I am having same problem as Dirk and it looks like 95-ceph-osd.rules 
is not installed in Debian Jessie in 10.2.1 or 10.2.2


I take my statement back - rules are installed in /lib/udev/rules.d/ 
ok, it looks like I found the culprit. 10.2.1 activates disks and 10.2.2 
- doesn't. Adding /lib/udev/rules.d/60-ceph-partuuid-workaround.rules to 
10.2.2 installation from 10.2.1 solves the problem.


This file was removed by Sage:

commit 9f76b9ff31525eac01f04450d72559ec99927496
Author: Sage Weil <s...@redhat.com>
Date:   Mon Apr 18 09:16:02 2016 -0400

udev: remove 60-ceph-partuuid-workaround-rules

These were added to get /dev/disk/by-partuuid/ symlinks to work on
wheezy.  They are no longer needed for the supported distros (el7+,
jessie+, trusty+), and they apparently break dm by opening devices they
should not.

Fixes: http://tracker.ceph.com/issues/15516
Signed-off-by: Sage Weil <s...@redhat.com>
(cherry picked from commit 9f77244b8e0782921663e52005b725cca58a8753)

So it looks like Jessie still needs it...
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New to Ceph - osd autostart problem

2016-07-18 Thread Henrik Korkuc

On 16-07-18 10:53, Henrik Korkuc wrote:

On 16-07-15 10:40, Oliver Dzombic wrote:

Hi,

Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
Partition unique GUID: 79FD1B30-F5AA-4033-BA03-8C7D0A7D49F5
First sector: 256 (at 1024.0 KiB)
Last sector: 976754640 (at 3.6 TiB)
Partition size: 976754385 sectors (3.6 TiB)
Attribute flags: 
Partition name: 'ceph data'


looks same with my disk, and they do autostart -- even if i deactivate
everything in /etc/systemd/system/ceph-osd.target.wants ( centos 7 ).

Ceph will simply generate the symlinks newly @ startup ^^;

This whole start up thing is something that could be renewed in the
documentation, that people would understand better what ceph does 
exactly :)


I am having same problem as Dirk and it looks like 95-ceph-osd.rules 
is not installed in Debian Jessie in 10.2.1 or 10.2.2



I take my statement back - rules are installed in /lib/udev/rules.d/


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] New to Ceph - osd autostart problem

2016-07-18 Thread Henrik Korkuc

On 16-07-15 10:40, Oliver Dzombic wrote:

Hi,

Partition GUID code: 4FBD7E29-9D25-41B8-AFD0-062C0CEFF05D (Unknown)
Partition unique GUID: 79FD1B30-F5AA-4033-BA03-8C7D0A7D49F5
First sector: 256 (at 1024.0 KiB)
Last sector: 976754640 (at 3.6 TiB)
Partition size: 976754385 sectors (3.6 TiB)
Attribute flags: 
Partition name: 'ceph data'


looks same with my disk, and they do autostart -- even if i deactivate
everything in /etc/systemd/system/ceph-osd.target.wants ( centos 7 ).

Ceph will simply generate the symlinks newly @ startup ^^;

This whole start up thing is something that could be renewed in the
documentation, that people would understand better what ceph does exactly :)

I am having same problem as Dirk and it looks like 95-ceph-osd.rules is 
not installed in Debian Jessie in 10.2.1 or 10.2.2


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mark out vs crush weight 0

2016-05-18 Thread Henrik Korkuc

On 16-05-18 14:23, Sage Weil wrote:

Currently, after an OSD has been down for 5 minutes, we mark the OSD
"out", whic redistributes the data to other OSDs in the cluster.  If the
OSD comes back up, it marks the OSD back in (with the same reweight value,
usually 1.0).

The good thing about marking OSDs out is that exactly the amount of data
on the OSD moves.  (Well, pretty close.)  It is uniformly distributed
across all other devices.

The bad thing is that if the OSD really is dead, and you remove it from
the cluster, or replace it and recreate the new OSD with a new OSD id,
there is a second data migration that sucks data out of the part of the
crush tree where the removed OSD was.  This move is non-optimal: if the
drive is size X, some data "moves" from the dead OSD to other N OSDs on
the host (X/N to each), and the same amount of data (X) moves off the host
(uniformly coming from all N+1 drives it used to live on).  The same thing
happens at the layer up: some data will move from the host to peer hosts
in the rack, and the same amount will move out of the rack.  This is a
byproduct of CRUSH's hierarchical placement.

If the lifecycle is to let drives fail, mark them out, and leave them
there forever in the 'out' state, then the current behavior is fine,
although over time you'll have lot sof dead+out osds that slow things down
marginally.

If the procedure is to replace dead OSDs and re-use the same OSD id, then
this also works fine.  Unfortunately the tools don't make this easy (that
I know of).

But if the procedure is to remove dead OSDs, or to remove dead OSDs and
recreate new OSDs in their place, probably with a fresh OSD id, then you
get this extra movement.  In that case, I'm wondering if we should allow
the mons to *instead* se the crush weight to 0 after the osd is down for
too long.  For that to work we need to set a flag so that if the OSD comes
back up it'll restore the old crush weight (or more likely make the
normal osd startup crush location update do so with the OSDs advertised
capacity).  Is it sensible?

And/or, anybody have a good idea how the tools can/should be changed to
make the osd replacement re-use the osd id?

sage

maybe something like "ceph-disk prepare /dev/sdX --replace="
which would remove old osd and set up new in place of it. I am just not 
sure if bootstrap-osd permissions would be enough for that.

ceph-deploy could have something similar



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying ceph by hand: a few omissions

2016-05-02 Thread Henrik Korkuc

On 16-05-02 02:14, Stuart Longland wrote:

On 02/05/16 00:32, Henrik Korkuc wrote:

mons generate these bootstrep keys. You can find them in
/var/lib/ceph/bootstrap-*/ceph.keyring

on pre-infernalis there were created automagically (I guess by init).
Infernalis and jewel have ceph-create-keys@.service systemd job for that.

Just place that dir with file in same location on OSD hosts and you'll
be able to activate OSDs.

Yeah, in my case the OSD hosts are the MON hosts, and there was no such
file or directory created on any of them.  Monitors were running at the
time.
you need to run ceph-create-keys systemd job to generate these keys (or 
the command in there). I am not sure if it is intentional that it 
doesn't run automatically or just some dependency problem. I think Jewel 
did create keys for me. I didn't put much attention to it as it is rare 
for me to start new clusters.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deploying ceph by hand: a few omissions

2016-05-01 Thread Henrik Korkuc
mons generate these bootstrep keys. You can find them in 
/var/lib/ceph/bootstrap-*/ceph.keyring


on pre-infernalis there were created automagically (I guess by init). 
Infernalis and jewel have ceph-create-keys@.service systemd job for that.


Just place that dir with file in same location on OSD hosts and you'll 
be able to activate OSDs.


On 16-05-01 13:46, Stuart Longland wrote:

Hi all,

This evening I was in the process of deploying a ceph cluster by hand.
I did it by hand because to my knowledge, ceph-deploy doesn't support
Gentoo, and my cluster here runs that.

The instructions I followed are these ones:
http://docs.ceph.com/docs/master/install/manual-deployment and I'm
running the 10.0.2 release of Ceph:

ceph version 10.0.2 (86764eaebe1eda943c59d7d784b893ec8b0c6ff9)

Things went okay bootstrapping the monitors.  I'm running a 3-node
cluster, with OSDs and monitors co-located.  Each node has a 1TB 2.5"
HDD and a 40GB partition on SSD for the journal.

Things went pear shaped however when I tried bootstrapping the OSDs.
All was going fine until it came time to activate my first OSD.

ceph-disk activate barfed because I didn't have the bootstrap-osd key.
No one told me I needed to create one, or how to do it.  There's a brief
note about using --activate-key, but no word on what to pass as the
argument.  I tried passing in my admin keyring in /etc/ceph, but it
didn't like that.

In the end, I muddled my way through the manual OSD deployment steps,
which worked fine.  After correcting permissions for the ceph user, I
found the OSDs came up.  As an added bonus, I now know how to work
around the journal permission issue at work since I've reproduced it
here, using a UDEV rules file like the following:

SUBSYSTEM=="block", KERNEL=="sda7", OWNER="ceph", GROUP="ceph", MODE="0600"

The cluster seems to be happy enough now, but some notes on how one
generates the OSD activation keys to use with `ceph-disk activate` would
be a big help.

Regards,


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] double rebalance when removing osd

2016-01-11 Thread Henrik Korkuc

On 16-01-11 04:10, Rafael Lopez wrote:

Thanks for the replies guys.

@Steve, even when you remove due to failing, have you noticed that the 
cluster rebalances twice using the documented steps? You may not if 
you don't wait for the initial recovery after 'ceph osd out'. If you 
do 'ceph osd out' and immediately 'ceph osd crush remove', RH support 
has told me that this effectively 'cancels' the original move 
triggered from 'ceph osd out' and starts permanently remapping... 
which still doesn't really explain why we have to do the ceph osd out 
in the first place..


It needs to be tested, but I think it may not allow to do crush remove 
before doing osd out (e.g. you shouldn't be removing osds from crush 
which are in cluster). At least it was the case with up OSDs when I was 
doing some testing


@Dan, good to hear it works, I will try that method next time and see 
how it goes!



On 8 January 2016 at 03:08, Steve Taylor 
> 
wrote:


If I’m not mistaken, marking an osd out will remap its placement
groups temporarily, while removing it from the crush map will
remap the placement groups permanently. Additionally, other
placement groups from other osds could get remapped permanently
when an osd is removed from the crush map. I would think the only
benefit to marking an osd out before stopping it would be a
cleaner redirection of client I/O before the osd disappears, which
may be worthwhile if you’re removing a healthy osd.

As for reweighting to 0 prior to removing an osd, it seems like
that would give the osd the ability to participate in the recovery
essentially in read-only fashion (plus deletes) until it’s empty,
so objects wouldn’t become degraded as placement groups are
backfilling onto other osds. Again, this would really only be
useful if you’re removing a healthy osd. If you’re removing an osd
where other osds in different failure domains are known to be
unhealthy, it seems like this would be a really good idea.

I usually follow the documented steps you’ve outlined myself, but
I’m typically removing osds due to failed/failing drives while the
rest of the cluster is healthy.



*Steve Taylor*| Senior Software Engineer | StorageCraft Technology
Corporation 
380 Data Drive Suite 300 | Draper | Utah | 84020
*Office: *801.871.2799 | *Fax: *801.545.4705



If you are not the intended recipient of this message, be advised
that any dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender
and delete it, together with any attachments.

*From:*ceph-users [mailto:ceph-users-boun...@lists.ceph.com
] *On Behalf Of *Rafael
Lopez
*Sent:* Wednesday, January 06, 2016 4:53 PM
*To:* ceph-users@lists.ceph.com 
*Subject:* [ceph-users] double rebalance when removing osd

Hi all,

I am curious what practices other people follow when removing OSDs
from a cluster. According to the docs, you are supposed to:

1. ceph osd out

2. stop daemon

3. ceph osd crush remove

4. ceph auth del

5. ceph osd rm

What value does ceph osd out (1) add to the removal process and
why is it in the docs ? We have found (as have others) that by
outing(1) and then crush removing (3), the cluster has to do two
recoveries. Is it necessary? Can you just do a crush remove
without step 1?

I found this earlier message from GregF which he seems to affirm
that just doing the crush remove is fine:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-January/007227.html

This recent blog post from Sebastien that suggests reweighting to
0 first, but havent tested it:

http://www.sebastien-han.fr/blog/2015/12/11/ceph-properly-remove-an-osd/

I thought that by marking it out, it sets the reweight to 0
anyway, so not sure how this would make a difference in terms of
two rebalances but maybe there is a subtle difference.. ?

Thanks,

Raf

-- 


Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions




--
Senior Storage Engineer - Automation and Delivery
Infrastructure Services - eSolutions
738 Blackburn Rd, Clayton
Monash University 3800
Telephone: +61 3 9905 9118 
Mobile:   +61 4 27 682 670
Email rafael.lo...@monash.edu 



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___

Re: [ceph-users] upgrading 0.94.5 to 9.2.0 notes

2015-11-21 Thread Henrik Korkuc

On 15-11-20 17:14, Kenneth Waegeman wrote:

<...>
* systemctl start ceph.target does not start my osds.., I have to 
start them all with systemctl start ceph-osd@...
* systemctl restart ceph.target restart the running osds, but not the 
osds that are not yet running.

* systemctl stop ceph.target stops everything, as expected :)

I didn't have a chance for complete testing (still preparing for 
upgrade), but I think that I saw in service files, that they install 
under ceph.target, so if you would do "systemctl enable 
ceph-osd@" for all OSDs on the server, they could be 
started/stopped by ceph.target


I didn't tested everything thoroughly yet, but does someone has seen 
the same issues?


Thanks!

Kenneth
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph same rbd on multiple client

2015-10-23 Thread Henrik Korkuc
can you paste dmesg and system logs? I am using 3 node OCFS2 with RBD 
and had no problems.


On 15-10-23 08:40, gjprabu wrote:

Hi Frederic,

   Can you give me some solution, we are spending more time to 
solve this issue.


Regards
Prabu




 On Thu, 15 Oct 2015 17:14:13 +0530 *Tyler Bishop 
* wrote 


I don't know enough on ocfs to help.  Sounds like you have
unconccurent writes though

Sent from TypeMail 
On Oct 15, 2015, at 1:53 AM, gjprabu > wrote:

Hi Tyler,

   Can please send me the next setup action to be taken on
this issue.

Regards
Prabu


 On Wed, 14 Oct 2015 13:43:29 +0530 *gjprabu
>* wrote 

Hi Tyler,

 Thanks for your reply. We have disabled rbd_cache
but still issue is persist. Please find our configuration
file.

# cat /etc/ceph/ceph.conf
[global]
fsid = 944fa0af-b7be-45a9-93ff-b9907cfaee3f
mon_initial_members = integ-hm5, integ-hm6, integ-hm7
mon_host = 192.168.112.192,192.168.112.193,192.168.112.194
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_size = 2

[mon]
mon_clock_drift_allowed = .500

[client]
rbd_cache = false


--

 cluster 944fa0af-b7be-45a9-93ff-b9907cfaee3f
 health HEALTH_OK
 monmap e2: 3 mons at

{integ-hm5=192.168.112.192:6789/0,integ-hm6=192.168.112.193:6789/0,integ-hm7=192.168.112.194:6789/0}
election epoch 480, quorum 0,1,2
integ-hm5,integ-hm6,integ-hm7
 osdmap e49780: 2 osds: 2 up, 2 in
  pgmap v2256565: 190 pgs, 2 pools, 1364 GB data, 410
kobjects
2559 GB used, 21106 GB / 24921 GB avail
 190 active+clean
  client io 373 kB/s rd, 13910 B/s wr, 103 op/s


Regards
Prabu

 On Tue, 13 Oct 2015 19:59:38 +0530 *Tyler Bishop
>* wrote 

You need to disable RBD caching.





*Tyler Bishop
*Chief Technical Officer
513-299-7108 x10

tyler.bis...@beyondhosting.net


If you are not the intended recipient of this
transmission you are notified that disclosing,
copying, distributing or taking any action in reliance
on the contents of this information is strictly
prohibited.






*From: *"gjprabu" >
*To: *"Frédéric Nass" >
*Cc: *">"
>, "Siva Sokkumuthu"
>, "Kamal Kannan
Subramani(kamalakannan)" >
*Sent: *Tuesday, October 13, 2015 9:11:30 AM
*Subject: *Re: [ceph-users] ceph same rbd on multiple
client

Hi ,

 We have CEPH  RBD with OCFS2 mounted servers. we are
facing i/o errors simultaneously while move the folder
using one nodes in the same disk other nodes data
replicating with below said error (Copying is not
having any problem). Workaround if we remount the
partition this issue get resolved but after sometime
problem again reoccurred. please help on this issue.

Note : We have total 5 Nodes, here two nodes working
fine other nodes are showing like below input/output
error on moved data's.

ls -althr
ls: cannot access LITE_3_0_M4_1_TEST: Input/output error
ls: cannot access LITE_3_0_M4_1_OLD: Input/output error
total 0
d? ? ? 

Re: [ceph-users] help! Ceph Manual Depolyment

2015-09-18 Thread Henrik Korkuc

On 15-09-17 18:59, wikison wrote:


Is there any detailed manual deployment document? I downloaded the 
source and built ceph, then installed ceph on 7 computers. I used 
three as monitors and four as OSD. I followed the official document on 
ceph.com. But it didn't work and it seemed to be out-dated. Could 
anybody help me?


What documentation did you follow? What doesn't work for you? I recently 
launched Ceph cluster without ceph-deploy, so maybe I'll be able to help 
you out


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] latest Hammer for Ubuntu precise

2015-06-25 Thread Henrik Korkuc

Hey,

I am having problems too - non-ceph dependencies cannot be satified 
(newer package versions are required than exists in distro):


# aptitude install ceph
The following NEW packages will be installed:
  libboost-program-options1.55.0{a} libboost-system1.55.0{a}
  libboost-thread1.55.0{a}
The following packages will be upgraded:
  ceph{b}
1 packages upgraded, 3 newly installed, 0 to remove and 35 not upgraded.
Need to get 11.7 MB of archives. After unpacking 18.4 MB will be used.
The following packages have unmet dependencies:
 ceph : Depends: libgoogle-perftools4 which is a virtual package.
Depends: libleveldb1 but it is not going to be installed.
Depends: liblttng-ust0 (= 2.5.0) but it is not going to be 
installed.
Depends: libnspr4 (= 2:4.9-2~) but 4.10.7-0ubuntu0.12.04.1 is 
installed. or
 libnspr4-0d (= 1.8.0.10) but it is not going to be 
installed.

Depends: libstdc++6 (= 4.9) but 4.6.3-1ubuntu5 is installed.
The following actions will resolve these dependencies:

 Remove the following packages:
1) ceph
2) ceph-mds




# aptitude install ceph-fuse
The following NEW packages will be installed:
  libboost-system1.55.0{a} libboost-thread1.55.0{a}
The following packages will be upgraded:
  ceph-fuse{b}
1 packages upgraded, 2 newly installed, 0 to remove and 35 not upgraded.
Need to get 2,489 kB of archives. After unpacking 339 kB will be used.
The following packages have unmet dependencies:
 ceph-fuse : Depends: libfuse2 (= 2.9) but 2.8.6-2ubuntu2 is installed 
and it is kept back.
 Depends: libnspr4 (= 2:4.9-2~) but 
4.10.7-0ubuntu0.12.04.1 is installed. or
  libnspr4-0d (= 1.8.0.10) but it is not going to 
be installed.

 Depends: libstdc++6 (= 4.9) but 4.6.3-1ubuntu5 is installed.
The following actions will resolve these dependencies:

 Remove the following packages:
1) ceph-fuse

 Leave the following dependencies unresolved:
2) ceph-mds recommends ceph-fuse


On 15-06-23 22:30, Castillon de la Cruz, Eddy Gonzalo wrote:

Hello Team,

I   have tested the new packages and I have a very strange issue: The 
software has been uninstalled.  My test cluster runs in Ubuntu 12.04. 
 I hope that has been  a wrong of me.


root@ceph1:/var/log# ceph -s
The program 'ceph' is currently not installed. You can install it by 
typing:

apt-get install ceph-common


I updated the repository and I  ran apt-get dist-upgrade

root@ceph1:/var/log# more apt/history.log


Start-Date: 2015-06-23 14:04:33
Commandline: apt-get dist-upgrade -V
Upgrade: ceph-fs-common:amd64 (0.94.1-1precise, 0.94.2-1), 
python-cephfs:amd64 (0.94.1-1precise, 0.94.2-1), python-rbd:amd64 
(0.94.1-1precise, 0.94.2-1), python-ceph:am
d64 (0.94.1-1precise, 0.94.2-1), patch:amd64 (2.6.1-3, 
2.6.1-3ubuntu0.1), python-rados:amd64 (0.94.1-1precise, 0.94.2-1)
 Remove: ceph-common:amd64 (0.94.1-1precise), ceph-mds:amd64 
(0.94.1-1precise), ceph:amd64 (0.94.1-1precise)

End-Date: 2015-06-23 14:04:46

root@ceph1:/var/log# dpkg --list | grep -i ceph
rc ceph 0.94.1-1precise distributed storage and file system
rc ceph-common 0.94.1-1precise common utilities to mount and interact 
with a ceph storage cluster
ii ceph-deploy 1.5.25precise Ceph-deploy is an easy to use 
configuration tool
ii ceph-fs-common 0.94.2-1 common utilities to mount and interact with 
a ceph file system
rc ceph-mds 0.94.1-1precise metadata server for the ceph distributed 
file system
ii curl 7.29.0-1precise.ceph command line tool for transferring data 
with URL syntax

ii libcephfs1 0.94.1-1precise Ceph distributed file system client library
ii libcurl3 7.29.0-1precise.ceph easy-to-use client-side URL transfer 
library (OpenSSL flavour)
ii libcurl3-gnutls 7.29.0-1precise.ceph easy-to-use client-side URL 
transfer library (GnuTLS flavour)
ii python-ceph 0.94.2-1 Meta-package for python libraries for the Ceph 
libraries

ii python-cephfs 0.94.2-1 Python libraries for the Ceph libcephfs library
ii python-rados 0.94.2-1 Python libraries for the Ceph librados library
ii python-rbd 0.94.2-1 Python libraries for the Ceph librbd library



*Eddy Gonzalo Castillon de la Cruz*
Sr. Network Engineer
ecastil...@axcess-financial.com 
mailto:ecastil...@axcess-financial.com%7D

P. +511 6408117 Ext. 2175
C. +51 983279761
AXCESS FINANCIAL PERU
Av. Pardo y Aliaga 675, of 301
San Isidro, Lima 27 - PERU


*De: *Alfredo Deza ad...@redhat.com
*Para: *Andrei Mikhailovsky and...@arhont.com
*CC: *ceph-users@lists.ceph.com
*Enviados: *Martes, 23 de Junio 2015 11:26:50
*Asunto: *Re: [ceph-users] latest Hammer for Ubuntu precise

And this is now available!

Again, apologies for the error. Let me know if there are any issues


-Alfredo

- Original Message -
From: Alfredo Deza ad...@redhat.com
To: Andrei Mikhailovsky and...@arhont.com
Cc: ceph-users@lists.ceph.com
Sent: Tuesday, June 23, 2015 

Re: [ceph-users] Restarting OSD leads to lower CPU usage

2015-06-11 Thread Henrik Korkuc

On 6/11/15 12:21, Jan Schermer wrote:

Hi,
hoping someone can point me in the right direction.

Some of my OSDs have a larger CPU usage (and ops latencies) than others. If I 
restart the OSD everything runs nicely for some time, then it creeps up.

1) most of my OSDs have ~40% CPU (core) usage (user+sys), some are closer to 
80%. Restarting means the offending OSDs only use 40% again.
2) average latencies and CPU usage on the host are the same - so it’s not 
caused by the host that the OSD is running on
3) I can’t say exactly when or how the issue happens. I can’t even say if it’s 
the same OSDs. It seems it either happens when something heavy happens in a 
cluster (like dropping very old snapshots, rebalancing) and then doesn’t come 
back, or maybe it happens slowly over time and I can’t find it in the graphs. 
Looking at the graphs it seems to be the former.

I have just one suspicion and that is the “fd cache size” - we have it set to 
16384 but the open fds suggest there are more open files for the osd process 
(over 17K fds) - it varies by some hundreds between the osds. Maybe some are 
just slightly over the limit and the misses cause this? Restarting the OSD 
clears them (~2K) and they increase over time. I increased it to 32768 
yesterday and it consistently nice now, but it might take another few days to 
manifest…
Could this explain it? Any other tips?

What about disk IO? Are OSDs scrubbing or deep-scrubbing?



Thanks

Jan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Debian Jessie packages?

2015-05-12 Thread Henrik Korkuc

Hey,

as Debian Jessie is already released for some time, I'd like to ask is 
there any plans to build newer Ceph packages for it?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW buckets sync to AWS?

2015-03-31 Thread Henrik Korkuc

Hello,

can anyone recommend script/program to periodically synchronize RGW 
buckets with Amazon's S3?


--
Sincerely
Henrik
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] One host failure bring down the whole cluster

2015-03-31 Thread Henrik Korkuc

On 3/31/15 11:27, Kai KH Huang wrote:

1) But Ceph says ...You can run a cluster with 1 monitor. 
(http://ceph.com/docs/master/rados/operations/add-or-rm-mons/), I assume it should work. 
And brain split is not my current concern

Point is that you must have majority of monitors up.
* In one monitor setup you need one monitor running,
* In two monitor setup you need two monitors running,because if one goes 
down you do not have majority up,
* In three monitor setup you need at least two monitors up, because if 
one goes down you still have majority up,

* 4 - at least 3
* 5 - at least 3
* etc




2) I've written object to Ceph, now I just want to get it back

Anyway. I tried to reduce the mon number to 1. But after I remove it following 
the steps, it just cannot start up any more

1. [root~]  service ceph -a stop mon.serverB
2. [root~]  ceph mon remove serverB ## hang here forever
3. #Remove the monitor entry from ceph.conf.
4. Restart ceph service
It is grey area for me, but I think that you failed to remove that 
monitor because you didn't have a quorum for operation to succeed. I 
think you'll need to modify monmap manually and remove second monitor 
from it




[root@serverA~]# systemctl status ceph.service -l
ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/rc.d/init.d/ceph)
Active: failed (Result: timeout) since Tue 2015-03-31 15:46:25 CST; 3min 
15s ago
   Process: 2937 ExecStop=/etc/rc.d/init.d/ceph stop (code=exited, 
status=0/SUCCESS)
   Process: 3670 ExecStart=/etc/rc.d/init.d/ceph start (code=killed, 
signal=TERM)

Mar 31 15:44:26 serverA ceph[3670]: === osd.6 ===
Mar 31 15:44:56 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.6 --keyring=/var/lib/ceph/osd/ceph-6/keyring osd 
crush create-or-move -- 6 3.64 host=serverA root=default'
Mar 31 15:44:56 serverA ceph[3670]: === osd.7 ===
Mar 31 15:45:26 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring osd 
crush create-or-move -- 7 3.64 host=serverA root=default'
Mar 31 15:45:26 serverA ceph[3670]: === osd.8 ===
Mar 31 15:45:57 serverA ceph[3670]: failed: 'timeout 30 /usr/bin/ceph -c 
/etc/ceph/ceph.conf --name=osd.8 --keyring=/var/lib/ceph/osd/ceph-8/keyring osd 
crush create-or-move -- 8 3.64 host=serverA root=default'
Mar 31 15:45:57 serverA ceph[3670]: === osd.9 ===
Mar 31 15:46:25 serverA systemd[1]: ceph.service operation timed out. 
Terminating.
Mar 31 15:46:25 serverA systemd[1]: Failed to start LSB: Start Ceph distributed 
file system daemons at boot time.
Mar 31 15:46:25 serverA systemd[1]: Unit ceph.service entered failed state.

/var/log/ceph/ceph.log says:
2015-03-31 15:55:57.648800 mon.0 10.???.78:6789/0 1048 : cluster [INF] osd.21 
10.???.78:6855/25598 failed (39 reports from 9 peers after 20.118062 = grace 
20.00)
2015-03-31 15:55:57.931889 mon.0 10.???.78:6789/0 1055 : cluster [INF] osd.15 
10..78:6825/23894 failed (39 reports from 9 peers after 20.401379 = grace 
20.00)

Obviously serverB is down, but it should not affect serverA from functioning? 
Right?

From: Gregory Farnum [g...@gregs42.com]
Sent: Tuesday, March 31, 2015 11:53 AM
To: Lindsay Mathieson; Kai KH Huang
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] One host failure bring down the whole cluster

On Mon, Mar 30, 2015 at 8:02 PM, Lindsay Mathieson
lindsay.mathie...@gmail.com wrote:

On Tue, 31 Mar 2015 02:42:27 AM Kai KH Huang wrote:

Hi, all
 I have a two-node Ceph cluster, and both are monitor and osd. When
they're both up, osd are all up and in, everything is fine... almost:



Two things.

1 -  You *really* need a min of three monitors. Ceph cannot form a quorum with
just two monitors and you run a risk of split brain.

You can form quorums with an even number of monitors, and Ceph does so
— there's no risk of split brain.

The problem with 2 monitors is that a quorum is always 2 — which is
exactly what you're seeing right now. You can't run with only one
monitor up (assuming you have a non-zero number of them).


2 - You also probably have a min size of two set (the default). This means
that you need a minimum  of two copies of each data object for writes to work.
So with just two nodes, if one goes down you can't write to the other.

Also this.



So:
- Install a extra monitor node - it doesn't have to be powerful, we just use a
Intel Celeron NUC for that.

- reduce your minimum size to 1 (One).

Yep.
-Greg
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cannot add OSD node into crushmap or all writes fail

2015-03-31 Thread Henrik Korkuc

check firewall rules, network connectivity.
Can all nodes and clients reach each other? Can you telnet to OSD ports 
(note that multiple OSDs may listen on differenct ports)?


On 3/31/15 8:44, Tyler Bishop wrote:
I have this ceph node that will correctly recover into my ceph pool 
and performance looks to be normal for the rbd clients.  However after 
a few minutes once finishing recovery the rbd clients begin to fall 
over and cannot write data to the pool.


I've been trying to figure this out for weeks! None of the logs 
contain anything relevant at all.


If I disable the node in the crushmap the rbd clients immediately 
begin writing to the other nodes.


Ideas?




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Henrik Korkuc
I think that there will be no big scrub, as there are limits of maximum 
scrubs at a time.

http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

If we take osd max scrubs which is 1 by default, then you will not get 
more than 1 scrub per OSD.


I couldn't quickly find if there are cluster wide limits.

On 3/13/15 10:46, Wido den Hollander wrote:


On 13-03-15 09:42, Andrija Panic wrote:

Hi all,

I have set nodeep-scrub and noscrub while I had small/slow hardware for
the cluster.
It has been off for a while now.

Now we are upgraded with hardware/networking/SSDs and I would like to
activate - or unset these flags.

Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
was wondering what is the best way to unset flags - meaning if I just
unset the flags, should I expect that the SCRUB will start all of the
sudden on all disks - or is there way to let the SCRUB do drives one by
one...


So, I *think* that unsetting these flags will trigger a big scrub, since
all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

You can verify this with:

$ ceph pg pgid query

A solution would be to scrub each PG manually first in a timely fashion.

$ ceph pg scrub pgid

That way you set the timestamps and slowly scrub each PG.

When that's done, unset the flags.

Wido


In other words - should I expect BIG performance impact ornot ?

Any experience is very appreciated...

Thanks,

--

Andrija Panić


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Turning on SCRUB back on - any suggestion ?

2015-03-13 Thread Henrik Korkuc

I think settings apply to both kinds of scrubs

On 3/13/15 13:31, Andrija Panic wrote:

Interestingthx for that Henrik.

BTW, my placements groups are arround 1800 objects (ceph pg dump) - 
meainng max of 7GB od data at the moment,


regular scrub just took 5-10sec to finish. Deep scrub would I guess 
take some minutes for sure


What about deepscrub - timestamp is still some months ago, but regular 
scrub is fine now with fresh timestamp...?


I don't see max deep scrub setings - or are these settings applied in 
general for both kind on scrubs ?


Thanks



On 13 March 2015 at 12:22, Henrik Korkuc li...@kirneh.eu 
mailto:li...@kirneh.eu wrote:


I think that there will be no big scrub, as there are limits of
maximum scrubs at a time.
http://ceph.com/docs/master/rados/configuration/osd-config-ref/#scrubbing

If we take osd max scrubs which is 1 by default, then you will
not get more than 1 scrub per OSD.

I couldn't quickly find if there are cluster wide limits.


On 3/13/15 10:46, Wido den Hollander wrote:

On 13-03-15 09:42, Andrija Panic wrote:

Hi all,

I have set nodeep-scrub and noscrub while I had small/slow hardware for
the cluster.
It has been off for a while now.

Now we are upgraded with hardware/networking/SSDs and I would like to
activate - or unset these flags.

Since I now have 3 servers with 12 OSDs each (SSD based Journals) - I
was wondering what is the best way to unset flags - meaning if I just
unset the flags, should I expect that the SCRUB will start all of the
sudden on all disks - or is there way to let the SCRUB do drives one by
one...


So, I *think* that unsetting these flags will trigger a big scrub, since
all PGs have a very old last_scrub_stamp and last_deepscrub_stamp

You can verify this with:

$ ceph pg pgid query

A solution would be to scrub each PG manually first in a timely fashion.

$ ceph pg scrub pgid

That way you set the timestamps and slowly scrub each PG.

When that's done, unset the flags.

Wido


In other words - should I expect BIG performance impact ornot ?

Any experience is very appreciated...

Thanks,

-- 


Andrija Panić


___
ceph-users mailing list
ceph-users@lists.ceph.com  mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com  mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com mailto:ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--

Andrija Panić


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph free space

2015-03-10 Thread Henrik Korkuc

On 3/10/15 11:06, Mateusz Skała wrote:


Hi,

In my cluster is something wrong with free space. In cluster with 
10OSD (5*1TB + 5*2TB) ‘ceph –s’ shows:


11425 GB used, 2485 GB / 13910 GB avail

But I have only 2 rbd disks in one pool (‘rbd’):

rados df

pool name   category KB  objects   clones degraded 
unfound   rdrd KB   wrwr KB


rbd - 3976154023   9714340 6474 
0 11542224   1391869743   742847385900453


  total used 11988041672   971434

  total avail 2598378648

  total space14586420320

rbd ls

vm-100-disk-1

vm-100-disk-2

rbd info vm-100-disk-1

rbd image 'vm-100-disk-1':

size 16384 MB in 4096 objects

order 22 (4096 kB objects)

block_name_prefix: rbd_data.14ef2ae8944a

format: 2

features: layering

rbd info vm-100-disk-2

rbd image 'vm-100-disk-2':

size 4096 GB in 1048576 objects

order 22 (4096 kB objects)

block_name_prefix: rbd_data.15682ae8944a

format: 2

features: layering

So my rbd disks use only 4112GB. Default size of cluster is 2 so used 
space should be ca 8224GB, why ceph –s shows 11425 GB ?


Can someone explain this situation?

Thanks, Mateusz


Hey,

what does ceph df show?

ceph -s shows raw disk usage so there will be some overhead from file 
system, also maybe you left some files there?






___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RadosGW - multiple dns names

2015-02-23 Thread Henrik Korkuc
I didn't have a need for this kind of setup, but as you already need 
http server (apache, nginx, etc) to proxy requests to rgw, you could 
setup all domains on it and when forwarding use only one.


On 2/21/15 1:58, Shinji Nakamoto wrote:
We have multiple interfaces on our Rados gateway node, each of which 
is assigned to one of our many VLANs with a unique IP address.


Is it possible to set multiple DNS names for a single Rados GW, so it 
can handle the request to each of the VLAN specific IP address DNS names?



eg.
rgw dns name = prd-apiceph001
rgw dns name = prd-backendceph001
etc.





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.90 released

2014-12-23 Thread Henrik Korkuc

On 12/23/14 12:57, René Gallati wrote:

Hello,

so I upgraded my cluster from 89 to 90 and now I get:

~# ceph health
HEALTH_WARN too many PGs per OSD (864  max 300)

That is a new one. I had too few but never too many. Is this a problem 
that needs attention, or ignorable? Or is there even a command now to 
shrink PGs?


The message did not appear before, I currently have 32 OSDs over 8 
hosts and 9 pools, each with 1024 PG as was the recommended number 
according to the OSD * 100 / replica formula, then round to next power 
of 2. The cluster has been increased by 4 OSDs, 8th host only days 
before. That is to say, it was at 28 OSD / 7 hosts / 9 pools but after 
extending it with another host, ceph 89 did not complain.


Using the formula again I'd actually need to go to 2048PGs in pools 
but ceph is telling me to reduce the PG count now?


formula recommends PG count for all pools, not each pool. So you need 
about 2048 PGs total distributed by expected pool size.


from http://ceph.com/docs/master/rados/operations/placement-groups/:
When using multiple data pools for storing objects, you need to ensure 
that you balance the number of placement groups per pool with the number 
of placement groups per OSD so that you arrive at a reasonable total 
number of placement groups that provides reasonably low variance per OSD 
without taxing system resources or making the peering process too slow.




Kind regards

René
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Centos 7 qemu

2014-10-05 Thread Henrik Korkuc
Hi,
Centos 7 qemu out of the box does not support rbd.

I had to build package with rbd support manually with %define rhev 1
in qemu-kvm spec file. I also had to salvage some files from src.rpm
file which were missing from centos git.

On 2014.10.04 11:31, Ignazio Cassano wrote:

 Hi all,
 I'd like to know if centos 7 qemu and libvirt suppirt rbd or if there
 are some extra packages.
 Regards

 Ignazio



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD backfill full tunings

2014-06-30 Thread Henrik Korkuc
well, at least for me it is live-updateable (0.80.1). It may be that
during recovery OSDs are currently backfilling other pgs, so stats are
not updated (because pg were not tried to backfill after setting change).

On 2014.06.30 18:31, Gregory Farnum wrote:
 It looks like that value isn't live-updateable, so you'd need to
 restart after changing the daemon's config. Sorry!
 Made a ticket: http://tracker.ceph.com/issues/8695
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com


 On Mon, Jun 30, 2014 at 12:41 AM, Kostis Fardelas dante1...@gmail.com wrote:
 Hi,
 during PGs remapping, the cluster recovery process sometimes gets
 stuck on PGs with backfill_toofull state. The obvious solution is to
 reweight the impacted OSD until we add new OSDs to the cluster. In
 order to force the remapping process to complete asap we try to inject
 a higher value on osd_backfill_full_ratio tunable (by default on
 85%). However, after applying the higher backfill full ratio values,
 the remapping does not seem to start and continues to be stuck with
 backfill_toofull PGs. Is there something more we should try?

 Thanks,
 Kostis
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HowTo mark an OSD as down?

2014-06-23 Thread Henrik Korkuc
On 2014.06.23 10:01, Udo Lembke wrote:
 Hi,
 AFAIK should an ceph osd down osd.29 marked osd.29 as down.
 But what is to do if this don't happens?

 I got following:
 root@ceph-02:~# ceph osd down osd.29
 marked down osd.29.

 root@ceph-02:~# ceph osd tree
 2014-06-23 08:51:00.588042 7f15747f5700  0 -- :/1018258 
 172.20.2.11:6789/0 pipe(0x7f157002a370 sd=3 :0 s=1 pgs=0 cs=0 l=1
 c=0x7f157002a5d0).fault
 # idweight  type name   up/down reweight
 -1  203.8   root default
 -3  203.8   rack unknownrack
 -2  29.12   host ceph-01
 52  3.64osd.52  up  1
 53  3.64osd.53  up  1
 54  3.64osd.54  up  1
 55  3.64osd.55  up  1
 56  3.64osd.56  up  1
 57  3.64osd.57  up  1
 58  3.64osd.58  up  1
 59  3.64osd.59  up  1
 -4  43.68   host ceph-02
 8   3.64osd.8   up  1
 10  3.64osd.10  up  1
 9   3.64osd.9   up  0.8936
 11  3.64osd.11  up  0.9022
 12  3.64osd.12  up  0.8664
 13  3.64osd.13  up  0.9084
 14  3.64osd.14  up  0.8097
 15  3.64osd.15  up  0.893
 29  0   osd.29  up  0
 ...

 osd.29 is marked as up and can't be unmountet, because it's in use. If I
 kill the osd-process, they will automaticly restarted just in time.

ceph osd set noup will prevent osd's from becoming up. Later remember
to run ceph osd unset noup.

You can stop OSD with stop ceph-osd id=29.

 My ceph-version is
 ceph --version
 ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)

 The OSD-node is an Linux ceph-02 3.11.0-15-generic #25~precise1-Ubuntu

 Any hints?

 The ugly way is simple remove the unused OSD - but I want to know how
 should this normaly work.

 Udo
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Expanding pg's of an erasure coded pool

2014-05-22 Thread Henrik Korkuc
On 2014.05.22 19:55, Gregory Farnum wrote:
 On Thu, May 22, 2014 at 4:09 AM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:
 - Message from Gregory Farnum g...@inktank.com -
Date: Wed, 21 May 2014 15:46:17 -0700

From: Gregory Farnum g...@inktank.com
 Subject: Re: [ceph-users] Expanding pg's of an erasure coded pool
  To: Kenneth Waegeman kenneth.waege...@ugent.be
  Cc: ceph-users ceph-users@lists.ceph.com


 On Wed, May 21, 2014 at 3:52 AM, Kenneth Waegeman
 kenneth.waege...@ugent.be wrote:
 Thanks! I increased the max processes parameter for all daemons quite a
 lot
 (until ulimit -u 3802720)

 These are the limits for the daemons now..
 [root@ ~]# cat /proc/17006/limits
 Limit Soft Limit   Hard Limit   Units
 Max cpu time  unlimitedunlimited
 seconds
 Max file size unlimitedunlimitedbytes
 Max data size unlimitedunlimitedbytes
 Max stack size10485760 unlimitedbytes
 Max core file sizeunlimitedunlimitedbytes
 Max resident set  unlimitedunlimitedbytes
 Max processes 3802720  3802720
 processes
 Max open files3276832768files
 Max locked memory 6553665536bytes
 Max address space unlimitedunlimitedbytes
 Max file locksunlimitedunlimitedlocks
 Max pending signals   9506895068
 signals
 Max msgqueue size 819200   819200   bytes
 Max nice priority 00
 Max realtime priority 00
 Max realtime timeout  unlimitedunlimitedus

 But this didn't help. Are there other parameters I should change?

 Hrm, is it exactly the same stack trace? You might need to bump the
 open files limit as well, although I'd be surprised. :/

 I increased the open file limit as test to 128000, still the same results.

 Stack trace:
 snip

 But I see some things happening on the system while doing this too:



 [root@ ~]# ceph osd pool set ecdata15 pgp_num 4096
 set pool 16 pgp_num to 4096
 [root@ ~]# ceph status
 Traceback (most recent call last):
   File /usr/bin/ceph, line 830, in module
 sys.exit(main())
   File /usr/bin/ceph, line 590, in main
 conffile=conffile)
   File /usr/lib/python2.6/site-packages/rados.py, line 198, in __init__
 librados_path = find_library('rados')
   File /usr/lib64/python2.6/ctypes/util.py, line 209, in find_library
 return _findSoname_ldconfig(name) or _get_soname(_findLib_gcc(name))
   File /usr/lib64/python2.6/ctypes/util.py, line 203, in
 _findSoname_ldconfig
 os.popen('LANG=C /sbin/ldconfig -p 2/dev/null').read())
 OSError: [Errno 12] Cannot allocate memory
 [root@ ~]# lsof | wc
 -bash: fork: Cannot allocate memory
 [root@ ~]# lsof | wc
   21801  211209 3230028
 [root@ ~]# ceph status
 ^CError connecting to cluster: InterruptedOrTimeoutError
 ^[[A[root@ ~]# lsof | wc
2028   17476  190947



 And meanwhile the daemons has then been crashed.

 I verified the memory never ran out.
 Is there anything in dmesg? It sure looks like the OS thinks it's run
 out of memory one way or another.
 -Greg
 Software Engineer #42 @ http://inktank.com | http://ceph.com
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

May it be related to memory fragmentation?
http://dom.as/2014/01/17/on-swapping-and-kernels/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.80 binaries?

2014-05-08 Thread Henrik Korkuc
hi,

I am not sure about your link, but I use: http://ceph.com/rpm-firefly/

reference: http://ceph.com/docs/master/install/get-packages/

On 2014.05.08 19:32, Shawn Edwards wrote:
 The links on the download page for 0.80 still shows 0.72 bins.  Did
 the 0.80 binaries get deployed yet?

 I'm looking here: http://ceph.com/rpm/el6/x86_64/

 Should I be looking elsewhere?

 -- 
  Shawn Edwards
  Beware programmers with screwdrivers.  They tend to spill them on
 their keyboards.


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 0.80 Firefly Debian/Ubuntu Trusty Packages

2014-05-08 Thread Henrik Korkuc
hi,
trusty will include ceph in usual repos. I am tracking
http://packages.ubuntu.com/trusty/ceph and
https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1278466 for release

On 2014.05.08 23:45, Michael wrote:
 Hi,

 Have these been missed or have they been held back for a specific reason?
 http://ceph.com/debian-firefly/dists/ looks like Trusty is the only
 one that hasn't been updated.

 -Michael
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] health HEALTH_WARN too few pgs per osd (16 min 20)

2014-05-07 Thread Henrik Korkuc
On 2014.05.07 20:28, *sm1Ly wrote:
 I got deploy my cluster with this commans.

 mkdir clustername
  
 cd clustername
  
 ceph-deploy install mon1 mon2 mon3 mds1 mds2 mds3 osd200
  
 ceph-deploy  new  mon1 mon2 mon3
  
 ceph-deploy mon create  mon1 mon2 mon3
  
 ceph-deploy gatherkeys  mon1 mon2 mon3
  
 ceph-deploy osd prepare --fs-type ext4 osd200:/osd/osd1
 osd200:/osd/osd2 osd200:/osd/osd3 osd200:/osd/osd4 osd200:/osd/osd5
 osd200:/osd/osd6 osd200:/osd/osd7 osd200:/osd/osd8 osd200:/osd/osd9
 osd200:/osd/osd10 osd200:/osd/osd11 osd200:/osd/osd12

 ceph-deploy osd activate osd200:/osd/osd1 osd200:/osd/osd2
 osd200:/osd/osd3 osd200:/osd/osd4 osd200:/osd/osd5 osd200:/osd/osd6
 osd200:/osd/osd7 osd200:/osd/osd8 osd200:/osd/osd9 osd200:/osd/osd10
 osd200:/osd/osd11 osd200:/osd/osd12

  
 ceph-deploy admin mon1 mon2 mon3 mds1 mds2 mds3 osd200 salt1
  
 ceph-deploy mds create mds1 mds2 mds3

 but in the end...:
  
 sudo ceph -s

 [sm1ly@salt1 ceph]$ sudo ceph -s
 cluster 0b2c9c20-985a-4a39-af8e-ef2325234744
  health HEALTH_WARN 19 pgs degraded; 192 pgs stuck unclean;
 recovery 21/42 objects degraded (50.000%); too few pgs per osd (16 
 min 20)
  monmap e1: 3 mons at
 {mon1=10.60.0.110:6789/0,mon2=10.60.0.111:6789/0,mon3=10.60.0.112:6789/0
 http://10.60.0.110:6789/0,mon2=10.60.0.111:6789/0,mon3=10.60.0.112:6789/0},
 election epoch 6, quorum 0,1,2 mon1,mon2,mon3
  mdsmap e6: 1/1/1 up {0=mds1=up:active}, 2 up:standby
  osdmap e61: 12 osds: 12 up, 12 in
   pgmap v103: 192 pgs, 3 pools, 9470 bytes data, 21 objects
 63751 MB used, 3069 GB / 3299 GB avail
 21/42 objects degraded (50.000%)
  159 active
   14 active+remapped
   19 active+degraded


 mon[123] and mds[123] are vms. osd200 - hardware server, cause on vms
 it shows bad perfomance?

 some searching talks me that the problem that I have only one osd
 node. can I ignore it for tests?
19 pgs degraded; 192 pgs stuck unclean; recovery 21/42 objects degraded
(50.000%) you can ignore it, or edit crush map, so failure domain would
be osd, not host

 another search talks me about pg groups, but I cant find how to get pgid.
too few pgs per osd (16  min 20) - increase pg_num and pgp_num


 -- 
 yours respectfully, Alexander Vasin.

 8 926 1437200
 icq: 9906064


 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph 0.80 and Ubuntu 14.04

2014-04-29 Thread Henrik Korkuc
hi,

Ubuntu 14.04 currently ships ceph 0.79. After firefly release ubuntu
maintainer will update ceph version in ubuntu's repos.

On 2014.04.30 07:08, Kenneth wrote:

 Latest Ceph release is Firefly v0.80 right? Or is it still in beta?
 And Ubuntu is on 14.04.

 Will I be able to install ceph 0.80 on Ubuntu 14.04 for production? If
 not, what is the time frame on when can I install the ceph v0.80 on
 ubuntu 14.04?

 -- 
 Thanks,
 Kenneth



 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph stops responding

2014-03-07 Thread Henrik Korkuc
On 2014.03.05 13:23, Georgios Dimitrakakis wrote:
 Actually there are two monitors (my bad in the previous e-mail).
 One at the MASTER and one at the CLIENT.

 The monitor in CLIENT is failing with the following

 2014-03-05 13:08:38.821135 7f76ba82b700  1
 mon.client1@0(leader).paxos(paxos active c 25603..26314) is_readable
 now=2014-03-05 13:08:38.821136 lease_expire=2014-03-05 13:08:40.845978
 has v0 lc 26314
 2014-03-05 13:08:40.599287 7f76bb22c700  0
 mon.client1@0(leader).data_health(86) update_stats avail 4% total
 51606140 used 46645692 avail 2339008
 2014-03-05 13:08:40.599527 7f76bb22c700 -1
 mon.client1@0(leader).data_health(86) reached critical levels of
 available space on data store -- shutdown!
 2014-03-05 13:08:40.599530 7f76bb22c700  0 ** Shutdown via Data Health
 Service **
 2014-03-05 13:08:40.599557 7f76b9328700 -1 mon.client1@0(leader) e2
 *** Got Signal Interrupt ***
 2014-03-05 13:08:40.599568 7f76b9328700  1 mon.client1@0(leader) e2
 shutdown
 2014-03-05 13:08:40.599602 7f76b9328700  0 quorum service shutdown
 2014-03-05 13:08:40.599609 7f76b9328700  0
 mon.client1@0(shutdown).health(86) HealthMonitor::service_shutdown 1
 services
 2014-03-05 13:08:40.599613 7f76b9328700  0 quorum service shutdown


 The thing is that there is plenty of space in that host (CLIENT)

 # df -h
 Filesystem Size  Used Avail Use% Mounted on
 /dev/mapper/vg_one-lv_root 50G45G  2.3G  96% /
 tmpfs  5.9G 0  5.9G   0% /dev/shm
 /dev/sda1  485M   76M  384M  17% /boot
 /dev/mapper/vg_one-lv_home 862G   249G 569G  31% /home


 On the other hand the other host (MASTER) is running low on disk space
 (93% is full).

 But why is the CLIENT failing while the MASTER is still running even
 though is running low on disk space?


CLIENT has less space percentage available than master (96% used vs
93%), I guess that is your problem.


 I 'll try to free some space and see what happens next...

 Best,

 G.



 On Wed, 05 Mar 2014 11:50:57 +0100, Wido den Hollander wrote:
 On 03/05/2014 11:21 AM, Georgios Dimitrakakis wrote:
 My setup consists of two nodes.

 The first node (master) is running:

 -mds
 -mon
 -osd.0



 and the second node (CLIENT) is running:

 -osd.1


 Therefore I 've restarted ceph services on both nodes


 Leaving the ceph -w running for as long as it can after a few seconds
 the error that is produced is this:

 2014-03-05 12:08:17.715699 7fba13fff700  0 monclient: hunting for
 new mon
 2014-03-05 12:08:17.716108 7fba102f8700  0 -- 192.168.0.10:0/1008298 
 X.Y.Z.X:6789/0 pipe(0x7fba08008e50 sd=4 :0 s=1 pgs=0 cs=0 l=1
 c=0x7fba080090b0).fault


 (where X.Y.Z.X is the public IP of the CLIENT node).

 And it keep goes on...

 ceph-health after a few minutes shows the following

 2014-03-05 12:12:58.355677 7effc52fb700  0 monclient(hunting):
 authenticate timed out after 300
 2014-03-05 12:12:58.355717 7effc52fb700  0 librados: client.admin
 authentication error (110) Connection timed out
 Error connecting to cluster: TimedOut


 Any ideas now??


 Is the monitor actually running on the first node? If not, checked
 the logs in /var/log/ceph as to why it isn't running.

 Or maybe you just need to start it.

 Wido

 Best,

 G.

 On Wed, 5 Mar 2014 15:10:25 +0530, Srinivasa Rao Ragolu wrote:
 First try to start OSD nodes by restarting the ceph service on ceph
 nodes. If it works file then you could able to see ceph-osd process
 running in process list. And do not need to add any public or private
 network in ceph.conf. If none of the OSDs run then you need to
 reconfigure them from monitor node.

 Please check ceph-mon process is running on monitor node or not?
 ceph-mds should not run.

 also check /etc/hosts file with valid ip address of cluster nodes

 Finally check ceph.client.admin.keyring and ceph.bootstrap-osd.keyring
 should be matched in all the cluster nodes.

 Best of luck.
 Srinivas.

 On Wed, Mar 5, 2014 at 3:04 PM, Georgios Dimitrakakis  wrote:

 Hi!

 I have installed ceph and created two osds and was very happy with
 that but apparently not everything was correct.

 Today after a system reboot the cluster comes up and for a few
 moments it seems that its ok (using the ceph health command) but
 after a few seconds the ceph health command doesnt produce any
 output at all.

 It justs stays there without anything on the screen...

 ceph -w is doing the same as well...

 If I restart the ceph services (service ceph restart) again for a
 few seconds is working but after a few more it stays frozen.

 Initially I thought that this was a firewall problem but apparently
 it isnt.

 Then I though that this had to do with the

 public_network

 cluster_network

 not defined in ceph.conf and changed that.

 No matter whatever I do the cluster works for a few seconds after
 the service restart and then it stops responding...

 Any help much appreciated!!!

 Best,

 G.
 ___
 ceph-users