> On Mar 1, 2017, at 9:39 AM, Peter Maloney
> <peter.malo...@brockmann-consult.de> wrote:
>
> On 03/01/17 15:36, Heller, Chris wrote:
>> I see. My journal is specified in ceph.conf. I'm not removing it from the
>> OSD so sounds like flushing isn't needed in my case
I see. My journal is specified in ceph.conf. I'm not removing it from the OSD
so sounds like flushing isn't needed in my case.
-Chris
> On Mar 1, 2017, at 9:31 AM, Peter Maloney
> <peter.malo...@brockmann-consult.de> wrote:
>
> On 03/01/17 14:41, Heller, Chris wrote:
>>
eter.malo...@brockmann-consult.de> wrote:
>
> On 02/28/17 18:55, Heller, Chris wrote:
>> Quick update. So I'm trying out the procedure as documented here.
>>
>> So far I've:
>>
>> 1. Stopped ceph-mds
>> 2. set noout, norecover, norebalance, nobackfill
o that
> all components start up correctly we set the osd weights to the normal value
> so that the
> cluster was rebalancing.
>
> With this procedure the cluster was always up.
>
> Regards
>
> Steffen
>
>
>>>> "Heller, Chris" <chel...@akamai
'norecover' flag is still set.
I'm going to wait out the recovery and see if the Ceph FS is OK. That would be
huge if it is. But I am curious why I lost an OSD, and why recovery is
happening with 'norecover' still set.
-Chris
> On Feb 28, 2017, at 4:51 AM, Peter Maloney
> <peter.ma
I am attempting an operating system upgrade of a live Ceph cluster. Before I go
an screw up my production system, I have been testing on a smaller
installation, and I keep running into issues when bringing the Ceph FS metadata
server online.
My approach here has been to store all Ceph critical
Just a thought, but since a directory tree is a first class item in cephfs,
could the wire protocol be extended with an “recursive delete” operation,
specifically for cases like this?
On 10/14/16, 4:16 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Fri, Oct 14, 2016 at
Ok. Since I’m running through the Hadoop/ceph api, there is no syscall boundary
so there is a simple place to improve the throughput here. Good to know, I’ll
work on a patch…
On 10/14/16, 3:58 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Fri, Oct 14, 2016 at 11:41 A
?
-Chris
On 10/13/16, 4:22 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Thu, Oct 13, 2016 at 12:44 PM, Heller, Chris <chel...@akamai.com> wrote:
> I have a directory I’ve been trying to remove from cephfs (via
> cephfs-hadoop), the directory is a f
I have a directory I’ve been trying to remove from cephfs (via cephfs-hadoop),
the directory is a few hundred gigabytes in size and contains a few million
files, but not in a single sub directory. I startd the delete yesterday at
around 6:30 EST, and it’s still progressing. I can see from (ceph
uot;: false,
"inst": "client.585194220 192.168.1.157:0\/634334964",
"client_metadata": {
"ceph_sha1": "d56bdf93ced6b80b07397d57e3fa68fe68304432",
"ceph_version": "ceph version 0.94.7
(d56bdf93ced6b80b07397d57e3f
I also went and bumped mds_cache_size up to 1 million… still seeing cache
pressure, but I might just need to evict those clients…
On 9/21/16, 9:24 PM, "Heller, Chris" <chel...@akamai.com> wrote:
What is the interesting value in ‘session ls’? Is it ‘num_leases’ or
‘num_caps
What is the interesting value in ‘session ls’? Is it ‘num_leases’ or ‘num_caps’
leases appears to be, on average, 1. But caps seems to be 16385 for many many
clients!
-Chris
On 9/21/16, 9:22 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Wed, Sep 21, 2016 at 6:13 P
2016 at 1:16 PM, Heller, Chris <chel...@akamai.com> wrote:
> Ok. I just ran into this issue again. The mds rolled after many clients
were failing to relieve cache pressure.
That definitely could have had something to do with it, if say they
overloaded the MDS so much it got
ce3b626700 3 mds.0.server handle_client_session
client_session(request_renewcaps seq 364) v1 from client.491885178
2016-09-21 20:15:58.159134 7fce3b626700 3 mds.0.server handle_client_session
client_session(request_renewcaps seq 364) v1 from client.491885188
-Chris
On 9/21/16, 11:23 AM, "Heller, Ch
1 mds.-1.0 log_to_monitors
{default=true}
2016-09-21 15:13:27.329181 7f68969e9700 1 mds.-1.0 handle_mds_map standby
2016-09-21 15:13:28.484148 7f68969e9700 1 mds.-1.0 handle_mds_map standby
2016-09-21 15:13:33.280376 7f68969e9700 1 mds.-1.0 handle_mds_map standby
On 9/21/16, 10:48 AM, "He
the summary).
-Chris
On 9/21/16, 10:46 AM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Wed, Sep 21, 2016 at 6:30 AM, Heller, Chris <chel...@akamai.com> wrote:
> I’m running a production 0.94.7 Ceph cluster, and have been seeing a
> periodic issue arise
I’m running a production 0.94.7 Ceph cluster, and have been seeing a periodic
issue arise where in all my MDS clients will become stuck, and the fix so far
has been to restart the active MDS (sometimes I need to restart the subsequent
active MDS as well).
These clients are using the
Ok. I’ll see about tracking down the logs (set to stderr for these tasks), and
the metadata stuff looks interesting for future association.
Thanks,
Chris
On 9/14/16, 5:04 PM, "Gregory Farnum" <gfar...@redhat.com> wrote:
On Wed, Sep 14, 2016 at 7:02 AM, Heller, Chris <
I am making use of CephFS plus the cephfs-hadoop shim to replace HDFS in a
system I’ve been experimenting with.
I’ve noticed that a large number of my HDFS clients have a ‘num_caps’ value of
16385, as seen when running ‘session ls’ on the active mds. This appears to be
one larger than the
I’d like to generate keys for ceph external to any system which would have
ceph-authtool.
Looking over the ceph website and googling have turned up nothing.
Is the ceph auth key generation algorithm documented anywhere?
-Chris
___
ceph-users mailing
be marked as ‘found’ once it returns to the network?
-Chris
From: Goncalo Borges <goncalo.bor...@sydney.edu.au>
Date: Monday, August 15, 2016 at 11:36 PM
To: "Heller, Chris" <chel...@akamai.com>, "ceph-users@lists.ceph.com"
<ceph-users@lists.ceph.com>
Subject: Re
bor...@sydney.edu.au>
Date: Monday, August 15, 2016 at 9:03 PM
To: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>, "Heller, Chris"
<chel...@akamai.com>
Subject: Re: [ceph-users] PG is in 'stuck unclean' state, but all acting OSD
are up
Hi Helle
I’d like to better understand the current state of my CEPH cluster.
I currently have 2 PG that are in the ‘stuck unclean’ state:
# ceph health detail
HEALTH_WARN 2 pgs down; 2 pgs peering; 2 pgs stuck inactive; 2 pgs stuck unclean
pg 4.2a8 is stuck inactive for 124516.91, current state
24 matches
Mail list logo