Before issuing scrub, you may check if those scrub errors would point to one
(or a small subset of) disk/OSD, and if so, did those objects put in a
specified interval?
It is a large amount of scrub errors in a small cluster, which might be caused
by some hardware issue ?
It is actually not part of ceph.
For some files under the folder, they are only access during OSD booting up, so
removal would not cause a problem there. For some other files, OSD would keep a
open handle, in which case, even you remove those files from within filesystem,
they are not erased
IIRC, the version got increased once the stats of the PG got changed, that is
properly the reason why you saw changing with client I/O.
Thanks,
Guang
> Date: Thu, 17 Sep 2015 16:55:41 -0600
> From: rob...@leblancnet.us
> To: ceph-users@lists.ceph.com
>
Which version are you using?
My guess is that the request (op) is waiting for lock (might be
ondisk_read_lock of the object, but a debug_osd=20 should be helpful to tell
what happened to the op).
How do you tell the IO wait is near to 0 (by top?)?
Thanks,
Guang
If we are talking about requests being blocked 60+ seconds, those tunings might
not help (they help a lot for average latency during recovering/backfilling).
It would be interesting to see the logs for those blocked requests at OSD side
(they have level 0), pattern to search might be "slow
t;
> Shinobu
>
> - Original Message -
> From: "GuangYang" <yguan...@outlook.com>
> To: "Ben Hines" <bhi...@gmail.com>, "Nick Fisk" <n...@fisk.me.uk>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Sa
IIRC, it only triggers the move (merge or split) when that folder is hit by a
request, so most likely it happens gradually.
Another thing might be helpful (and we have had good experience with), is that
we do the folder splitting at the pool creation time, so that we avoid the
performance
Date: Fri, 28 Aug 2015 12:07:39 +0100
From: gfar...@redhat.com
To: vickey.singh22...@gmail.com
CC: ceph-users@lists.ceph.com; ceph-us...@ceph.com; ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Opensource plugin for pulling out cluster recovery
/DBd9k56m
I realize these nodes are quite large, I have plans to break them out
into 12 OSD's/node.
On Thu, Aug 13, 2015 at 9:02 AM, GuangYang yguan...@outlook.com wrote:
Could you share the 'ceph osd tree dump' and CRUSH map dump ?
Thanks,
Guang
Date
it could incur lots of data
movement...
To: ceph-users@lists.ceph.com
From: vedran.fu...@gmail.com
Date: Fri, 14 Aug 2015 00:15:17 +0200
Subject: Re: [ceph-users] OSD space imbalance
On 13.08.2015 18:01, GuangYang wrote:
Try 'ceph osd int' right
Try 'ceph osd reweight-by-pg int' right after creating the pools? What is the
typical object size in the cluster?
Thanks,
Guang
To: ceph-users@lists.ceph.com
From: vedran.fu...@gmail.com
Date: Thu, 13 Aug 2015 14:58:11 +0200
Subject: [ceph-users]
Could you share the 'ceph osd tree dump' and CRUSH map dump ?
Thanks,
Guang
Date: Thu, 13 Aug 2015 08:16:09 -0700
From: sdain...@spd1.com
To: yangyongp...@bwstor.com.cn; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Cluster health_warn 1
If you are using the default configuration to create the pool (3 replicas),
after losing 1 OSD and having 2 left, CRUSH would not be able to find enough
OSDs (at least 3) to map the PG thus it would stuck at unclean.
Thanks,
Guang
From:
..
Thanks,
Guang
Date: Wed, 24 Jun 2015 17:21:04 -0400
From: yeh...@redhat.com
To: yguan...@outlook.com
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: radosgw crash within libfcgi
- Original Message -
From: GuangYang
To: Yehuda Sadeh-Weinraub
Cc: ceph-de
; ceph-users@lists.ceph.com
Subject: Re: radosgw crash within libfcgi
- Original Message -
From: GuangYang yguan...@outlook.com
To: ceph-de...@vger.kernel.org, ceph-users@lists.ceph.com, yeh...@redhat.com
Sent: Wednesday, June 24, 2015 10:09:58 AM
Subject: radosgw crash within
Hello Cephers,
Recently we have several radosgw daemon crashes with the same following kernel
log:
Jun 23 14:17:38 xxx kernel: radosgw[68180]: segfault at f0 ip 7ffa069996f2
sp 7ff55c432710 error 6 in libfcgi.so.0.0.0[7ffa06995000+a000] in
libfcgi.so.0.0.0[7ffa06995000+a000]
Looking
Hi Cephers,
While looking at disk utilization on OSD, I noticed the disk was constantly
busy with large number of small writes, further investigation showed that, as
radosgw uses xattrs to store metadata (e.g. etag, content-type, etc.), which
made the xattrs get from local to extents, which
...@newdream.net
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: xattrs vs. omap with radosgw
On 06/16/2015 03:48 PM, GuangYang wrote:
Thanks Sage for the quick response.
It is on Firefly v0.80.4.
While trying to put with *rados* directly, the xattrs can be inline
[mailto:ceph-devel-ow...@vger.kernel.org] On Behalf Of Sage Weil
Sent: Wednesday, June 17, 2015 3:43 AM
To: GuangYang
Cc: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: xattrs vs. omap with radosgw
On Tue, 16 Jun 2015, GuangYang wrote:
Hi Cephers,
While looking at disk
Thanks to Sam, we can use:
ceph pg pg_id list_missing
to get the list of unfound objects.
Thanks,
Guang
From: yguan...@outlook.com
To: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Date: Mon, 15 Jun 2015 16:46:53 +
Subject: [ceph-users]
Hello Cephers,
On one of our production clusters, there is one *unfound* object reported which
make the PG stuck at recovering. While trying to recover the object, I failed
to find a way to tell which object is unfound.
I tried:
1 PG query
2 Grep from monitor log
Did I miss anything?
Hi cephers,
Recently I am investigating the geo-replication of rgw, from the example at
[1], it looks like if we want to do data geo replication between us east and us
west, we will need to build *one* (super) RADOS cluster which cross us east and
west, and only deploy two different radosgw
___
Date: Fri, 24 Apr 2015 17:29:40 +0530
From: vum...@redhat.com
To: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] rgw geo-replication
On 04/24/2015 05:17 PM, GuangYang wrote:
Hi cephers,
Recently I am investigating the geo-replication of rgw
We have a tiny script which does the CRUSH re-weight based on the PGs/OSD to
achieve balance across OSDs, and we run the script right after setup the
cluster to avoid data migration after the cluster is filled up.
A couple of experiences to share:
1 As suggested, it is helpful to choose a
We have had good experience so far keeping each bucket less than 0.5 million
objects, by client side sharding. But I think it would be nice you can test at
your scale, with your hardware configuration, as well as your expectation over
the tail latency.
Generally the bucket sharding should
Hi Sage,
Is there any timeline around the switch? So that we can plan ahead for the
testing.
We are running apache + mod-fastcgi in production at scale (540 OSDs, 9 RGW
hosts) and it looks good so far. Although at the beginning we came across a
problem with large volume of 500 error, which
Thanks Sage!
Date: Fri, 7 Nov 2014 02:19:06 -0800
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: PG inconsistency
On Thu, 6 Nov 2014, GuangYang wrote:
Hello Cephers,
Recently
Hello Cephers,
Recently we observed a couple of inconsistencies in our Ceph cluster, there
were two major patterns leading to inconsistency as I observed: 1) EIO to read
the file, 2) the digest is inconsistent (for EC) even there is no read error).
While ceph has built-in tool sets to repair
know if
this is a best or even good practise, but it works for us.
Cheers, Dan
On Thu Nov 06 2014 at 2:24:32 PM GuangYang
yguan...@outlook.commailto:yguan...@outlook.com wrote:
Hello Cephers,
Recently we observed a couple of inconsistencies in our Ceph cluster,
there were two major
-users@lists.ceph.com
What is your version of the ceph?
0.80.0 - 0.80.3
https://github.com/ceph/ceph/commit/7557a8139425d1705b481d7f010683169fd5e49b
Thu Nov 06 2014 at 16:24:21, GuangYang
yguan...@outlook.commailto:yguan...@outlook.com:
Hello Cephers,
Recently we observed a couple
Date: Thu, 23 Oct 2014 21:26:07 -0700
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: RE: Filestore throttling
On Fri, 24 Oct 2014, GuangYang wrote:
commit
---
Date: Thu, 23 Oct 2014 06:58:58 -0700
From: s...@newdream.net
To: yguan...@outlook.com
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: RE: Filestore throttling
On Thu, 23 Oct 2014, GuangYang wrote:
Thanks Sage for the quick
Hello Cephers,
During our testing, I found that the filestore throttling became a limiting
factor for performance, the four settings (with default value) are:
filestore queue max ops = 50
filestore queue max bytes = 100 20
filestore queue committing max ops = 500
filestore queue committing
CC: ceph-de...@vger.kernel.org; ceph-users@lists.ceph.com
Subject: Re: Filestore throttling
On Thu, 23 Oct 2014, GuangYang wrote:
Hello Cephers,
During our testing, I found that the filestore throttling became a limiting
factor for performance, the four settings (with default value
Thanks Greg for the response, my comments inline…
Thanks,
Guang
On Feb 20, 2014, at 11:16 PM, Gregory Farnum g...@inktank.com wrote:
On Tue, Feb 18, 2014 at 7:24 AM, Guang Yang yguan...@yahoo.com wrote:
Hi ceph-users,
We are using Ceph (radosgw) to store user generated images, as GET latency
Thanks Yehuda.
Try looking at the perfcounters, see if there's any other throttling
happening. Also, make sure you have enough pgs for your data pool. One
other thing to try is disabling leveldb xattrs and see if it affects
your latency.
1. There is not throttling happening.
2. According to
Hi ceph-users and ceph-devel,
I came across an issue after restarting monitors of the cluster, that
authentication fails which prevents running any ceph command.
After we did some maintenance work, I restart OSD, however, I found that the
OSD would not join the cluster automatically after being
37 matches
Mail list logo