On 03/06/2015 06:51 AM, Jake Young wrote:
On Thursday, March 5, 2015, Nick Fisk n...@fisk.me.uk
mailto:n...@fisk.me.uk wrote:
Hi All,
__ __
Just a heads up after a day’s experimentation.
__ __
I believe tgt with its default settings has a small
So I'm in the middle of trying to triage a problem with my ceph cluster
running 0.80.5. I have 24 OSDs spread across 8 machines. The cluster has
been running happily for about a year. This last weekend, something caused
the box running the MDS to sieze hard, and when we came in on monday,
several
It looks like you may be able to work around the issue for the moment with
ceph osd set nodeep-scrub
as it looks like it is scrub that is getting stuck?
sage
On Fri, 6 Mar 2015, Quentin Hartman wrote:
Ceph health detail - http://pastebin.com/5URX9SsQpg dump summary (with
active+clean pgs
Alright, tried a few suggestions for repairing this state, but I don't seem
to have any PG replicas that have good copies of the missing / zero length
shards. What do I do now? telling the pg's to repair doesn't seem to help
anything? I can deal with data loss if I can figure out which images
Thanks for the response. Is this the post you are referring to?
http://ceph.com/community/incomplete-pgs-oh-my/
For what it's worth, this cluster was running happily for the better part
of a year until the event from this weekend that I described in my first
post, so I doubt it's configuration
Hidden HTML ... trying agin...
-- Forwarded message --
From: Robert LeBlanc rob...@leblancnet.us
Date: Fri, Mar 6, 2015 at 5:20 PM
Subject: Re: [ceph-users] Prioritize Heartbeat packets
To: ceph-users@lists.ceph.com ceph-users@lists.ceph.com,
ceph-devel ceph-de...@vger.kernel.org
Ceph health detail - http://pastebin.com/5URX9SsQ
pg dump summary (with active+clean pgs removed) -
http://pastebin.com/Y5ATvWDZ
an osd crash log (in github gist because it was too big for pastebin) -
https://gist.github.com/qhartman/cb0e290df373d284cfb5
And now I've got four OSDs that are
Thanks for the suggestion, but that doesn't seem to have made a difference.
I've shut the entire cluster down and brought it back up, and my config
management system seems to have upgraded ceph to 0.80.8 during the reboot.
Everything seems to have come back up, but I am still seeing the crash
I see that Jian Wen has done work on this for 0.94. I tried looking through
the code to see if I can figure out how to configure this new option, but
it all went over my head pretty quick.
Can I get a brief summary on how to set the priority of heartbeat packets
or where to look in the code to
Finally found an error that seems to provide some direction:
-1 2015-03-07 02:52:19.378808 7f175b1cf700 0 log [ERR] : scrub 3.18e
e08a418e/rbd_data.3f7a2ae8944a.16c8/7//3 on disk size (0) does
not match object info size (4120576) ajusted for ondisk to (4120576)
I'm diving into
This might be related to the backtrace assert, but that's the problem
you need to focus on. In particular, both of these errors are caused
by the scrub code, which Sage suggested temporarily disabling — if
you're still getting these messages, you clearly haven't done so
successfully.
That said,
Here's more information I have been able to glean:
pg 3.5d3 is stuck inactive for 917.471444, current state incomplete, last
acting [24]
pg 3.690 is stuck inactive for 11991.281739, current state incomplete, last
acting [24]
pg 4.ca is stuck inactive for 15905.499058, current state incomplete,
Interesting. We've seen things like this on the librbd side in the
past, but I don't think I've seen this kind of behavior in the kernel
client. what does the latency historgram look like when going from 1-2?
Mark
On 03/06/2015 08:10 AM, Nick Fisk wrote:
Just tried cfq, deadline and noop
Thanks!!
On Mar 5, 2015, at 4:09 PM, Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:
The metadata api can do it:
GET /admin/metadata/user
Yehuda
- Original Message -
From: Joshua Weaver joshua.wea...@ctl.io
To: ceph-us...@ceph.com
Sent: Thursday, March 5, 2015 1:43:33 PM
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jake
Young
Sent: 06 March 2015 12:52
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] tgt and krbd
On Thursday, March 5, 2015, Nick Fisk n...@fisk.me.uk wrote:
Hi All,
Just a heads up after a
Hi Jake,
Good to see it’s not just me.
I’m guessing that the fact you are doing 1MB writes means that the latency
difference is having a less noticeable impact on the overall write bandwidth.
What I have been discovering with Ceph + iSCSi is that due to all the extra
hops (client-iscsi
On 06/03/2015, at 12.24, Steffen W Sørensen ste...@me.com wrote:
3. What are BCP for maintaining GW pools, need I run something like GC /
cleanup OPs / log object pruning etc. any pointers to doc here for?
Is this all manitaince one should consider on pools for a GW instance?
Just tried cfq, deadline and noop which more or less all show identical results
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Alexandre DERUMIER
Sent: 06 March 2015 11:59
To: Nick Fisk
Cc: ceph-users
Subject: Re: [ceph-users] Strange krbd
On 05/03/2015 18:18, Thomas Lemarchand wrote:
Hello Loïc,
It does exists ... but maybe not at the scale you are looking for :
http://www.fujitsu.com/global/products/computing/storage/eternus-cd/
It's slightly above my price range ;-)
I read a paper about their hardware, it seems like
Hi John,
On 06/03/2015 11:38, John Spray wrote: On 04/03/2015 00:10, Loic Dachary wrote:
Last week-end I discussed with a friend about a use case many of us thought
about already: it would be cool to have a simple way to assemble Ceph aware
NAS fresh from the store. I summarized the use case
On Thu, Mar 5, 2015 at 8:17 PM, Nick Fisk n...@fisk.me.uk wrote:
I’m seeing a strange queue depth behaviour with a kernel mapped RBD, librbd
does not show this problem.
Cluster is comprised of 4 nodes, 10GB networking, not including OSDs as test
sample is small so fits in page cache.
What
My initator is also VMware software iscsi. I had my tgt iscsi targets'
write-cache setting off.
I turned write and read cache on in the middle of creating a large eager
zeroed disk (tgt has no VAAI support, so this is all regular synchronous
IO) and it did give me a clear performance boost.
Not
Histogram is probably the wrong word. In the normal fio output, there
should be a distribution of latencies shown for the test, so you can get
a rough estimate of the skew. It might be interesting to know when you
jump from iodepth=1 to iodepth=2 how that skew changes.
Here's an example:
Hi Ilya,
I meant that the OSD numbers and configuration is probably irrelevant as the
sample size of 1G fits in the page cache.
This is Kernel 3.16 (from Ubuntu 14.04.2, Ceph v87.1)
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Ilya
Hi Mark,
Sorry if I am showing my ignorance here, is there some sort of flag or tool
that generates this from fio?
Nick
-Original Message-
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Mark
Nelson
Sent: 06 March 2015 15:06
To: ceph-users@lists.ceph.com
Is it possible all replicas of an object to be saved in the same node?
No. (until you don't wrongly modify the crushmap manually)
Is it possible to lose any?
with replicat x2, if you loose 2osd on 2differents nodes, with the same object
inside, you'll lost the object
Is there a mechanism
On Thursday, March 5, 2015, Nick Fisk n...@fisk.me.uk wrote:
Hi All,
Just a heads up after a day’s experimentation.
I believe tgt with its default settings has a small write cache when
exporting a kernel mapped RBD. Doing some write tests I saw 4 times the
write throughput when using
Hi Sonal,
You can refer to this doc to identify your problem.
Your error code is 4204, so
* 4000 upgrade to kernel 3.9
* 200 CEPH_FEATURE_CRUSH_TUNABLES2
* 4 CEPH_FEATURE_CRUSH_TUNABLES
*
On Fri, Mar 6, 2015 at 10:18 AM, Nick Fisk n...@fisk.me.uk wrote:
On Fri, Mar 6, 2015 at 9:04 AM, Nick Fisk n...@fisk.me.uk wrote:
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
Jake Young
Sent: 06 March 2015 12:52
To: Nick Fisk
Cc: ceph-users@lists.ceph.com
Nick,
I think this is because of the krbd you are using is using Naggle's algorithm
i.e TCP_NODELAY = false by default.
The latest krbd module should have the TCP_NODELAY = true by default. You may
want to try that. But, I think it is available in the latest kernel only.
Librbd is running with
On Fri, Mar 6, 2015 at 7:27 PM, Nick Fisk n...@fisk.me.uk wrote:
Hi Somnath,
I think you hit the nail on the head, setting librbd to not use TCP_NODELAY
shows the same behaviour as with krbd.
That's why I asked about the kernel version. TCP_NODELAY is enabled by
default since 4.0-rc1, so if
Hi Jake,
Good to see it’s not just me.
I’m guessing that the fact you are doing 1MB writes means that the latency
difference is having a less noticeable impact on the overall write bandwidth.
What I have been discovering with Ceph + iSCSi is that due to all the extra
hops (client-iscsi
On 03/06/2015 10:27 AM, Nick Fisk wrote:
Hi Somnath,
I think you hit the nail on the head, setting librbd to not use TCP_NODELAY
shows the same behaviour as with krbd.
Score (another) 1 for Somnath! :)
Mark if you are still interested here are the two latency reports
Queue Depth=1
Hi Somnath,
I think you hit the nail on the head, setting librbd to not use TCP_NODELAY
shows the same behaviour as with krbd.
Mark if you are still interested here are the two latency reports
Queue Depth=1
slat (usec): min=24, max=210, avg=39.40, stdev=11.54
clat (usec): min=310,
Hi
Check the S3 Bucket OPS at : http://ceph.com/docs/master/radosgw/s3/bucketops/
I've read that as well, but I'm having other issues getting an App to run
against our Ceph S3 GW, maybe you have a few hints on this as well...
Got the cluster working for rbd+cephFS and have initial verified the
Hi Italo,
Check the S3 Bucket OPS at :
http://ceph.com/docs/master/radosgw/s3/bucketops/
or use any of the examples provided in Python
(http://ceph.com/docs/master/radosgw/s3/python/) or PHP
(http://ceph.com/docs/master/radosgw/s3/php/) or JAVA
On 04/03/2015 00:10, Loic Dachary wrote:
Last week-end I discussed with a friend about a use case many of us thought
about already: it would be cool to have a simple way to assemble Ceph aware NAS
fresh from the store. I summarized the use case and interface we discussed here
:
I have a doubt . In a scenario (3nodes x 4osd each x 2replica) I tested
with a node down and as long as you have space available all objects were
there.
Is it possible all replicas of an object to be saved in the same node?
Is it possible to lose any?
Is there a mechanism that prevents
Hi,
does somebody known if redhat will backport new krbd features (discard, blk-mq,
tcp_nodelay,...) to the redhat 3.10 kernel ?
Alexandre
- Mail original -
De: Mark Nelson mnel...@redhat.com
À: Nick Fisk n...@fisk.me.uk, Somnath Roy somnath@sandisk.com,
aderumier
On Fri, Mar 6, 2015 at 9:52 PM, Alexandre DERUMIER aderum...@odiso.com wrote:
Hi,
does somebody known if redhat will backport new krbd features (discard,
blk-mq, tcp_nodelay,...) to the redhat 3.10 kernel ?
Yes, all of those will be backported. discard is already there in
rhel7.1 kernel.
On 06/03/2015, at 16.50, Jake Young jak3...@gmail.com wrote:
After seeing your results, I've been considering experimenting with that.
Currently, my iSCSI proxy nodes are VMs.
I would like to build a few dedicated servers with fast SSDs or fusion-io
devices. It depends on my budget,
Hello,
I’m building a object storage environment and I’m in trouble with some
administration ops, to manage the entire environment I decided create an admin
user and use that to manage the client users which I’ll create further.
Using the admin (called “italux) I created a new user (called
On Friday, March 6, 2015, Steffen W Sørensen ste...@me.com wrote:
On 06/03/2015, at 16.50, Jake Young jak3...@gmail.com javascript:;
wrote:
After seeing your results, I've been considering experimenting with
that. Currently, my iSCSI proxy nodes are VMs.
I would like to build a few
Message d'origine
De : CHEVALIER Ghislain IMT/OLPS ghislain.cheval...@orange.com
Date :06/03/2015 21:56 (GMT+01:00)
À : Italo Santos okd...@gmail.com
Cc :
Objet : RE : [ceph-users] RadosGW - Bucket link and ACLs
Hi
We encountered this behavior when developing the rgw admin
Still not working does anybody know show to automap and Mount rbd image on
redhat?
Regards
Jesus Chavez
SYSTEMS ENGINEER-C.SALES
jesch...@cisco.commailto:jesch...@cisco.com
Phone: +52 55 5267 3146tel:+52%2055%205267%203146
Mobile: +51 1 5538883255tel:+51%201%205538883255
CCIE - 44433
On Mar
45 matches
Mail list logo