Re: [ceph-users] OSD is crashing while running admin socket

2014-09-08 Thread Samuel Just
That seems reasonable. Bug away! -Sam On Mon, Sep 8, 2014 at 5:11 PM, Somnath Roy somnath@sandisk.com wrote: Hi Sage/Sam, I faced a crash in OSD with latest Ceph master. Here is the log trace for the same. ceph version 0.85-677-gd5777c4 (d5777c421548e7f039bb2c77cb0df2e9c7404723)

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
Added a comment about the approach. -Sam On Tue, Sep 9, 2014 at 1:33 PM, Somnath Roy somnath@sandisk.com wrote: Hi Sam/Sage, As we discussed earlier, enabling the present OpTracker code degrading performance severely. For example, in my setup a single OSD node with 10 clients is reaching

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
I don't quite understand. -Sam On Wed, Sep 10, 2014 at 2:38 PM, Somnath Roy somnath@sandisk.com wrote: Thanks Sam. So, you want me to go with optracker/shadedopWq , right ? Regards Somnath -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Wednesday

Re: [ceph-users] OpTracker optimization

2014-09-10 Thread Samuel Just
sharded optracker for the ios going through ms_dispatch path. 2. Additionally, for ios going through ms_fast_dispatch, you want me to implement optracker (without internal shard) per opwq shard Am I right ? Thanks Regards Somnath -Original Message- From: Samuel Just [mailto:sam.j

[ceph-users] Firefly v0.80.6 issues 9696 and 9732

2014-10-10 Thread Samuel Just
We've gotten some reports of a couple of issues on v0.80.6: 1) #9696: mixed clusters (or upgrading clusters) with v0.80.6 and pre-firefly osds/mons can hit an assert in PG::choose_acting during backfill. The fix appears to be to remove the assert (wip-9696[-firefly]). 2) #9731: there is a bug

Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?

2014-10-31 Thread Samuel Just
You should start by upgrading to giant, many many bug fixes went in between .86 and giant. -Sam On Fri, Oct 31, 2014 at 8:54 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage Weil Thank for your repling. Yes, I'm using Ceph v.0.86, I report some related bugs, Hope you help me, 2014-10-31

Re: [ceph-users] Ceph Giant not fixed RepllicatedPG:NotStrimming?

2014-11-03 Thread Samuel Just
will upgrde to Giant soon, Thank you so much. -- Tuan HaNoi-VietNam On 11/01/2014 01:10 AM, Samuel Just wrote: You should start by upgrading to giant, many many bug fixes went in between .86 and giant. -Sam On Fri, Oct 31, 2014 at 8:54 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi Sage Weil

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-03 Thread Samuel Just
If you have osds that are close to full, you may be hitting 9626. I pushed a branch based on v0.80.7 with the fix, wip-v0.80.7-9626. -Sam On Mon, Nov 3, 2014 at 2:09 PM, Chad Seys cws...@physics.wisc.edu wrote: No, it is a change, I just want to make sure I understand the scenario. So you're

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-04 Thread Samuel Just
Incomplete usually means the pgs do not have any complete copies. Did you previously have more osds? -Sam On Tue, Nov 4, 2014 at 7:37 AM, Chad Seys cws...@physics.wisc.edu wrote: On Monday, November 03, 2014 17:34:06 you wrote: If you have osds that are close to full, you may be hitting 9626.

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-05 Thread Samuel Just
The incomplete pgs are not processing requests. That's where the blocked requests are coming from. You can query the pg state using 'ceph pg pgid query'. Full osds can also block requests. -Sam On Wed, Nov 5, 2014 at 7:24 AM, Chad Seys cws...@physics.wisc.edu wrote: Hi Sam, Incomplete

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-05 Thread Samuel Just
Sounds like you needed osd 20. You can mark osd 20 lost. -Sam On Wed, Nov 5, 2014 at 9:41 AM, Gregory Farnum g...@gregs42.com wrote: On Wed, Nov 5, 2014 at 7:24 AM, Chad Seys cws...@physics.wisc.edu wrote: Hi Sam, Incomplete usually means the pgs do not have any complete copies. Did you

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now, but it certainly is unexpected. -Sam On Thu, Nov 6, 2014 at 7:18 AM, Chad William Seys cws...@physics.wisc.edu wrote: Hi Sam, Sounds

Re: [ceph-users] emperor - firefly 0.80.7 upgrade problem

2014-11-06 Thread Samuel Just
Also, are you certain that osd 20 is not up? -Sam On Thu, Nov 6, 2014 at 10:52 AM, Samuel Just sam.j...@inktank.com wrote: Amusingly, that's what I'm working on this week. http://tracker.ceph.com/issues/7862 There are pretty good reasons for why it works the way it does right now

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Samuel Just
Ok, why is ceph marking osds down? Post your ceph.log from one of the problematic periods. -Sam On Tue, Nov 18, 2014 at 1:35 AM, Andrei Mikhailovsky and...@arhont.com wrote: Hello cephers, I need your help and suggestion on what is going on with my cluster. A few weeks ago i've upgraded from

Re: [ceph-users] Giant upgrade - stability issues

2014-11-18 Thread Samuel Just
pastebin or something, probably. -Sam On Tue, Nov 18, 2014 at 12:34 PM, Andrei Mikhailovsky and...@arhont.com wrote: Sam, the logs are rather large in size. Where should I post it to? Thanks From: Samuel Just sam.j...@inktank.com To: Andrei Mikhailovsky

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Samuel Just
with that much data. Anything more constructive? Thanks From: Samuel Just sam.j...@inktank.com To: Andrei Mikhailovsky and...@arhont.com Cc: ceph-users@lists.ceph.com Sent: Tuesday, 18 November, 2014 8:53:47 PM Subject: Re: [ceph-users] Giant upgrade - stability

Re: [ceph-users] Giant upgrade - stability issues

2014-11-19 Thread Samuel Just
: Samuel Just sam.j...@inktank.com To: Andrei Mikhailovsky and...@arhont.com Cc: ceph-users@lists.ceph.com Sent: Tuesday, 18 November, 2014 8:53:47 PM Subject: Re: [ceph-users] Giant upgrade - stability issues pastebin or something, probably. -Sam On Tue, Nov 18, 2014 at 12:34 PM, Andrei

[ceph-users] fiemap bug on giant

2014-11-24 Thread Samuel Just
Bug #10166 (http://tracker.ceph.com/issues/10166) can cause recovery to result in incorrect object sizes on giant if the setting 'filestore fiemap' is set to true. This setting is disabled by default. This should be fixed in a future point release, though filestore fiemap will probably continue

Re: [ceph-users] What is the state of filestore sloppy CRC?

2014-11-25 Thread Samuel Just
sloppy crc uses fs xattrs directly, omap won't help. -Sam On Tue, Nov 25, 2014 at 7:39 AM, Tomasz Kuzemko tomasz.kuze...@ovh.net wrote: On Tue, Nov 25, 2014 at 07:10:26AM -0800, Sage Weil wrote: On Tue, 25 Nov 2014, Tomasz Kuzemko wrote: Hello, as far as I can tell, Ceph does not make any

Re: [ceph-users] seg fault

2014-12-08 Thread Samuel Just
At a guess, this is something that has long since been fixed in dumpling, you probably want to upgrade to the current dumpling point release. -Sam On Mon, Dec 8, 2014 at 2:40 PM, Philipp von Strobl-Albeg phil...@pilarkto.net wrote: Hi, after using the ceph-cluster for months without any

Re: [ceph-users] seg fault

2014-12-08 Thread Samuel Just
planed this step already - so good to know ;-) Do you recommend firefly or giant - without needing radosgw ? Best Philipp Am 08.12.2014 um 23:42 schrieb Samuel Just: At a guess, this is something that has long since been fixed in dumpling, you probably want to upgrade to the current

Re: [ceph-users] Some OSD and MDS crash

2014-07-01 Thread Samuel Just
Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 ? -Sam On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU pierre.blond...@unicaen.fr wrote: Hi, I join : - osd.20 is one of osd that I detect which makes crash other OSD. - osd.23 is one of osd which crash when i start

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
for the help Regards Pierre Le 01/07/2014 23:51, Samuel Just a écrit : Can you reproduce with debug osd = 20 debug filestore = 20 debug ms = 1 ? -Sam On Tue, Jul 1, 2014 at 1:21 AM, Pierre BLONDEAU pierre.blond...@unicaen.fr wrote: Hi, I join : - osd.20 is one of osd that I detect

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
other osd crash. I pass from 31 osd up to 16. I remark that after this the number of down+peering PG decrease from 367 to 248. It's normal ? May be it's temporary, the time that the cluster verifies all the PG ? Regards Pierre Le 02/07/2014 19:16, Samuel Just a écrit : You should add debug

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Also, what version did you upgrade from, and how did you upgrade? -Sam On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just sam.j...@inktank.com wrote: Ok, in current/meta on osd 20 and osd 23, please attach all files matching ^osdmap.13258.* There should be one such file on each osd. (should look

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Joao: this looks like divergent osdmaps, osd 20 and osd 23 have differing ideas of the acting set for pg 2.11. Did we add hashes to the incremental maps? What would you want to know from the mons? -Sam On Wed, Jul 2, 2014 at 3:10 PM, Samuel Just sam.j...@inktank.com wrote: Also, what version

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
\uosdmap.13258__0_469271DE__none on each meta directory. Le 03/07/2014 00:10, Samuel Just a écrit : Also, what version did you upgrade from, and how did you upgrade? -Sam On Wed, Jul 2, 2014 at 3:09 PM, Samuel Just sam.j...@inktank.com wrote: Ok, in current/meta on osd 20 and osd 23, please

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
that got set? -Sam On Wed, Jul 2, 2014 at 3:43 PM, Samuel Just sam.j...@inktank.com wrote: Yeah, divergent osdmaps: 555ed048e73024687fc8b106a570db4f osd-20_osdmap.13258__0_4E62BB79__none 6037911f31dc3c18b05499d24dcdbe5c osd-23_osdmap.13258__0_4E62BB79__none Joao: thoughts? -Sam On Wed

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
Can you confirm from the admin socket that all monitors are running the same version? -Sam On Wed, Jul 2, 2014 at 4:15 PM, Pierre BLONDEAU pierre.blond...@unicaen.fr wrote: Le 03/07/2014 00:55, Samuel Just a écrit : Ah, ~/logs » for i in 20 23; do ../ceph/src/osdmaptool --export-crush /tmp

Re: [ceph-users] Some OSD and MDS crash

2014-07-02 Thread Samuel Just
-daemon /var/run/ceph/ceph-mon.joe.asok version {version:0.82} Pierre Le 03/07/2014 01:17, Samuel Just a écrit : Can you confirm from the admin socket that all monitors are running the same version? -Sam On Wed, Jul 2, 2014 at 4:15 PM, Pierre BLONDEAU pierre.blond...@unicaen.fr wrote: Le 03

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
Can you attach your ceph.conf for your osds? -Sam On Thu, Jul 10, 2014 at 8:01 AM, Christian Eichelmann christian.eichelm...@1und1.de wrote: I can also confirm that after upgrading to firefly both of our clusters (test and live) were going from 0 scrub errors each for about 6 Month to about

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
automatically during deep-scrub or does it have to be done manually because there is no majority? Thanks, -Sudip -Original Message- From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Samuel Just Sent: Thursday, July 10, 2014 10:16 AM To: Christian

Re: [ceph-users] scrub error on firefly

2014-07-10 Thread Samuel Just
- if yes, this is automatic - correct? Or Are you referring to the explicit manually initiated repair commands? Thanks, -Sudip -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Thursday, July 10, 2014 10:50 AM To: Chahal, Sudip Cc: Christian Eichelmann

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
When you get the next inconsistency, can you copy the actual objects from the osd store trees and get them to us? That might provide a clue. -Sam On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith rbsm...@adams.edu wrote: On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just sam.j...@inktank.com wrote

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
, Samuel Just wrote: When you get the next inconsistency, can you copy the actual objects from the osd store trees and get them to us? That might provide a clue. -Sam On Fri, Jul 11, 2014 at 6:52 AM, Randy Smith rbsm...@adams.edu wrote: On Thu, Jul 10, 2014 at 4:40 PM, Samuel Just

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
.b0ce3.238e1f29.000b__head_34DC35C6__3 ? On Fri, Jul 11, 2014 at 2:00 PM, Samuel Just sam.j...@inktank.com wrote: Also, what filesystem are you using? -Sam On Fri, Jul 11, 2014 at 10:37 AM, Sage Weil sw...@redhat.com wrote: One other thing we might also try is catching this earlier (on first

Re: [ceph-users] scrub error on firefly

2014-07-11 Thread Samuel Just
And grab the xattrs as well. -Sam On Fri, Jul 11, 2014 at 2:39 PM, Samuel Just sam.j...@inktank.com wrote: Right. -Sam On Fri, Jul 11, 2014 at 2:05 PM, Randy Smith rbsm...@adams.edu wrote: Greetings, I'm using xfs. Also, when, in a previous email, you asked if I could send the object, do

Re: [ceph-users] scrub error on firefly

2014-07-12 Thread Samuel Just
www.adams.edu 719-587-7741 On Jul 12, 2014 10:34 AM, Samuel Just sam.j...@inktank.com wrote: Here's a diff of the two files. One of the two files appears to contain ceph leveldb keys? Randy, do you have an idea of what this rbd image is being used for (rb.0.b0ce3.238e1f29, that is). -Sam On Fri

Re: [ceph-users] rgw client

2013-09-23 Thread Samuel Just
You might need to tell cyberduck the location of your endpoint. -Sam On Tue, Sep 17, 2013 at 9:16 PM, lixuehui lixue...@chinacloud.com.cn wrote: Hi,all I installed rgw with a healthy ceph cluster .Although it works well with S3 api ,can it be connected by Cyberduck ? I've tried with the rgw

Re: [ceph-users] Cannot start 5/20 OSDs

2013-09-23 Thread Samuel Just
Can you restart those osds with debug osd = 20 debug filestore = 20 debug ms = 1 in the [osd] section of the ceph.conf file on the respective machines and upload the logs? Sounds like a bug. -Sam On Tue, Sep 17, 2013 at 2:05 PM, Matt Thompson watering...@gmail.com wrote: Hi All, I set up a

Re: [ceph-users] Data loss after force umount !

2013-10-07 Thread Samuel Just
Sounds like it's probably an issue with the fs on the rbd disk? What fs was the vm using on the rbd? -Sam On Mon, Oct 7, 2013 at 8:11 AM, higkoohk higko...@gmail.com wrote: We use ceph as the storage of kvm . I found the VMs errors when force umount the ceph disk. Is it just right ? How to

Re: [ceph-users] Continually crashing osds

2013-10-21 Thread Samuel Just
What happened when you simply left the cluster to recover without osd.11 in? -Sam On Mon, Oct 21, 2013 at 4:01 PM, Jeff Williams jeff.willi...@medio.com wrote: What is the best way to do that? I tried ceph pg repair, but it only did so much. On 10/21/13 3:54 PM, Samuel Just sam.j

Re: [ceph-users] near full osd

2013-11-12 Thread Samuel Just
I think we removed the experimental warning in cuttlefish. It probably wouldn't hurt to do it in bobtail particularly if you test it extensively on a test cluster first. However, we didn't do extensive testing on it until cuttlefish. I would upgrade to cuttlefish (actually, dumpling or emperor,

[ceph-users] Emperor upgrade bug 6761

2013-11-13 Thread Samuel Just
Upgrading to emperor from previous versions may cause an issue where objects become marked lost erroneously. We suggest delaying upgrades to emperor until this issue is resolved in a point release. We should have the point release out within a day or two. If you have completed the upgrade

[ceph-users] v0.72.1 released

2013-11-15 Thread Samuel Just
This point release addresses issue #6761 (http://tracker.ceph.com/issues/6761). Upgrades to v0.72 can cause reads to begin returning ENFILE (Too many open files in system). Changes: * osd: fix upgrade issue with object_info_t encodings * ceph_filestore_tool: add tool to repair osd stores

Re: [ceph-users] unable to start OSD

2014-02-20 Thread Samuel Just
What has happened in the last few weeks to this cluster? Was there an upgrade? -Sam On Wed, Feb 12, 2014 at 10:07 AM, Dietmar Maurer diet...@proxmox.com wrote: It would be great to get two logs from two different crashing OSDs for comparison purposes.

Re: [ceph-users] ceph -w question

2013-04-15 Thread Samuel Just
Can you post the output of ceph osd tree? -Sam On Mon, Apr 15, 2013 at 9:52 AM, Jeppesen, Nelson nelson.jeppe...@disney.com wrote: Thanks for the help but how do I track down this issue? If data is inaccessible, that's a very bad thing given this is production. # ceph osd dump | grep pool

Re: [ceph-users] ceph -w question

2013-04-15 Thread Samuel Just
Jeppesen Disney Technology Solutions and Services Phone 206-588-5001 -Original Message- From: Samuel Just [mailto:sam.j...@inktank.com] Sent: Monday, April 15, 2013 10:11 AM To: Jeppesen, Nelson Cc: Gregory Farnum; ceph-users@lists.ceph.com Subject: Re: [ceph-users] ceph -w

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-04-29 Thread Samuel Just
You appear to be missing pg metadata for some reason. If you can reproduce it with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the OSDs, I should be able to track it down. I created a bug: #4855. Thanks! -Sam On Mon, Apr 29, 2013 at 9:52 AM, Travis Rhoden trho...@gmail.com

Re: [ceph-users] Failed assert when starting new OSDs in 0.60

2013-04-29 Thread Samuel Just
/cephlogs.tgz - Travis On Mon, Apr 29, 2013 at 1:04 PM, Samuel Just sam.j...@inktank.com wrote: You appear to be missing pg metadata for some reason. If you can reproduce it with debug osd = 20 debug filestore = 20 debug ms = 1 on all of the OSDs, I should be able to track it down. I

Re: [ceph-users] scrub error: found clone without head

2013-05-22 Thread Samuel Just
Can you post your ceph.log with the period including all of these errors? -Sam On Wed, May 22, 2013 at 5:39 AM, Dzianis Kahanovich maha...@bspu.unibel.by wrote: Olivier Bonvalet пишет: Le lundi 20 mai 2013 à 00:06 +0200, Olivier Bonvalet a écrit : Le mardi 07 mai 2013 à 15:51 +0300, Dzianis

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Samuel Just
Just a écrit : rb.0.15c26.238e1f29 Has that rbd volume been removed? -Sam On Wed, May 22, 2013 at 12:18 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: 0.61-11-g3b94f03 (0.61-1.1), but the bug occured with bobtail. Le mercredi 22 mai 2013 à 12:00 -0700, Samuel Just a écrit : What

Re: [ceph-users] scrub error: found clone without head

2013-05-23 Thread Samuel Just
snapshot with this id for the rb.0.15c26.238e1f29 image. So, which files should I remove ? Thanks for your help. Le jeudi 23 mai 2013 à 15:17 -0700, Samuel Just a écrit : Do all of the affected PGs share osd.28 as the primary? I think the only recovery is probably to manually remove the orphaned

Re: [ceph-users] ceph -w warning I don't have pgid 0.2c8?

2013-07-17 Thread Samuel Just
What version are you running? How did you move the osds from 2TB to 4TB? -Sam On Wed, Jul 17, 2013 at 12:59 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi everyone, I converted every osds from 2TB to 4TB, and when moving complete, show log Ceph realtimeceph -w: displays error: I don't have pgid

Re: [ceph-users] ceph -w warning I don't have pgid 0.2c8?

2013-07-18 Thread Samuel Just
TUAN On 07/18/2013 01:16 AM, Samuel Just wrote: What version are you running? How did you move the osds from 2TB to 4TB? -Sam On Wed, Jul 17, 2013 at 12:59 AM, Ta Ba Tuan tua...@vccloud.vn wrote: Hi everyone, I converted every osds from 2TB to 4TB, and when moving complete, show log Ceph

Re: [ceph-users] OSD Keep Crashing

2013-08-12 Thread Samuel Just
Can you post more of the log? There should be a line towards the bottom indicating the line with the failed assert. Can you also attach ceph pg dump, ceph osd dump, ceph osd tree? -Sam On Mon, Aug 12, 2013 at 11:54 AM, John Wilkins john.wilk...@inktank.comwrote: Stephane, You should post

Re: [ceph-users] ceph-deploy and journal on separate disk

2013-08-12 Thread Samuel Just
Did you try using ceph-deploy disk zap ceph001:sdaa first? -Sam On Mon, Aug 12, 2013 at 6:21 AM, Pavel Timoschenkov pa...@bayonetteas.onmicrosoft.com wrote: Hi. I have some problems with create journal on separate disk, using ceph-deploy osd prepare command. When I try execute next command:

Re: [ceph-users] mounting a pool via fuse

2013-08-12 Thread Samuel Just
Can you elaborate on what behavior you are looking for? -Sam On Fri, Aug 9, 2013 at 4:37 AM, Georg Höllrigl georg.hoellr...@xidras.com wrote: Hi, I'm using ceph 0.61.7. When using ceph-fuse, I couldn't find a way, to only mount one pool. Is there a way to mount a pool - or is it simply not

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Can you attach the output of ceph osd tree? Also, can you run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap? -Sam On Fri, Aug 9, 2013 at 4:28 AM, Jeff Moskow j...@rtr.com wrote: Thanks for the suggestion. I had tried stopping each OSD for 30 seconds, then restarting it, waiting 2

Re: [ceph-users] run ceph without auth

2013-08-12 Thread Samuel Just
I have referred you to someone more conversant with the details of mkcephfs, but for dev purposes, most of us use the vstart.sh script in src/ (http://ceph.com/docs/master/dev/). -Sam On Fri, Aug 9, 2013 at 2:59 AM, Nulik Nol nulik...@gmail.com wrote: Hi, I am configuring a single node for

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-12 Thread Samuel Just
Are you using any kernel clients? Will osds 3,14,16 be coming back? -Sam On Mon, Aug 12, 2013 at 2:26 PM, Jeff Moskow j...@rtr.com wrote: Sam, I've attached both files. Thanks! Jeff On Mon, Aug 12, 2013 at 01:46:57PM -0700, Samuel Just wrote: Can you attach the output

Re: [ceph-users] How to set Object Size/Stripe Width/Stripe Count?

2013-08-12 Thread Samuel Just
I think the docs you are looking for are http://ceph.com/docs/master/man/8/cephfs/ (specifically the set_layout command). -Sam On Thu, Aug 8, 2013 at 7:48 AM, Da Chun ng...@qq.com wrote: Hi list, I saw the info about data striping in http://ceph.com/docs/master/architecture/#data-striping .

Re: [ceph-users] Ceph pgs stuck unclean

2013-08-12 Thread Samuel Just
Can you attach the output of: ceph -s ceph pg dump ceph osd dump and run ceph osd getmap -o /tmp/osdmap and attach /tmp/osdmap/ -Sam On Wed, Aug 7, 2013 at 1:58 AM, Howarth, Chris chris.howa...@citi.com wrote: Hi, One of our OSD disks failed on a cluster and I replaced it, but when it

Re: [ceph-users] could not generate the bootstrap key

2013-08-12 Thread Samuel Just
Can you give a step by step account of what you did prior to the error? -Sam On Tue, Aug 6, 2013 at 10:52 PM, 於秀珠 yuxiu...@jovaunn.com wrote: using the ceph-deploy to manage a existing cluster,i follow the steps in the document ,but there is some errors that i can not gather the keys. when i

Re: [ceph-users] one pg stuck with 2 unfound pieces

2013-08-13 Thread Samuel Just
You can run 'ceph pg 0.cfa mark_unfound_lost revert'. (Revert Lost section of http://ceph.com/docs/master/rados/operations/placement-groups/). -Sam On Tue, Aug 13, 2013 at 6:50 AM, Jens-Christian Fischer jens-christian.fisc...@switch.ch wrote: We have a cluster with 10 servers, 64 OSDs and 5

Re: [ceph-users] pgs stuck unclean -- how to fix? (fwd)

2013-08-13 Thread Samuel Just
Cool! -Sam On Tue, Aug 13, 2013 at 4:49 AM, Jeff Moskow j...@rtr.com wrote: Sam, Thanks that did it :-) health HEALTH_OK monmap e17: 5 mons at {a=172.16.170.1:6789/0,b=172.16.170.2:6789/0,c=172.16.170.3:6789/0,d=172.16.170.4:6789/0,e=172.16.170.5:6789/0}, election epoch 9794,

Re: [ceph-users] Designing an application with Ceph

2013-08-13 Thread Samuel Just
2 is certainly an intriguing option. RADOS isn't really a database engine (even a nosql one), but should be able to serve your needs here. Have you seen the omap api available in librados? It allows you to efficiently store key/value pairs attached to a librados object (uses leveldb on the OSDs

Re: [ceph-users] one pg stuck with 2 unfound pieces

2013-08-14 Thread Samuel Just
wrote: On 13.08.2013, at 21:09, Samuel Just sam.j...@inktank.com wrote: You can run 'ceph pg 0.cfa mark_unfound_lost revert'. (Revert Lost section of http://ceph.com/docs/master/rados/operations/placement-groups/). -Sam As I wrote further down the info, ceph wouldn't let me do that: root

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-19 Thread Samuel Just
You're right, PGLog::undirty() looks suspicious. I just pushed a branch wip-dumpling-pglog-undirty with a new config (osd_debug_pg_log_writeout) which if set to false will disable some strictly debugging checks which occur in PGLog::undirty(). We haven't actually seen these checks causing

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-20 Thread Samuel Just
usage, right at the bottom of the green part of the output. Many thanks for your help so far! Regards, Oliver On ma, 2013-08-19 at 00:29 -0700, Samuel Just wrote: You're right, PGLog::undirty() looks suspicious. I just pushed a branch wip-dumpling-pglog-undirty

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-20 Thread Samuel Just
, but it looks promising, not taking any more CPU than the Cuttlefish-osds. Thanks! I'll get back to you. Regards, Oliver On di, 2013-08-20 at 10:40 -0700, Samuel Just wrote: Can you try dumpling head without the option? -Sam On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey oli

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-21 Thread Samuel Just
that the problem might be in RBD, as Mark suggested. Regards, Oliver On 20-08-13 19:40, Samuel Just wrote: Can you try dumpling head without the option? -Sam On Tue, Aug 20, 2013 at 1:44 AM, Oliver Daudey oli...@xs4all.nl wrote: Hey Mark, Sorry, but after some more tests I have

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-21 Thread Samuel Just
also waited for the cluster to come to a healthy state after restarting the OSDs, so it's not related to rebalancing or peering-activity, either. Regards, Oliver On wo, 2013-08-21 at 14:07 -0700, Samuel Just wrote: Try it again in the reverse order, I strongly suspect caching

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-21 Thread Samuel Just
, 2013-08-21 at 18:33 -0700, Samuel Just wrote: I am dumb. There *has* been a change in the osd which can account for this: the wbthrottle limits. We added some logic to force the kernel to start flushing writes out earlier, normally a good thing. In this case, it's probably doing an fsync every

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-23 Thread Samuel Just
) flushing behavior into the new code so that we can confirm that it is really the writeback that is causing the problem and not something else... sage On Thu, 22 Aug 2013, Oliver Daudey wrote: Hey Samuel, On wo, 2013-08-21 at 20:27 -0700, Samuel Just wrote: I think the rbd cache one you'd

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-23 Thread Samuel Just
smaller margin now. Looks like we're onto something. The fdatasync seems to be the key here, rather than disabling wbthrottle. Regards, Oliver On 23-08-13 19:53, Samuel Just wrote: I pushed a branch, wip-dumpling-perf. It does two things: 1) adds a config

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-23 Thread Samuel Just
out the performance. -Sam On Fri, Aug 23, 2013 at 1:44 PM, Oliver Daudey oli...@xs4all.nl wrote: Hey Samuel, I commented the earlier settings out, so it was with defaults. Regards, Oliver On vr, 2013-08-23 at 13:35 -0700, Samuel Just wrote: When you were running

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-26 Thread Samuel Just
! Regards, Oliver On vr, 2013-08-23 at 13:55 -0700, Samuel Just wrote: Ok, can you try setting filestore_op_threads to 1 on both cuttlefish and wip-dumpling-perf (with and with wbthrottle, default wbthrottle settings). I suspect I created contention in the filestore op threads

Re: [ceph-users] bucket count limit

2013-08-26 Thread Samuel Just
As I understand it, that should actually help avoid bucket contention and thereby increase performance. Yehuda, anything to add? -Sam On Thu, Aug 22, 2013 at 7:08 AM, Mostowiec Dominik dominik.mostow...@grupaonet.pl wrote: Hi, I think about sharding s3 buckets in CEPH cluster, create

Re: [ceph-users] Storage, File Systems and Data Scrubbing

2013-08-26 Thread Samuel Just
ceph-osd builds a transactional interface on top of the usual posix operations so that we can do things like atomically perform an object write and update the osd metadata. The current implementation requires our own journal and some metadata ordering (which is provided by the backing

Re: [ceph-users] lvm for a quick ceph lab cluster test

2013-08-26 Thread Samuel Just
Seems reasonable to me. I'm not sure I've heard anything about using LVM under ceph. Let us know how it goes! -Sam On Wed, Aug 21, 2013 at 5:18 PM, Liu, Larry larry@disney.com wrote: Hi guys, I'm a newbie in ceph. Wonder if I can use 2~3 LVM disks on each server, total 2 servers to run

Re: [ceph-users] osd/OSD.cc: 4844: FAILED assert(_get_map_bl(epoch, bl)) (ceph 0.61.7)

2013-08-26 Thread Samuel Just
This is the same osd, and hasn't been working in the mean time? Can your clsuter operate without that osd? -Sam On Mon, Aug 19, 2013 at 2:05 PM, Olivier Bonvalet ceph.l...@daevel.fr wrote: Le lundi 19 août 2013 à 12:27 +0200, Olivier Bonvalet a écrit : Hi, I have an OSD which crash every

Re: [ceph-users] Sequential placement

2013-08-26 Thread Samuel Just
I think rados bench is actually creating new objects with each IO. Can you paste in the command you used? -Sam On Tue, Aug 20, 2013 at 7:28 AM, daniel pol daniel_...@hotmail.com wrote: Hi ! Ceph newbie here with a placement question. I'm trying to get a simple Ceph setup to run well with

Re: [ceph-users] cuttlefish operatiing a cluster(start ceph all) failed

2013-08-26 Thread Samuel Just
Usually you need to run the inictl stuff on the node the process is on to control the process. -Sam On Fri, Aug 16, 2013 at 12:28 AM, maoqi1982 maoqi1...@126.com wrote: Hi list: After I deployed a cuttlefish(6.1.07) cluster on three nodes(OS Ubuntu 12.04),one ceph-deploy node ,one monitor

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-26 Thread Samuel Just
that PGLog::undirty() is also still showing up near the top, even in 0.67.2. I'll send you the logs by private mail. Regards, Oliver On ma, 2013-08-26 at 13:35 -0700, Samuel Just wrote: Can you attach a log from the startup of one of the dumpling osds on your production machine

Re: [ceph-users] Optimal configuration to validate Ceph

2013-08-26 Thread Samuel Just
compared to the multiple OSDs on multiple servers, since there are no network latencies involved? On Mon, Aug 26, 2013 at 1:47 PM, Samuel Just sam.j...@inktank.com wrote: If you create a pool with size 1 (no replication), (2) should be somewhere around 3x the speed of (1) assuming the client

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
I just pushed a patch to wip-dumpling-log-assert (based on current dumpling head). I had disabled most of the code in PGLog::check() but left an (I thought) innocuous assert. It seems that with (at least) g++ 4.6.3, stl list::size() is linear in the size of the list, so that assert actually

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
% [kernel] [k] rcu_needs_cpu 0.45% [kernel] [k] fput Regards, Oliver On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote: I just pushed a patch to wip-dumpling-log-assert (based on current dumpling head). I had disabled most of the code in PGLog

Re: [ceph-users] Significant slowdown of osds since v0.67 Dumpling

2013-08-27 Thread Samuel Just
] copy_user_generic_string 0.53% [kernel] [k] load_balance 0.50% [kernel] [k] rcu_needs_cpu 0.45% [kernel] [k] fput Regards, Oliver On ma, 2013-08-26 at 23:33 -0700, Samuel Just wrote: I just pushed a patch to wip-dumpling

Re: [ceph-users] Full OSD questions

2013-09-09 Thread Samuel Just
This is usually caused by having too few pgs. Each pool with a significant amount of data needs at least around 100pgs/osd. -Sam On Mon, Sep 9, 2013 at 10:32 AM, Gaylord Holder ghol...@cs.drexel.edu wrote: I'm starting to load up my ceph cluster. I currently have 12 2TB drives (10 up and in,

Re: [ceph-users] rgw geo-replication and disaster recovery problem

2013-09-09 Thread Samuel Just
The regions and zones can be used to distribute among different ceph clusters. -Sam On Mon, Sep 2, 2013 at 2:05 AM, 李学慧 lixuehui...@126.com wrote: Mr. Hi!I'm interested into the rgw geo-replication and disaster recovery feature. But whether those 'regisions and zones ' distributes among

Re: [ceph-users] ceph freezes for 10+ seconds during benchmark

2013-09-09 Thread Samuel Just
It looks like osd.4 may actually be the problem. Can you try removing osd.4 and trying again? -Sam On Mon, Sep 2, 2013 at 8:01 AM, Mariusz Gronczewski mariusz.gronczew...@artegence.com wrote: We've installed ceph on test cluster: 3x mon, 7xOSD on 2x10k RPM SAS Centos 6.4 (

Re: [ceph-users] SSD only storage, where to place journal

2013-09-09 Thread Samuel Just
You can't really disable the journal. It's used for failure recovery. It should be fine to place your journal on the same ssd as the osd data directory (though it does affect performance). -Sam On Wed, Sep 4, 2013 at 8:40 AM, Neo n...@spacerat.ch wrote: Original-Nachricht

Re: [ceph-users] few port per ceph-osd

2013-09-09 Thread Samuel Just
That's normal, each osd listens on a few different ports for different reasons. -Sam On Mon, Sep 9, 2013 at 12:27 AM, Timofey Koolin timo...@koolin.ru wrote: I use ceph 0.67.2. When I start ceph-osd -i 0 or ceph-osd -i 1 it start one process, but it process open few tcp-ports, is it normal?

Re: [ceph-users] radosgw md5

2013-09-09 Thread Samuel Just
What do you mean by directly from Rados? -Sam On Wed, Sep 4, 2013 at 1:40 AM, Art M. artwork...@gmail.com wrote: Hello, As I know, radosgw calculates MD5 of the uploaded file and compares it with MD5 provided in header. Is it possible to get calculated MD5 of uploaded file directly from

Re: [ceph-users] Ceph space problem, garbage collector ?

2013-09-10 Thread Samuel Just
Can you post the rest of you crush map? -Sam On Tue, Sep 10, 2013 at 5:52 AM, Olivier Bonvalet ceph.l...@daevel.fr wrote: I also checked that all files in that PG still are on that PG : for IMG in `find . -type f -printf '%f\n' | awk -F '__' '{ print $1 }' | sort --unique` ; do echo -n $IMG ;

Re: [ceph-users] Weird scrub problem

2014-12-22 Thread Samuel Just
So 4.458_head/DIR_8/DIR_5/DIR_4/DIR_F/rbd\\udata.cbba8a759d2.0a5b__head_6F0DF458__4 is present on osd 3, osd 34, and osd 10? -Sam On Mon, Dec 22, 2014 at 8:36 AM, Andrey Korolyov and...@xdel.ru wrote: Hello, I am currently facing some strange problem, most probably a bug (osd.3

Re: [ceph-users] Weird scrub problem

2014-12-22 Thread Samuel Just
Korolyov and...@xdel.ru wrote: On Mon, Dec 22, 2014 at 11:50 PM, Samuel Just sam.j...@inktank.com wrote: So 4.458_head/DIR_8/DIR_5/DIR_4/DIR_F/rbd\\udata.cbba8a759d2.0a5b__head_6F0DF458__4 is present on osd 3, osd 34, and osd 10? -Sam Yes, exactly, and have same checksum

Re: [ceph-users] Weird scrub problem

2015-01-02 Thread Samuel Just
Korolyov and...@xdel.ru wrote: On Tue, Dec 23, 2014 at 4:17 AM, Samuel Just sam.j...@inktank.com wrote: Oh, that's a bit less interesting. The bug might be still around though. -Sam On Mon, Dec 22, 2014 at 2:50 PM, Andrey Korolyov and...@xdel.ru wrote: On Tue, Dec 23, 2014 at 1:12 AM, Samuel

Re: [ceph-users] v0.92 released

2015-02-03 Thread Samuel Just
Mykola Golub) * osd: EIO on whole-object reads when checksum is wrong (Sage Weil) * osd: filejournal: don't cache journal when not using direct IO (Jianpeng Ma) * osd: fix ioprio option (Mykola Golub) * osd: fix scrub delay bug (#10693 Samuel Just) * osd: fix watch reconnect race (#10441 Sage

  1   2   3   >