CephFS and the next hammer release v0.94.3

2015-08-03 Thread Loic Dachary
Hi Greg,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the fs suite (http://tracker.ceph.com/issues/11990#fs). Do you think it 
is ready for QE to start their own round of testing ?

Cheers

P.S. http://tracker.ceph.com/issues/11990#Release-information has direct links 
to the pull requests merged into hammer since v0.94.2 in case you need more 
context about one of them.

-- 
Loïc Dachary, Artisan Logiciel Libre


















signature.asc
Description: OpenPGP digital signature


rados and the next hammer release v0.94.3

2015-08-03 Thread Loic Dachary
Hi Sam,

We need http://tracker.ceph.com/issues/12465 in hammer v0.94.3. Is there 
anything else we want before publishing this release ? Should we wait for 
http://tracker.ceph.com/issues/12410 maybe ? And 
http://tracker.ceph.com/issues/12536 ?

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the rados suite. Provided it also contains the fixes for the above three 
issues, do you think it is ready for QE to start their own round of testing ?

Cheers

P.S. http://tracker.ceph.com/issues/11990#Release-information has direct links 
to the pull requests merged into hammer since v0.94.2 in case you need more 
context about one of them.

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


rbd and the next hammer release v0.94.3

2015-08-03 Thread Loic Dachary
Hi Josh,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the rbd suite (http://tracker.ceph.com/issues/11990#rbd). Do you think 
it is ready for QE to start their own round of testing ?

Cheers

P.S. http://tracker.ceph.com/issues/11990#Release-information has direct links 
to the pull requests merged into hammer since v0.94.2 in case you need more 
context about one of them.

-- 
Loïc Dachary, Artisan Logiciel Libre















signature.asc
Description: OpenPGP digital signature


Re: CephFS and the next hammer release v0.94.3

2015-08-03 Thread Gregory Farnum
On Mon, Aug 3, 2015 at 6:43 PM Loic Dachary l...@dachary.org wrote:

 Hi Greg,

 The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
 passed the fs suite (http://tracker.ceph.com/issues/11990#fs). Do you think 
 it is ready for QE to start their own round of testing ?

I'm on vacation right now, but the only thing I see there that might
be iffy is the backport of the file handle reference counting. As long
as that is all good (Zheng?) things look fine to me.
-Greg


 Cheers

 P.S. http://tracker.ceph.com/issues/11990#Release-information has direct 
 links to the pull requests merged into hammer since v0.94.2 in case you need 
 more context about one of them.

 --
 Loïc Dachary, Artisan Logiciel Libre
















--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transitioning Ceph from Autotools to CMake

2015-08-03 Thread John Spray
OK, here are vstart+ceph.in changes that work well enough in my out of
tree build:
https://github.com/ceph/ceph/pull/5457

John

On Mon, Aug 3, 2015 at 11:09 AM, John Spray jsp...@redhat.com wrote:
 On Sat, Aug 1, 2015 at 8:24 PM, Orit Wasserman owass...@redhat.com wrote:


 3. no vstart.sh , starting working on this too but have less progress
 here. At the moment in order to use vstart I copy the exe and libs to
 src dir.

 I just started playing with CMake on Friday, adding some missing cephfs
 bits.  I was going to fix (3) as well, but I don't want to duplicate work
 -- do you have an existing branch at all?  Presumably this will mostly be a
 case of adding appropriate prefixes to commands.

 Cheers,
 John
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [michigan-eng] cmake and gitbuilder, juntos

2015-08-03 Thread Matt Benjamin
(fyi, ceph-devel, this was irc discussion about enhancing gitbuilder, a 
temporary blocker for cmake)

(05:36:09 PM) sjusthm: sage mattbenjamin: so that means we should adapt 
gitbuilder to use cmake, right?
(05:36:17 PM) sjusthm: in the immediate term?
(05:36:23 PM) sjusthm: since we want to switch to cmake anyway
(05:36:26 PM) sjusthm: and we need it for C++11?
(05:39:50 PM) mattbenjamin: sjustm:  that sounds correct
snippage
(05:47:44 PM) sjusthm: mattbenjamin: oh, I'd be fine with doing cmake first
(05:47:52 PM) sjusthm: no one actually *likes* messing with automake

- Original Message -
From: Matt Benjamin mbenja...@redhat.com
To: michigan-...@redhat.com
Sent: Monday, August 3, 2015 5:21:38 PM
Subject: [michigan-eng] cmake and gitbuilder, juntos

(04:53:04 PM) mattbenjamin: gitbuilder doesn't understand cmake;  I heard 
someone (sam?) talk about gitbiulder eol--but that's not soon?
(04:54:45 PM) mattbenjamin: this is appropo of:  casey's c++11 change includes 
automake logic to get around that
(04:58:44 PM) sage: mattbenjamin: yeah, we'll need tdo change all of the build 
tooling (gitbuilder and ceph-build.git) to use cmake
(04:58:54 PM) mattbenjamin: ok
(04:59:01 PM) sage: it'll be a while before we phase out gitbuidler
(04:59:51 PM) joshd: mattbenjamin: gitbuilder just runs a script you give it - 
it has no knowledge of build systems. it'll involve replacing parts of scripts 
like https://github.com/ceph/autobuild-ceph/blob/master/build-ceph.sh
(05:00:48 PM) mattbenjamin: tx
(05:01:01 PM) haomaiwang left the room (quit: Remote host closed the 
connection).

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Tracker 12577 repair won't fix replica with bad digest

2015-08-03 Thread David Zafman


Sage,

I restored the branch wip-digest-repair which merged post-hammer in pull 
request #4365.  Do you think that 4365 fixes the reported bug #12577?


I cherry-picked the 9 commits off of hammer-backports-next as pull 
request #5458 and assigned to Loic.


David


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Tracker 12577 repair won't fix replica with bad digest

2015-08-03 Thread Sage Weil
On Mon, 3 Aug 2015, David Zafman wrote:
 Sage,
 
 I restored the branch wip-digest-repair which merged post-hammer in pull
 request #4365.  Do you think that 4365 fixes the reported bug #12577?
 
 I cherry-picked the 9 commits off of hammer-backports-next as pull request
 #5458 and assigned to Loic.

I suspect so... and conveniently we have dozens of PGs in this 
inconsistent state on the lab cluster that we can test the backport on.  
(I would do a simple vstart test first where you either inject a 
corruption or manually edit all copies of the object so that they match 
each other.)

sage

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


radosgw - stuck ops

2015-08-03 Thread GuangYang
Hi Yehuda,
Recently with our pre-production clusters (with radosgw), we had an outage that 
all radosgw worker threads got stuck and all clients request resulted in 500 
because that there is no worker thread taking care of them.

What we observed from the cluster, is that there was a PG stuck at *peering* 
state, as a result, all requests hitting that PG would occupy a worker thread 
infinitely and that gradually stuck all workers.

The reason why the PG stuck at peering is still under investigation, but 
radosgw side, I am wondering if we can pursue anything to improve such use case 
(to be more specific, 1 out of 8192 PGs' issue cascading to a service 
unavailable across the entire cluster):

1. The first approach I can think of is to add timeout at objecter layer for 
each OP to OSD, I think the complexity comes with WRITE, that is, how do we 
make sure the integrity if we abort at objecter layer. But for immutable op, I 
think we certainly can do this, since at an upper layer, we already reply back 
to client with an error.
2. Do thread pool/working queue sharding  at radosgw, in which case, partial 
failure would (hopefully) only impact partial of worker threads and only cause 
a partial outage.

How do you think?

Thanks,
Guang --
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CephFS and the next hammer release v0.94.3

2015-08-03 Thread Yan, Zheng
Hi Loic.

Yes, https://github.com/ceph/ceph/pull/5222 is problematic.  Do you mean should 
we include these RPs in v0.94.3?  These RPs fix a bug in rare configure, I 
think it’s not a big deal to not include it in v0.94.3 

Regards
Yan, Zheng


 On Aug 4, 2015, at 00:32, Loic Dachary l...@dachary.org wrote:
 
 Hi Greg,
 
 I assume the file handle reference counting is about 
 http://tracker.ceph.com/issues/12088 which is backported as described at 
 http://tracker.ceph.com/issues/12319. It was indeed somewhat problematic and 
 required two pull requests: https://github.com/ceph/ceph/pull/5222 (authored 
 by Yan Zheng) and https://github.com/ceph/ceph/pull/5427 (merged by Yan 
 Zheng).
 
 Cheers
 
 On 03/08/2015 18:01, Gregory Farnum wrote:
 On Mon, Aug 3, 2015 at 6:43 PM Loic Dachary l...@dachary.org wrote:
 
 Hi Greg,
 
 The next hammer release as found at 
 https://github.com/ceph/ceph/tree/hammer passed the fs suite 
 (http://tracker.ceph.com/issues/11990#fs). Do you think it is ready for QE 
 to start their own round of testing ?
 
 I'm on vacation right now, but the only thing I see there that might
 be iffy is the backport of the file handle reference counting. As long
 as that is all good (Zheng?) things look fine to me.
 -Greg
 
 
 Cheers
 
 P.S. http://tracker.ceph.com/issues/11990#Release-information has direct 
 links to the pull requests merged into hammer since v0.94.2 in case you 
 need more context about one of them.
 
 --
 Loïc Dachary, Artisan Logiciel Libre
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 -- 
 Loïc Dachary, Artisan Logiciel Libre
 

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


A issue in straw2 , maybe it`s not a problem

2015-08-03 Thread chen kael
Hi,everyone

recently,I want to migrate our on-line cluster from straw to
straw2,because if I add or remove some osds,too much objects are to be
replaced than it should be.

I have a confusing in using straw2, because straw2 should have a
character that if an item's weight was adjusted up or down, mappings
would either
move to or from the adjusted item, but never between other unmodified
items in the bucket.

However after I do the test below , I still find pgs move from
unmodified items,I am not sure whether this is normal.



code from :commit 1113eb6b98cb8224f8c3e407435923de7415ca3f



Step1:MON=1 OSD=6 ./vstart.sh -l -n -x

Step2:[root@localhost src]# ./ceph osd pool create newpool 200 200

Step3:[root@localhost src]# ./ceph osd getcrushmap -o crush.out

Step4:[root@localhost src]# ./crushtool -d crush.out -o crush.txt

Step5:[root@localhost src]# vi crush.txt //modify crush map

===

# buckets
host localhost {
 id -2  # do not change unnecessarily
 # weight 18.000
 alg straw2
 hash 0 # rjenkins1
 item osd.0 weight 2.000
 item osd.1 weight 2.000
 item osd.2 weight 3.000
 item osd.3 weight 3.000
 item osd.4 weight 4.000
 item osd.5 weight 4.000
}
root default {
 id -1  # do not change unnecessarily
 # weight 18.000
 alg straw
 hash 0 # rjenkins1
 item localhost weight 6.000
}

===

Step6:[root@localhost src]# ./crushtool -c crush.txt -o crush.out

Step7:[root@localhost src]# ./crushtool -i crush.out --test --min-x 1
--max-x 1000 --num-rep 3  --show-statistics --show-mappings straw.old

Step8:[root@localhost src]# ./crushtool -i crush.out --reweight-item
osd.5 0 -o crush.out.new

Step9:[root@localhost src]# ./crushtool -i crush.out.new --test
--min-x 1 --max-x 1000 --num-rep 3  --show-statistics --show-mappings
straw.new

Step9:[root@localhost src]# diff -y --suppress-common-lines straw.old
straw.new straw.diff



In straw.diff we can find pg movements like:

oldnew

CRUSH rule 0 x 453 [5,2,1]CRUSH rule 0 x 453 [0,2,4]
CRUSH rule 0 x 628 [3,0,5]CRUSH rule 0 x 628 [3,0,2]
CRUSH rule 0 x 629 [3,1,0]CRUSH rule 0 x 629 [3,5,1]
CRUSH rule 0 x 759 [5,2,1]CRUSH rule 0 x 759 [0,4,3]



Conclusion:

If the osd.5 is the primary osd,or second osd in a pg, then other osd
behind osd.5 are still possible to be switched out,Is this what straw2
really want achieve?
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Transitioning Ceph from Autotools to CMake

2015-08-03 Thread John Spray
On Sat, Aug 1, 2015 at 8:24 PM, Orit Wasserman owass...@redhat.com wrote:


 3. no vstart.sh , starting working on this too but have less progress
 here. At the moment in order to use vstart I copy the exe and libs to
 src dir.

I just started playing with CMake on Friday, adding some missing cephfs
bits.  I was going to fix (3) as well, but I don't want to duplicate work
-- do you have an existing branch at all?  Presumably this will mostly be a
case of adding appropriate prefixes to commands.

Cheers,
John
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph-devel

2015-08-03 Thread 徐龙
subscribe ceph-devel
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: systemd status

2015-08-03 Thread Owen Synge


On 07/29/2015 04:08 PM, Alex Elsayed wrote:
 Sage Weil wrote:
 
 On Wed, 29 Jul 2015, Alex Elsayed wrote:
 Travis Rhoden wrote:

 On Tue, Jul 28, 2015 at 12:13 PM, Sage Weil sw...@redhat.com wrote:
 Hey,

 I've finally had some time to play with the systemd integration branch
 on
 fedora 22.  It's in wip-systemd and my current list of issues
 includes:

 - after mon creation ceph-create-keys isn't run automagically
   - Personally I kind of hate how it was always run on mon startup and
   not
 just during cluster creation so I wouldn't mind *so* much if this
 became an explicit step, maybe triggered by ceph-deploy, after mon
 create.

 I would be happy to see this become an explicit step as well.  We
 could make it conditional such that ceph-deploy only runs it if we are
 dealing with systemd, but I think re-running ceph-create-keys is
 always safe.  It just aborts if
 /etc/ceph/{cluster}.client.admin.keyring is already present.

 Another option is to have the ceph-mon@.service have a Wants= and After=
 on ceph-create-keys@.service, which has a
 ConditionPathExists=!/path/to/key/from/templated/%I

 With that, it would only run ceph-create-keys if the keys do not exist
 already - otherwise, it'd be skipped-as-successful.

 This sounds promising!

 - udev's attempt to trigger ceph-disk isn't working for me.  the osd
 service gets started but the mount isn't present and it fails to
 start. I'm a systemd noob and haven't sorted out how to get udev to
 log something
 meaningful to debug it.  Perhaps we should merge in the udev +
 systemd revamp patches here too...

 Personally, my opinion is that ceph-disk is doing too many things at
 once, and thus fits very poorly into the systemd architecture...

 I mean, it tries to partition, format, mount, introspect the filesystem
 inside, and move the mount, depending on what the initial state was.

 There is a series from David Disseldorp[1] that fixes much of this, by
 doing most of these steps in short-lived systemd tasks (instead of a
 complicated slow ceph-disk invocation directly from udev, which breaks
 udev).

 Now, part of the issue is that the final mountpoint depends on data
 inside the filesystem - OSD id, etc. To me, that seems... mildly absurd
 at least.

 If the _mountpoint_ was only dependent on the partuuid, and the ceph OSD
 self-identified from the contents of the path it's passed, that would
 simplify things immensely IMO when it comes to systemd integration
 because the mount logic wouldn't need any hokey double-mounting, and
 could likely use the systemd mount machinery much more easily - thus
 avoiding race issues like the above.

 Hmm.  Well, we could name the mount point with the uuid and symlink the
 osd id to that.  We could also do something sneaky like embed the osd id
 in the least significant bits of the uuid, but that throws away a lot of
 entropy and doesn't capture the cluster name (which also needs to be known
 before mount).
 
 Does it?
 
 If the mount point is (say) /var/ceph/$UUID, and ceph-osd can take a --
 datadir parameter from which it _reads_ the cluster and ID if they aren't 
 passed on the command line, I think that'd resolve the issue rather tidily 
 _without_ requring that be known prior to mount.
 
 And if I understand correctly, that data is _already in there_ for ceph-disk 
 to mount it in the final location - it's just shuffling around who reads 
 it.
 
 If the mounting and binding to the final location is done in a systemd job
 identified by the uuid, it seems like systemd would effectively handle the
 mutual exclusion and avoid races?
 
 What I object to is the idea of a final location that depends on the 
 contents of the filesystem - it's bass-ackwards IMO.

As I understand it this discussion is about:

systemctl start ceph-osd@12

Vs:

systemctl start ceph-osd@354a1e62-6f35-4b74-b633-3a8ac302cd77

I think you have a very sound argument that 12 is not unambiguous to
cluster name as 2 different 12 OSD's. Personally I do not think the
complexity of using ceph-disk is too important, as we can improve this
later.

I also worry you are at the same time not considering just how ugly
having to type UUID's without cut and paste.

Can we square the circle and get systemd plus some helper scripts to
overcome the requirement that UUID's _have_ to be used.

To me the perfect end result would be that system admins can use both
UUID, and ID to describe the service they wish to start and stop, we can
unambiguously start and stop different clusters OSD's and not _have_to
type much when their is no ambiguity.

Best regards

Owen




--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Tech Talk Today!

2015-08-03 Thread Goncalo Borges

Hi Patrick...

Do you think it is possible to make the talk / slides available? The 
link is still not active in the ceph-tech-talks URL


Cheers
Goncalo

On 07/31/2015 01:08 AM, Patrick McGarry wrote:

Hey cephers,

Just sending a friendly reminder that our online CephFS Tech Talk is
happening today at 13:00 EDT (17:00 UTC). Please stop by and hear a
technical deep dive on CephFS and ask any questions you might have.
Thanks!

http://ceph.com/ceph-tech-talks/

direct link to the video conference:  https://bluejeans.com/172084437/browser





--
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


ceph branch status

2015-08-03 Thread ceph branch robot
-- All Branches --

Adam Crume adamcr...@gmail.com
2014-12-01 20:45:58 -0800   wip-doc-rbd-replay

Alfredo Deza ad...@redhat.com
2015-03-23 16:39:48 -0400   wip-11212
2015-03-25 10:10:43 -0400   wip-11065
2015-07-01 08:34:15 -0400   wip-12037

Alfredo Deza alfredo.d...@inktank.com
2014-07-08 13:58:35 -0400   wip-8679
2014-09-04 13:58:14 -0400   wip-8366
2014-10-13 11:10:10 -0400   wip-9730

Boris Ranto bra...@redhat.com
2015-04-13 13:51:32 +0200   wip-fix-ceph-dencoder-build
2015-04-14 13:51:49 +0200   wip-fix-ceph-dencoder-build-master
2015-06-23 15:29:45 +0200   wip-user-rebase
2015-07-10 12:34:33 +0200   wip-bash-completion
2015-07-15 18:21:11 +0200   wip-selinux-policy
2015-07-30 13:47:47 +0200   wip-selinux-policy-no-user

Chendi.Xue chendi@intel.com
2015-06-16 14:39:42 +0800   wip-blkin

Chi Xinze xmdx...@gmail.com
2015-05-15 21:47:44 +   XinzeChi-wip-ec-read

Dan Mick dan.m...@inktank.com
2013-07-16 23:00:06 -0700   wip-5634

Danny Al-Gaaf danny.al-g...@bisect.de
2015-04-23 16:32:00 +0200   wip-da-SCA-20150421
2015-04-23 17:18:57 +0200   wip-nosetests
2015-04-23 18:20:16 +0200   wip-unify-num_objects_degraded
2015-07-17 10:50:46 +0200   wip-da-SCA-20150601

David Zafman dzaf...@redhat.com
2014-08-29 10:41:23 -0700   wip-libcommon-rebase
2015-04-24 13:14:23 -0700   wip-cot-giant
2015-06-02 13:46:23 -0700   wip-11511
2015-07-20 17:48:15 -0700   wip-12387
2015-07-22 08:12:00 -0700   wip-zafman-testing
2015-07-23 17:02:58 -0700   wip-12437
2015-07-31 19:54:24 -0700   wip-12000-12200

Dongmao Zhang deanracc...@gmail.com
2014-11-14 19:14:34 +0800   thesues-master

Greg Farnum gfar...@redhat.com
2015-04-29 21:44:11 -0700   wip-init-names
2015-06-11 18:22:55 -0700   greg-fs-testing
2015-07-16 09:28:24 -0700   hammer-12297

Greg Farnum g...@inktank.com
2014-10-23 13:33:44 -0700   wip-forward-scrub

Gregory Meno gm...@redhat.com
2015-02-25 17:30:33 -0800   wip-fix-typo-troubleshooting

Guang G Yang ygu...@renownedground.corp.gq1.yahoo.com
2015-06-26 20:31:44 +   wip-ec-readall
2015-07-23 16:13:19 +   wip-12316

Guang Yang ygu...@yahoo-inc.com
2014-08-08 10:41:12 +   wip-guangyy-pg-splitting
2014-09-25 00:47:46 +   wip-9008
2014-09-30 10:36:39 +   guangyy-wip-9614

Haomai Wang haomaiw...@gmail.com
2014-07-27 13:37:49 +0800   wip-flush-set
2015-04-20 00:47:59 +0800   update-organization
2015-04-20 00:48:42 +0800   update-organization-1
2015-07-21 19:33:56 +0800   fio-objectstore

Ilya Dryomov ilya.dryo...@inktank.com
2014-09-05 16:15:10 +0400   wip-rbd-notify-errors

James Page james.p...@ubuntu.com
2013-02-27 22:50:38 +   wip-debhelper-8

Jason Dillaman dilla...@redhat.com
2015-05-22 00:52:20 -0400   wip-11625
2015-06-10 12:02:16 -0400   wip-11770-hammer
2015-06-22 11:17:56 -0400   wip-12109-hammer
2015-07-17 14:17:04 -0400   wip-12384-hammer
2015-07-19 13:44:16 -0400   wip-12237-hammer
2015-07-24 09:59:56 -0400   wip-11769-firefly
2015-07-28 16:36:35 -0400   wip-12345-hammer
2015-07-29 13:29:54 -0400   wip-12235-hammer
2015-07-29 13:41:40 -0400   wip-12236-hammer
2015-07-30 11:39:22 -0400   wip-11286
2015-07-30 11:42:58 -0400   wip-11287
2015-07-31 13:55:23 -0400   wip-12383-next

Jenkins jenk...@inktank.com
2014-07-29 05:24:39 -0700   wip-nhm-hang
2015-02-02 10:35:28 -0800   wip-sam-v0.92
2015-07-14 13:10:32 -0700   last
2015-07-29 12:55:24 -0700   rhcs-v0.94.1-ubuntu

Joao Eduardo Luis jec...@gmail.com
2014-09-10 09:39:23 +0100   wip-leveldb-get.dumpling

Joao Eduardo Luis joao.l...@gmail.com
2014-07-22 15:41:42 +0100   wip-leveldb-misc

Joao Eduardo Luis joao.l...@inktank.com
2014-09-02 17:19:52 +0100   wip-leveldb-get
2014-10-17 16:20:11 +0100   wip-paxos-fix
2014-10-21 21:32:46 +0100   wip-9675.dumpling
2015-07-27 21:56:42 +0100   wip-11470.hammer
2015-07-27 22:01:36 +0100   wip-11786.hammer

Joao Eduardo Luis j...@redhat.com
2014-11-17 16:43:53 +   wip-mon-osdmap-cleanup
2014-12-15 16:18:56 +   wip-giant-mon-backports
2014-12-17 17:13:57 +   wip-mon-backports.firefly
2014-12-17 23:15:10 +   wip-mon-sync-fix.dumpling
2015-01-07 23:01:00 +   wip-mon-blackhole-mlog-0.87.7
2015-01-10 02:40:42 +   wip-dho-joao
2015-01-10 02:46:31 +   wip-mon-paxos-fix
2015-01-26 13:00:09 +   wip-mon-datahealth-fix
2015-02-04 22:36:14 +   wip-10643
2015-07-27 21:53:38 +0100   wip-11470.firefly
2015-07-27 22:04:27 +0100   wip-11786.firefly

Joao 

Fedora 22 systemd and ceph-deploy

2015-08-03 Thread Owen Synge
Dear all,

My plan is to make a fedora22-systemd branch. I will leave fedora 20 as
sysvinit.

Ok just done my first proper install of systemd ceph branch on fedora22.

I can confirm most of the issues.

I am giving up for the day, but so far applying SUSE/opensuse code to
Fedora ceph-deploy code in ceph-deploy has helped a lot.

cp /usr/lib/python2.7/site-packages/ceph_deploy/hosts/suse/mon/* \
  /usr/lib/python2.7/site-packages/ceph_deploy/hosts/fedora/mon/

(Running on suse version patched release)

It can now set up the mon daemons correctly its self.

I will look into the udev rules, tomorrow morning, and remove some more
fedora hard coding to sysvinit.

Best regards

Owen


On 07/28/2015 09:13 PM, Sage Weil wrote:
 Hey,
 
 I've finally had some time to play with the systemd integration branch on 
 fedora 22.  It's in wip-systemd and my current list of issues includes:
 
 - after mon creation ceph-create-keys isn't run automagically
   - Personally I kind of hate how it was always run on mon startup and not 
 just during cluster creation so I wouldn't mind *so* much if this became 
 an explicit step, maybe triggered by ceph-deploy, after mon create.
 
 - udev's attempt to trigger ceph-disk isn't working for me.  the osd 
 service gets started but the mount isn't present and it fails to start.  
 I'm a systemd noob and haven't sorted out how to get udev to log something 
 meaningful to debug it.  Perhaps we should merge in the udev + 
 systemd revamp patches here too...
 
 - ceph-detect-init is only recently unbroken in master for fedora 22.
 
 - ceph-deploy doesn't know that fedora should be systemd yet.
 
 - ceph-deploy has a wip-systemd branch with a few things so far:
   - on mon create, we unconditionally systemctl enable ceph.target.  
 i think osd create and mds create and rgw create should do the same thing, 
 since the ceph.target is a catch-all bucket for any ceph service, and i 
 don't think we want to enable it on install?
   - rgw create and mds create don't work yet
   - osd create doesn't enable ceph.target
 
 - I'm guessing my ceph.spec changes to install teh systemd unit files 
 aren't quite right... please review!  The gitbuilder turnaround is so slow 
 it's hard to iterate and I don't really know what I'm doing here.
 
 Owen, I'd like to get this just a tad bit more functional and then merge 
 ASAP, then up any issues in the weeks leading up to infernalis.  What say 
 ye?
 
 sage
 
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

-- 
SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG
Nürnberg)

Maxfeldstraße 5

90409 Nürnberg

Germany
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fedora 22 systemd and ceph-deploy

2015-08-03 Thread Sage Weil
On Mon, 3 Aug 2015, Owen Synge wrote:
 Dear all,
 
 My plan is to make a fedora22-systemd branch. I will leave fedora 20 as
 sysvinit.
 
 Ok just done my first proper install of systemd ceph branch on fedora22.
 
 I can confirm most of the issues.
 
 I am giving up for the day, but so far applying SUSE/opensuse code to
 Fedora ceph-deploy code in ceph-deploy has helped a lot.
 
 cp /usr/lib/python2.7/site-packages/ceph_deploy/hosts/suse/mon/* \
   /usr/lib/python2.7/site-packages/ceph_deploy/hosts/fedora/mon/
 
 (Running on suse version patched release)
 
 It can now set up the mon daemons correctly its self.
 
 I will look into the udev rules, tomorrow morning, and remove some more
 fedora hard coding to sysvinit.

There is a wip-systemd branch ceph-deploy that has enough ceph-deploy 
changes for me to successfully do the deployment of mon, osd, mds, and 
rgw.  The main thing it doesn't do is figure out which version of Ceph 
you're installing to decide whether to do systemd (post-hammer) or 
sysvinit (hammer and earlier).  That's going to be annoying, I'm afraid...

I suspect what we really want to do is abstract out the systemd behavior 
into something that the distros opt in to so that we aren't duplicating 
code across the suse, centos, rhel, and fedora host types...

sage


 
 Best regards
 
 Owen
 
 
 On 07/28/2015 09:13 PM, Sage Weil wrote:
  Hey,
  
  I've finally had some time to play with the systemd integration branch on 
  fedora 22.  It's in wip-systemd and my current list of issues includes:
  
  - after mon creation ceph-create-keys isn't run automagically
- Personally I kind of hate how it was always run on mon startup and not 
  just during cluster creation so I wouldn't mind *so* much if this became 
  an explicit step, maybe triggered by ceph-deploy, after mon create.
  
  - udev's attempt to trigger ceph-disk isn't working for me.  the osd 
  service gets started but the mount isn't present and it fails to start.  
  I'm a systemd noob and haven't sorted out how to get udev to log something 
  meaningful to debug it.  Perhaps we should merge in the udev + 
  systemd revamp patches here too...
  
  - ceph-detect-init is only recently unbroken in master for fedora 22.
  
  - ceph-deploy doesn't know that fedora should be systemd yet.
  
  - ceph-deploy has a wip-systemd branch with a few things so far:
- on mon create, we unconditionally systemctl enable ceph.target.  
  i think osd create and mds create and rgw create should do the same thing, 
  since the ceph.target is a catch-all bucket for any ceph service, and i 
  don't think we want to enable it on install?
- rgw create and mds create don't work yet
- osd create doesn't enable ceph.target
  
  - I'm guessing my ceph.spec changes to install teh systemd unit files 
  aren't quite right... please review!  The gitbuilder turnaround is so slow 
  it's hard to iterate and I don't really know what I'm doing here.
  
  Owen, I'd like to get this just a tad bit more functional and then merge 
  ASAP, then up any issues in the weeks leading up to infernalis.  What say 
  ye?
  
  sage
  
  --
  To unsubscribe from this list: send the line unsubscribe ceph-devel in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
 
 -- 
 SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
 21284 (AG
 Nürnberg)
 
 Maxfeldstraße 5
 
 90409 Nürnberg
 
 Germany
 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 

Re: CephFS and the next hammer release v0.94.3

2015-08-03 Thread Loic Dachary
Hi Greg,

I assume the file handle reference counting is about 
http://tracker.ceph.com/issues/12088 which is backported as described at 
http://tracker.ceph.com/issues/12319. It was indeed somewhat problematic and 
required two pull requests: https://github.com/ceph/ceph/pull/5222 (authored by 
Yan Zheng) and https://github.com/ceph/ceph/pull/5427 (merged by Yan Zheng).

Cheers

On 03/08/2015 18:01, Gregory Farnum wrote:
 On Mon, Aug 3, 2015 at 6:43 PM Loic Dachary l...@dachary.org wrote:

 Hi Greg,

 The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
 passed the fs suite (http://tracker.ceph.com/issues/11990#fs). Do you think 
 it is ready for QE to start their own round of testing ?
 
 I'm on vacation right now, but the only thing I see there that might
 be iffy is the backport of the file handle reference counting. As long
 as that is all good (Zheng?) things look fine to me.
 -Greg
 

 Cheers

 P.S. http://tracker.ceph.com/issues/11990#Release-information has direct 
 links to the pull requests merged into hammer since v0.94.2 in case you need 
 more context about one of them.

 --
 Loïc Dachary, Artisan Logiciel Libre

















-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Cluster Network Public Network w.r.t XIO ?

2015-08-03 Thread kernel neophyte
Thanks Marcus  Matt. I will take a look at it this week.

-Neo

On Fri, Jul 31, 2015 at 6:07 PM, Sage Weil s...@newdream.net wrote:
 On Fri, 31 Jul 2015, Marcus Watts wrote:
 I promised information on my copy of wip-address.  It's not in as
 good a shape as I promised-- got pulled onto something else, so
 the last commit I made left it not building.  And it will definitely
 need more past that.

 So here's where it lives right now,
 repo
 g...@github.com:linuxbox2/linuxbox-ceph.git
 branch
 xio-firefly-mpc1

 This is definitely on my todo list, and I plan to work on it next week.

 I also plan to push a copy of this or something much like it, as a wip
 branch to the main ceph git, as soon as practical.

 As a first step I'd rebase on master as any cleanup work done before
 that'll probably get shredded by the rebase conflicts.  This one is
 tedious unfortunately since every encode needs that features arg.  :(

 Thanks, Marcus!
 sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: A issue in straw2 , maybe it`s not a problem

2015-08-03 Thread Sage Weil
On Mon, 3 Aug 2015, chen kael wrote:
 Hi,everyone
 
 recently,I want to migrate our on-line cluster from straw to
 straw2,because if I add or remove some osds,too much objects are to be
 replaced than it should be.
 
 I have a confusing in using straw2, because straw2 should have a
 character that if an item's weight was adjusted up or down, mappings
 would either
 move to or from the adjusted item, but never between other unmodified
 items in the bucket.
 
 However after I do the test below , I still find pgs move from
 unmodified items,I am not sure whether this is normal.

It is normal.  It's not really straw2's fault (it's doing what it 
should) but an artifact of the way CRUSH works.  See below...

 oldnew
 CRUSH rule 0 x 759 [5,2,1]CRUSH rule 0 x 759 [0,4,3]

What CRUSH actually does for the old map is:

 - with r=0 we pick 5 (ok)
 - with r=1 we get 5 (dup, try again)
 - with r=2 we get 5 (dup, try again)
 - with r=3 we get 2 (ok)
 - with r=4 we get 5 (dup, try again)
 - with r=5 we get 1 (ok)
 - [5,2,1]

When we do the new map, it's luckier:

 - with r=0 we pick 0 (ok)   (was 5 before)
 - with r=1 we pick 4 (ok)   (was 5 before)
 - with r=2 we pick 3 (ok)   (was 5 before)
 - [0,4,3]

For any given draw (r= value), we will follow the rule that item either 
stays the same or switches to or from the reweighted item (5).  But for 
anything later in the sequence we may be at high values of r because we've 
had to retry (due to dups, or OSDs being marked out), and any change 
earlier in the sequence may mean that we have a different number of 
retries.  Here, positions 2 and 3 were r=3 and r=5 because of dups, but 
those don't happen with the new map and those positions are r=1 and r=2.  

Note that straw2's promise remains true, though: for r=0, 1, and 2, the 
value switches away from 5 but no non-5 value changes.  If we had 
num_rep=6, we would have seen the new map still choose 2 for r=3 
and 1 for r=5 (although they would have landed in different positions).

 Conclusion:
 
 If the osd.5 is the primary osd,or second osd in a pg, then other osd
 behind osd.5 are still possible to be switched out,Is this what straw2
 really want achieve?

Correct.  It's not ideal, but I don't think it's avoidable, because it is 
not an independent decision process for every position of the sequence... 
our choice is constrained to items that we haven't chosen before.

sage
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: rbd and the next hammer release v0.94.3

2015-08-03 Thread Josh Durgin

On 08/03/2015 08:44 AM, Loic Dachary wrote:

Hi Josh,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the rbd suite (http://tracker.ceph.com/issues/11990#rbd). Do you think 
it is ready for QE to start their own round of testing ?


Looks ready to me. Thanks!

Josh
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Odd QA Test Running

2015-08-03 Thread Haomai Wang
I found 
https://github.com/ceph/ceph-qa-suite/blob/master/erasure-code/ec-rados-plugin%3Dshec-k%3D4-m%3D3-c%3D2.yaml
has override section and will override user's enable experimental
unrecoverable data corrupting features config. So my jobs are
corrupted.

I made a PR(https://github.com/ceph/ceph-qa-suite/pull/518) and hope
fix this point.

On Fri, Jul 31, 2015 at 5:50 PM, Haomai Wang haomaiw...@gmail.com wrote:
 Hi all,

 I  ran a test 
 suite(http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/)
 and found the failed jobs are failed by 2015-07-29 10:52:35.313197
 7f16ae655780 -1 unrecognized ms_type 'async'

 Then I found the failed jobs(like
 http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991540/)
 lack of “enable experimental unrecoverable data corrupting features:
 ms-type-async”.

 Other successful jobs(like
 http://pulpito.ceph.com/haomai-2015-07-29_11:40:40-rados-master-distro-basic-multi/991517/)
 can find enable experimental unrecoverable data corrupting features:
 ms-type-async in yaml.

 So that's mean the same schedule suite will generate the different
 yaml file? Is there something tricky?

 --

 Best Regards,

 Wheat



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Andras Pataki
Done: http://tracker.ceph.com/issues/12577
BTW, I¹m using the latest release 0.94.2 on all machines.

Andras


On 8/3/15, 3:38 PM, Samuel Just sj...@redhat.com wrote:

Hrm, that's certainly supposed to work.  Can you make a bug?  Be sure
to note what version you are running (output of ceph-osd -v).
-Sam

On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki
apat...@simonsfoundation.org wrote:
 Summary: I am having problems with inconsistent PG's that the 'ceph pg
 repair' command does not fix.  Below are the details.  Any help would be
 appreciated.

 # Find the inconsistent PG's
 ~# ceph pg dump | grep inconsistent
 dumped all in format plain
 2.439 42080 00 017279507143 31033103 active+clean+inconsistent2015-08-03
 14:49:17.29288477323'2250145 77480:890566 [78,54]78 [78,54]78
 77323'22501452015-08-03 14:49:17.29253877323'2250145 2015-08-03
 14:49:17.292538
 2.8b9 40830 00 016669590823 30513051 active+clean+inconsistent2015-08-03
 14:46:05.14006377323'2249886 77473:897325 [7,72]7 [7,72]7
 77323'22498862015-08-03 14:22:47.83406377323'2249886 2015-08-03
 14:22:47.834063

 # Look at the first one:
 ~# ceph pg deep-scrub 2.439
 instructing pg 2.439 on osd.78 to deep-scrub

 # The logs of osd.78 show:
 2015-08-03 15:16:34.409738 7f09ec04a700  0 log_channel(cluster) log
[INF] :
 2.439 deep-scrub starts
 2015-08-03 15:16:51.364229 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 deep-scrub 2.439 b029e439/1022d93.0f0c/head//2 on disk data
digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:16:52.763977 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 2.439 deep-scrub 1 errors

 # Finding the object in question:
 ~# find ~ceph/osd/ceph-78/current/2.439_head -name
1022d93.0f0c* -ls
 21510412310 4100 -rw-r--r--   1 root root  4194304 Jun 30 17:09
 
/var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2
 ~# md5sum
 
/var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2
 4e4523244deec051cfe53dd48489a5db
 
/var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2

 # The object on the backup osd:
 ~# find ~ceph/osd/ceph-54/current/2.439_head -name
1022d93.0f0c* -ls
 6442614367 4100 -rw-r--r--   1 root root  4194304 Jun 30 17:09
 
/var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2
 ~# md5sum
 
/var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2
 4e4523244deec051cfe53dd48489a5db
 
/var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1000
0022d93.0f0c__head_B029E439__2

 # They don't seem to be different.
 # When I try repair:
 ~# ceph pg repair 2.439
 instructing pg 2.439 on osd.78 to repair

 # The osd.78 logs show:
 2015-08-03 15:19:21.775933 7f09ec04a700  0 log_channel(cluster) log
[INF] :
 2.439 repair starts
 2015-08-03 15:19:38.088673 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 repair 2.439 b029e439/1022d93.0f0c/head//2 on disk data digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:19:39.958019 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 2.439 repair 1 errors, 0 fixed
 2015-08-03 15:19:39.962406 7f09ec04a700  0 log_channel(cluster) log
[INF] :
 2.439 deep-scrub starts
 2015-08-03 15:19:56.510874 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 deep-scrub 2.439 b029e439/1022d93.0f0c/head//2 on disk data
digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:19:58.348083 7f09ec04a700 -1 log_channel(cluster) log
[ERR] :
 2.439 deep-scrub 1 errors

 The inconsistency is not fixed.  Any hints of what should be done next?
 I have tried  a few things:
  * Stop the primary osd, remove the object from the filesystem, restart
the
 OSD and issue a repair.  It didn't work - it sais that one object is
 missing, but did not copy it from the backup.
  * I tried the same on the backup (remove the file) - it also didn't get
 copied back from the primary in a repair.

 Any help would be appreciated.

 Thanks,

 Andras
 apat...@simonsfoundation.org


 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


C++11 and librados C++

2015-08-03 Thread Samuel Just
It seems like it's about time for us to make the jump to C++11.  This
is probably going to have an impact on users of the librados C++
bindings.  It seems like such users would have to recompile code using
the librados C++ libraries after upgrading the librados library
version.  Is that reasonable?  What do people expect here?
-Sam
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Ceph Tech Talk Today!

2015-08-03 Thread Patrick McGarry
Yes! Sorry I forgot to publish these since I was still fighting
technical troubles from CDS (which now seem to be mostly resolved).
The Ceph Tech Talk should be up, and the CDS videos should be up
within the next day or two. Thanks for reminding me.


Best Regards,

Patrick McGarry
Director Ceph Community || Red Hat
http://ceph.com  ||  http://community.redhat.com
@scuttlemonkey || @ceph


On Mon, Aug 3, 2015 at 2:50 AM, Goncalo Borges
gonc...@physics.usyd.edu.au wrote:
 Hi Patrick...

 Do you think it is possible to make the talk / slides available? The link is
 still not active in the ceph-tech-talks URL

 Cheers
 Goncalo

 On 07/31/2015 01:08 AM, Patrick McGarry wrote:

 Hey cephers,

 Just sending a friendly reminder that our online CephFS Tech Talk is
 happening today at 13:00 EDT (17:00 UTC). Please stop by and hear a
 technical deep dive on CephFS and ask any questions you might have.
 Thanks!

 http://ceph.com/ceph-tech-talks/

 direct link to the video conference:
 https://bluejeans.com/172084437/browser




 --
 Goncalo Borges
 Research Computing
 ARC Centre of Excellence for Particle Physics at the Terascale
 School of Physics A28 | University of Sydney, NSW  2006
 T: +61 2 93511937


 --
 To unsubscribe from this list: send the line unsubscribe ceph-devel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Samuel Just
Hrm, that's certainly supposed to work.  Can you make a bug?  Be sure
to note what version you are running (output of ceph-osd -v).
-Sam

On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki
apat...@simonsfoundation.org wrote:
 Summary: I am having problems with inconsistent PG's that the 'ceph pg
 repair' command does not fix.  Below are the details.  Any help would be
 appreciated.

 # Find the inconsistent PG's
 ~# ceph pg dump | grep inconsistent
 dumped all in format plain
 2.439 42080 00 017279507143 31033103 active+clean+inconsistent2015-08-03
 14:49:17.29288477323'2250145 77480:890566 [78,54]78 [78,54]78
 77323'22501452015-08-03 14:49:17.29253877323'2250145 2015-08-03
 14:49:17.292538
 2.8b9 40830 00 016669590823 30513051 active+clean+inconsistent2015-08-03
 14:46:05.14006377323'2249886 77473:897325 [7,72]7 [7,72]7
 77323'22498862015-08-03 14:22:47.83406377323'2249886 2015-08-03
 14:22:47.834063

 # Look at the first one:
 ~# ceph pg deep-scrub 2.439
 instructing pg 2.439 on osd.78 to deep-scrub

 # The logs of osd.78 show:
 2015-08-03 15:16:34.409738 7f09ec04a700  0 log_channel(cluster) log [INF] :
 2.439 deep-scrub starts
 2015-08-03 15:16:51.364229 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 deep-scrub 2.439 b029e439/1022d93.0f0c/head//2 on disk data digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:16:52.763977 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 2.439 deep-scrub 1 errors

 # Finding the object in question:
 ~# find ~ceph/osd/ceph-78/current/2.439_head -name 1022d93.0f0c* -ls
 21510412310 4100 -rw-r--r--   1 root root  4194304 Jun 30 17:09
 /var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2
 ~# md5sum
 /var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2
 4e4523244deec051cfe53dd48489a5db
 /var/lib/ceph/osd/ceph-78/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2

 # The object on the backup osd:
 ~# find ~ceph/osd/ceph-54/current/2.439_head -name 1022d93.0f0c* -ls
 6442614367 4100 -rw-r--r--   1 root root  4194304 Jun 30 17:09
 /var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2
 ~# md5sum
 /var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2
 4e4523244deec051cfe53dd48489a5db
 /var/lib/ceph/osd/ceph-54/current/2.439_head/DIR_9/DIR_3/DIR_4/DIR_E/1022d93.0f0c__head_B029E439__2

 # They don't seem to be different.
 # When I try repair:
 ~# ceph pg repair 2.439
 instructing pg 2.439 on osd.78 to repair

 # The osd.78 logs show:
 2015-08-03 15:19:21.775933 7f09ec04a700  0 log_channel(cluster) log [INF] :
 2.439 repair starts
 2015-08-03 15:19:38.088673 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 repair 2.439 b029e439/1022d93.0f0c/head//2 on disk data digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:19:39.958019 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 2.439 repair 1 errors, 0 fixed
 2015-08-03 15:19:39.962406 7f09ec04a700  0 log_channel(cluster) log [INF] :
 2.439 deep-scrub starts
 2015-08-03 15:19:56.510874 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 deep-scrub 2.439 b029e439/1022d93.0f0c/head//2 on disk data digest
 0xb3d78a6e != 0xa3944ad0
 2015-08-03 15:19:58.348083 7f09ec04a700 -1 log_channel(cluster) log [ERR] :
 2.439 deep-scrub 1 errors

 The inconsistency is not fixed.  Any hints of what should be done next?
 I have tried  a few things:
  * Stop the primary osd, remove the object from the filesystem, restart the
 OSD and issue a repair.  It didn't work - it sais that one object is
 missing, but did not copy it from the backup.
  * I tried the same on the backup (remove the file) - it also didn't get
 copied back from the primary in a repair.

 Any help would be appreciated.

 Thanks,

 Andras
 apat...@simonsfoundation.org


 ___
 ceph-users mailing list
 ceph-us...@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
To unsubscribe from this list: send the line unsubscribe ceph-devel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html