Stable releases preparation temporarily stalled

2016-01-06 Thread Loic Dachary
Hi,

The stable releases (hammer, infernalis) did not make progress in the past few 
weeks because we can't run tests.

Before xmas the following happened:

* the sepia lab was migrated and we discovered the OpenStack teuthology backend 
can't run without it (that was a problem during a few days only)
* there are OpenStack specific failures in each teuthology suites and it is non 
trivial to separate them from genuine backport errors
* the make check bot went down (it was partially running on my private hardware)

If we just wait, I'm not sure when we will be able to resume our work because:

* the sepia lab is back but has less horsepower than it did
* not all of us have access to the sepia lab
* the make check bot is being worked on by the infrastructure team but it is 
low priority and it may take weeks before it's back online
* the ceph-qa-suite errors that are OpenStack specific are low priority and it 
may never be fixed

I think we should rely on the sepia lab for testing for the foreseeable future 
and wait for the make check bot to be back. Tests will take a long time to run, 
but we've been able to work with a one week delay before so it's not a blocker.

Although fixing OpenStack specific errors would allow us to use the teuthology 
OpenStack backend (I will fix the last error left in the rados suite), it is 
unrealistic to set that as a requirement to run tests: we don't have the 
workforce nor the skills to do that. Hopefully, some time in the future, Ceph 
developers will  use ceph-qa-suite on OpenStack as part of the development 
workflow. But right now running ceph-qa-suite on OpenStack suites is outside of 
the development workflow and in a state of continuous regression which is 
inconvenient for us because we need something stable to compare the runs from 
the integration branch.

Fixing the make check bot is a two part problem. Each failed run must be looked 
at to chase false negatives (continuous integration with false negatives is a 
plague), which I did in the past year on a daily basis and I'm happy to keep 
doing. Before xmas break the bot running at jenkins.ceph.com sent over 90% 
false negative, primarily because it was trying to run on unsupported operating 
systems and it was stopped until this is fixed. It also appears that the 
machine running the bot is not re-imaged after each test, meaning a bugous run 
may taint all future tests and create a continuous flow of false negative. 
Addressing these two issues require knowing or learning about the Ceph jenkins 
setup and slave provisioning. This probably is a few days of work, reason why 
the infrastructure team can't resolve that immediately.

If you have alternative creative ideas on how to improve the current situation, 
please speak up :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fixing jenkins builds on pull requests

2015-12-23 Thread Loic Dachary
Hi,

I triaged the jenkins related failures (from #24 to #49):

CentOS 6 not supported:

  https://jenkins.ceph.com/job/ceph-pull-requests/26/console
  https://jenkins.ceph.com/job/ceph-pull-requests/28/console
  https://jenkins.ceph.com/job/ceph-pull-requests/29/console
  https://jenkins.ceph.com/job/ceph-pull-requests/34/console
  https://jenkins.ceph.com/job/ceph-pull-requests/38/console
  https://jenkins.ceph.com/job/ceph-pull-requests/44/console
  https://jenkins.ceph.com/job/ceph-pull-requests/46/console
  https://jenkins.ceph.com/job/ceph-pull-requests/48/console
  https://jenkins.ceph.com/job/ceph-pull-requests/49/console

Ubuntu 12.04 not supported:

  https://jenkins.ceph.com/job/ceph-pull-requests/27/console
  https://jenkins.ceph.com/job/ceph-pull-requests/36/console

Failure to fetch from github

  https://jenkins.ceph.com/job/ceph-pull-requests/35/console

I've not been able to analyze more failures because it looks like only 30 jobs 
are kept. Here is an updated summary:

 * running on unsupported operating systems (CentOS 6, precise and maybe others)
 * leftovers from a previous test (which should be removed when a new slave is 
provisionned for each test)
 * keep the last 300 jobs for forensic analysis (about one week worth)
 * disable reporting to github pull requests until the above are resolved (all 
failures were false negative).

Cheers

On 23/12/2015 10:11, Loic Dachary wrote:
> Hi Alfredo,
> 
> I forgot to mention that the ./run-make-check.sh run currently has no known 
> false negative on CentOS 7. By that I mean that if run on master 100 times, 
> it will succeed 100 times. This is good to debug the jenkins builds on pull 
> requests as we know all problems either come from the infrastructure or the 
> pull request. We do not have to worry about random errors due to race 
> conditions in the tests or things like that.
> 
> I'll keep an eye on the test results and analyse each failure. For now it 
> would be best to disable reporting failures as they are almost entirely false 
> negative and will confuse the contributor. The failures come from:
> 
>  * running on unsupported operating systems (CentOS 6 and maybe others)
>  * leftovers from a previous test (which should be removed when a new slave 
> is provisionned for each test)
> 
> I'll add to this thread when / if I find more.
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: New "make check" job for Ceph pull requests

2015-12-23 Thread Loic Dachary
Hi,

For the record the pending issues that prevent the "make check" job 
(https://jenkins.ceph.com/job/ceph-pull-requests/) from running can be found at 
http://tracker.ceph.com/issues/14172

Cheers

On 23/12/2015 21:05, Alfredo Deza wrote:
> Hi all,
> 
> As of yesterday (Tuesday Dec 22nd) we have the "make check" job
> running within our CI infrastructure, working very similarly as the
> previous check with a few differences:
> 
> * there are no longer comments added to the pull requests
> * notifications of success (or failure) are done inline in the same
> notification box for "This branch has no conflicts with the base
> branch"
> * All members of the Ceph organization can trigger a job with the
> following comment:
> test this please
> 
> Changes to the job should be done following our new process: anyone can open
> a pull request against the "ceph-pull-requests" job that configures/modifies
> it. This process is fairly minimal:
> 
> 1) *Jobs no longer require to make changes in the Jenkins UI*, they
> are rather plain text YAML files that live in the ceph/ceph-build.git
> repository and have a specific structure. Job changes (including
> scripts) are made directly on that repository via pull requests.
> 
> 2) As soon as a PR is merged the changes are automatically pushed to
> Jenkins. Regardless if this is a new or old job. All one needs for a
> new job to appear is a directory with a working YAML file (see links
> at the end on what this means)
> 
> Below, please find a list to resources on how to make changes to a
> Jenkins Job, and examples on how mostly anyone can provide changes:
> 
> * Format and configuration of YAML files are consumed by JJB (Jenkins
> Job builder), full docs are here:
> http://docs.openstack.org/infra/jenkins-job-builder/definition.html
> * Where does the make-check configuration lives?
> https://github.com/ceph/ceph-build/tree/master/ceph-pull-requests
> * Full documentation on Job structure and configuration:
> https://github.com/ceph/ceph-build#ceph-build
> * Everyone has READ permissions on jenkins.ceph.com (you can 'login'
> with your github account), current admin members (WRITE permissions)
> are: ktdreyer, alfredodeza, gregmeno, dmick, zmc, andrewschoen,
> ceph-jenkins, dachary, ldachary
> 
> If you have any questions, we can help and provide guidance and feedback. We
> highly encourage contributors to take ownership on this new tool and make it
> awesome!
> 
> Thanks,
> 
> 
> Alfredo
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


jenkins on ceph pull requests: clarify which Operating System is used

2015-12-23 Thread Loic Dachary
Hi Alfredo,

I see a make check slave currently runs on jessie and I think to remember it 
ran on trusty slaves before. It's a good thing operating systems are mixed but 
there does not seem to be a clear indication about which operating system is 
used. For instance regarding:

https://jenkins.ceph.com/job/ceph-pull-requests/44/

one has to click on the console and know that it shows in the first few lines 
as:

Building remotely on centos6+158.69.78.199 (x86_64 huge centos6 amd64) in 
workspace 

Side note: as CentOS 6 is no longer a supported platform, trying to build on it 
will fail.

Another problem is that chosing an operating system randomly may lead to 
different test results and the inability for the author of the pull request to 
chose repeat the bug because the operating system on which it happens is not 
selected.

Unless there is a know strategy with jenkins to deal with that kind of problem, 
it probably is best to stick to a single Operating System and CentOS 7 would be 
my choice.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


fixing jenkins builds on pull requests

2015-12-23 Thread Loic Dachary
Hi Alfredo,

I forgot to mention that the ./run-make-check.sh run currently has no known 
false negative on CentOS 7. By that I mean that if run on master 100 times, it 
will succeed 100 times. This is good to debug the jenkins builds on pull 
requests as we know all problems either come from the infrastructure or the 
pull request. We do not have to worry about random errors due to race 
conditions in the tests or things like that.

I'll keep an eye on the test results and analyse each failure. For now it would 
be best to disable reporting failures as they are almost entirely false 
negative and will confuse the contributor. The failures come from:

 * running on unsupported operating systems (CentOS 6 and maybe others)
 * leftovers from a previous test (which should be removed when a new slave is 
provisionned for each test)

I'll add to this thread when / if I find more.

Cheers
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Time to move the make check bot to jenkins.ceph.com

2015-12-22 Thread Loic Dachary
Hi,

The make check bot moved to jenkins.ceph.com today and ran it's first 
successfull job. You will no longer see comments from the bot: it will update 
the github status instead, which is less intrusive.

Cheers

On 21/12/2015 11:13, Loic Dachary wrote:
> Hi,
> 
> The make check bot is broken in a way that I can't figure out right now. 
> Maybe now is the time to move it to jenkins.ceph.com ? It should not be more 
> difficult than launching the run-make-check.sh script. It does not need 
> network or root access.
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Time to move the make check bot to jenkins.ceph.com

2015-12-21 Thread Loic Dachary
Hi,

The make check bot is broken in a way that I can't figure out right now. Maybe 
now is the time to move it to jenkins.ceph.com ? It should not be more 
difficult than launching the run-make-check.sh script. It does not need network 
or root access.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


v10.0.1 Contributor credits

2015-12-20 Thread Loic Dachary
Hi Ceph,

Here is a sorted list of authors and organizations who contributed to
v10.0.1, by number of commits or reviews back to v10.0.0. The affiliation of
authors to organizations can be updated by submitting a patch to
https://github.com/ceph/ceph/blob/master/.organizationmap

All commits are reviewed but the number of reviews is a fraction of the
number of commits. More often than not, the reviewer(s) is only
mentioned in the message of the merge although it means that all
associated commits have been reviewed. If you are curious about how it
is done, do checkout the wiki entry at 
http://tracker.ceph.com/projects/ceph/wiki/Ceph_contributors_list_maintenance_guide

Number of lines added and removed, by authors
 1   44020 Jason Dillaman <dilla...@redhat.com>
 2   10457 Sage Weil <sw...@redhat.com>
 33790 Vicente Cheng <freeze.bils...@gmail.com>
 41665 Mykola Golub <mgo...@mirantis.com>
 51390 Greg Farnum <gfar...@redhat.com>
 61061 Loic Dachary <ldach...@redhat.com>
 7 910 David Coles <dco...@gaikai.com>
 8 771 Rohan Mars <c...@rohanmars.com>
 9 591 John Spray <jsp...@redhat.com>
10 449 David Zafman <dzaf...@redhat.com>
11 381 Xinze Chi <xi...@xsky.com>
12 276 Xie Xingguo <xie.xing...@zte.com.cn>
13 252 Piotr Dałek <piotr.da...@ts.fujitsu.com>
14 249 Javier M. Mellid <jmun...@igalia.com>
15 233 MingXin Liu <mingxin@kylin-cloud.com>
16 149 Yan, Zheng <z...@redhat.com>
17 117 Radoslaw Zarzynski <rzarzyn...@mirantis.com>
18 115 Ilya Dryomov <idryo...@redhat.com>
19  98 Shu, Xinxin <xinxin@intel.com>
20  94 Joe Julian <jjul...@io.com>
21  92 Guang Yang <ygu...@yahoo-inc.com>
22  88 Xiaowei Chen <chen.xiao...@h3c.com>
23  68 Ma Jianpeng <jianpeng...@intel.com>
24  67 Josh Durgin <jdur...@redhat.com>
25  63 Nathan Cutler <ncut...@suse.com>
26  54 Yunchuan Wen <yunchuan@kylin-cloud.com>
27  50 Dan van der Ster <daniel.vanders...@cern.ch>
28  46 Jianhui Yuan <zuiwany...@gmail.com>
29  30 Li Wang <li.w...@kylin-cloud.com>
30  28 Haomai Wang <hao...@xsky.com>
31  26 James Page <james.p...@ubuntu.com>
32  22 Ning Yao <yaon...@ruijie.com.cn>
33  18 Yuan Zhou <yuan.z...@intel.com>
34  17 Vasu Kulkarni <vasu.kulka...@gmail.com>
35  16 Yehuda Sadeh <ysade...@redhat.com>
36  16 Jenkins <jenk...@ceph.com>
37  12 Yann Dupont <y...@objoo.org>
38  12 Vikhyat Umrao <vum...@redhat.com>
39  11 Rahul Aggarwal <rahul.1aggar...@gmail.com>
40  11 Brian Felton <bjfel...@gmail.com>
41   9 Xiaoxi Chen <xiaoxi.c...@intel.com>
42   8 Jiaying Ren <mikul...@gmail.com>
43   6 Kefu Chai <kc...@redhat.com>
44   5 Chengyuan Li <cheng...@ebay.com>
45   4 Tobias Suckow <tob...@suckow.biz>
46   4 Orit Wasserman <owass...@redhat.com>
47   4 Jie Wang <jie.w...@kylin-cloud.com>
48   3 Ren Huanwen <ren.huan...@zte.com.cn>
49   3 Ji Chen <insom...@139.com>
50   2 Zhi Zhang <zhangz.da...@outlook.com>
51   2 Sangdi Xu <xu.san...@h3c.com>
52   2 Ruifeng Yang <yangruifeng.09...@h3c.com>
53   2 Hervé Rousseau <hrous...@cern.ch>
54   2 Chris Holcombe <chris.holco...@nebula.com>
55   2 Brad Hubbard <bhubb...@redhat.com>
56   1 Samuel Just <sj...@redhat.com>

Number of lines added and removed, by organization
 1   58294 Red Hat <cont...@redhat.com>
 24675 Unaffiliated <n...@organization.net>
 31782 Mirantis <cont...@mirantis.com>
 4 910 Gaikai <cont...@gaikai.com>
 5 409 XSky <cont...@xsky.com>
 6 321 Kylin Cloud <cont...@kylin-cloud.com>
 7 279 ZTE <cont...@zte.com.cn>
 8 252 Fujitsu <cont...@fujitsu.com>
 9 249 Igalia <cont...@igalia.com>
10 193 Intel <cont...@intel.com>
11  94 IO <cont...@io.com>
12  92 Yahoo! <cont...@yahoo-inc.com>
13  92 H3C <wen...@h3c.com>
14  63 SUSE <cont...@suse.com>
15  62 Inktank <cont...@inktank.com>
16  52 CERN <cont...@cern.ch>
17  26 Canonical <cont...@canonical.com>
18  22 Ruijie Networks <cont...@ruijie.com.cn>
19   3 LETV <cont...@letv.com>
20   2 Tencent <cont...@tencent.com>
21   2 Nebula <i...@nebula.com>

Commits, by authors
 1  121 Jason D

Re: puzzling disapearance of /dev/sdc1

2015-12-18 Thread Loic Dachary
Hi Ilya,

It turns out that sgdisk 0.8.6 -i 2 /dev/vdb removes partitions and re-adds 
them on CentOS 7 with a 3.10.0-229.11.1.el7 kernel, in the same way partprobe 
does. It is used intensively by ceph-disk and inevitably leads to races where a 
device temporarily disapears. The same command (sgdisk 0.8.8) on Ubuntu 14.04 
with a 3.13.0-62-generic kernel only generates two udev change events and does 
not remove / add partitions. The source code between sgdisk 0.8.6 and sgdisk 
0.8.8 did not change in a significant way and the output of strace -e ioctl 
sgdisk -i 2 /dev/vdb is identical in both environments.

ioctl(3, BLKGETSIZE, 20971520)  = 0
ioctl(3, BLKGETSIZE64, 10737418240) = 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
ioctl(3, BLKGETSIZE, 20971520)  = 0
ioctl(3, BLKGETSIZE64, 10737418240) = 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKGETSIZE, 20971520)  = 0
ioctl(3, BLKGETSIZE64, 10737418240) = 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0
ioctl(3, BLKSSZGET, 512)= 0

This leads me to the conclusion that the difference is in how the kernel reacts 
to these ioctl.

What do you think ? 

Cheers

On 17/12/2015 17:26, Ilya Dryomov wrote:
> On Thu, Dec 17, 2015 at 3:10 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Sage,
>>
>> On 17/12/2015 14:31, Sage Weil wrote:
>>> On Thu, 17 Dec 2015, Loic Dachary wrote:
>>>> Hi Ilya,
>>>>
>>>> This is another puzzling behavior (the log of all commands is at
>>>> http://tracker.ceph.com/issues/14094#note-4). in a nutshell, after a
>>>> series of sgdisk -i commands to examine various devices including
>>>> /dev/sdc1, the /dev/sdc1 file disappears (and I think it will showup
>>>> again although I don't have a definitive proof of this).
>>>>
>>>> It looks like a side effect of a previous partprobe command, the only
>>>> command I can think of that removes / re-adds devices. I thought calling
>>>> udevadm settle after running partprobe would be enough to ensure
>>>> partprobe completed (and since it takes as much as 2mn30 to return, I
>>>> would be shocked if it does not ;-).
> 
> Yeah, IIRC partprobe goes through every slot in the partition table,
> trying to first remove and then add the partition back.  But, I don't
> see any mention of partprobe in the log you referred to.
> 
> Should udevadm settle for a few vd* devices be taking that much time?
> I'd investigate that regardless of the issue at hand.
> 
>>>>
>>>> Any idea ? I desperately try to find a consistent behavior, something
>>>> reliable that we could use to say : "wait for the partition table to be
>>>> up to date in the kernel and all udev events generated by the partition
>>>> table update to complete".
>>>
>>> I wonder if the underlying issue is that we shouldn't be calling udevadm
>>> settle from something running from udev.  Instead, of a udev-triggered
>>> run of ceph-disk does something that changes the partitions, it
>>> should just exit and let udevadm run ceph-disk again on the new
>>> devices...?
> 
>>
>> Unless I missed something this is on CentOS 7 and ceph-disk is only called 
>> from udev as ceph-disk trigger which does nothing else but asynchronously 
>> delegate the work to systemd. Therefore there is no udevadm settle from 
>> within udev (which would deadlock and timeout every time... I hope ;-).
> 
> That's a sure lockup, until one of them times out.
> 
> How are you delegating to systemd?  Is it to avoid long-running udev
> events?  I'm probably missing something - udevadm settle wouldn't block
> on anything other than udev, so if you are shipping work off to
> somewhere else, udev can't be relied upon for waiting.
> 
> Thanks,
> 
> Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: puzzling disapearance of /dev/sdc1

2015-12-18 Thread Loic Dachary


On 18/12/2015 16:31, Ilya Dryomov wrote:
> On Fri, Dec 18, 2015 at 1:38 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Ilya,
>>
>> It turns out that sgdisk 0.8.6 -i 2 /dev/vdb removes partitions and re-adds 
>> them on CentOS 7 with a 3.10.0-229.11.1.el7 kernel, in the same way 
>> partprobe does. It is used intensively by ceph-disk and inevitably leads to 
>> races where a device temporarily disapears. The same command (sgdisk 0.8.8) 
>> on Ubuntu 14.04 with a 3.13.0-62-generic kernel only generates two udev 
>> change events and does not remove / add partitions. The source code between 
>> sgdisk 0.8.6 and sgdisk 0.8.8 did not change in a significant way and the 
>> output of strace -e ioctl sgdisk -i 2 /dev/vdb is identical in both 
>> environments.
>>
>> ioctl(3, BLKGETSIZE, 20971520)  = 0
>> ioctl(3, BLKGETSIZE64, 10737418240) = 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
>> ioctl(3, HDIO_GETGEO, {heads=16, sectors=63, cylinders=16383, start=0}) = 0
>> ioctl(3, BLKGETSIZE, 20971520)  = 0
>> ioctl(3, BLKGETSIZE64, 10737418240) = 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKGETSIZE, 20971520)  = 0
>> ioctl(3, BLKGETSIZE64, 10737418240) = 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>> ioctl(3, BLKSSZGET, 512)= 0
>>
>> This leads me to the conclusion that the difference is in how the kernel 
>> reacts to these ioctl.
> 
> I'm pretty sure it's not the kernel versions that matter here, but
> systemd versions.  Those are all get-property ioctls, and I don't think
> sgdisk -i does anything with the partition table.
> 
> What it probably does though is it opens the disk for write for some
> reason.  When it closes it, udevd (systemd-udevd process) picks that
> close up via inotify and issues the BLKRRPART ioctl, instructing the
> kernel to re-read the partition table.  Technically, that's different
> from what partprobe does, but it still generates those udev events you
> are seeing in the monitor.
> 
> AFAICT udevd started doing this in v214.

That explains everything indeed.

# strace -f -e open sgdisk -i 2 /dev/vdb
...
open("/dev/vdb", O_RDONLY)  = 4
open("/dev/vdb", O_WRONLY|O_CREAT, 0644) = 4
open("/dev/vdb", O_RDONLY)  = 4
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)
Partition unique GUID: 7BBAA731-AA45-47B8-8661-B4FAA53C4162
First sector: 2048 (at 1024.0 KiB)
Last sector: 204800 (at 100.0 MiB)
Partition size: 202753 sectors (99.0 MiB)
Attribute flags: 
Partition name: 'ceph journal'

# strace -f -e open blkid /dev/vdb2
...
open("/etc/blkid.conf", O_RDONLY)   = 4
open("/dev/.blkid.tab", O_RDONLY)   = 4
open("/dev/vdb2", O_RDONLY) = 4
open("/sys/dev/block/253:18", O_RDONLY) = 5
open("/sys/block/vdb/dev", O_RDONLY)= 6
open("/dev/.blkid.tab-hVvwJi", O_RDWR|O_CREAT|O_EXCL, 0600) = 4

blkid does not open the device for write, hence the different behavior. 
Switching sgdisk in favor of blkid fixes the issue.

Nice catch !

> Thanks,
> 
> Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: puzzling disapearance of /dev/sdc1

2015-12-18 Thread Loic Dachary
Nevermind, got it:

CHANGES WITH 214:

* As an experimental feature, udev now tries to lock the
  disk device node (flock(LOCK_SH|LOCK_NB)) while it
  executes events for the disk or any of its partitions.
  Applications like partitioning programs can lock the
  disk device node (flock(LOCK_EX)) and claim temporary
  device ownership that way; udev will entirely skip all event
  handling for this disk and its partitions. If the disk
  was opened for writing, the close will trigger a partition
  table rescan in udev's "watch" facility, and if needed
  synthesize "change" events for the disk and all its partitions.
  This is now unconditionally enabled, and if it turns out to
  cause major problems, we might turn it on only for specific
  devices, or might need to disable it entirely. Device Mapper
  devices are excluded from this logic.


On 18/12/2015 17:32, Loic Dachary wrote:
> 
>>> AFAICT udevd started doing this in v214.
> 
> Do you have a specific commit / changelog entry in mind ? I'd like to add it 
> to the commit message fixing the problem reference.
> 
> Thanks !
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: puzzling disapearance of /dev/sdc1

2015-12-17 Thread Loic Dachary
Hi Sage,

On 17/12/2015 14:31, Sage Weil wrote:
> On Thu, 17 Dec 2015, Loic Dachary wrote:
>> Hi Ilya,
>>
>> This is another puzzling behavior (the log of all commands is at 
>> http://tracker.ceph.com/issues/14094#note-4). in a nutshell, after a 
>> series of sgdisk -i commands to examine various devices including 
>> /dev/sdc1, the /dev/sdc1 file disappears (and I think it will showup 
>> again although I don't have a definitive proof of this).
>>
>> It looks like a side effect of a previous partprobe command, the only 
>> command I can think of that removes / re-adds devices. I thought calling 
>> udevadm settle after running partprobe would be enough to ensure 
>> partprobe completed (and since it takes as much as 2mn30 to return, I 
>> would be shocked if it does not ;-).
>>
>> Any idea ? I desperately try to find a consistent behavior, something 
>> reliable that we could use to say : "wait for the partition table to be 
>> up to date in the kernel and all udev events generated by the partition 
>> table update to complete".
> 
> I wonder if the underlying issue is that we shouldn't be calling udevadm 
> settle from something running from udev.  Instead, of a udev-triggered 
> run of ceph-disk does something that changes the partitions, it 
> should just exit and let udevadm run ceph-disk again on the new 
> devices...?

Unless I missed something this is on CentOS 7 and ceph-disk is only called from 
udev as ceph-disk trigger which does nothing else but asynchronously delegate 
the work to systemd. Therefore there is no udevadm settle from within udev 
(which would deadlock and timeout every time... I hope ;-).

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: understanding partprobe failure

2015-12-17 Thread Loic Dachary


On 17/12/2015 16:49, Ilya Dryomov wrote:
> On Thu, Dec 17, 2015 at 1:19 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Ilya,
>>
>> I'm seeing a partprobe failure right after a disk was zapped with sgdisk 
>> --clear --mbrtogpt -- /dev/vdb:
>>
>> partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been 
>> written, but we have been unable to inform the kernel of the change, 
>> probably because it/they are in use. As a result, the old partition(s) will 
>> remain in use. You should reboot now before making further changes.
>>
>> waiting 60 seconds (see the log below) and trying again succeeds. The 
>> partprobe call is guarded by udevadm settle to prevent udev actions from 
>> racing and nothing else goes on in the machine.
>>
>> Any idea how that could happen ?
>>
>> Cheers
>>
>> 2015-12-17 11:46:10,356.356 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:get_dm_uuid
>>  /dev/vdb uuid path is /sys/dev/block/253:16/dm/uuid
>> 2015-12-17 11:46:10,357.357 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Zapping
>>  partition table on /dev/vdb
>> 2015-12-17 11:46:10,358.358 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
>>  command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
>> 2015-12-17 11:46:10,365.365 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution:
>>  invalid backup GPT header, but valid main header; regenerating
>> 2015-12-17 11:46:10,366.366 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:backup 
>> header from main header.
>> 2015-12-17 11:46:10,366.366 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:10,366.366 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning!
>>  Main and backup partition tables differ! Use the 'c' and 'e' options
>> 2015-12-17 11:46:10,367.367 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:on the 
>> recovery & transformation menu to examine the two tables.
>> 2015-12-17 11:46:10,367.367 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:10,367.367 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning!
>>  One or more CRCs don't match. You should repair the disk!
>> 2015-12-17 11:46:10,368.368 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:11,413.413 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:11,414.414 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution:
>>  Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
>> 2015-12-17 11:46:11,414.414 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:verification
>>  and recovery are STRONGLY recommended.
>> 2015-12-17 11:46:11,414.414 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:11,415.415 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning:
>>  The kernel is still using the old partition table.
>> 2015-12-17 11:46:11,415.415 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The 
>> new table will be used at the next reboot.
>> 2015-12-17 11:46:11,416.416 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:GPT 
>> data structures destroyed! You may now partition the disk using fdisk or
>> 2015-12-17 11:46:11,416.416 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:other 
>> utilities.
>> 2015-12-17 11:46:11,416.416 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
>>  command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
>> 2015-12-17 11:46:12,504.504 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Creating
>>  new GPT entries.
>> 2015-12-17 11:46:12,505.505 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning:
>>  The kernel is still using the old partition table.
>> 2015-12-17 11:46:12,505.505 
>> INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The 
>> new table will be used

Re: [ceph-users] v10.0.0 released

2015-12-17 Thread Loic Dachary
The script handles UTF-8 fine, the copy/paste is at fault here ;-)

On 24/11/2015 07:59, piotr.da...@ts.fujitsu.com wrote:
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
>> ow...@vger.kernel.org] On Behalf Of Sage Weil
>> Sent: Monday, November 23, 2015 5:08 PM
>>
>> This is the first development release for the Jewel cycle.  We are off to a
>> good start, with lots of performance improvements flowing into the tree.
>> We are targetting sometime in Q1 2016 for the final Jewel.
>>
>> [..]
>> (`pr#5853 `_, Piotr Dałek)
> 
> Hopefully at that point the script that generates this list will learn how to 
> handle UTF-8 ;-)
> 
> 
> With best regards / Pozdrawiam
> Piotr Dałek
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


understanding partprobe failure

2015-12-17 Thread Loic Dachary
Hi Ilya,

I'm seeing a partprobe failure right after a disk was zapped with sgdisk 
--clear --mbrtogpt -- /dev/vdb:

partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been 
written, but we have been unable to inform the kernel of the change, probably 
because it/they are in use. As a result, the old partition(s) will remain in 
use. You should reboot now before making further changes.

waiting 60 seconds (see the log below) and trying again succeeds. The partprobe 
call is guarded by udevadm settle to prevent udev actions from racing and 
nothing else goes on in the machine.

Any idea how that could happen ?

Cheers

2015-12-17 11:46:10,356.356 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:get_dm_uuid
 /dev/vdb uuid path is /sys/dev/block/253:16/dm/uuid
2015-12-17 11:46:10,357.357 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Zapping
 partition table on /dev/vdb
2015-12-17 11:46:10,358.358 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
 command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
2015-12-17 11:46:10,365.365 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: 
invalid backup GPT header, but valid main header; regenerating
2015-12-17 11:46:10,366.366 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:backup 
header from main header.
2015-12-17 11:46:10,366.366 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:10,366.366 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! 
Main and backup partition tables differ! Use the 'c' and 'e' options
2015-12-17 11:46:10,367.367 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:on the 
recovery & transformation menu to examine the two tables.
2015-12-17 11:46:10,367.367 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:10,367.367 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! 
One or more CRCs don't match. You should repair the disk!
2015-12-17 11:46:10,368.368 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:11,413.413 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:11,414.414 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: 
Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-12-17 11:46:11,414.414 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:verification
 and recovery are STRONGLY recommended.
2015-12-17 11:46:11,414.414 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:11,415.415 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: 
The kernel is still using the old partition table.
2015-12-17 11:46:11,415.415 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new 
table will be used at the next reboot.
2015-12-17 11:46:11,416.416 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:GPT data 
structures destroyed! You may now partition the disk using fdisk or
2015-12-17 11:46:11,416.416 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:other 
utilities.
2015-12-17 11:46:11,416.416 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
 command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
2015-12-17 11:46:12,504.504 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Creating 
new GPT entries.
2015-12-17 11:46:12,505.505 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: 
The kernel is still using the old partition table.
2015-12-17 11:46:12,505.505 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new 
table will be used at the next reboot.
2015-12-17 11:46:12,505.505 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The 
operation has completed successfully.
2015-12-17 11:46:12,506.506 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling
 partprobe on zapped device /dev/vdb
2015-12-17 11:46:12,507.507 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
 command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:46:15,427.427 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running
 command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:46:16,860.860 
INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe
 /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we 
have 

puzzling disapearance of /dev/sdc1

2015-12-17 Thread Loic Dachary
Hi Ilya,

This is another puzzling behavior (the log of all commands is at 
http://tracker.ceph.com/issues/14094#note-4). in a nutshell, after a series of 
sgdisk -i commands to examine various devices including /dev/sdc1, the 
/dev/sdc1 file disappears (and I think it will showup again although I don't 
have a definitive proof of this).

It looks like a side effect of a previous partprobe command, the only command I 
can think of that removes / re-adds devices. I thought calling udevadm settle 
after running partprobe would be enough to ensure partprobe completed (and 
since it takes as much as 2mn30 to return, I would be shocked if it does not 
;-).

Any idea ? I desperately try to find a consistent behavior, something reliable 
that we could use to say : "wait for the partition table to be up to date in 
the kernel and all udev events generated by the partition table update to 
complete". 

Cheers
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: cmake

2015-12-16 Thread Loic Dachary
Hi,

On 16/12/2015 18:33, Sage Weil wrote:
> The work to transition to cmake has stalled somewhat.  I've tried to use 
> it a few times but keep running into issues that make it unusable for me.  
> Not having make check is a big one, but I think the hackery required to 
> get that going points to the underlying problem(s).
> 
> I seems like the main problem is that automake puts all build targets in 
> src/ and cmake spreads them all over build/*.  This makes that you can't 
> just add ./ to anything that would normally be in your path (or, 
> PATH=.:$PATH, and then run, say, ../qa/workunits/cephtool/test.sh).  
> There's a bunch of kludges in vstart.sh to make it work that I think 
> mostly point to this issue (and the .libs things).  Is there simply an 
> option we can give cmake to make it put built binaries directly in build/?
> 
> Stepping back a bit, it seems like the goals should be
> 
> 1. Be able to completely replace autotools.  I don't fancy maintaining 
> both in parallel.
> 
> 2. Be able to run vstart etc from the build dir.
> 
> 3. Be able to run ./ceph[-anything] from the build dir, or put the build 
> dir in the path.  (I suppose we could rely in a make install step, but 
> that seems like more hassle... hopefully it's not neceesary?)
> 
> 4. make check has to work
> 
> 5. Use make-dist.sh to generate a release tarball (not make dist)
> 
> 6. gitbuilders use make-dist.sh and cmake to build packages
> 
> 7. release process uses make-dist.sh and cmake to build a relelase
> 
> I'm probably missing something?
> 
> Should we set a target of doing the 10.0.2 or .3 with cmake?

An intermediate step could be to switch from using make check to using make 
distcheck. Once this works, all issues related to out-of-tree build and make 
check will be resolved. And deprecating autotools in favor of cmake will be 
just about that.

My 2cts

> 
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [Ceph-qa] Bug #13191 CentOS 7 multipath test fail because libdevmapper version must be >= 1.02.89

2015-12-15 Thread Loic Dachary
[redirecting to ceph-devel].

Hi,

On 14/12/2015 21:20, Abe Asraoui wrote:
> Hi All,
> 
> Does anyone know if this bug # 13191 has been resolved ??

http://tracker.ceph.com/issues/13191 has not been resolved. Could you please 
comment on it ? A short explanation about why you need it resolved will help.

Thanks !

> 
> 
> Thanks,
> Abe
> ___
> Ceph-qa mailing list
> ceph...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-qa-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


misc: ignore some unusable block devices

2015-12-14 Thread Loic Dachary
Hi Robin,

I removed misc: ignore some unusable block devices 
9c5eaeccb807d103884f46d174798dd982092696 from the openstack branch because it 
fails on CentOS 7 with
the following. Could you please make a new pull request with it so we can keep 
testing it there ?

Thanks !

2015-12-14T14:37:36.305 
INFO:teuthology.orchestra.run.target167114236178:Running: 'sudo blkid -o export 
-p /dev/vdc'
2015-12-14T14:37:36.731 ERROR:teuthology.contextutil:Saw exception from nested 
tasks
Traceback (most recent call last):
  File "/home/ubuntu/teuthology/teuthology/contextutil.py", line 28, in nested
vars.append(enter())
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
  File "/home/ubuntu/src/ceph-qa-suite_master/tasks/ceph.py", line 357, in 
cluster
devs = teuthology.get_scratch_devices(remote)
  File "/home/ubuntu/teuthology/teuthology/misc.py", line 860, in 
get_scratch_devices
stderr=StringIO(),
  File "/home/ubuntu/teuthology/teuthology/orchestra/remote.py", line 156, in 
run
r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
  File "/home/ubuntu/teuthology/teuthology/orchestra/run.py", line 378, in run
r.wait()
  File "/home/ubuntu/teuthology/teuthology/orchestra/run.py", line 114, in wait
label=self.label)
CommandFailedError: Command failed on target167114236178 with status 2: 'sudo 
blkid -o export -p /dev/vdc'

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Stable release HOWTO: hunt for lost page

2015-12-14 Thread Loic Dachary
Hi,

It looks like we've lost 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_triage_incoming_backport_pull_requests
 . We can re-write it, of course, it's not that complex. But looking at the 
index I can't find it ( 
http://tracker.ceph.com/projects/ceph-releases/wiki/index ) and the activity 
backlog ( 
http://tracker.ceph.com/projects/ceph-releases/activity?utf8=%E2%9C%93_issues=1_wiki_edits=1
 ) does not show an event when a page is deleted. 

I suppose it's an accident and I propose we disable the page deletion feature 
to avoid such mistakes in the future (pages can be renamed to things like 
REMOVE-ME instead ;-).

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Stable release HOWTO: hunt for lost page

2015-12-14 Thread Loic Dachary
Hi Abhishek,

On 14/12/2015 15:40, Abhishek Varshney wrote:
> Hi Loic,
> 
> I have revived the page at
> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_triage_incoming_backport_pull_requests
> from https://web.archive.org with a version cached on 4th October.

Perfect ! It has not changed much in the past few month, I can't think of 
something we added that would be missing.

> Please review it, in case there are any changes to it.
> 
> Disabling the page deletion sounds like a good idea.

Done.

Cheers

> 
> Thanks
> Abhishek
> 
> On Mon, Dec 14, 2015 at 7:50 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi,
>>
>> It looks like we've lost 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_triage_incoming_backport_pull_requests
>>  . We can re-write it, of course, it's not that complex. But looking at the 
>> index I can't find it ( 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/index ) and the activity 
>> backlog ( 
>> http://tracker.ceph.com/projects/ceph-releases/activity?utf8=%E2%9C%93_issues=1_wiki_edits=1
>>  ) does not show an event when a page is deleted.
>>
>> I suppose it's an accident and I propose we disable the page deletion 
>> feature to avoid such mistakes in the future (pages can be renamed to things 
>> like REMOVE-ME instead ;-).
>>
>> What do you think ?
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: misc: ignore some unusable block devices

2015-12-14 Thread Loic Dachary


On 14/12/2015 19:18, Robin H. Johnson wrote:
> On Mon, Dec 14, 2015 at 04:12:16PM +0100, Loic Dachary wrote:
>> Hi Robin,
>>
>> I removed misc: ignore some unusable block devices
>> 9c5eaeccb807d103884f46d174798dd982092696 from the openstack branch
>> because it fails on CentOS 7 with
>> the following. Could you please make a new pull request with it so we
>> can keep testing it there ?
> Will do, but I wonder how you managed to get a return code of 2 from it;
> as that should be device-does-not-exist.

IIRC that happens when blkid fails to retrieve information. I found blkid to be 
generally faster and less reliable than sgdisk -i 1 /dev/sdd for the purpose of 
getting partition information. For ceph-disk we chose reliability over speed. 
That being said I've not investigated the real reason why blkid returns -2.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: new OSD re-using old OSD id fails to boot

2015-12-09 Thread Loic Dachary


On 09/12/2015 02:13, David Zafman wrote:
> 
> Remember I really think we want a disk replacement feature that would retain 
> the OSD id so that it avoids unnecessary data movement.  See tracker 
> http://tracker.ceph.com/issues/13732

I remember this, yes, it is a good idea :-)  Would that help fix 
http://tracker.ceph.com/issues/13988 ?

> David
> 
> On 12/5/15 8:49 AM, Loic Dachary wrote:
>> Hi Sage,
>>
>> The problem described at "new OSD re-using old OSD id fails to boot" 
>> http://tracker.ceph.com/issues/13988 consistently fails the ceph-disk suite 
>> on master. I wonder if it could be a side effect of the recent optimizations 
>> introduced in the monitor ?
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: new OSD re-using old OSD id fails to boot

2015-12-09 Thread Loic Dachary


On 09/12/2015 11:39, Wei-Chung Cheng wrote:
> Hi Loic,
> 
> I try to reproduce this problem on my CentOS7.
> I can not do the same issue.
> This is my version:
> ceph version 10.0.0-928-g8eb0ed1 (8eb0ed1dcda9ee6180a06ee6a4415b112090c534)
> Would you describe more detail?

I reproduced the problem yesterday once more by running the ceph-disk suite. It 
happens every time.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: testing the /dev/cciss/c0d0 device names

2015-12-07 Thread Loic Dachary

Hi,

On 06/12/2015 20:15, Ilya Dryomov wrote:
> On Sat, Dec 5, 2015 at 7:36 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Ilya,
>>
>> ceph-disk has special handling for device names like /dev/cciss/c0d1 [1] and 
>> it was partially broken when support for device mapper was introduced. 
>> Ideally there would be a way to test that support when running the ceph-disk 
>> suite [2]. Do you know of a way to do that without having the hardware for 
>> which this driver is designed ?
>>
>> Maybe this convention (/dev/cciss/c0d0 being mapped to /sys/block/cciss!c0d0 
>> is not unique to this driver and I could use another to validate the name 
>> conversion from X/Y to X!Y and vice versa is handled as it should ?
> 
> No, it's not unique.  driver core does strreplace(s, '/', '!') at
> register time to work around such block devices.  The list includes
> DAC960, aoeblk, cciss, cpqarray, sx8 and probably more, but I don't
> think anything widespread uses this naming scheme.  IIRC dm actually
> won't let you name a device with anything that contains a slash.
> 
> If you really wanted you could set up aoeblk I guess, but a simple unit
> test should be be more than enough ;)

Thanks for the information. I'll refrain from attempting integration tests and 
try to make it solid with unit tests :-)

Cheers

> 
> Thanks,
> 
> Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


new OSD re-using old OSD id fails to boot

2015-12-05 Thread Loic Dachary
Hi Sage,

The problem described at "new OSD re-using old OSD id fails to boot" 
http://tracker.ceph.com/issues/13988 consistently fails the ceph-disk suite on 
master. I wonder if it could be a side effect of the recent optimizations 
introduced in the monitor ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


testing the /dev/cciss/c0d0 device names

2015-12-05 Thread Loic Dachary
Hi Ilya,

ceph-disk has special handling for device names like /dev/cciss/c0d1 [1] and it 
was partially broken when support for device mapper was introduced. Ideally 
there would be a way to test that support when running the ceph-disk suite [2]. 
Do you know of a way to do that without having the hardware for which this 
driver is designed ? 

Maybe this convention (/dev/cciss/c0d0 being mapped to /sys/block/cciss!c0d0 is 
not unique to this driver and I could use another to validate the name 
conversion from X/Y to X!Y and vice versa is handled as it should ?

Cheers

[1] https://github.com/ceph/ceph/blob/infernalis/src/ceph-disk#L438
[2] 
https://github.com/ceph/ceph/blob/infernalis/qa/workunits/ceph-disk/ceph-disk-test.py
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


proposal to run Ceph tests on pull requests

2015-12-05 Thread Loic Dachary
Hi Ceph,

TL;DR: a ceph-qa-suite bot running on pull requests is sustainable and is an 
incentive for contributors to use teuthology-openstack independently

When a pull request is submitted, it is compiled, some tests are run[1] and the 
result is added to the pull request to confirm that it does not introduce a 
trivial problem. Such tests are however limited because they must:

* run within a few minutes at most
* not require multiple machines
* not require root privileges

More extensive tests (primarily integration tests) are needed before a 
contribution can be merged into Ceph [2], to verify it does not introduce a 
subtle regression. It would be ideal to run these integration tests on each 
pull request but there are two obstacles:

* each test takes ~ 1.5 hour
* each test cost ~ 0.30 euros

On the current master, running all tests would require ~1000 jobs [3]. That 
would cost ~ 300 euros on each pull request and take ~10 hours assuming 100 
jobs can run in parallel. We could resolve that problem by:

* maintaining a ceph-qa-suite map to be used as a white list mapping a diff to 
a set of tests. For instance, if the diff modifies the src/ceph-disk file, it 
outputs the ceph-disk suite[4]. This would effectively trim the tests that are 
unrelated to the contribution and reduce the number of tests to a maximum of 
~100 [4] and most likely a dozen.
* tests are run if one of the commits of the pull request has the *Needs-qa: 
true* flag in the commit message[5]
* limiting the number of tests to fit in the allocated budget. If there was 
enough funding for 10,000 jobs during the previous period and there was a total 
of 1,000 test run required (a test run is a set of tests as produced by the 
ceph-qa-suite map), each run is trimmed to a maximum of ten tests, regardless.

Here is an example:

Joe submits a pull request to fix a bug in the librados API
The make check bot compiles and fails make check because it introduces a bug
Joe uses run-make-check.sh locally to repeat the failure, fixes it and repush
The make check bot compiles and passes make check
Joe amends the commit message to add *Needs-qa: true* and repushes
The ceph-qa-suite map script finds a change on the librados API and outputs 
smoke/basic/tasks/rados_api_tests.yaml
The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml 
which fails
Joe examines the logs found at http://teuthology-logs.public.ceph.com/ and 
decides to debug by running the test himself
Joe runs teuthology-openstack --suite smoke/basic/tasks/rados_api_tests.yaml 
against his own OpenStack tenant [6]
Joe repush with a fix
The ceph-qa-suite bot runs the test smoke/basic/tasks/rados_api_tests.yaml 
which succeeds
Kefu reviews the pull request and has a link to the successful test runs in the 
comments

This approach scales with the size of the Ceph developer community [7] because 
regular contributors benefit directly from funding the ceph-qa-suite bot. New 
contributors can focus on learning how to interpret the ceph-qa-suite error 
logs for their contribution and learn about how to debug it via 
teuthology-openstack if needed, which is a better user experience than trying 
to figure out which ceph-qa-suite job to run, learning about teuthology, 
schedule the test and interpret the results.

The maintenance workload of a ceph-qa-suite bot probably requires one work day 
a week, to handle funding, sysadmin of the server where the bot runs but mostly 
to sort out the false negatives. I believe a pure self-service approach where 
each contributor would be asked to run teuthology-openstack independently would 
actually require more work. The ceph-qa-suite bot provides a baseline on which 
everybody can agree to sort out the false negatives. When a contributor runs 
teuthology-openstack by herself/himself, it is difficult for her/him to figure 
out if a failure comes from something she/he did incorrectly because she/he is 
not familiar with teuthology-openstack or if it is related to her/his 
contribution. She/He will asks for assistance  in situations where comparing 
her/his run with the output of the ceph-qa-suite bot would probably give 
her/him enough hints to fix the problem herself/himself.

If the ceph-qa-suite bot becomes unavailable, the contributors are not blocked 
because they can run it by themselves on their own OpenStack tenant and link 
the results to the pull request in the same way the bot would. Debugging a 
failed test is essentially the same thing as running the ceph-qa-suite bot.

Cheers

[1] run-make-check.sh https://github.com/ceph/ceph/blob/master/run-make-check.sh
[2] Ceph test suites https://github.com/ceph/ceph-qa-suite/tree/master/suites
[3] teuthology-suite --suite .  --subset 1/4
[4] minimal number of tests to run all tasks at least once: 130 for rados, 76 
for fs, 113 for upgrade, 18 for rgw, 45 for rbd.
[5] a former proposal was to include the test suite to run in the commit 
message, but this is more difficult to 

Re: CodingStyle on existing code

2015-12-01 Thread Loic Dachary


On 01/12/2015 14:10, Wido den Hollander wrote:
> Hi,
> 
> While working on mon/PGMonitor.cc I see that there is a lot of
> inconsistency on the code.
> 
> A lot of whitespaces, indentation which is not correct, well, a lot of
> things.
> 
> Is this something we want to fix? With some scripts we can probably do
> this easily, but it might cause merge hell with people working on features.

A sane (but long) way to do that is to cleanup when fixing a bug or adding a 
feature. With (a lot) of patience, it will eventually be better :-)

My 2cts.

> Wido
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: RFC: teuthology field in commit messages

2015-11-29 Thread Loic Dachary


On 29/11/2015 21:47, John Spray wrote:
> On Sun, Nov 29, 2015 at 8:25 PM, Loic Dachary <l...@dachary.org> wrote:
>>
>>
>> On 29/11/2015 21:08, John Spray wrote:
>>> On Sat, Nov 28, 2015 at 3:56 PM, Loic Dachary <l...@dachary.org> wrote:
>>>> Hi Ceph,
>>>>
>>>> An optional teuthology field could be added to a commit message like so:
>>>>
>>>> teuthology: --suite rbd
>>>>
>>>> to state that this commit should be tested with the rbd suite. It could be 
>>>> parsed by bots and humans.
>>>>
>>>> It would make it easy and cost effective to run partial teuthology suites 
>>>> automatically on pull requests.
>>>>
>>>> What do you think ?
>>>
>>> Hmm, we are usually testing things at the branch/PR level rather than
>>> on the per-commit level, so it feels a bit strange to have this in the
>>> commit message.
>>
>> Indeed. But what is a branch if not the HEAD commit ?
> 
> It's the HEAD commit, and its ancestors.  So in a typical PR (or
> branch) there are several commits since the base (i.e. since master),
> and perhaps only one of them has a test suite marked on it, or maybe
> they have different test suites marked on different commits in the
> branch.
> 
> It's not necessarily a problem, just something that would need to have
> a defined behaviour (maybe when testing a PR collect the "teuthology:"
> tags from all commits in PR, and run all the suites mentioned?).

That's an interesting idea :-) My understanding is that we currently test a PR 
by scheduling suites on its HEAD. But maybe you sometime schedule suites using 
a commit that's in the middle of a PR ?

Cheers

>>
>>> However, if a system existed that would auto-test things when I put
>>> something magic in a commit message, I would probably use it!
>>>
>>> John
>>>
>>>
>>>>
>>>> --
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: RFC: teuthology field in commit messages

2015-11-29 Thread Loic Dachary


On 29/11/2015 23:55, John Spray wrote:
> On Sun, Nov 29, 2015 at 9:25 PM, Loic Dachary <l...@dachary.org> wrote:
>>
>>
>> On 29/11/2015 21:47, John Spray wrote:
>>> On Sun, Nov 29, 2015 at 8:25 PM, Loic Dachary <l...@dachary.org> wrote:
>>>>
>>>>
>>>> On 29/11/2015 21:08, John Spray wrote:
>>>>> On Sat, Nov 28, 2015 at 3:56 PM, Loic Dachary <l...@dachary.org> wrote:
>>>>>> Hi Ceph,
>>>>>>
>>>>>> An optional teuthology field could be added to a commit message like so:
>>>>>>
>>>>>> teuthology: --suite rbd
>>>>>>
>>>>>> to state that this commit should be tested with the rbd suite. It could 
>>>>>> be parsed by bots and humans.
>>>>>>
>>>>>> It would make it easy and cost effective to run partial teuthology 
>>>>>> suites automatically on pull requests.
>>>>>>
>>>>>> What do you think ?
>>>>>
>>>>> Hmm, we are usually testing things at the branch/PR level rather than
>>>>> on the per-commit level, so it feels a bit strange to have this in the
>>>>> commit message.
>>>>
>>>> Indeed. But what is a branch if not the HEAD commit ?
>>>
>>> It's the HEAD commit, and its ancestors.  So in a typical PR (or
>>> branch) there are several commits since the base (i.e. since master),
>>> and perhaps only one of them has a test suite marked on it, or maybe
>>> they have different test suites marked on different commits in the
>>> branch.
>>>
>>> It's not necessarily a problem, just something that would need to have
>>> a defined behaviour (maybe when testing a PR collect the "teuthology:"
>>> tags from all commits in PR, and run all the suites mentioned?).
>>
>> That's an interesting idea :-) My understanding is that we currently test a 
>> PR by scheduling suites on its HEAD. But maybe you sometime schedule suites 
>> using a commit that's in the middle of a PR ?
> 
> I think I've made this too complicated...
> 
> What I meant was that while one would schedule suites against the HEAD
> of the PR, that might not be the same commit that has the logical
> testing information in.  For example, I might have main commit that
> has the "Fixes: " and "teuthology: " tags, and then a second commit
> (that would be HEAD) which e.g. tweaks a unit test.  It would be weird
> if I had to put the teuthology: tag on the unit test commit rather
> than the functional test, so I guess it would make sense to look at
> the teuthology: tags from all the commits in a PR when scheduling it.

Thanks for explaining, it's cristal clear. 

My initial idea of having a teuthology: tag on the top level commit was indeed 
naive and wrong. And looking through all commits and scheduling the suites 
found on the HEAD as you suggest reflect what we manually do and sound right 
:-) 

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: RFC: teuthology field in commit messages

2015-11-29 Thread Loic Dachary


On 29/11/2015 21:08, John Spray wrote:
> On Sat, Nov 28, 2015 at 3:56 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Ceph,
>>
>> An optional teuthology field could be added to a commit message like so:
>>
>> teuthology: --suite rbd
>>
>> to state that this commit should be tested with the rbd suite. It could be 
>> parsed by bots and humans.
>>
>> It would make it easy and cost effective to run partial teuthology suites 
>> automatically on pull requests.
>>
>> What do you think ?
> 
> Hmm, we are usually testing things at the branch/PR level rather than
> on the per-commit level, so it feels a bit strange to have this in the
> commit message.

Indeed. But what is a branch if not the HEAD commit ?

> However, if a system existed that would auto-test things when I put
> something magic in a commit message, I would probably use it!
> 
> John
> 
> 
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: RFC: teuthology field in commit messages

2015-11-29 Thread Loic Dachary
Hi Joao,

On 29/11/2015 12:51, Joao Eduardo Luis wrote:
> On 11/28/2015 03:56 PM, Loic Dachary wrote:
>> Hi Ceph,
>>
>> An optional teuthology field could be added to a commit message like so:
>>
>> teuthology: --suite rbd
>>
>> to state that this commit should be tested with the rbd suite. It could be 
>> parsed by bots and humans.
>>
>> It would make it easy and cost effective to run partial teuthology suites 
>> automatically on pull requests.
>>
>> What do you think ?
> 
> Can't we use git-notes for that instead?

It possible but few people understand how it works.

> I think this pollutes the history a bit. Especially considering this
> sort of metadata isn't necessarily specific to a given diff.

I think it is relevant in a permanent way. When running a suite, we do it on a 
given diff. For instance,
in a 10 commit pull request, we run the suite on the head of the branch, which 
will later become the second parent of the merge. Should we want to test at a 
later time, long after the pull request has been merged, we will be able to do 
it using the same suite. 

> Also should be considered that this is a field that may make sense today
> but may not make much sense in 10, 15 years. And while we have quite a
> few special-purpose fields (e.g., Fixes, Backport), those are currently
> pretty explanatory and I believe will be still easily understandable in
> a decade's time.

It also holds for stable branches since we maintain stable branches for 
ceph-qa-suite as well. So, for backporting 3 commits from a given pull request, 
it will also help to know that the backport could also be tested with this 
specific suite. And if the suite is missing the test, it's also a good hint 
that this test needs to be backported as well.

> In any case, if there's absolutely no other way to do this and the other
> folk thinks it's important to have this, I will certainly not be the
> party pooper ;)

:-) FWIW, I think the Backport: field should not be used ( see 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_schedule_an_issue_for_backporting#Backport-field-in-the-commit-messages
 for the full rationale ). But I think the "teuthology" field being used 
*prior* to the pull request being merged makes sense and is a valuable addition 
to the commit history.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


RFC: teuthology field in commit messages

2015-11-28 Thread Loic Dachary
Hi Ceph,

An optional teuthology field could be added to a commit message like so:

teuthology: --suite rbd

to state that this commit should be tested with the rbd suite. It could be 
parsed by bots and humans.

It would make it easy and cost effective to run partial teuthology suites 
automatically on pull requests.

What do you think ?

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: v0.80.11 QE validation status

2015-11-16 Thread Loic Dachary
Thanks for the update Tamil, that makes sense now :-)

On 16/11/2015 23:05, Tamil Muthamizhan wrote:
> Hi Loic and Yuri,
> 
> issue 11104 can be resolved only after the stable epel[with Boris Fix] is out 
> and tested in teuthology.
> 
> what Warren has currently done is to tweak the install task to work around 
> this issue inorder to clear the test blocker for v0.80.11. It doesnt count as 
> a "real" fix.
> 
> Regards,
> Tamil
> 
> - Original Message -
> From: "Yuri Weinstein" <ywein...@redhat.com>
> To: "Loic Dachary" <ldach...@redhat.com>
> Cc: "ceph-qa" <ceph...@ceph.com>, "Ceph Development" 
> <ceph-devel@vger.kernel.org>, "Sage Weil" <s...@redhat.com>, "Alfredo Deza" 
> <ad...@redhat.com>
> Sent: Monday, November 16, 2015 2:01:42 PM
> Subject: Re: v0.80.11 QE validation status
> 
> Loic,
> 
> I am not actually sure about resolving #11104.
> 
> Warren?
> 
> Thx
> YuriW
> 
> 
> On Mon, Nov 16, 2015 at 1:04 PM, Loic Dachary <ldach...@redhat.com> wrote:
>> Hi Yuri,
>>
>> Thanks for the update :-) Should we mark #11104 as resolved ?
>>
>> Cheers
>>
>> On 16/11/2015 19:45, Yuri Weinstein wrote:
>>> This release QE validation took longer time due to the #11104
>>> additional fixing/testing and discovered related to it issues ##13794,
>>> 13622
>>>
>>> We agreed to release v0.80.11 based on tests results.
>>>
>>> Thx
>>> YuriW
>>>
>>> On Wed, Oct 28, 2015 at 9:04 AM, Yuri Weinstein <ywein...@redhat.com> wrote:
>>>> Summary of suites executed for this release can be found in
>>>> http://tracker.ceph.com/issues/11644
>>>>
>>>> rados - 1/7th passed
>>>>
>>>> rbd - http://tracker.ceph.com/issues/11104
>>>>
>>>> rgw - http://tracker.ceph.com/issues/11104
>>>>
>>>> fs - http://tracker.ceph.com/issues/11104, 
>>>> http://tracker.ceph.com/issues/13630
>>>>
>>>> krbd - http://tracker.ceph.com/issues/13631
>>>>
>>>> kcephfs - http://tracker.ceph.com/issues/13631,
>>>> http://tracker.ceph.com/issues/13630
>>>>
>>>> samba - http://tracker.ceph.com/issues/6613 sama as in v0.80.10 was
>>>> aprroved for release
>>>>
>>>> ceph-deploy(ubuntu_) - almost passed, 1 job is still running
>>>>
>>>> ceph-deploy(distros) - http://tracker.ceph.com/issues/13367
>>>>
>>>> upgrade/dumpling-x (to firefly)(distros) - passed
>>>>
>>>> upgrade/firefly(distros) - passed
>>>>
>>>> upgrades to giant - deprecated
>>>>
>>>> upgrade/firefly-x (to hammer)(distros) -
>>>> http://tracker.ceph.com/issues/11104,
>>>> http://tracker.ceph.com/issues/13632
>>>>
>>>> powercycle - http://tracker.ceph.com/issues/11104,
>>>> http://tracker.ceph.com/issues/13631
>>>>
>>>> All found problems seem unrelated to the product, however they
>>>> prevented some tests from running.  In particular #11104 is widespread
>>>> and has to be fixed (see also http://tracker.ceph.com/issues/13622 as
>>>> proposed workaround)
>>>>
>>>> I suggest we rerunning failed tests after addressing the issues above.
>>>>
>>>> Thx
>>>> YuriW
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: v0.80.11 QE validation status

2015-11-16 Thread Loic Dachary
Hi Yuri,

Thanks for the update :-) Should we mark #11104 as resolved ?

Cheers

On 16/11/2015 19:45, Yuri Weinstein wrote:
> This release QE validation took longer time due to the #11104
> additional fixing/testing and discovered related to it issues ##13794,
> 13622
>
> We agreed to release v0.80.11 based on tests results.
>
> Thx
> YuriW
>
> On Wed, Oct 28, 2015 at 9:04 AM, Yuri Weinstein  wrote:
>> Summary of suites executed for this release can be found in
>> http://tracker.ceph.com/issues/11644
>>
>> rados - 1/7th passed
>>
>> rbd - http://tracker.ceph.com/issues/11104
>>
>> rgw - http://tracker.ceph.com/issues/11104
>>
>> fs - http://tracker.ceph.com/issues/11104, 
>> http://tracker.ceph.com/issues/13630
>>
>> krbd - http://tracker.ceph.com/issues/13631
>>
>> kcephfs - http://tracker.ceph.com/issues/13631,
>> http://tracker.ceph.com/issues/13630
>>
>> samba - http://tracker.ceph.com/issues/6613 sama as in v0.80.10 was
>> aprroved for release
>>
>> ceph-deploy(ubuntu_) - almost passed, 1 job is still running
>>
>> ceph-deploy(distros) - http://tracker.ceph.com/issues/13367
>>
>> upgrade/dumpling-x (to firefly)(distros) - passed
>>
>> upgrade/firefly(distros) - passed
>>
>> upgrades to giant - deprecated
>>
>> upgrade/firefly-x (to hammer)(distros) -
>> http://tracker.ceph.com/issues/11104,
>> http://tracker.ceph.com/issues/13632
>>
>> powercycle - http://tracker.ceph.com/issues/11104,
>> http://tracker.ceph.com/issues/13631
>>
>> All found problems seem unrelated to the product, however they
>> prevented some tests from running.  In particular #11104 is widespread
>> and has to be fixed (see also http://tracker.ceph.com/issues/13622 as
>> proposed workaround)
>>
>> I suggest we rerunning failed tests after addressing the issues above.
>>
>> Thx
>> YuriW

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Firefly EOL date - still Jan 2016?

2015-11-13 Thread Loic Dachary
Hi Ken,

On 13/11/2015 22:15, Ken Dreyer wrote:
> Hi folks,
> 
> This is mainly directed at the stable release team members
> (http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO), since
> they are the ones doing the work of backporting :)
> 
> On http://docs.ceph.com/docs/master/releases/, it says the estimated
> EOL for Firefly is Jan 2016, which is coming up soon.
> 
> Does anyone on the stable release team have an interest in doing
> releases beyond that date, or should we announce that as a firm date?

Although we're heading in this direction, publishing a point release still 
depends on resources that are not limited to the stable release team. Ideally, 
in the not too far future, someone from the stable release team could answer 
"I'll keep publishing releases" and really be able to do it all by her/himself.

As of today, running the integration tests can partially be done using an 
OpenStack tenant and no access to the sepia lab. But some of them (rgw in 
particular) still need work. There is no need for the gitbuilders because the 
teuthology-openstack creates the necessary package on demand, but corner cases 
were fixed this week and there probably are a few others.

The other blocker is the release process which still require significant manual 
intervention, privileged access and undocumented knowledge. This is improving 
quickly but we're not yet at a stage where an unprivileged third party is able 
to run it independently.

I'm under the impression that firefly will indeed be EOL early next year 
because:

 * it is currently impractical for community members to test and publish a 
release
 * the backport activity gradually slowed down as users migrate to hammer

I hope that my answer will be different by the time Jewel retires :-) The 
larger goal is to establish a light weight development process that requires as 
little resources as possible. Let say adding a patch to a stable release, 
running all the tests and publishing the release can be done within a week real 
time, a few hours of work and less than 1,000 USD to pay for the cloud 
resources. IMHO it is unlikely to expect stable releases from a community of 
developers if the effort/cost is significantly higher.

Cheers
> 
> - Ken
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
Hi Sam,

I crafted a custom query that could be used as a replacement for the backlog 
plugin

   http://tracker.ceph.com/projects/ceph/issues?query_id=86

It displays issues that are features or tasks, grouped by target version and 
ordered by priority.

I also created a v10.0.0 version so we can assign features we want for this 
next version to it.

If you feel that's not good enough, we can just throw it away, it's merely a 
proposal ;-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre




signature.asc
Description: OpenPGP digital signature


Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary
But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-)

On 10/11/2015 16:28, Loic Dachary wrote:
> Hi Sam,
> 
> I crafted a custom query that could be used as a replacement for the backlog 
> plugin
> 
>http://tracker.ceph.com/projects/ceph/issues?query_id=86
> 
> It displays issues that are features or tasks, grouped by target version and 
> ordered by priority.
> 
> I also created a v10.0.0 version so we can assign features we want for this 
> next version to it.
> 
> If you feel that's not good enough, we can just throw it away, it's merely a 
> proposal ;-)
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Backlog for the Ceph tracker

2015-11-10 Thread Loic Dachary


On 10/11/2015 16:34, Loic Dachary wrote:
> But http://tracker.ceph.com/projects/ceph/agile_versions looks better :-)

It appears to be a crippled version of a proprietary product 
http://www.redminecrm.com/projects/agile/pages/last

My vote would be to de-install it since it is even less flexible to use than 
the custom query below. It is disapointing to loose a plugin because it is no 
longer maintained, but that's not something we can always forsee. IMHO, relying 
on a proprietary redmine plugin is not a safe bet and it would be wise to not 
become dependent on it.

Cheers

> On 10/11/2015 16:28, Loic Dachary wrote:
>> Hi Sam,
>>
>> I crafted a custom query that could be used as a replacement for the backlog 
>> plugin
>>
>>http://tracker.ceph.com/projects/ceph/issues?query_id=86
>>
>> It displays issues that are features or tasks, grouped by target version and 
>> ordered by priority.
>>
>> I also created a v10.0.0 version so we can assign features we want for this 
>> next version to it.
>>
>> If you feel that's not good enough, we can just throw it away, it's merely a 
>> proposal ;-)
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Preparing infernalis v9.2.1

2015-11-10 Thread Loic Dachary
Hi Abhishek,

I created the issue to track the progress of infernalis v9.2.1 at 
http://tracker.ceph.com/issues/13750 and assigned it to you. There are a dozen 
issues waiting to be backported and another dozen waiting to be tested in an 
integration branch. 

Good luck with driving your first point release :-)

Enjoy Diwali !

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: How to modify affiliation?

2015-11-10 Thread Loic Dachary
Hi,

You can submit a patch to 
https://github.com/ceph/ceph/blob/master/.organizationmap

Cheers

On 10/11/2015 09:21, chen kael wrote:
> Hi,ceph-dev
>  who can tell me how to modify my affiliation?
>  Thanks!
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: a home for backport snippets

2015-11-10 Thread Loic Dachary
Hi,

The new snippets home is at https://pypi.python.org/pypi/ceph-workbench and 
http://ceph-workbench.dachary.org/root/ceph-workbench.

The first snippet was merged by Nathan yesterday[1], the backport documentation 
updated accordingly[2], and I used it after merging half a dozen hammer 
backport that were approved a few days ago.

Integration tests should provide the best help against regression we can hope 
for (they spawn a redmine instance every time they run and use a dedicated 
github user to create and destroy projects, pull requests etc.) and they are 
run on every merge request[3]. When integrated in ceph-workbench, the snippet 
is documented[4] and the implementation[5] is tested in full[6]. The merits of 
100% coverage are often disputed as overkill. IMHO it's better to remove an 
untested line of code rather than taking the chance that it grows into 
something that does not work (or possibly never worked). In the case of this 
snippet, there is a dozen of safe guards and four lines of code to modify the 
issue. It would be bad to discover, after modifying hundreds of issues in the 
Ceph tracker, that it never worked as expected. I'm sure we'll find ways to 
*not* do the right thing even with integration tests. But we'll hopefully do 
the right thing more often ;-)

I'm not sure how much time it will take us to convert all the snippets we have, 
but it does not matter much as we can keep doing things manually in the 
meantime.

Cheers

P.S. We are using a GitLab instance, with an integrated CI, instead of github 
with a CI on jenkins.ceph.com roughly for the same reasons puppet-ceph is in 
https://github.com/openstack/puppet-ceph and uses the OpenStack gates. We have 
no expertise on jenkins-job-builder[7] and the learning curve is perceived as 
significantly higher than a GitLab with an integrated CI[8]. We also want to 
share administrative permissions on the CI with all members of the stable 
release team to share the maintenance workload.

[1] backport-set-release 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8
[2] Resolving an issue 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_merge_commits_from_the_integration_branch#Resolving-the-matching-issue
[3] Continuous integration 
http://ceph-workbench.dachary.org/dachary/ceph-workbench/builds/53
[4] Documentation 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#9f3ebf1fc38506b66593397f3baac514d515c496_73_75
[5] Implementation 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#070f4537c6cef8a2dacef1911a7d39acd0ce1387_0_75
[6] Testing 
http://ceph-workbench.dachary.org/root/ceph-workbench/merge_requests/8/diffs#66bd83c5111f0ccc884ad791c4acaa926ab52c2a_0_64
[7] Jenkins Job Builder http://docs.openstack.org/infra/jenkins-job-builder/ 
[8] Configuration of your builds with .gitlab-ci.yml 
http://doc.gitlab.com/ci/yaml/README.html

On 05/11/2015 14:20, Loic Dachary wrote:
> Hi,
> 
> Today, Nathan and I briefly discussed the idea of collecting the backport 
> snippets that are archived in the wiki at 
> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO. We all have copies 
> on our local disks and although they don't diverge much, this is not very 
> sustainable. It was really good as we established the backport workflows. And 
> it would have been immensely painful to maintain a proper software while we 
> were changing the workflow on a regular basis. But it looks like we now have 
> something stable.
> 
> Early this year ceph-workbench[1] was started with the idea of helping with 
> backports. It is a mostly empty shell we can now use to collect all the 
> snippets we have. Instead of adding set-release[2] to the script directory of 
> Ceph, it would be a subcommand of ceph-workbench, like so:
> 
>   ceph-workbench set-release --token $github_token --key $redmine_key
> 
> What do you think ?
> 
> Cheers
> 
> [1] https://pypi.python.org/pypi/ceph-workbench
> [2] https://github.com/ceph/ceph/pull/6466
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


make check bot resumed

2015-11-09 Thread Loic Dachary
Hi,

The machine sending notifications for the make check bot failed during the 
week-end. It was rebooted and it should resume its work. 

The virtual machine was actually re-built because the underlying OpenStack 
cloud was unable to find the volume used for root after a hard reboot. There 
were also issues with the devicemapper docker backend that was corrupted. 
Wiping them out was enough to resolve the problem: they did not have any 
persistent data anyway.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: make check bot resumed

2015-11-09 Thread Loic Dachary
Hi,

For some reason jenkins thought it was necessary to reconsider all commits 
merged weeks ago. It was silenced to not send test results about pull request 
already merged. It should now resume work on the current pull requests. If a 
pull request needs to be visited by the make check bot, it is enough to rebase 
and repush it.

Cheers

On 09/11/2015 15:33, Loic Dachary wrote:
> Hi,
> 
> The machine sending notifications for the make check bot failed during the 
> week-end. It was rebooted and it should resume its work. 
> 
> The virtual machine was actually re-built because the underlying OpenStack 
> cloud was unable to find the volume used for root after a hard reboot. There 
> were also issues with the devicemapper docker backend that was corrupted. 
> Wiping them out was enough to resolve the problem: they did not have any 
> persistent data anyway.
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


a home for backport snippets

2015-11-05 Thread Loic Dachary
Hi,

Today, Nathan and I briefly discussed the idea of collecting the backport 
snippets that are archived in the wiki at 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO. We all have copies 
on our local disks and although they don't diverge much, this is not very 
sustainable. It was really good as we established the backport workflows. And 
it would have been immensely painful to maintain a proper software while we 
were changing the workflow on a regular basis. But it looks like we now have 
something stable.

Early this year ceph-workbench[1] was started with the idea of helping with 
backports. It is a mostly empty shell we can now use to collect all the 
snippets we have. Instead of adding set-release[2] to the script directory of 
Ceph, it would be a subcommand of ceph-workbench, like so:

  ceph-workbench set-release --token $github_token --key $redmine_key

What do you think ?

Cheers

[1] https://pypi.python.org/pypi/ceph-workbench
[2] https://github.com/ceph/ceph/pull/6466
-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: a home for backport snippets

2015-11-05 Thread Loic Dachary
Hi Abhishek,

On 06/11/2015 07:11, Abhishek Varshney wrote:
> Hi Loic,
> 
> It is definitely a great idea to have all the backport snippets under
> a roof. However, these snippets, which are mostly a set of commands,
> provide great flexibility in terms of configuration and the ability to
> partly execute them. For instance, if I see conflicts while preparing
> an integration branch, I comment out the git fetch, checkout and reset
> steps from [1] after resolving the conflicts and re-run the snippet.
> It would be nice if we can incorporate all the snippets and take
> ceph-workbench to that level of flexibility :)

This is a good point: I'm doing the same kind of copy / paste. We'll have to 
adjust the granularity.

   http://ceph-workbench.dachary.org/users/auth/github

was setup and is dedicated to the tool. Please let me know when you first login 
so that I can make you an admin.

Cheers

> 
> [1] 
> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_populate_the_integration_branch
> 
> Thanks
> Abhishek
> 
> On Thu, Nov 5, 2015 at 6:50 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi,
>>
>> Today, Nathan and I briefly discussed the idea of collecting the backport 
>> snippets that are archived in the wiki at 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO. We all have 
>> copies on our local disks and although they don't diverge much, this is not 
>> very sustainable. It was really good as we established the backport 
>> workflows. And it would have been immensely painful to maintain a proper 
>> software while we were changing the workflow on a regular basis. But it 
>> looks like we now have something stable.
>>
>> Early this year ceph-workbench[1] was started with the idea of helping with 
>> backports. It is a mostly empty shell we can now use to collect all the 
>> snippets we have. Instead of adding set-release[2] to the script directory 
>> of Ceph, it would be a subcommand of ceph-workbench, like so:
>>
>>   ceph-workbench set-release --token $github_token --key $redmine_key
>>
>> What do you think ?
>>
>> Cheers
>>
>> [1] https://pypi.python.org/pypi/ceph-workbench
>> [2] https://github.com/ceph/ceph/pull/6466
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: civetweb upstream/downstream divergence

2015-10-30 Thread Loic Dachary
Hi Pete,

On 30/10/2015 13:57, Pete Zaitcev wrote:
> On Thu, 29 Oct 2015 10:58:07 -0700
> Yehuda Sadeh-Weinraub  wrote:
> 
>> We should definitely do it. We're based off civetweb 1.6, and there
>> was no official civetweb version for quite a while, but 1.7 was tagged
>> a few months ago. I made some effort and got most of our material
>> changes upstream, however, there are some changes that might need some
>> more work before we can get them merged, or might not make complete
>> sense at all.
> 
> I take it Nathan is volunteering to parse the delta into logical pieces
> and identify what upstream is willing to accept, right?

I've discussed with Nathan about this general problem a few times. The issue is 
much less about volunteering and much more about how to track the progress of 
the delta over time.

> Dunno about SuSE, but as a Fedora packager I would prefer if we (Ceph)
> talked upstream into making regular releases and then for us to stop
> carrying it entirely. One less git submodule if nothing else.

Right now we have no method. For the jerasure / gf-complete sub-modules, I'm 
watching the delta and do the right thing but it's mostly an unwritten process: 
someone else would do it completely differently. For other Ceph sub-modules I 
suppose each developer has his own way of dealing with the delta.I remember 
Sage recently proposed patches upstream for rocksdb but I'm unaware of where or 
how. I would not be able to help him in any way. And I don't think anyone could 
figure out exactly how to deal with the jerasure / gf-complete sub-modules 
either.

Do you happen to know a project that is using submodules (or copies of projects 
instead of dependencies) and that is also well organized to track the delta ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Ceph erasure coding

2015-10-26 Thread Loic Dachary


On 27/10/2015 03:34, Kjetil Babington wrote:
> I see, thank you for your replies. I will try to implement a
> CRUSH-map, which distributes the chunks in such a way.  Do you think
> there will be any performance issues with such a map, placing multiple
> smaller chunks on each osd instead of one large on each?

Maybe but I'd be better able to tell you looking at the crush ruleset.

> The reason we would like to do this, is that we have developed some
> new erasure codes, which we would like to test on a large-scale
> storage system like Ceph. These erasure codes rely on
> sub-packetization to reduce the amount of data read from each disk
> when reconstructing an erasure. Since CEPH does not seem to have an
> ability to read part of a chunk, we thought we could sort of bypass
> this by making each chunk contain one sub-packet, and place a set of
> these chunks (sub-packets) on each osd.

Interesting :-) Could you share the URL to the code of this erasure code plugin 
?

Cheers

> 
> 2015-10-22 18:59 GMT+02:00 Loic Dachary <l...@dachary.org>:
>> Hi,
>>
>> On 22/10/2015 18:44, Kjetil Babington wrote:
>>> Hi,
>>>
>>> I have a question about the capabilities of the erasure coding API in
>>> Ceph. Let's say that I have 10 data disks and 4 parity disks, is it
>>> possible to create an erasure coding plugin which creates 20 data
>>> chunks and 8 parity chunks, and then places two chunks on each osd?
>>>
>>> Or said maybe a bit simpler is it possible for two or more chunks from
>>> the same encode operation to be placed on the same osd?
>>
>> This is more a question of creating a crush ruleset that does it. The 
>> erasure code plugin encodes chunks but the crush ruleset decides where they 
>> are placed.
>>
>> Cheers
>>
>>>
>>> - Kjetil Babington
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majord...@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


rbd and the next firefly release v0.80.11

2015-10-23 Thread Loic Dachary
Hi Josh,

The next firefly release as found at https://github.com/ceph/ceph/tree/firefly 
passed the rbd suite (http://tracker.ceph.com/issues/11644#note-105 and 
http://tracker.ceph.com/issues/11644#note-120). Do you think the firefly branch 
is ready for QE to start their own round of testing ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre























signature.asc
Description: OpenPGP digital signature


rados and the next firefly release v0.80.11

2015-10-23 Thread Loic Dachary
Hi Sam,

The next firefly release as found at https://github.com/ceph/ceph/tree/firefly 
passed the rados suite (http://tracker.ceph.com/issues/11644#note-110). Do you 
think the firefly branch is ready for QE to start their own round of testing ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre





























signature.asc
Description: OpenPGP digital signature


rgw and the next firefly release v0.80.11

2015-10-23 Thread Loic Dachary
Hi Yehuda,

The next firefly release as found at https://github.com/ceph/ceph/tree/firefly 
passed the rgw suite (http://tracker.ceph.com/issues/11644#note-111). Do you 
think the firefly branch is ready for QE to start their own round of testing ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

























signature.asc
Description: OpenPGP digital signature


cephfs and the next firefly release v0.80.11

2015-10-23 Thread Loic Dachary
Hi Greg,

The next firefly release as found at https://github.com/ceph/ceph/tree/firefly 
passed the fs suite (http://tracker.ceph.com/issues/11644#note-112). Do you 
think the firefly branch is ready for QE to start their own round of testing ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



























signature.asc
Description: OpenPGP digital signature


Re: rados and the next firefly release v0.80.11

2015-10-23 Thread Loic Dachary
 looks good to me

On 23/10/2015 22:13, Loic Dachary wrote:
> Hi Sam,
> 
> The next firefly release as found at 
> https://github.com/ceph/ceph/tree/firefly passed the rados suite 
> (http://tracker.ceph.com/issues/11644#note-110). Do you think the firefly 
> branch is ready for QE to start their own round of testing ?
> 
> Cheers
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Ceph erasure coding

2015-10-22 Thread Loic Dachary
Hi,

On 22/10/2015 18:44, Kjetil Babington wrote:
> Hi,
> 
> I have a question about the capabilities of the erasure coding API in
> Ceph. Let's say that I have 10 data disks and 4 parity disks, is it
> possible to create an erasure coding plugin which creates 20 data
> chunks and 8 parity chunks, and then places two chunks on each osd?
> 
> Or said maybe a bit simpler is it possible for two or more chunks from
> the same encode operation to be placed on the same osd?

This is more a question of creating a crush ruleset that does it. The erasure 
code plugin encodes chunks but the crush ruleset decides where they are placed.

Cheers

> 
> - Kjetil Babington
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: osd activation under 9.1.0

2015-10-16 Thread Loic Dachary


On 16/10/2015 23:09, Deneau, Tom wrote:
> Using 9.1.0 I am getting the error shown below at ceph-deploy osd activate 
> time.
> 
> + ceph-deploy --overwrite-conf osd activate 
> Intel-2P-Sandy-Bridge-04:/var/local//dev/sdf2:/dev/sdf1

Is:

/var/local//dev/sdf2

intentional ? Note that ceph now runs as ceph, not as root.

Cheers

> ...
> [][WARNIN] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph 
> --mkfs --mkkey -i 4 --monmap /var/local/\
> /dev/sdf2/activate.monmap --osd-data /var/local//dev/sdf2 --osd-journal 
> /var/local//dev/sdf2/journal --osd-uuid 204865df-8dbf-4f26-91f2-5dfa7c3a49f8 
> --keyring /var/local//dev/sdf2/keyring --setuser ceph --setgroup ceph
> [][WARNIN] 2015-10-16 13:13:41.464615 7f3f40642940 -1 
> filestore(/var/local//dev/sdf2) mkjournal error creating journ\
> al on /var/local//dev/sdf2/journal: (13) Permission denied
> [][WARNIN] 2015-10-16 13:13:41.464635 7f3f40642940 -1 OSD::mkfs: 
> ObjectStore::mkfs failed with error -13
> [][WARNIN] 2015-10-16 13:13:41.464669 7f3f40642940 -1  ** ERROR: error 
> creating empty object store in /var/local//de\
> v/sdf2: (13) Permission denied
> [][WARNIN] Traceback (most recent call last):
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 3576, in 
> [][WARNIN] main(sys.argv[1:])
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 3530, in main
> [][WARNIN] args.func(args)
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 2432, in main_activate
> [][WARNIN] init=args.mark_init,
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 2258, in activate_dir
> [][WARNIN] (osd_id, cluster) = activate(path, activate_key_template, init)
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 2360, in activate
> [][WARNIN] keyring=keyring,
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 1950, in mkfs
> [][WARNIN] '--setgroup', get_ceph_user(),
> [][WARNIN]   File "/usr/sbin/ceph-disk", line 349, in command_check_call
> [][WARNIN] return subprocess.check_call(arguments)
> [][WARNIN]   File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
> [][WARNIN] raise CalledProcessError(retcode, cmd)
> [][WARNIN] subprocess.CalledProcessError: Command '['/usr/bin/ceph-osd', 
> '--cluster', 'ceph', '--mkfs', '--mkkey', '
> -i', '4', '--monmap', '/var/local//dev/sdf2/activate.monmap', '--osd-data', 
> '/var/local//dev/sdf2', '--osd-journal', '/var/local//dev/sdf2/journal', 
> '--osd-uuid', '204865df-8dbf-4f26-91f2-5dfa7c3a49f8', '--keyring', 
> '/var/local//dev/sdf2/keyring', '--setuser', 'ceph', '--setgroup\
> ', 'ceph']' returned non-zero exit status 1
> [][ERROR ] RuntimeError: command returned non-zero exit status: 1
> [ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v 
> activate --mark-init upstart --mount /var/local//dev/sdf2
> 
> When I look at the data disk, I see the following.  
> 
>   -rw-r--r-- 1 root ceph   210 Oct 16 13:13 activate.monmap
>   -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 ceph_fsid
>   drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 current
>   -rw-r--r-- 1 ceph ceph37 Oct 16 13:13 fsid
>   lrwxrwxrwx 1 root ceph 9 Oct 16 13:13 journal -> /dev/sdf1
>   -rw-r--r-- 1 ceph ceph21 Oct 16 13:13 magic
>   -rw-r--r-- 1 ceph ceph 4 Oct 16 13:13 store_version
>   -rw-r--r-- 1 ceph ceph53 Oct 16 13:13 superblock
>   -rw-r--r-- 1 ceph ceph 2 Oct 16 13:13 whoami
> 
> (The parent directory has
>   drwxr-sr-x 3 ceph ceph  4096 Oct 16 13:13 sdf2)
> 
> I had been creating the partitions myself and then passing them to 
> ceph-deploy osd prepare and osd activate.
> Which worked fine before 9.1.0.
> Is there some extra permissions setup I need to do for 9.1.0?
> 
> Alternatively, is there a single-node setup script for 9.1.0 that I can look 
> at?
> 
> -- Tom Deneau
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: make check bot delays

2015-10-15 Thread Loic Dachary


On 15/10/2015 11:45, Dałek, Piotr wrote:
>> -Original Message-
>> From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel-
>> ow...@vger.kernel.org] On Behalf Of Loic Dachary
>> Sent: Thursday, October 15, 2015 11:09 AM
>>
>> Hi,
>>
>> TL;DR: the make check bot is fixed (no more delays) but only keeps the last
>> 30 runs
>> [..] 
>> What I can't really explain is that it also kept 300 jobs two month ago and 
>> that
>> was no issue. What changed in between... I have no clue.
> 
> Maybe there was some change in Java configuration (memory bounds in 
> particular)?

I don't think any kind of change or update or anything was done on the jenkins 
running the bot (and doing nothing else) in the past 9 months.

Weird.

> 
> With best regards / Pozdrawiam
> Piotr Dałek
> 
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: make check bot delays

2015-10-15 Thread Loic Dachary
Hi,

TL;DR: the make check bot is fixed (no more delays) but only keeps the last 30 
runs

It looks like the cause of the jenkins delays was the number of run kept in the 
archives (set to 300). Lowering it down to 30 have jenkins back to running with 
no delay between jobs (as opposed to using 100% of one core during 10 minutes 
between two jobs, apparently to do things on the archived runs).

What I can't really explain is that it also kept 300 jobs two month ago and 
that was no issue. What changed in between... I have no clue.

The bot is catching up.

Cheers

On 15/10/2015 00:06, Loic Dachary wrote:
> Hi,
> 
> TL;DR: the jenkins instance running make check bot hangs daily, looking for a 
> solution
> 
> In the past two weeks the make check bot has experienced troubles for which 
> I've been unable to find a cause. The same jenkins instance running it for 
> the past nine month now freezes at random times. Nothing in the logs, no 
> error message: either the page or the slave are just doing nothing no matter 
> what. The only reliable solution seems to be a restart of jenkins (killing 
> the process since even restart blocks). 
> 
> Maybe it's time for the make check bot to run via teuthology instead of 
> jenkins.
> 
> Thanks for your patience
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


make check bot delays

2015-10-14 Thread Loic Dachary
Hi,

TL;DR: the jenkins instance running make check bot hangs daily, looking for a 
solution

In the past two weeks the make check bot has experienced troubles for which 
I've been unable to find a cause. The same jenkins instance running it for the 
past nine month now freezes at random times. Nothing in the logs, no error 
message: either the page or the slave are just doing nothing no matter what. 
The only reliable solution seems to be a restart of jenkins (killing the 
process since even restart blocks). 

Maybe it's time for the make check bot to run via teuthology instead of jenkins.

Thanks for your patience

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Preparing the suites to run with the OpenStack backend

2015-10-14 Thread Loic Dachary
Hi Josh, Yehuda and Greg,

It is my understanding that there is a chance we may need to use the OpenStack 
teuthology backend as a backup while machines in the sepia lab migrate from one 
data center to another. Zack has setup a new teuthology cluster that will 
transparently behave as the cluster in the sepia lab does: the only difference 
being that you would --machine-type openstack instead. Or the new 
teuthology-openstack command could be used if one feels like learning about it.

Despite our best efforts, OpenStack provisionning is not 100% transparent. In 
the past few months we made various attempts at running suites on OpenStack to 
verify that they do not massively fail and identify possible show stoppers. 
When possible the OpenStack backend was adapted but in some cases the suites 
themselves had to be modified. For instance, a number of jobs in the rados 
suite run fine with no attached disks, which is the default. But all jobs in 
rados/thrash need three attached disks per target and that had to be set in the 
ceph-qa-suite files as follows[1]:

openstack:
  machine:
disk: 40 # GB
ram: 8000 # MB
cpus: 1
  volumes: # attached to each instance
count: 3
size: 30 # GB

The rados suite for hammer now runs cleanly on OpenStack [2] and I'll work on 
making it run on infernalis as well [3]. The rbd suite for hammer runs cleanly 
(no changes :-) on OpenStack [4] but needs work to run on infernalis: an 
inventory of the problems was made[5].

A similar verification needs to be done for the rgw and fs suites (the upgrade 
/ ceph-deploy / ceph-disk are not a concern as they already run on virtual 
machines). The first problem that need attention for the rgw suite is 
http://tracker.ceph.com/issues/12471#note-4 (which also happens with the 
infernalis rados suite because it has some rgw workload). AFAIK, there are no 
other outstanding issues.

I will not be able to run and fix all the suites all by myself, it's too much 
work and would divert me for too long from my rados duties. I'm however 
available to help as much as you need to make it work :-)

Cheers


[1] resource hint for rados/thrash 
https://github.com/ceph/ceph-qa-suite/blob/wip-12329-resources-hint-hammer/suites/rados/thrash/clusters/openstack.yaml
[2] running the rados suite on OpenStack virtual machines (hammer) 
http://tracker.ceph.com/issues/12386
[3] running the rados suite on OpenStack virtual machines (infernalis) 
http://tracker.ceph.com/issues/12471
[4] running the rbd suite on OpenStack virtual machines (hammer) 
http://tracker.ceph.com/issues/13265
[5] running the rbd suite on OpenStack virtual machines (infernalis) 
http://tracker.ceph.com/issues/13270

-- 
Loïc Dachary, Artisan Logiciel Libre




signature.asc
Description: OpenPGP digital signature


Re: enable rbd on ec pool ?

2015-10-13 Thread Loic Dachary
Hi Tomy,

On 13/10/2015 06:13, Tomy Cheru wrote:
> Is there a patch available to enable rbd over an EC pool ?

You have to go through a cache tier instead of using it directly. See 
http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ for more 
information.

Cheers

> 
> Currently its restricted,
> 2015-10-12 10:52:23.042085 7f4721ca1840 -1 librbd: error adding image to 
> directory: (95) Operation not supported
> rbd: create error: (95) Operation not supported
> 
> Thanks,
> tomy
> 
> 
> 
> 
> PLEASE NOTE: The information contained in this electronic mail message is 
> intended only for the use of the designated recipient(s) named above. If the 
> reader of this message is not the intended recipient, you are hereby notified 
> that you have received this message in error and that any review, 
> dissemination, distribution, or copying of this message is strictly 
> prohibited. If you have received this communication in error, please notify 
> the sender by telephone or e-mail (as shown above) immediately and destroy 
> any and all copies of this message in your possession (whether hard copies or 
> electronically stored copies).
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: enable rbd on ec pool ?

2015-10-13 Thread Loic Dachary


On 13/10/2015 09:59, Tomy Cheru wrote:
> Hi Loic,
>   Thanks for your response, 
> 
> however am specifically looking for a patch to enable rbd on ec pool(am aware 
> of cache tier option).

Ah :-) I'm not aware of such a patch, even in draft state.

> Thanks,
> tomy
> 
> -Original Message-
> From: Loic Dachary [mailto:l...@dachary.org] 
> Sent: Tuesday, October 13, 2015 12:47 PM
> To: Tomy Cheru; ceph-devel@vger.kernel.org
> Subject: Re: enable rbd on ec pool ?
> 
> Hi Tomy,
> 
> On 13/10/2015 06:13, Tomy Cheru wrote:
>> Is there a patch available to enable rbd over an EC pool ?
> 
> You have to go through a cache tier instead of using it directly. See 
> http://docs.ceph.com/docs/master/rados/operations/cache-tiering/ for more 
> information.
> 
> Cheers
> 
>>
>> Currently its restricted,
>> 2015-10-12 10:52:23.042085 7f4721ca1840 -1 librbd: error adding image 
>> to directory: (95) Operation not supported
>> rbd: create error: (95) Operation not supported
>>
>> Thanks,
>> tomy
>>
>>
>> 
>>
>> PLEASE NOTE: The information contained in this electronic mail message is 
>> intended only for the use of the designated recipient(s) named above. If the 
>> reader of this message is not the intended recipient, you are hereby 
>> notified that you have received this message in error and that any review, 
>> dissemination, distribution, or copying of this message is strictly 
>> prohibited. If you have received this communication in error, please notify 
>> the sender by telephone or e-mail (as shown above) immediately and destroy 
>> any and all copies of this message in your possession (whether hard copies 
>> or electronically stored copies).
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
>> in the body of a message to majord...@vger.kernel.org More majordomo 
>> info at  http://vger.kernel.org/majordomo-info.html
>>
> 
> --
> Loïc Dachary, Artisan Logiciel Libre
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Fwd: monitor crashing

2015-10-13 Thread Loic Dachary
https://github.com/ceph/ceph/compare/hammer...wip-ecpool-hammer

In order to bypass the crush verification, you could:

ceph tell mon.* injectargs --crushtool /bin/true

Cheers

On 13/10/2015 15:41, Sage Weil wrote:
> On Tue, 13 Oct 2015, Luis Periquito wrote:
>> the store.db dir is 3.4GB big :(
>>
>> can I do it on my side?
> 
> Nevermind, I was able to reproduce it from the bugzilla.  I've pushed a 
> branch wip-ecpool-hammer.  Not sure which distro you're on, but packages 
> will appear at gitbuilder.ceph.com in 30-45 minutes.  This fixes the mon 
> crash, which will let you delete the pool.  I suggest stopping the OSDs 
> before starting the mon with this or else they might get pg create 
> messages and crash too.  Once the pool is removed you can start them 
> again.  They shouldn't need to be upgraded.
> 
> Note that the latest hammer doesn't let you create the pool at all because 
> it fails the crush safety check (I had to disable the check to reproduce 
> this), so that's good at least!
> 
> sage
> 
>>
>> On Tue, Oct 13, 2015 at 2:25 PM, Sage Weil  wrote:
>>> On Tue, 13 Oct 2015, Luis Periquito wrote:
 Any ideas? I'm growing desperate :(

 I've tried compiling from source, and including
 https://github.com/ceph/ceph/pull/5276, but it still crashes on boot
 of the ceph-mon
>>>
>>> If you can email a (link to a) tarball of your mon data directory I'd love
>>> to extract the osdmap and see why crush is crashing.. it's obviously not
>>> supposed to do that (even with a bad rule).  You can also use
>>> the ceph-post-file utility.
>>>
>>> Thanks!
>>> sage
>>>
>>>

 -- Forwarded message --
 From: Luis Periquito 
 Date: Tue, Oct 13, 2015 at 12:26 PM
 Subject: Re: monitor crashing
 To: Ceph Users 


 I'm currently running Hammer (0.94.3), created an invalid LRC profile
 (typo in the l=, should have been l=4 but was l=3, and now I don't
 have enough different ruleset-locality) and created a pool. Is there
 any way to delete this pool? remember I can't start the ceph-mon...

 On Tue, Oct 13, 2015 at 11:56 AM, Luis Periquito  
 wrote:
> It seems I've hit this bug:
> https://bugzilla.redhat.com/show_bug.cgi?id=1231630
>
> is there any way I can recover this cluster? It worked in our test
> cluster, but crashed the production one...
 --
 To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: pre-Infernalis ceph-disk bug

2015-10-13 Thread Loic Dachary
Hi,

On 14/10/2015 00:02, Jeremy Hanmer wrote:
> I think I've found a bug in ceph-disk when running on Ubuntu 14.04
> (and I believe 12.04 as well, but haven't confirmed) and using
> --dmcrypt.
> 
> The problem is that when update_partition() is called, partprobe is
> used to re-read the partition table (as opposed to partx on all other
> distros) and it appears that it isn't smart/thorough enough to update
> all of the device's metadata. Specifically, ID_PART_ENTRY_TYPE isn't
> updated:
> 
> root@ceph-osd03:~# udevadm info --query=env --name=/dev/vdd1 | grep
> ID_PART_ENTRY_TYPE
> ID_PART_ENTRY_TYPE=89c57f98-2fe5-4dc0-89c1-5ec00ceff2be
> 
> running `partx -u` rather than `partprobe` does the appropriate thing:
> 
> root@ceph-osd03:~# partx -u /dev/vdd1
> root@ceph-osd03:~# udevadm info --query=env --name=/dev/vdd1 | grep
> ID_PART_ENTRY_TYPE
> ID_PART_ENTRY_TYPE=4fbd7e29-9d25-41b8-afd0-5ec00ceff05d
> 
> 
> I have an experimental patch here that Works For Me, but Sage wanted
> me to ping the list for input:
> 
> https://github.com/fzylogic/ceph/commit/8c83f75392d68fbec7def8aa61f20b2c9c237571
> 
> 
> I also want to test the new Infernalis code for this same bug (after a
> cursory check, I strongly suspect it's there as well), but it'll take
> a little bit to get another test cluster up to confirm.

There has been many changes in infernalis, most of them to make it more robust. 
It would be great if you could try to reproduce the problem you had with 
infernalis. 

Your patch looks good and you could also remove 
https://github.com/fzylogic/ceph/blob/8c83f75392d68fbec7def8aa61f20b2c9c237571/src/ceph-disk#L1505
 which will happen immediately after the function returns. 

An alternate fix would be to udevadm settle before 
https://github.com/fzylogic/ceph/blob/8c83f75392d68fbec7def8aa61f20b2c9c237571/src/ceph-disk#L985
 and after it to avoid races. I think the reason why partprobe does not appear 
to work is because it triggers udev events that race with udev events triggered 
by sgdisk while creating the partition.

Cheers

> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: rgw and the next hammer release v0.94.4

2015-10-12 Thread Loic Dachary
Hi,

After todays private discussion and the merge of 
https://github.com/ceph/ceph/pull/6161, I will assume the current hammer branch 
(7f485ed5aa620fe982561663bf64356b7e2c38f2) is ready for QE to start their own 
round of testing. If I misinterpreted what you wrote, please speak up and I'll 
do what's needed ;-)

Cheers

On 02/10/2015 22:31, Loic Dachary wrote:
> Hi Yehuda,
> 
> The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
> passed the rgw suite (http://tracker.ceph.com/issues/12701#note-58). 
> Do you think the hammer branch is ready for QE to start their own round of 
> testing ?
> 
> Cheers
> 
> P.S. http://tracker.ceph.com/issues/12701#Release-information has direct 
> links to the pull requests merged into hammer since v0.94.3 in case you need 
> more context about one of them.
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


hammer branch for v0.94.4 ready for QE

2015-10-12 Thread Loic Dachary
Hi Yuri,

The hammer branch for v0.94.4 as found at 
https://github.com/ceph/ceph/commits/hammer has been approved by Yehuda, Josh 
and Sam (there are no CephFS related commits according to Greg, hence his 
approval was not relevant) and is ready for QE. For the record, the head is 
https://github.com/ceph/ceph/commit/7f485ed5aa620fe982561663bf64356b7e2c38f2 
and the details of the tests run are at http://tracker.ceph.com/issues/12701.

This time around, instead of adding the table to the description, I propose you 
add it as a comment (which can be edited later on). It is easier because it's 
not overloaded with unrelated content. There also is the matter of the maximum 
size of the description field: there is a real risk of exceeding it and 
truncate the result.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre














signature.asc
Description: OpenPGP digital signature


Re: CephFS and the next hammer release v0.94.4

2015-10-08 Thread Loic Dachary


On 08/10/2015 22:50, Gregory Farnum wrote:
> On Tue, Oct 6, 2015 at 8:16 AM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Greg,
>>
>> The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
>> passed the fs suite (http://tracker.ceph.com/issues/12701#note-66). Do you 
>> think the hammer branch is ready for QE to start their own round of testing ?
>>
>> Cheers
>>
>> P.S. http://tracker.ceph.com/issues/12701#Release-information has direct 
>> links to the pull requests merged into hammer since v0.94.3 in case you need 
>> more context about one of them.
> 
> I don't see any FS patches in that list. The only thing I'm aware of
> which CephFS would like to see backported is
> https://github.com/ceph/ceph/pull/5885, which isn't in that list but
> has been merged since...would it be difficult to add that one?

It will be in the v0.94.4 release.

Thanks !

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


CephFS and the next hammer release v0.94.4

2015-10-06 Thread Loic Dachary
Hi Greg,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the fs suite (http://tracker.ceph.com/issues/12701#note-66). Do you 
think the hammer branch is ready for QE to start their own round of testing ?

Cheers

P.S. http://tracker.ceph.com/issues/12701#Release-information has direct links 
to the pull requests merged into hammer since v0.94.3 in case you need more 
context about one of them.

-- 
Loïc Dachary, Artisan Logiciel Libre






















signature.asc
Description: OpenPGP digital signature


Preparing hammer v0.94.5

2015-10-03 Thread Loic Dachary
Hi Abhishek,

The v0.94.5 version was added to the list of versions and you should now be 
able to create the issue to track its progress. Since v0.94.4 is in the process 
of being tested, most of the ~50 backports in flight[1] will actually be for 
v0.94.5 and we can start testing them. The worst that can happen is that a few 
of them shift to v0.94.4 because they are needed after all. From the point of 
view of v0.94.5 that's all the same.

The next step is to create the issue[2] and you should have all the credentials 
you need to do so. 

This will be your first time driving a point release but there should be no 
surprises: you already know all about the process :-)

Cheers

[1] https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Ahammer
[2] 
http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_start_working_on_a_new_point_release#Create-new-task

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Driving the first infernalis point release v9.2.1

2015-10-03 Thread Loic Dachary
Nathan & Abhisheks (are we even allowed to do that ? ;-)

Immediately after the first infernalis release v9.2.0 [1], we will start 
preparing the v9.2.1 point release. Would one of you be willing to drive it ?

Cheers

[1] Release numbers conventions 
http://docs.ceph.com/docs/master/releases/#release-numbers-conventions

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Preparing hammer v0.94.5

2015-10-03 Thread Loic Dachary


On 03/10/2015 19:47, Abhishek Varshney wrote:
> Hi Loic,
> 
> 
> On Sat, Oct 3, 2015 at 2:09 PM, Loic Dachary <l...@dachary.org> wrote:
>> Hi Abhishek,
>>
>> The v0.94.5 version was added to the list of versions and you should now be 
>> able to create the issue to track its progress. Since v0.94.4 is in the 
>> process of being tested, most of the ~50 backports in flight[1] will 
>> actually be for v0.94.5 and we can start testing them. The worst that can 
>> happen is that a few of them shift to v0.94.4 because they are needed after 
>> all. From the point of view of v0.94.5 that's all the same.
>>
>> The next step is to create the issue[2] and you should have all the 
>> credentials you need to do so.
> 
> I have created a new issue to keep track of hammer v0.94.5 (
> http://tracker.ceph.com/issues/13356 ). I would next prepare an
> integration branch with all the in-flight backports and update the
> tracker for v0.94.5.

The hammer-backport has them already and it compiles ( 
http://ceph.com/gitbuilder.cgi ). I'll schedule suites for you since you do not 
have access to the sepia lab yet. You'll have a lot of fun sorting out the 
results.

> 
>>
>> This will be your first time driving a point release but there should be no 
>> surprises: you already know all about the process :-)
> 
> I guess I know the process well now, but, I am sure I will need your
> support and motivation in getting it through :)

I'll be around :-) 

> 
>>
>> Cheers
>>
>> [1] https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Ahammer
>> [2] 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_start_working_on_a_new_point_release#Create-new-task
>>
>> --
>> Loïc Dachary, Artisan Logiciel Libre
>>

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


rbd and the next hammer release v0.94.4

2015-10-02 Thread Loic Dachary
Hi Josh,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the rbd suite (http://tracker.ceph.com/issues/12701#note-61). Do you 
think the hammer branch is ready for QE to start their own round of testing ?

Cheers

P.S. http://tracker.ceph.com/issues/12701#Release-information has direct links 
to the pull requests merged into hammer since v0.94.3 in case you need more 
context about one of them.

-- 
Loïc Dachary, Artisan Logiciel Libre





















signature.asc
Description: OpenPGP digital signature


rgw and the next hammer release v0.94.4

2015-10-02 Thread Loic Dachary
Hi Yehuda,

The next hammer release as found at https://github.com/ceph/ceph/tree/hammer 
passed the rgw suite (http://tracker.ceph.com/issues/12701#note-58). 
Do you think the hammer branch is ready for QE to start their own round of 
testing ?

Cheers

P.S. http://tracker.ceph.com/issues/12701#Release-information has direct links 
to the pull requests merged into hammer since v0.94.3 in case you need more 
context about one of them.

-- 
Loïc Dachary, Artisan Logiciel Libre



















signature.asc
Description: OpenPGP digital signature


Re: Teuthology Integration to native openstack

2015-09-30 Thread Loic Dachary
OS-SRV-USG:terminated_at"
  },
  {
"Value": "",
"Field": "accessIPv4"
  },
  {
"Value": "",
"Field": "accessIPv6"
  },
  {
"Value": "Ext-Net=167.114.225.193",
"Field": "addresses"
  },
  {
"Value": "",
"Field": "config_drive"
  },
  {
"Value": "2015-09-30T08:37:01Z",
"Field": "created"
  },
  {
"Value": "vps-ssd-3 (e43d7458-6b82-4a78-a712-3a4dc6748cf4)",
"Field": "flavor"
  },
  {
"Value": "38119f63edc62252c491fa7e9a8d164a90c48db09fdee1a5687c1c7f",
"Field": "hostId"
  },
  {
"Value": "897cbcc9-d662-4ae9-bb68-a71ef4269cdc",
"Field": "id"
  },
  {
"Value": "teuthology-centos-7.0 (67438ecf-803c-45a6-83bb-54a0ba0d0b6c)",
"Field": "image"
  },
  {
"Value": "teuthology",
"Field": "key_name"
  },
  {
"Value": "target225193",
"Field": "name"
  },
  {
"Value": [],
"Field": "os-extended-volumes:volumes_attached"
  },
  {
"Value": 0,
"Field": "progress"
  },
  {
"Value": "131b886b156a4f84b5f41baf2fbe646c",
"Field": "project_id"
  },
  {
"Value": "ownedby='167.114.249.14', 
teuthology='d48f8bc9adf785614308e33094933a72'",
"Field": "properties"
  },
  {
"Value": [
  {
"name": "teuthology"
  }
],
"Field": "security_groups"
  },
  {
"Value": "ACTIVE",
"Field": "status"
  },
  {
"Value": "2015-09-30T08:39:08Z",
"Field": "updated"
  },
  {
"Value": "291dde1633154837be2693c6ffa6315c",
"Field": "user_id"
  }
]

> 
> Thank you.
> 
> Regards,
> M Bharath Krishna
> 
> 
> On 9/28/15, 3:20 PM, "Loic Dachary" <l...@dachary.org> wrote:
> 
>> Hi,
>>
>> On 28/09/2015 07:24, Bharath Krishna wrote:
>>> Hi Dachary,
>>>
>>> Thanks for the reply. I am following your blog
>>> http://dachary.org/?p=3767
>>> And the README in
>>>
>>> https://github.com/dachary/teuthology/tree/wip-6502-openstack-v2/#opensta
>>> ck
>>> -backend
>>
>> The up to date instructions are at
>> https://github.com/dachary/teuthology/tree/openstack/#openstack-backend
>> (the link you used comes from http://dachary.org/?p=3828 and I just
>> updated it so noone else will be confused).
>>>
>>> I have sourced the openrc file of my Openstack deployment and verified
>>> that clients are working fine. My Openstack deployment has Cinder
>>> integrated with CEPH backend.
>>>
>>> I have cloned and installed teuthology using the below steps:
>>>
>>> $ git clone -b wip-6502-openstack-v2
>>> http://github.com/dachary/teuthology
>>> $ cd teuthology ; ./bootstrap install
>>> $ source virtualenv/bin/activate
>>>
>>>
>>> Then I tried to run a dummy suite as test and I ran into following
>>> error:
>>>
>>> Traceback (most recent call last):
>>>   File "/root/teuthology/virtualenv/bin/teuthology-openstack", line 9,
>>> in
>>> 
>>> load_entry_point('teuthology==0.1.0', 'console_scripts',
>>> 'teuthology-openstack')()
>>>   File "/root/teuthology/scripts/openstack.py", line 8, in main
>>> teuthology.openstack.main(parse_args(argv), argv)
>>>   File "/root/teuthology/teuthology/openstack.py", line 375, in main
>>> return TeuthologyOpenStack(ctx, teuth_config, argv).main()
>>>   File "/root/teuthology/teuthology/openstack.py", line 181, in main
>>> self.verify_openstack()
>>>   File "/root/teuthology/teuthology/openstack.py", line 270, in
>>> verify_openstack
>>> str(providers))
>>> Exception: ('OS_AUTH_URL=http://:5000/v2.0', " does is not a
>>> known OpenStack provider (('cloud.ovh.net', 'ovh'), ('control.os1.phx2',
>>> 'redhat'), ('entercloudsuite.com', 'entercloudsuite'))")
>>
>> This limitation was in an earlier implementations and should not be a
>> problem now.
>>
>> Cheers
>>
>>>
>>>
>>> Thank you.
>>>
>>> Regards,
>>> M Bharath Krishna
>>>
>>> On 9/28/15, 1:47 AM, "Loic Dachary" <l...@dachary.org> wrote:
>>>
>>>> [moving to ceph-devel]
>>>>
>>>> Hi,
>>>>
>>>> On 27/09/2015 21:20, Bharath Krishna wrote:
>>>>> Hi,
>>>>>
>>>>> We have an openstack deployment in place with CEPH as CINDER backend.
>>>>>
>>>>> We would like to perform functional testing for CEPH and found
>>>>> teuthology as recommended option.
>>>>>
>>>>> Have successfully installed teuthology. Now to integrate it with
>>>>> Openstack, I could see that the possible providers could be either
>>>>> OVH,
>>>>> REDHAT or ENTERCLOUDSITE.
>>>>>
>>>>> Is there any option where in we can source openstack deployment of our
>>>>> own and test CEPH using teuthology?
>>>>
>>>> The documentation mentions these providers because they have been
>>>> tested.
>>>> But there should be no blocker to run teuthology against a regular
>>>> OpenStack provider. Should you run into troubles, please let me know
>>>> and
>>>> I'll help.
>>>>
>>>> Cheers
>>>>
>>>>>
>>>>> If NO, please suggest on how to test CEPH in such scenarios?
>>>>>
>>>>> Please help.
>>>>>
>>>>> Thank you.
>>>>> Bharath Krishna
>>>>> ___
>>>>> ceph-users mailing list
>>>>> ceph-us...@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>
>>>> -- 
>>>> Loïc Dachary, Artisan Logiciel Libre
>>>>
>>>
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Teuthology Integration to native openstack

2015-09-30 Thread Loic Dachary


On 30/09/2015 11:34, Bharath Krishna wrote:
> Hi Loic,
> 
> Does piping the command output of "openstack server show -f json
> ” to jq alter the output format?

It just displays it nicely but does not otherwise change it.


> 
> Openstack version being used is Juno.

That's also the version of some of the clusters I use. What version of the 
openstack cli do you have ?

$ openstack --version
openstack 1.7.0


> 
> Thank you
> 
> Regards,
> M Bharath Krishna
> 
> On 9/30/15, 2:20 PM, "Loic Dachary" <l...@dachary.org> wrote:
> 
>> Hi,
>>
>> On 30/09/2015 07:51, Bharath Krishna wrote:
>>> Hi,
>>>
>>> Thanks a lot for pointing to right git and instructions. I have passed
>>> that step now and teuthology VM got created.
>>>
>>> But teuthology openstack command fails to parse the instance id from the
>>> json format output of below command:
>>>
>>> DEBUG:teuthology.misc:openstack server show -f json teuthology output
>>>
>>>  "OS-EXT-STS:task_state": null,
>>>   "addresses": ³Primary_External_Net=",
>>>   "image": "teuthology-ubuntu-14.04
>>> (10e6d3b1-f94a-4220-a00f-3e3a13f349e0)",
>>>   "OS-EXT-STS:vm_state": "active",
>>>   "OS-EXT-SRV-ATTR:instance_name": "instance-26e8",
>>>   "OS-SRV-USG:launched_at": "2015-09-28T10:33:09.00",
>>>   "flavor": "m1.small (2)",
>>>   "id": "79a41b6f-f379-4d14-98ac-e73cb42cfa48",
>>>   "security_groups": [
>>> {
>>>   "name": "teuthology"
>>> }
>>>   ],
>>>   "user_id": "281f9aa2d9c54177b45e72db742b4744",
>>>   "OS-DCF:diskConfig": "MANUAL",
>>>   "accessIPv4": "",
>>>   "accessIPv6": "",
>>>   "progress": 0,
>>>   "OS-EXT-STS:power_state": 1,
>>>   "OS-EXT-AZ:availability_zone": "az3",
>>>   "config_drive": "",
>>>   "status": "ACTIVE",
>>>   "updated": "2015-09-28T10:33:09Z",
>>>   "hostId": "b205fbea7ee98ef482712db93325a1d7d44d7694a8ec9fce7df038c3",
>>>   "OS-EXT-SRV-ATTR:host": ³hostname",
>>>   "OS-SRV-USG:terminated_at": null,
>>>   "key_name": "ceph_test_key",
>>>   "properties": "",
>>>   "project_id": "1d0137fe585742bdbe13e2b16daab6ff",
>>>   "OS-EXT-SRV-ATTR:hypervisor_hostname": ³hostname",
>>>   "name": "teuthology",
>>>   "created": "2015-09-28T10:32:47Z",
>>>   "os-extended-volumes:volumes_attached": []
>>> }
>>> Traceback (most recent call last):
>>>   File "/opt/teuthology/virtualenv/bin/teuthology-openstack", line 9, in
>>> 
>>> load_entry_point('teuthology==0.1.0', 'console_scripts',
>>> 'teuthology-openstack')()
>>>   File "/opt/teuthology/scripts/openstack.py", line 8, in main
>>> teuthology.openstack.main(parse_args(argv), argv)
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 622, in
>>> main
>>> return TeuthologyOpenStack(ctx, teuth_config, argv).main()
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 284, in
>>> main
>>> ip = self.setup()
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 338, in
>>> setup
>>> if not self.cluster_exists():
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 607, in
>>> cluster_exists
>>> instance_id = self.get_instance_id(self.args.name)
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 565, in
>>> get_instance_id
>>> return TeuthologyOpenStack.get_value(instance, 'id')
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 75, in
>>> get_value
>>> return filter(lambda v: v['Field'] == field, result)[0]['Value']
>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 75, in
>>> 
>>> return filter(lambda v: v['Field'] == field, result)[0]['Value']
>>> TypeError: string indices must be integers
>>>

Re: Teuthology Integration to native openstack

2015-09-30 Thread Loic Dachary
Could you send me privately the full log ? I suspect something else is 
happening (not a problem with tools / cluster version) and I may find a clue in 
the logs.

On 30/09/2015 12:17, Bharath Krishna wrote:
> Its the same version I do have as well.
> 
> #openstack --version
> openstack 1.7.0
> 
> 
> Thank you.
> 
> Regards
> M Bharath Krishna
> 
> 
> 
> On 9/30/15, 3:42 PM, "Loic Dachary" <l...@dachary.org> wrote:
> 
>>
>>
>> On 30/09/2015 11:34, Bharath Krishna wrote:
>>> Hi Loic,
>>>
>>> Does piping the command output of "openstack server show -f json
>>> ” to jq alter the output format?
>>
>> It just displays it nicely but does not otherwise change it.
>>
>>
>>>
>>> Openstack version being used is Juno.
>>
>> That's also the version of some of the clusters I use. What version of
>> the openstack cli do you have ?
>>
>> $ openstack --version
>> openstack 1.7.0
>>
>>
>>>
>>> Thank you
>>>
>>> Regards,
>>> M Bharath Krishna
>>>
>>> On 9/30/15, 2:20 PM, "Loic Dachary" <l...@dachary.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> On 30/09/2015 07:51, Bharath Krishna wrote:
>>>>> Hi,
>>>>>
>>>>> Thanks a lot for pointing to right git and instructions. I have passed
>>>>> that step now and teuthology VM got created.
>>>>>
>>>>> But teuthology openstack command fails to parse the instance id from
>>>>> the
>>>>> json format output of below command:
>>>>>
>>>>> DEBUG:teuthology.misc:openstack server show -f json teuthology output
>>>>>
>>>>>  "OS-EXT-STS:task_state": null,
>>>>>   "addresses": ³Primary_External_Net=",
>>>>>   "image": "teuthology-ubuntu-14.04
>>>>> (10e6d3b1-f94a-4220-a00f-3e3a13f349e0)",
>>>>>   "OS-EXT-STS:vm_state": "active",
>>>>>   "OS-EXT-SRV-ATTR:instance_name": "instance-26e8",
>>>>>   "OS-SRV-USG:launched_at": "2015-09-28T10:33:09.00",
>>>>>   "flavor": "m1.small (2)",
>>>>>   "id": "79a41b6f-f379-4d14-98ac-e73cb42cfa48",
>>>>>   "security_groups": [
>>>>> {
>>>>>   "name": "teuthology"
>>>>> }
>>>>>   ],
>>>>>   "user_id": "281f9aa2d9c54177b45e72db742b4744",
>>>>>   "OS-DCF:diskConfig": "MANUAL",
>>>>>   "accessIPv4": "",
>>>>>   "accessIPv6": "",
>>>>>   "progress": 0,
>>>>>   "OS-EXT-STS:power_state": 1,
>>>>>   "OS-EXT-AZ:availability_zone": "az3",
>>>>>   "config_drive": "",
>>>>>   "status": "ACTIVE",
>>>>>   "updated": "2015-09-28T10:33:09Z",
>>>>>   "hostId": 
>>>>> "b205fbea7ee98ef482712db93325a1d7d44d7694a8ec9fce7df038c3",
>>>>>   "OS-EXT-SRV-ATTR:host": ³hostname",
>>>>>   "OS-SRV-USG:terminated_at": null,
>>>>>   "key_name": "ceph_test_key",
>>>>>   "properties": "",
>>>>>   "project_id": "1d0137fe585742bdbe13e2b16daab6ff",
>>>>>   "OS-EXT-SRV-ATTR:hypervisor_hostname": ³hostname",
>>>>>   "name": "teuthology",
>>>>>   "created": "2015-09-28T10:32:47Z",
>>>>>   "os-extended-volumes:volumes_attached": []
>>>>> }
>>>>> Traceback (most recent call last):
>>>>>   File "/opt/teuthology/virtualenv/bin/teuthology-openstack", line 9,
>>>>> in
>>>>> 
>>>>> load_entry_point('teuthology==0.1.0', 'console_scripts',
>>>>> 'teuthology-openstack')()
>>>>>   File "/opt/teuthology/scripts/openstack.py", line 8, in main
>>>>> teuthology.openstack.main(parse_args(argv), argv)
>>>>>   File "/opt/teuthology/teuthology/openstack/__init__.py", line 622,
>>>>> i

Re: [ceph-users] [puppet] Moving puppet-ceph to the Openstack big tent

2015-09-29 Thread Loic Dachary
Good move :-)

On 29/09/2015 23:45, Andrew Woodward wrote:
> [I'm cross posting this to the other Ceph threads to ensure that it's seen]
> 
> We've discussed this on Monday on IRC and again in the puppet-openstack IRC 
> meeting. The current census is that we will move from the deprecated 
> stackforge organization and will be moved to the openstack one. At this time 
> we will not be perusing membership as a formal OpenStack project. This will 
> allow puppet-ceph to retain the tight relationship with OpenStack community 
> and tools for the time being. 
> 
> On Mon, Sep 28, 2015 at 8:32 AM David Moreau Simard  > wrote:
> 
> Hi,
> 
> puppet-ceph currently lives in stackforge [1] which is being retired
> [2]. puppet-ceph is also mirrored on the Ceph Github organization [3].
> This version of the puppet-ceph module was created from scratch and
> not as a fork of the (then) upstream puppet-ceph by Enovance [4].
> Today, the version by Enovance is no longer officially maintained
> since Red Hat has adopted the new release.
> 
> Being an Openstack project under Stackforge or Openstack brings a lot
> of benefits but it's not black and white, there are cons too.
> 
> It provides us with the tools, the processes and the frameworks to
> review and test each contribution to ensure we ship a module that is
> stable and is held to the highest standards.
> But it also means that:
> - We forego some level of ownership back to the Openstack foundation,
> it's technical committee and the Puppet Openstack PTL.
> - puppet-ceph contributors will also be required to sign the
> Contributors License Agreement and jump through the Gerrit hoops [5]
> which can make contributing to the project harder.
> 
> We have put tremendous efforts into creating a quality module and as
> such it was the first puppet module in the stackforge organization to
> implement not only unit tests but also integration tests with third
> party CI.
> Integration testing for other puppet modules are just now starting to
> take shape by using the Openstack CI inrastructure.
> 
> In the context of Openstack, RDO already ships with a mean to install
> Ceph with this very module and Fuel will be adopting it soon as well.
> This means the module will benefit from real world experience and
> improvements by the Openstack community and packagers.
> This will help further reinforce that not only Ceph is the best
> unified storage solution for Openstack but that we have means to
> deploy it in the real world easily.
> 
> We all know that Ceph is also deployed outside of this context and
> this is why the core reviewers make sure that contributions remain
> generic and usable outside of this use case.
> 
> Today, the core members of the project discussed whether or not we
> should move puppet-ceph to the Openstack big tent and we had a
> consensus approving the move.
> We would also like to hear the thoughts of the community on this topic.
> 
> Please let us know what you think.
> 
> Thanks,
> 
> [1]: https://github.com/stackforge/puppet-ceph
> [2]: https://review.openstack.org/#/c/192016/
> [3]: https://github.com/ceph/puppet-ceph
> [4]: https://github.com/redhat-cip/puppet-ceph
> [5]: https://wiki.openstack.org/wiki/How_To_Contribute
> 
> David Moreau Simard
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org 
> 
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> 
> --
> 
> Andrew Woodward
> 
> Mirantis
> 
> Fuel Community Ambassador
> 
> Ceph Community
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: branches! infernalis vs master, RIP next

2015-09-29 Thread Loic Dachary


On 29/09/2015 23:12, Sage Weil wrote:
> Having master and infernalis branches follow each other is preventing some 
> folks from moving forward with post-infernalis work.  And it's confusing.  
> So, we're proposing a new world order... and this time it should keep 
> things consistent both during regular development and stabilization 
> periods:
> 
>  master - where new feature work is merged
>  infernalis ($next_stable) - where bug fixes are merged
>  hammer, firefly ($previous_stable) - where bug fixes are merged
> 
> Previously we had a 'next' branch that was whatever the next pending 
> release would be.  This will henceforth be named after the next major 
> release (currently infernalis).  This let's us avoid having next sometimes 
> during regular development periods and not during stabilization periods.  
> It also means that we needn't know ahead of time whether the next release 
> is going to be a development release, rc, or the initial (x.2.0) release 
> of the next stable series.
> 
> That $next_stable branch (currently infernalis) will periodically get 
> merged back into master, just like next did.  We will stop doing that when 
> the release is made.  From that point forward, any stable series backports 
> will be cherry-picked.
> 
> The only real difference then between the development and stabilization 
> periods is that during development we pull in new stuff from master each 
> time we release, and releases are x.0.z.  During stabilization, we do rc 
> releases that look like x.1.z, and we don't merge in anything new from 
> master.
> 
> So:
> 
>  1- Target any pull request with a bug fix that should go into infernalis 
> at the infernalis branch.
> 
>  2- Before merging anything, make sure it is targetting the right branch.  
> If it isn't, merge it manually, and verify it isn't pulling in a bunch of 
> extra stuff (e.g., because a bug fix for infernalis was based on the 
> current master branch).
> 
>  3- We will periodically merge infernalis back into master until the first 
> infernalis (9.2.0) release.  After that we'll cherry-pick -x.
> 
>  4- At that point we'll create a jewel branch that will work the same way.  
> (It will be regularly merged into master, will eventually result in 10.0.z 
> releases, and will pull lumps of new stuff from master in each time that 
> happens.)

When 10.0.0 is released, the jewel branch is reset to master and will 
eventually be 10.0.1 etc. Is that right ?

> 
> Sound okay?
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [Hammer Backports] Should rest-bench be removed on hammer ?

2015-09-28 Thread Loic Dachary
Hi,

On 28/09/2015 12:19, Abhishek Varshney wrote:
> Hi,
> 
> The rest-bench tool has been removed in master through PR #5428
> (https://github.com/ceph/ceph/pull/5428). The backport PR #5812
> (https://github.com/ceph/ceph/pull/5812) is currently causing failures
> on the hammer-backports integration branch. These failures can be
> resolved by either backporting PR #5428 or by adding a hammer-specific
> commit to PR #5812.
> 
> How should we proceed here?

It looks like rest-bench support was removed because cosbench can replace it. 
The string cosbench or rest.bench does not show in ceph-qa-suite / ceph master 
or hammer, which probably means tests using rest-bench are outside of the scope 
of the ceph project. Deprecating rest-bench from hammer by backporting 
https://github.com/ceph/ceph/pull/5428 seems sensible.

Cheers

> 
> Thanks
> Abhishek
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Teuthology Integration to native openstack

2015-09-28 Thread Loic Dachary
Hi,

On 28/09/2015 07:24, Bharath Krishna wrote:
> Hi Dachary,
> 
> Thanks for the reply. I am following your blog http://dachary.org/?p=3767
> And the README in 
> https://github.com/dachary/teuthology/tree/wip-6502-openstack-v2/#openstack
> -backend

The up to date instructions are at 
https://github.com/dachary/teuthology/tree/openstack/#openstack-backend (the 
link you used comes from http://dachary.org/?p=3828 and I just updated it so 
noone else will be confused).
> 
> I have sourced the openrc file of my Openstack deployment and verified
> that clients are working fine. My Openstack deployment has Cinder
> integrated with CEPH backend.
> 
> I have cloned and installed teuthology using the below steps:
> 
> $ git clone -b wip-6502-openstack-v2 http://github.com/dachary/teuthology
> $ cd teuthology ; ./bootstrap install
> $ source virtualenv/bin/activate
> 
> 
> Then I tried to run a dummy suite as test and I ran into following error:
> 
> Traceback (most recent call last):
>   File "/root/teuthology/virtualenv/bin/teuthology-openstack", line 9, in
> 
> load_entry_point('teuthology==0.1.0', 'console_scripts',
> 'teuthology-openstack')()
>   File "/root/teuthology/scripts/openstack.py", line 8, in main
> teuthology.openstack.main(parse_args(argv), argv)
>   File "/root/teuthology/teuthology/openstack.py", line 375, in main
> return TeuthologyOpenStack(ctx, teuth_config, argv).main()
>   File "/root/teuthology/teuthology/openstack.py", line 181, in main
> self.verify_openstack()
>   File "/root/teuthology/teuthology/openstack.py", line 270, in
> verify_openstack
> str(providers))
> Exception: ('OS_AUTH_URL=http://:5000/v2.0', " does is not a
> known OpenStack provider (('cloud.ovh.net', 'ovh'), ('control.os1.phx2',
> 'redhat'), ('entercloudsuite.com', 'entercloudsuite'))")

This limitation was in an earlier implementations and should not be a problem 
now.

Cheers

> 
> 
> Thank you.
> 
> Regards,
> M Bharath Krishna
> 
> On 9/28/15, 1:47 AM, "Loic Dachary" <l...@dachary.org> wrote:
> 
>> [moving to ceph-devel]
>>
>> Hi,
>>
>> On 27/09/2015 21:20, Bharath Krishna wrote:
>>> Hi,
>>>
>>> We have an openstack deployment in place with CEPH as CINDER backend.
>>>
>>> We would like to perform functional testing for CEPH and found
>>> teuthology as recommended option.
>>>
>>> Have successfully installed teuthology. Now to integrate it with
>>> Openstack, I could see that the possible providers could be either OVH,
>>> REDHAT or ENTERCLOUDSITE.
>>>
>>> Is there any option where in we can source openstack deployment of our
>>> own and test CEPH using teuthology?
>>
>> The documentation mentions these providers because they have been tested.
>> But there should be no blocker to run teuthology against a regular
>> OpenStack provider. Should you run into troubles, please let me know and
>> I'll help.
>>
>> Cheers
>>
>>>
>>> If NO, please suggest on how to test CEPH in such scenarios?
>>>
>>> Please help.
>>>
>>> Thank you.
>>> Bharath Krishna
>>> ___
>>> ceph-users mailing list
>>> ceph-us...@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>> -- 
>> Loïc Dachary, Artisan Logiciel Libre
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Teuthology Integration to native openstack

2015-09-27 Thread Loic Dachary
[moving to ceph-devel]

Hi,

On 27/09/2015 21:20, Bharath Krishna wrote:
> Hi,
> 
> We have an openstack deployment in place with CEPH as CINDER backend.
> 
> We would like to perform functional testing for CEPH and found teuthology as 
> recommended option.
> 
> Have successfully installed teuthology. Now to integrate it with Openstack, I 
> could see that the possible providers could be either OVH, REDHAT or 
> ENTERCLOUDSITE.
> 
> Is there any option where in we can source openstack deployment of our own 
> and test CEPH using teuthology?

The documentation mentions these providers because they have been tested. But 
there should be no blocker to run teuthology against a regular OpenStack 
provider. Should you run into troubles, please let me know and I'll help.

Cheers

> 
> If NO, please suggest on how to test CEPH in such scenarios?
> 
> Please help.
> 
> Thank you.
> Bharath Krishna
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Firefly help

2015-09-26 Thread Loic Dachary


On 26/09/2015 22:14, Nathan Cutler wrote:
> Hi Loic:
> 
> For some reason I cannot reach teuthology.front.sepia.ceph.com - I turn
> on the VPN but the machine does not respond to pings or ssh. (Nor does
> the gateway machine, for that matter.)
> 
> If you feel inclined to help, could you start the standard round of
> integration tests on firefly-backports?

Hi,

They are scheduled starting http://tracker.ceph.com/issues/11644#note-101. I 
find this way more convenient as each comment can be edited individually (as 
opposed to a large description). And they are sorted chronologically :-)

Cheers

> 
> Thanks,
> Nathan
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: failed to open http://apt-mirror.front.sepia.ceph.com

2015-09-23 Thread Loic Dachary
Hi,

On 23/09/2015 12:29, wangsongbo wrote:
> 64.90.32.37 apt-mirror.front.sepia.ceph.com

It works for me. Could you send a traceroute apt-mirror.front.sepia.ceph.com ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: failed to open http://apt-mirror.front.sepia.ceph.com

2015-09-23 Thread Loic Dachary


On 23/09/2015 15:11, Sage Weil wrote:
> On Wed, 23 Sep 2015, Loic Dachary wrote:
>> Hi,
>>
>> On 23/09/2015 12:29, wangsongbo wrote:
>>> 64.90.32.37 apt-mirror.front.sepia.ceph.com
>>
>> It works for me. Could you send a traceroute 
>> apt-mirror.front.sepia.ceph.com ?
> 
> This is a private IP internal to the sepia lab.  Anythign outside the lab 
> shouldn't be using it...

This is the public facing IP and is required for teuthology to run outside of 
the lab (http://tracker.ceph.com/issues/12212).

64.90.32.37 apt-mirror.front.sepia.ceph.com

suggests the workaround was used. And a traceroute will confirm if the 
resolution happens as expected (with the public IP) or with a private IP 
(meaning the workaround is not in place where it should).

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: failed to open http://apt-mirror.front.sepia.ceph.com

2015-09-23 Thread Loic Dachary


On 23/09/2015 18:50, wangsongbo wrote:
> Sage and Loic,
> Thanks for your reply.
> I am running teuthology in our testing.I can send a traceroute to 64.90.32.37.
> but when ceph-cm-ansible run the " yum-complete-transaction --cleanup-only" 
> command,
> it got such a response 
> :"http://apt-mirror.front.sepia.ceph.com/misc-rpms/repodata/repomd.xml: 
> [Errno 14] PYCURL ERROR 7 - "Failed connect to 
> apt-mirror.front.sepia.ceph.com:80; Connection timed out"
> And I replace "apt-mirror.front.sepia.ceph.com"  to "64.90.32.37" in repo 
> file, then run "yum-complete-transaction --cleanup-only" command,
> I got a response like this:"http://64.90.32.37/misc-rpms/repodata/repomd.xml: 
> [Errno 14] PYCURL ERROR 22 - "The requested URL returned error: 502 Bad 
> Gateway""
> I do not know whether it was affected by the last week's attack.

Querying the IP directly won't get you where the mirror is (it's a vhost). I 
think ansible fails because it queries the DNS and does not use the entry you 
set in the /etc/hosts file. The OpenStack teuthology backend sets a specific 
entry in the DNS to workaround the problem (see 
https://github.com/ceph/teuthology/blob/master/teuthology/openstack/setup-openstack.sh#L318)

Cheers

> 
> Thanks and Regards,
> WangSongbo
> 
> On 15/9/23 下午11:22, Loic Dachary wrote:
>>
>> On 23/09/2015 15:11, Sage Weil wrote:
>>> On Wed, 23 Sep 2015, Loic Dachary wrote:
>>>> Hi,
>>>>
>>>> On 23/09/2015 12:29, wangsongbo wrote:
>>>>> 64.90.32.37 apt-mirror.front.sepia.ceph.com
>>>> It works for me. Could you send a traceroute
>>>> apt-mirror.front.sepia.ceph.com ?
>>> This is a private IP internal to the sepia lab.  Anythign outside the lab
>>> shouldn't be using it...
>> This is the public facing IP and is required for teuthology to run outside 
>> of the lab (http://tracker.ceph.com/issues/12212).
>>
>> 64.90.32.37 apt-mirror.front.sepia.ceph.com
>>
>> suggests the workaround was used. And a traceroute will confirm if the 
>> resolution happens as expected (with the public IP) or with a private IP 
>> (meaning the workaround is not in place where it should).
>>
>> Cheers
>>
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: partprobe or partx or ... ?

2015-09-21 Thread Loic Dachary
Hi Ilya,

On 21/09/2015 12:23, Ilya Dryomov wrote:
> On Sat, Sep 19, 2015 at 11:08 PM, Loic Dachary <l...@dachary.org> wrote:
>>
>>
>> On 19/09/2015 17:23, Loic Dachary wrote:
>>> Hi Ilya,
>>>
>>> At present ceph-disk uses partprobe to ensure the kernel is aware of the 
>>> latest partition changes after a new one is created, or after zapping the 
>>> partition table. Although it works reliably (in the sense that the kernel 
>>> is indeed aware of the desired partition layout), it goes as far as to 
>>> remove all partition devices of the current kernel table, only to re-add 
>>> them with the new partition table. The delay it implies is not an issue 
>>> because ceph-disk is rarely called. It however generate many udev events 
>>> (dozens remove/change/add for a two partition disk) and almost always 
>>> creates border cases that are difficult to figure out and debug. While it 
>>> is a good way to ensure that ceph-disk is idempotent and immune to race 
>>> conditions, maybe it is needlessly hard.
>>>
>>> Do you know of a light weight alternative to partprobe ? In the past we've 
>>> used partx but I remember it failed to address some border cases in 
>>> non-intuitive ways. Do you know of another, simpler, approach to this ?
>>>
>>> Thanks in advance for your help :-)
>>>
>>
>> For the record using /sys/block/sdX/device/rescan sounds good but does not 
>> exist for devices created via devicemapper (used for dmcrypt and multipath).
> 
> Hi Loic,
> 
> Yeah, partprobe loops through the entire partition table, trying do
> delete/add every slot.  As an aside, the in-kernel way to do this
> (blockdev --rereadpt) is similar in that it also drops all partitions
> and re-adds them later, but it's faster and probably generates less
> change events.  The downside is it won't work on busy device.
> 
> I don't think there is any alternative, except for using partx --add
> with --nr, that is targeting specific slots in the partition table.  If
> all you are doing is adding partitions and zapping entire partition
> tables, that may work well enough.
> 
> That said, given that the resulting delay (which can be in the seconds
> range, especially if your disk happens to have a busy partition) isn't
> a problem, what difference does it make?  What are you listening to
> those events for?

This is part of the ceph-disk prepare / activate workflow:

 ceph-disk prepare creates partitions, mounts them, populate them and exits
 ceph udev rules ( 95-ceph-osd.rules ) react to udev events when the partition 
type is known and run ceph-disk activate in the background

When a machine boots or a disk is hot swapped, udev rules do the same and 
activate: we only have one code path for all cases. The problem is to ensure 
all race conditions are addressed. What used to work in hammer has to be 
revisited because the code path was changed in infernalis. udev actions no 
longer call ceph-disk activate, because it can take a long time and that's not 
what udev is good at. Instead, udev actions run ceph-disk activate in the 
background, using systemd/upstart when available (it falls back to the legacy 
syncrhonous behavior when they are not available).

I think I managed to address all race conditons with the patch series at 
https://github.com/ceph/ceph/pull/5999.

We should be good with partprobe :-)

> 
> /sys/block/sdX/device/rescan is sd only, and AFAIK it doesn't generally
> trigger a re-read of a partition table.

Thanks a lot for your insights !

Cheers

> 
> Thanks,
> 
> Ilya
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: partprobe or partx or ... ?

2015-09-19 Thread Loic Dachary


On 19/09/2015 17:23, Loic Dachary wrote:
> Hi Ilya,
> 
> At present ceph-disk uses partprobe to ensure the kernel is aware of the 
> latest partition changes after a new one is created, or after zapping the 
> partition table. Although it works reliably (in the sense that the kernel is 
> indeed aware of the desired partition layout), it goes as far as to remove 
> all partition devices of the current kernel table, only to re-add them with 
> the new partition table. The delay it implies is not an issue because 
> ceph-disk is rarely called. It however generate many udev events (dozens 
> remove/change/add for a two partition disk) and almost always creates border 
> cases that are difficult to figure out and debug. While it is a good way to 
> ensure that ceph-disk is idempotent and immune to race conditions, maybe it 
> is needlessly hard.
> 
> Do you know of a light weight alternative to partprobe ? In the past we've 
> used partx but I remember it failed to address some border cases in 
> non-intuitive ways. Do you know of another, simpler, approach to this ?
> 
> Thanks in advance for your help :-)
> 

For the record using /sys/block/sdX/device/rescan sounds good but does not 
exist for devices created via devicemapper (used for dmcrypt and multipath).

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


partprobe or partx or ... ?

2015-09-19 Thread Loic Dachary
Hi Ilya,

At present ceph-disk uses partprobe to ensure the kernel is aware of the latest 
partition changes after a new one is created, or after zapping the partition 
table. Although it works reliably (in the sense that the kernel is indeed aware 
of the desired partition layout), it goes as far as to remove all partition 
devices of the current kernel table, only to re-add them with the new partition 
table. The delay it implies is not an issue because ceph-disk is rarely called. 
It however generate many udev events (dozens remove/change/add for a two 
partition disk) and almost always creates border cases that are difficult to 
figure out and debug. While it is a good way to ensure that ceph-disk is 
idempotent and immune to race conditions, maybe it is needlessly hard.

Do you know of a light weight alternative to partprobe ? In the past we've used 
partx but I remember it failed to address some border cases in non-intuitive 
ways. Do you know of another, simpler, approach to this ?

Thanks in advance for your help :-)

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: What should be in the next hammer/firefly release ?

2015-09-18 Thread Loic Dachary
Hi Robert,

http://tracker.ceph.com/issues/10399 was backported to hammer and will be in 
v0.94.4 (see http://tracker.ceph.com/issues/12751 for details).

Cheers

On 18/09/2015 02:44, Robert LeBlanc wrote:
> Can we get http://tracker.ceph.com/issues/10399 into hammer. We hit this 
> today.
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Thu, Sep 10, 2015 at 4:57 AM, Miyamae, Takeshi  wrote:
>> Hi Loic,
> 
>> As the last pull request #5257 was committed into the main trunk,
>> we believe all the SHEC codes in main trunk are ready to be backported into 
>> Hammer branch.
>> What can we do at this moment?
> 
>> Best regards,
>> Takeshi Miyamae
> 
>> -Original Message-
>> From: Miyamae, Takeshi/宮前 剛
>> Sent: Thursday, September 3, 2015 4:09 PM
>> To: 'ceph-devel-ow...@vger.kernel.org'
>> Cc: Paul Von-Stamwitz (pvonstamw...@us.fujitsu.com); Toshine, Naoyoshi/利根 
>> 直佳; Shiozawa, Kensuke/塩沢 賢輔; Nakao, Takanori/中尾 鷹詔 
>> (nakao.takan...@jp.fujitsu.com)
>> Subject: Re: What should be in the next hammer/firefly release ?
> 
>> Dear Loic,
> 
>> We would like to let following two patches be backported to hammer v0.94.4.
>> (And our wish is finally backporting these patches to RHCS v1.3.) Can it be 
>> possible? If possible, please let us know what should be started at first.
>> (Caution: #5257 has not been committed to master branch yet.)
> 
>> erasure-code: shec plugin feature #5493
>> https://github.com/ceph/ceph/pull/5493
> 
>> erasure code: shec performance optimization by decoding cache #5257
>> https://github.com/ceph/ceph/pull/5257
> 
>> Best regards,
>> Takeshi Miyamae
> 
>> -Original Message-
>> From: Loic Dachary  dachary.org>
>> Subject: What should be in the next hammer/firefly release ?
>> Newsgroups: gmane.comp.file-systems.ceph.devel
>> Date: 2015-09-02 11:00:53 GMT (15 hours and 32 minutes ago) Hi,
> 
>> I added a link to
> 
>> http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO#Overview-of-the-backports-in-progress
> 
>> to show all issues that should be in the next point release for
> 
>> hammer v0.94.4 : http://tracker.ceph.com/versions/495
>> firefly v0.80.11 : http://tracker.ceph.com/versions/480
> 
>> Cheers
> 
>> --
>> Loïc Dachary, Artisan Logiciel Libre
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


install-deps.sh failures on Ubuntu 14.04

2015-09-17 Thread Loic Dachary
Hi,

Because a bug appeared in the last 24h when running pip wheel coverage with 
python3[1], the install-deps.sh[2] script now fails on Ubuntu 14.04. This 
causes failure on http://ceph.com/gitbuilder.cgi as well as the make check bot 
that runs on pull requests.

A workaround is prepared at http://tracker.ceph.com/issues/13136

Cheers

[1] pip wheel coverage fails with python3 
https://bugs.launchpad.net/ubuntu/+source/python-coverage/+bug/1496715
[2] install-deps.sh https://github.com/ceph/ceph/blob/master/install-deps.sh

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Backporting from Infernalis and c++11

2015-09-15 Thread Loic Dachary
Hi Ceph,

With Infernalis Ceph move to c++11 (and CMake), we will see more conflicts when 
backporting bug fixes to Hammer. Any ideas you may have to better deal with 
this would be most welcome. Since these conflicts will be mostly cosmetic, they 
should not be too difficult to resolve. The trick will be for someone not 
familiar with the codebase to separate what is cosmetic and what is not.

This does not happen yet, no immediate concern :-) Maybe if we think about that 
well in advance we'll be in a better position to deal with it later on ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: Backporting from Infernalis and c++11

2015-09-15 Thread Loic Dachary
Hi John,

On 15/09/2015 12:02, John Spray wrote:
> On Tue, Sep 15, 2015 at 8:21 AM, Loic Dachary <l...@dachary.org> wrote:
>> With Infernalis Ceph move to c++11 (and CMake), we will see more conflicts 
>> when backporting bug fixes to Hammer. Any ideas you may have to better deal 
>> with this would be most welcome. Since these conflicts will be mostly 
>> cosmetic, they should not be too difficult to resolve. The trick will be for 
>> someone not familiar with the codebase to separate what is cosmetic and what 
>> is not.
>>
>> This does not happen yet, no immediate concern :-) Maybe if we think about 
>> that well in advance we'll be in a better position to deal with it later on ?
> 
> I think this came up in conversation but wasn't necessarily made
> official policy yet -- my understanding is that we are (already)
> endeavouring to avoid c++11isms in bug fixes, along with the usual
> principle of fixing bugs in the smallest/neatest patch we can.
> 
> Perhaps in cases where those of us working on master mistakenly put
> something un-backportable in a bug fix, it would be reasonable for the
> backporter to point it out and poke us for a clean version of the
> patch.

We'll do our best but it's very reassuring to know we can rely on you if we 
struggle with c++11isms :-)

Thanks !

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [Hammer Backports] Status of https://github.com/ceph/ceph/pull/5888

2015-09-15 Thread Loic Dachary


On 15/09/2015 12:36, Abhishek Varshney wrote:
> Hi Loic/Abhishek,
> 
> Here are a few backport PRs which I created and some of them have 
> run-make-check failures. I have tried to triage them and this is the summary 
> of what I could make of. It would be nice if you can have a look at them and 
> help me in how to proceed ahead with them.
> 
>   * https://github.com/ceph/ceph/pull/5888 run-make-check status FAILURE
>   o It may require commit 90e5f410ee35a614e1d49771226064d012d8b85e to be 
> in hammer, as per my understanding.
> 

This is probably non trivial: you could ask for help with a pull request 
comment like:

@XinzeChi this commit does not apply cleanly, would you mind taking a look ?

http://jenkins.ceph.dachary.org/job/ceph/LABELS=centos-7&_64/7703/console

osd/ReplicatedPG.cc:11761:76: error: expected '}' at end of input
 void intrusive_ptr_release(ReplicatedPG::RepGather *repop) { repop->put(); }
^
osd/ReplicatedPG.cc:11761:76: error: expected '}' at end of input


-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [Hammer Backports] Status of a few of run-make-check Failed PRs

2015-09-15 Thread Loic Dachary

> PS: I would be on vacation from today and would be back to work on Monday 
> (21st).

Enjoy !

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: [Hammer Backports] Status of https://github.com/ceph/ceph/pull/5812

2015-09-15 Thread Loic Dachary


On 15/09/2015 12:36, Abhishek Varshney wrote:

>   * https://github.com/ceph/ceph/pull/5812 run-make-check-status SUCCESS.
>   o This is the PR with changes to obj_bencher.{cc,h}, and I guess we are 
> waiting for Sage's approval to backport all commits from 9bcf5f0 to 069d95e 
> along with it.

Correct :-)

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


shec tests failure on i386

2015-09-14 Thread Loic Dachary
Hi Takeshi,

http://tracker.ceph.com/issues/12936 shows that shec tests fail on i386. Before 
I investigate further, I'd like to know if you had a chance to run more tests ? 
There are not many users of i386 these days, of course ;-) But this is a good 
test to verify there are no architecture related problems.

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


Re: reducing package size / compression time

2015-09-14 Thread Loic Dachary
Thanks for the link. Was that effective ?

On 14/09/2015 03:40, Sage Weil wrote:
> On Mon, 14 Sep 2015, Loic Dachary wrote:
>> Hi Sage,
>>
>> You did something to reduce the size (hence the compression time) of the 
>> debug packages using 
>> https://fedoraproject.org/wiki/Features/DwarfCompressor. Would you be so 
>> kind as to remind me which commit does that ?
> 
> https://github.com/ceph/autobuild-ceph/commit/193864ec69edb4dbb0112bb3ea54e6d2f20b30dd
> 
> The suggestion came from Boris.
> 
> sage
> 

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


reducing the size of Ceph debug packages

2015-09-14 Thread Loic Dachary
Hi James,

https://fedoraproject.org/wiki/Features/DwarfCompressor is packaged for Ubuntu 
(http://packages.ubuntu.com/trusty/dwz) and could be used to reduce the size of 
the Ceph debug packages. This is how it's done for RPM : 
https://github.com/ceph/autobuild-ceph/commit/193864ec69edb4dbb0112bb3ea54e6d2f20b30dd
 but I could not find a  page explaining how to do the same for Ubuntu. Do you 
happen to have more information on that topic ?

About half the time required to build Ceph packages is used to compress the 
debug packages (ceph-dbg, ceph-test-dbg) and it would be great if we could 
reduce that :-)

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


reducing package size / compression time

2015-09-13 Thread Loic Dachary
Hi Sage,

You did something to reduce the size (hence the compression time) of the debug 
packages using https://fedoraproject.org/wiki/Features/DwarfCompressor. Would 
you be so kind as to remind me which commit does that ?

Thanks in advance :-)

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


make check bot failures (2 hours today)

2015-09-11 Thread Loic Dachary
Hi Ceph,

The make check bot failed a number of pull request verifications today. Each of 
them was notified as false negative (you should have received a short note if 
your pull request is concerned). The problem is now fixed[1] and all should be 
back to normal. If you want to schedule another run, you just need to rebase 
your pull request and re-push it, the bot will notice.

Sorry for the inconvenience and thanks for your patience :-)

P.S. I'm not sure what it was exactly. Just that git fetch took too long to 
answer and failed. Reseting the git clone from which the bot works fixed the 
problem. It happened a few times in the past but did not show up in the past 
six month or so.

-- 
Loïc Dachary, Artisan Logiciel Libre



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   6   7   8   9   10   >