nightlies on firefly removed
As we are nearing EOL for firefly release all nightlies run on firefly branch were disabled. Let me know if this presents any problems. Thx YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
nightlies moved to ovh (openstack)
In preparation to sepia lab move, nightlies' schedules were moved to the ovh (openstack) lab. See details here - http://tracker.ceph.com/projects/ceph-releases/wiki/ovh Please let me know if you see any problems. PS: I will optimize times/frequencies in next several days/weeks Thx YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v0.80.11 QE validation status
This release QE validation took longer time due to the #11104 additional fixing/testing and discovered related to it issues ##13794, 13622 We agreed to release v0.80.11 based on tests results. Thx YuriW On Wed, Oct 28, 2015 at 9:04 AM, Yuri Weinstein <ywein...@redhat.com> wrote: > Summary of suites executed for this release can be found in > http://tracker.ceph.com/issues/11644 > > rados - 1/7th passed > > rbd - http://tracker.ceph.com/issues/11104 > > rgw - http://tracker.ceph.com/issues/11104 > > fs - http://tracker.ceph.com/issues/11104, > http://tracker.ceph.com/issues/13630 > > krbd - http://tracker.ceph.com/issues/13631 > > kcephfs - http://tracker.ceph.com/issues/13631, > http://tracker.ceph.com/issues/13630 > > samba - http://tracker.ceph.com/issues/6613 sama as in v0.80.10 was > aprroved for release > > ceph-deploy(ubuntu_) - almost passed, 1 job is still running > > ceph-deploy(distros) - http://tracker.ceph.com/issues/13367 > > upgrade/dumpling-x (to firefly)(distros) - passed > > upgrade/firefly(distros) - passed > > upgrades to giant - deprecated > > upgrade/firefly-x (to hammer)(distros) - > http://tracker.ceph.com/issues/11104, > http://tracker.ceph.com/issues/13632 > > powercycle - http://tracker.ceph.com/issues/11104, > http://tracker.ceph.com/issues/13631 > > All found problems seem unrelated to the product, however they > prevented some tests from running. In particular #11104 is widespread > and has to be fixed (see also http://tracker.ceph.com/issues/13622 as > proposed workaround) > > I suggest we rerunning failed tests after addressing the issues above. > > Thx > YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v0.80.11 QE validation status
Loic, I am not actually sure about resolving #11104. Warren? Thx YuriW On Mon, Nov 16, 2015 at 1:04 PM, Loic Dachary <ldach...@redhat.com> wrote: > Hi Yuri, > > Thanks for the update :-) Should we mark #11104 as resolved ? > > Cheers > > On 16/11/2015 19:45, Yuri Weinstein wrote: >> This release QE validation took longer time due to the #11104 >> additional fixing/testing and discovered related to it issues ##13794, >> 13622 >> >> We agreed to release v0.80.11 based on tests results. >> >> Thx >> YuriW >> >> On Wed, Oct 28, 2015 at 9:04 AM, Yuri Weinstein <ywein...@redhat.com> wrote: >>> Summary of suites executed for this release can be found in >>> http://tracker.ceph.com/issues/11644 >>> >>> rados - 1/7th passed >>> >>> rbd - http://tracker.ceph.com/issues/11104 >>> >>> rgw - http://tracker.ceph.com/issues/11104 >>> >>> fs - http://tracker.ceph.com/issues/11104, >>> http://tracker.ceph.com/issues/13630 >>> >>> krbd - http://tracker.ceph.com/issues/13631 >>> >>> kcephfs - http://tracker.ceph.com/issues/13631, >>> http://tracker.ceph.com/issues/13630 >>> >>> samba - http://tracker.ceph.com/issues/6613 sama as in v0.80.10 was >>> aprroved for release >>> >>> ceph-deploy(ubuntu_) - almost passed, 1 job is still running >>> >>> ceph-deploy(distros) - http://tracker.ceph.com/issues/13367 >>> >>> upgrade/dumpling-x (to firefly)(distros) - passed >>> >>> upgrade/firefly(distros) - passed >>> >>> upgrades to giant - deprecated >>> >>> upgrade/firefly-x (to hammer)(distros) - >>> http://tracker.ceph.com/issues/11104, >>> http://tracker.ceph.com/issues/13632 >>> >>> powercycle - http://tracker.ceph.com/issues/11104, >>> http://tracker.ceph.com/issues/13631 >>> >>> All found problems seem unrelated to the product, however they >>> prevented some tests from running. In particular #11104 is widespread >>> and has to be fixed (see also http://tracker.ceph.com/issues/13622 as >>> proposed workaround) >>> >>> I suggest we rerunning failed tests after addressing the issues above. >>> >>> Thx >>> YuriW > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
suites' runs on jewel added to the schedule
(rados suite/jewel - on hold for the time being to avoid queue overload) But other suites have been added to the schedule: http://tracker.ceph.com/projects/ceph-releases/wiki/Sepia Pls let me know if you see problems or any issues. Thx YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
giant suites removed from nightlies
As giant was declared EOL all related suites had been removed from the schedule: #giant EOL 15 18 * * 3,6 teuthology-suite -v -c giant -k distro -m vps -s upgrade/dumpling-firefly-x ~/vps.yaml #giant EOL 18 18 * * 3,6 teuthology-suite -v -c giant -k distro -m vps -s upgrade/firefly-x ~/vps.yaml #giant EOL 05 17 * * 1,5 teuthology-suite -v -c hammer -k distro -m vps -s upgrade/giant-x ~/vps.yaml Thx YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
v0.80.11 QE validation status
Summary of suites executed for this release can be found in http://tracker.ceph.com/issues/11644 rados - 1/7th passed rbd - http://tracker.ceph.com/issues/11104 rgw - http://tracker.ceph.com/issues/11104 fs - http://tracker.ceph.com/issues/11104, http://tracker.ceph.com/issues/13630 krbd - http://tracker.ceph.com/issues/13631 kcephfs - http://tracker.ceph.com/issues/13631, http://tracker.ceph.com/issues/13630 samba - http://tracker.ceph.com/issues/6613 sama as in v0.80.10 was aprroved for release ceph-deploy(ubuntu_) - almost passed, 1 job is still running ceph-deploy(distros) - http://tracker.ceph.com/issues/13367 upgrade/dumpling-x (to firefly)(distros) - passed upgrade/firefly(distros) - passed upgrades to giant - deprecated upgrade/firefly-x (to hammer)(distros) - http://tracker.ceph.com/issues/11104, http://tracker.ceph.com/issues/13632 powercycle - http://tracker.ceph.com/issues/11104, http://tracker.ceph.com/issues/13631 All found problems seem unrelated to the product, however they prevented some tests from running. In particular #11104 is widespread and has to be fixed (see also http://tracker.ceph.com/issues/13622 as proposed workaround) I suggest we rerunning failed tests after addressing the issues above. Thx YuriW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: timeout 120 teuthology-killl is highly recommended
I was thinking of teuthology-nuke thou ! Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, July 21, 2015 9:33:26 AM Subject: Re: timeout 120 teuthology-killl is highly recommended Loic I don't use teuthology-kill simultaneously only sequentially. As far as run time, just as a note, when we use 'stale' arg and it invokes ipmitool interface it does take awhile to finish. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, July 21, 2015 9:13:04 AM Subject: timeout 120 teuthology-killl is highly recommended Hi Ceph, Today I did something wrong and that blocked the lab for a good half hour. a) I ran two teuthology-kill simultaneously and that makes them deadlock each other b) I let them run unattended only to come back to the terminal 30 minutes later and see them stuck. Sure, two teuthology-kill simultaneously should not deadlock and that needs to be fixed. But the easy workaround to avoid that trouble is to just not let it run forever. Even for ~200 jobs it takes at most a minute or two. And if it takes longer it probably means another teuthology-kill competes and it should be interrupted and restarted later. From now on I'll do timeout 120 teuthology-kill || echo FAIL! as a generic safeguard. Apologies for the troubles. -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: timeout 120 teuthology-killl is highly recommended
Loic I don't use teuthology-kill simultaneously only sequentially. As far as run time, just as a note, when we use 'stale' arg and it invokes ipmitool interface it does take awhile to finish. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, July 21, 2015 9:13:04 AM Subject: timeout 120 teuthology-killl is highly recommended Hi Ceph, Today I did something wrong and that blocked the lab for a good half hour. a) I ran two teuthology-kill simultaneously and that makes them deadlock each other b) I let them run unattended only to come back to the terminal 30 minutes later and see them stuck. Sure, two teuthology-kill simultaneously should not deadlock and that needs to be fixed. But the easy workaround to avoid that trouble is to just not let it run forever. Even for ~200 jobs it takes at most a minute or two. And if it takes longer it probably means another teuthology-kill competes and it should be interrupted and restarted later. From now on I'll do timeout 120 teuthology-kill || echo FAIL! as a generic safeguard. Apologies for the troubles. -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: VPS memory
I am all for it! Thx YuriW - Original Message - From: Sage Weil s...@newdream.net To: Yuri Weinstein ywein...@redhat.com Cc: se...@ceph.com, ceph-devel@vger.kernel.org, Loic Dachary ldach...@redhat.com, Xinxin Shu xinxin@intel.com, Alfredo Deza ad...@redhat.com Sent: Wednesday, June 17, 2015 8:13:17 PM Subject: VPS memory On Wed, 17 Jun 2015, Yuri Weinstein wrote: - upgrade/dumpling-firefly-x (to hammer)(distros) - runs out of memory on vps and unreliable How about we - half the number of vps's per node. - double the default ram per vps instance - double the cpu? It'll mean lower test throughput, but (hopefully) reliable test results. Given the amount of time we waste sifting through noisy results that seems like a better path? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
firefly v0.80.10 QE validation completed
firefly v0.80.10 is ready for publishing (Sage, Alfredo, Loic, Xinxin FYI) All results details were summarized in http://tracker.ceph.com/issues/11090 Note: - upgrade/dumpling-firefly-x (to hammer)(distros) - runs out of memory on vps and unreliable - #11957 fixed (thanks Ilya, Zack!) and tested Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Ceph Development ceph-devel@vger.kernel.org Cc: Loic Dachary ldach...@redhat.com, Xinxin Shu xinxin@intel.com Sent: Monday, June 15, 2015 9:37:20 AM Subject: firefly v0.80.10 QE validation status 6/15/2015 QE validation is almost completed (there are a couple of jobs that are still running) All statis details were summarized in http://tracker.ceph.com/issues/11090 Highlights (by suite/issue): rados #11914 needs Sam's approval kcephfs n/a needs Greg's approval samba #6613 needs Greg's approval Nice to have fixes (no blockers): upgrade/firefly(distros) #11957 (env noise) Thx YuriW -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v9.0.1 released
Sage We still running nightlies on next and branches. Just wanted to reaffirm that this is not time yet to start scheduling suites on infernalis? Thx YuriW - Original Message - From: Sage Weil sw...@redhat.com To: ceph-annou...@ceph.com, ceph-devel@vger.kernel.org, ceph-us...@ceph.com, ceph-maintain...@ceph.com Sent: Thursday, June 11, 2015 10:06:38 AM Subject: v9.0.1 released This development release is delayed a bit due to tooling changes in the build environment. As a result the next one (v9.0.2) will have a bit more work than is usual. Highlights here include lots of RGW Swift fixes, RBD feature work surrounding the new object map feature, more CephFS snapshot fixes, and a few important CRUSH fixes. Notable Changes --- * auth: cache/reuse crypto lib key objects, optimize msg signature check (Sage Weil) * build: allow tcmalloc-minimal (Thorsten Behrens) * build: do not build ceph-dencoder with tcmalloc (#10691 Boris Ranto) * build: fix pg ref disabling (William A. Kennington III) * build: install-deps.sh improvements (Loic Dachary) * build: misc fixes (Boris Ranto, Ken Dreyer, Owen Synge) * ceph-authtool: fix return code on error (Gerhard Muntingh) * ceph-disk: fix zap sgdisk invocation (Owen Synge, Thorsten Behrens) * ceph-disk: pass --cluster arg on prepare subcommand (Kefu Chai) * ceph-fuse, libcephfs: drop inode when rmdir finishes (#11339 Yan, Zheng) * ceph-fuse,libcephfs: fix uninline (#11356 Yan, Zheng) * ceph-monstore-tool: fix store-copy (Huangjun) * common: add perf counter descriptions (Alyona Kiseleva) * common: fix throttle max change (Henry Chang) * crush: fix crash from invalid 'take' argument (#11602 Shiva Rkreddy, Sage Weil) * crush: fix divide-by-2 in straw2 (#11357 Yann Dupont, Sage Weil) * deb: fix rest-bench-dbg and ceph-test-dbg dependendies (Ken Dreyer) * doc: document region hostnames (Robin H. Johnson) * doc: update release schedule docs (Loic Dachary) * init-radosgw: run radosgw as root (#11453 Ken Dreyer) * librados: fadvise flags per op (Jianpeng Ma) * librbd: allow additional metadata to be stored with the image (Haomai Wang) * librbd: better handling for dup flatten requests (#11370 Jason Dillaman) * librbd: cancel in-flight ops on watch error (#11363 Jason Dillaman) * librbd: default new images to format 2 (#11348 Jason Dillaman) * librbd: fast diff implementation that leverages object map (Jason Dillaman) * librbd: fix snapshot creation when other snap is active (#11475 Jason Dillaman) * librbd: new diff_iterate2 API (Jason Dillaman) * librbd: object map rebuild support (Jason Dillaman) * logrotate.d: prefer service over invoke-rc.d (#11330 Win Hierman, Sage Weil) * mds: avoid getting stuck in XLOCKDONE (#11254 Yan, Zheng) * mds: fix integer truncateion on large client ids (Henry Chang) * mds: many snapshot and stray fixes (Yan, Zheng) * mds: persist completed_requests reliably (#11048 John Spray) * mds: separate safe_pos in Journaler (#10368 John Spray) * mds: snapshot rename support (#3645 Yan, Zheng) * mds: warn when clients fail to advance oldest_client_tid (#10657 Yan, Zheng) * misc cleanups and fixes (Danny Al-Gaaf) * mon: fix average utilization calc for 'osd df' (Mykola Golub) * mon: fix variance calc in 'osd df' (Sage Weil) * mon: improve callout to crushtool (Mykola Golub) * mon: prevent bucket deletion when referenced by a crush rule (#11602 Sage Weil) * mon: prime pg_temp when CRUSH map changes (Sage Weil) * monclient: flush_log (John Spray) * msgr: async: many many fixes (Haomai Wang) * msgr: simple: fix clear_pipe (#11381 Haomai Wang) * osd: add latency perf counters for tier operations (Xinze Chi) * osd: avoid multiple hit set insertions (Zhiqiang Wang) * osd: break PG removal into multiple iterations (#10198 Guang Yang) * osd: check scrub state when handling map (Jianpeng Ma) * osd: fix endless repair when object is unrecoverable (Jianpeng Ma, Kefu Chai) * osd: fix pg resurrection (#11429 Samuel Just) * osd: ignore non-existent osds in unfound calc (#10976 Mykola Golub) * osd: increase default max open files (Owen Synge) * osd: prepopulate needs_recovery_map when only one peer has missing (#9558 Guang Yang) * osd: relax reply order on proxy read (#11211 Zhiqiang Wang) * osd: skip promotion for flush/evict op (Zhiqiang Wang) * osd: write journal header on clean shutdown (Xinze Chi) * qa: run-make-check.sh script (Loic Dachary) * rados bench: misc fixes (Dmitry Yatsushkevich) * rados: fix error message on failed pool removal (Wido den Hollander) * radosgw-admin: add 'bucket check' function to repair bucket index (Yehuda Sadeh) * rbd: allow unmapping by spec (Ilya Dryomov) * rbd: deprecate --new-format option (Jason Dillman) * rgw: do not set content-type if length is 0 (#11091 Orit Wasserman) * rgw: don't use end_marker for namespaced object listing (#11437 Yehuda Sadeh) * rgw: fail if parts not specified on multipart upload (#11435 Yehuda Sadeh) * rgw: fix GET on swift account when
firefly v0.80.10 QE validation status 6/15/2015
QE validation is almost completed (there are a couple of jobs that are still running) All statis details were summarized in http://tracker.ceph.com/issues/11090 Highlights (by suite/issue): rados #11914 needs Sam's approval kcephfs n/a needs Greg's approval samba #6613 needs Greg's approval Nice to have fixes (no blockers): upgrade/firefly(distros) #11957 (env noise) Thx YuriW -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: firefly branch for v0.80.10 ready for QE
Then it still in the queue and I will reschedule to pick up the latest code. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Xinxin Shu xinxin@intel.com Sent: Friday, June 5, 2015 10:26:20 AM Subject: Re: firefly branch for v0.80.10 ready for QE On 05/06/2015 16:54, Yuri Weinstein wrote: Loic Thx for the heads up. Would that touch only the rgw suite? Yes. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Xinxin Shu xinxin@intel.com Sent: Friday, June 5, 2015 2:11:44 AM Subject: Re: firefly branch for v0.80.10 ready for QE Hi Yuri, A month passed since this mail was sent and the firefly branch has a few additional commits. All but one (https://github.com/ceph/ceph/pull/4829 with more information at http://tracker.ceph.com/issues/11890) have been tested. This exception seems harmless but I thought you should know. For the record, the head of the firefly branch now is https://github.com/ceph/ceph/commit/d0f9c5f47024f53b4eccea2e0fde9b7844746362 and http://tracker.ceph.com/issues/11090#Release-information has been updated accordingly. Cheers On 29/05/2015 18:18, Loic Dachary wrote: Hi Yuri, The firefly branch for v0.80.10 as found at https://github.com/ceph/ceph/commits/firefly has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/071c94385ee71b86c5ed8363d56cf299da1aa7b3 and the details of the tests run are at http://tracker.ceph.com/issues/11090 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: firefly branch for v0.80.10 ready for QE
Loic Thx for the heads up. Would that touch only the rgw suite? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Xinxin Shu xinxin@intel.com Sent: Friday, June 5, 2015 2:11:44 AM Subject: Re: firefly branch for v0.80.10 ready for QE Hi Yuri, A month passed since this mail was sent and the firefly branch has a few additional commits. All but one (https://github.com/ceph/ceph/pull/4829 with more information at http://tracker.ceph.com/issues/11890) have been tested. This exception seems harmless but I thought you should know. For the record, the head of the firefly branch now is https://github.com/ceph/ceph/commit/d0f9c5f47024f53b4eccea2e0fde9b7844746362 and http://tracker.ceph.com/issues/11090#Release-information has been updated accordingly. Cheers On 29/05/2015 18:18, Loic Dachary wrote: Hi Yuri, The firefly branch for v0.80.10 as found at https://github.com/ceph/ceph/commits/firefly has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/071c94385ee71b86c5ed8363d56cf299da1aa7b3 and the details of the tests run are at http://tracker.ceph.com/issues/11090 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hammer branch for v0.94.2 ready for QE
QE validation is complete and this release is ready for publishing. (Greg, I assumed that you approved this release with failures in knfs suite re: unresolved #11789, marked for v0.94.3 backport) Summary of all tests performed for this releases and notes can be found in http://tracker.ceph.com/issues/11492 . Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Abhishek L abhishek.lekshma...@gmail.com Sent: Monday, May 18, 2015 5:42:12 AM Subject: hammer branch for v0.94.2 ready for QE Hi Yuri, The hammer branch for v0.94.2 as found at https://github.com/ceph/ceph/commits/hammer has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/63832d4039889b6b704b88b86eaba4aadcfceb2e and the details of the tests run are at http://tracker.ceph.com/issues/11492 Note that it has two more commits compared to what you tested before: https://github.com/ceph/ceph/commit/293affe992118ed6e04f685030b2d83a794ca624 fixing http://tracker.ceph.com/issues/11622 https://github.com/ceph/ceph/commit/a43d24861089a02f3b42061e482e05016a0021f6 fixing http://tracker.ceph.com/issues/11604 which address two blockers that you listed at http://tracker.ceph.com/issues/11492#QE-Validation These two new commits only have influence, directly or indirectly, on rgw. They do not require or deserve a new run of the rados, fs or rbd suite because none of them depend on rgw, directly or indirectly. The other two issues listed as blockers are http://tracker.ceph.com/issues/11613#note-4 do not need a backport to hammer http://tracker.ceph.com/issues/11591 is a teuthology related issue that can be worked around and does not need to be a blocker for hammer Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: teuthology job priorities
I usually use: priority [90,100] for point releases validations. This is a good thread to bring up for open approval/disapproval. Does that sound reasonable ?? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Ceph Development ceph-devel@vger.kernel.org Sent: Thursday, May 28, 2015 2:32:29 AM Subject: teuthology job priorities Hi, This morning I'll schedule a job with priority 50, assuming nobody will get mad at me for using such a low priority because the associated bug fix blocks the release of v0.94.2 (http://tracker.ceph.com/issues/11546) and also assuming noone uses a priority lower than 100 just to get in front of the nightlies[1]. In my imagination priority [0,100] is for emergencies priority [100,1000] is to schedule a job with higher priority than the nightlies priority 1000 (the default) is for all automated tests and no human being wait on them (the nightlies for instance). Does someone have a different mapping in mind ? Cheers [1] the nightlies http://tracker.ceph.com/projects/ceph-releases/wiki/HOWTO_monitor_the_automated_tests_AKA_nightlies -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hammer branch for v0.94.2 ready for QE
QE validation status. All detailed information is summarized in http://tracker.ceph.com/issues/11492 Team leads pls review for do go-no-go decision. Issues to be considered: rados - passed ~2.8K jobs, listed issues (#11660, #11661) are not blockers (NOTE: we also agreed to use the 0/7th rule for future point releases, e.g. passing --subset 0/7 will be sufficient for release) knfs - #11789, #11790 - per Sage - not blockers; Greg, John - agreed? samba - I assumed that failured in http://pulpito.ceph.com/teuthology-2015-05-18_13:46:55-samba-hammer-testing-basic-multi/ due to #6613, Greg pls confirm. upgrade/client-upgrade - blocked by #11546 (3 jobs passed) upgrade/firefly-x - blocked by #11546 upgrade/dumpling-firefly-x - blocked by #11546 Sage, Loic are you willing to push this release out without upgrade suites run due to packaging issues (NOTE: upgrade/giant-x - hammer passed on all distros)? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Abhishek L abhishek.lekshma...@gmail.com Sent: Monday, May 18, 2015 5:42:12 AM Subject: hammer branch for v0.94.2 ready for QE Hi Yuri, The hammer branch for v0.94.2 as found at https://github.com/ceph/ceph/commits/hammer has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/63832d4039889b6b704b88b86eaba4aadcfceb2e and the details of the tests run are at http://tracker.ceph.com/issues/11492 Note that it has two more commits compared to what you tested before: https://github.com/ceph/ceph/commit/293affe992118ed6e04f685030b2d83a794ca624 fixing http://tracker.ceph.com/issues/11622 https://github.com/ceph/ceph/commit/a43d24861089a02f3b42061e482e05016a0021f6 fixing http://tracker.ceph.com/issues/11604 which address two blockers that you listed at http://tracker.ceph.com/issues/11492#QE-Validation These two new commits only have influence, directly or indirectly, on rgw. They do not require or deserve a new run of the rados, fs or rbd suite because none of them depend on rgw, directly or indirectly. The other two issues listed as blockers are http://tracker.ceph.com/issues/11613#note-4 do not need a backport to hammer http://tracker.ceph.com/issues/11591 is a teuthology related issue that can be worked around and does not need to be a blocker for hammer Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hammer branch for v0.94.2 ready for QE
Loic There are infrastructure related issues as well as for example the rados suite needs review as many jobs failed. See for example lines with need review notes which I am suggesting to be reviewed by the team leads. rados run on typica and magna (for example): http://pulpito-rdu.front.sepia.ceph.com/teuthology-2015-05-15_12:42:53-rados-hammer-distro-basic-typica/ http://pulpito.ceph.redhat.com/teuthology-2015-05-11_20:22:13-rados-hammer-distro-basic-magna/ I think it's more efficient to review such results as it feels like amount of failures is unproportionally high. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Abhishek L abhishek.lekshma...@gmail.com Sent: Tuesday, May 26, 2015 9:25:25 AM Subject: Re: hammer branch for v0.94.2 ready for QE Hi Yuri, If I'm not mistaken http://tracker.ceph.com/issues/11660 is the last issue blocking v0.94.2. Is there another one I don't see ? Cheers On 26/05/2015 18:13, Yuri Weinstein wrote: Loic This hammer release QE validation is taking unusually long time and has issues that has to be clarified. All test results were summarized in http://tracker.ceph.com/issues/11492 There are several reasons contributing to slowness of this validation, product related as well as infrastructure related, also high amount of tests make turn around time slower as well. I think some suites, e.g. rados and upgrades for example will have to be re-run after issues had been clarified/fixed. rados, krbd, knfs, samba suite test results need reviews by the team leads. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Abhishek L abhishek.lekshma...@gmail.com Sent: Monday, May 18, 2015 5:42:12 AM Subject: hammer branch for v0.94.2 ready for QE Hi Yuri, The hammer branch for v0.94.2 as found at https://github.com/ceph/ceph/commits/hammer has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/63832d4039889b6b704b88b86eaba4aadcfceb2e and the details of the tests run are at http://tracker.ceph.com/issues/11492 Note that it has two more commits compared to what you tested before: https://github.com/ceph/ceph/commit/293affe992118ed6e04f685030b2d83a794ca624 fixing http://tracker.ceph.com/issues/11622 https://github.com/ceph/ceph/commit/a43d24861089a02f3b42061e482e05016a0021f6 fixing http://tracker.ceph.com/issues/11604 which address two blockers that you listed at http://tracker.ceph.com/issues/11492#QE-Validation These two new commits only have influence, directly or indirectly, on rgw. They do not require or deserve a new run of the rados, fs or rbd suite because none of them depend on rgw, directly or indirectly. The other two issues listed as blockers are http://tracker.ceph.com/issues/11613#note-4 do not need a backport to hammer http://tracker.ceph.com/issues/11591 is a teuthology related issue that can be worked around and does not need to be a blocker for hammer Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: giant branch for v0.87.2 ready for QE
QE validation of this release has been completed and it's ready for next steps. All tests results and notes were summarized in http://tracker.ceph.com/issues/11153 QE Validation section. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Abhishek L abhishek.lekshma...@gmail.com Sent: Friday, April 17, 2015 2:17:16 PM Subject: giant branch for v0.87.2 ready for QE Hi Yuri, The giant branch for v0.87.2 as found at https://github.com/ceph/ceph/commits/giant has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/c1301e84aee0f399db85e2d37818a66147a0ce78 and the details of the tests run are at http://tracker.ceph.com/issues/11153 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: client/cluster compatibility testing
We have a PR https://github.com/ceph/ceph-qa-suite/pull/414 that addressed part of this issue, e.g. added hammer-x to the mix (it's ready to be merged). We also have two tickets where requirements for this suite were captured: http://tracker.ceph.com/issues/11413 http://tracker.ceph.com/issues/11414 Per Josh's comment Also I think we'll want to start doing mixed-client-version tests, particularly for things like rbd's exclusive locking, I assigned #11414 for next steps. Question/request to the team leads - pls either agree with a need to add specific tests for mixed clients testing (and pls add tickets as you feel necessary.) or suggest otherwise. I am guessing: rbd - confirmed by Josh, we need those rados - Sam, Sage? cephfs - Greg? rgw - Yehuda? I am sure I missing lots of others... What do you think? Thx YuriW - Original Message - From: Josh Durgin jdur...@redhat.com To: Sage Weil sw...@redhat.com, ceph-devel@vger.kernel.org Sent: Thursday, April 16, 2015 1:59:11 PM Subject: Re: client/cluster compatibility testing On 04/16/2015 09:42 AM, Sage Weil wrote: I think the simplest way to address this is to talk about compatibility in terms of the upstream stable releases (firefly, hammer, etc.), and test that compatibility with teuthology tests from ceph-qa-suite.git. We have some basic inter-version client/cluster tests already in suites/upgrade/client-upgrade. Currently these test new (version x) clients against a given release (dumpling, firefly). I think we just need to add hammer to that mix, and then add a second set of tests that do the reverse: test clients from a given release (dumpling, firefly, hammer) against an arbitrary cluster version (x). The suites in suites/upgrade/$version-x do this, and use a mixed version cluster rather than a purely version x cluster. It seems like people would want that intra-cluster version coverage for smooth upgrades. Just need to add hammer-x there too (Yuri's renaming the client ones to be $version-client-x for less confusion). Also I think we'll want to start doing mixed-client-version tests, particularly for things like rbd's exclusive locking: http://tracker.ceph.com/issues/11405 Josh -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: client/cluster compatibility testing
Yea, Sage, that sounds reasonable. I added a ticket to capture this plan (http://tracker.ceph.com/issues/11413) and will add those tests soon. Please add your comments to the ticket above. I am assuming that it will look something like this for dumpling, firefly and hammer: dumpling(stable) - client-x firefly(stable) - client-x hammer(stable) - client-x and reverse dumpling-client(stable) - cluster-x firefly-cluster(stable) - cluster-x hammer-cluster(stable) - cluster-x Yes? Thx YuriW - Original Message - From: Sage Weil sw...@redhat.com To: ceph-devel@vger.kernel.org Sent: Thursday, April 16, 2015 9:42:29 AM Subject: client/cluster compatibility testing Now that there are several different vendors shipping and supporting Ceph in their products, we'll invariably have people running different versions of Ceph that are interested in interoperability. If we focus just on client - cluster compatability, I think the issues are (1) compatibility between upstream ceph versions (firefly vs hammer) and (2) ensuring that any downstream changes the vendor makes don't break that compatibility. I think the simplest way to address this is to talk about compatibility in terms of the upstream stable releases (firefly, hammer, etc.), and test that compatibility with teuthology tests from ceph-qa-suite.git. We have some basic inter-version client/cluster tests already in suites/upgrade/client-upgrade. Currently these test new (version x) clients against a given release (dumpling, firefly). I think we just need to add hammer to that mix, and then add a second set of tests that do the reverse: test clients from a given release (dumpling, firefly, hammer) against an arbitrary cluster version (x). We'll obviously run these tests on upstream releases to ensure that we are not breaking compatibility (or are doing so in known, explicit ways). Downstream folks can run the same test suites against any changes they make as well to ensure that their product is compatible with firefly clients, or whatever. Does that sound reasonable? sage -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases
How will that go for the next run of upgrade/giant-x ? I was thinking that as soon as for example this suite passed, #11189 gets resolved as thus indicates that it's ready for for the hammer release cut. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Sunday, March 22, 2015 5:35:19 PM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases On 22/03/2015 17:16, Yuri Weinstein wrote: Loic, I think the idea was to do more process driven approach for releasing hammer, e.g. keep track of suites vs. results and open issues, so we can have a high level view on status at any time before the final cut day. Do you have any suggestions or objections? Reading http://tracker.ceph.com/issues/11189 I see it has one run, and a run of failed tests, and got resolved because all passed. The title is hammer: upgrade/giant-x. How will that go for the next run of upgrade/giant-x ? I use a python snippet to display the errors in a redmine format (http://workbench.dachary.org/dachary/ceph-workbench/issues/2) $ python ../fail.py teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps ** *'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=giant TESTDIR=/home/ubuntu/cephtest CEPH_ID=1 PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.1/cls/test_cls_rgw.sh'* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/parallel_run/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_cache-pool-snaps.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814081 ** *2015-03-20 23:04:51.042345 mon.0 10.214.130.49:6789/0 3 : cluster [WRN] message from mon.1 was stamped 14400.248297s in the future, clocks not synchronized in cluster log* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/test_rbd_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/centos_6.5.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814155 ** *Could not reconnect to ubu...@vpm169.front.sepia.ceph.com* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/ec-rados-default.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814108 ** *Could not reconnect to ubu...@vpm166.front.sepia.ceph.com* *** upgrade:giant-x/stress-split-erasure-code/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/ec-rados-default.yaml 6-next-mon/monb.yaml 8-next-mon/monc.yaml 9-workload/ec-rados-plugin=jerasure-k=3-m=1.yaml distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814194 ** *'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-mon -f -i a'* *** upgrade:giant-x/stress-split-erasure-code-x86_64/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/ec-rados-default.yaml 6-next-mon/monb.yaml 8-next-mon/monc.yaml 9-workload/ec-rados-plugin=isa-k=2-m=1.yaml distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814197 ** *timed out waiting for admin_socket to appear after osd.13 restart* *** upgrade:giant-x/stress-split/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/{rbd-cls.yaml rbd-import-export.yaml readwrite.yaml snaps-few-objects.yaml} 6-next-mon/monb.yaml 7-workload/{radosbench.yaml rbd_api.yaml} 8-next-mon/monc.yaml 9-workload/{rbd-python.yaml rgw-swift.yaml snaps-many-objects.yaml} distros/rhel_6.5.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814186 Thx YuriW - Original Message - From: Loic Dachary l
Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases
Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Monday, March 23, 2015 8:40:02 AM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases Hi Yuri, On 23/03/2015 16:09, Yuri Weinstein wrote: How will that go for the next run of upgrade/giant-x ? I was thinking that as soon as for example this suite passed, #11189 gets resolved as thus indicates that it's ready for for the hammer release cut. If the following happens: * hammer: upgrade/giant-x runs and passes * a dozen more commits are added because problems are fixed * hammer: upgrade/giant-x runs and passes That leaves us with two issues with the same name but with different update dates. So if I look at the hammer: upgrade/giant-x issues in chronological order, I have a complete history of the successive runs and I can check the latest one to see how it went. Or older ones if I need to dig the history. This is good :-) After hammer is released, the same will presumably happen for point releases. Instead of naming them hammer: upgrade/giant-x which would be confusing, I guess we could name them v0.94.1: upgrade/giant-x instead. Does that sound right ? Yes, we can alternatively name the set of those tasks as hammer v0.94.1 Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Sunday, March 22, 2015 5:35:19 PM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases On 22/03/2015 17:16, Yuri Weinstein wrote: Loic, I think the idea was to do more process driven approach for releasing hammer, e.g. keep track of suites vs. results and open issues, so we can have a high level view on status at any time before the final cut day. Do you have any suggestions or objections? Reading http://tracker.ceph.com/issues/11189 I see it has one run, and a run of failed tests, and got resolved because all passed. The title is hammer: upgrade/giant-x. How will that go for the next run of upgrade/giant-x ? I use a python snippet to display the errors in a redmine format (http://workbench.dachary.org/dachary/ceph-workbench/issues/2) $ python ../fail.py teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps ** *'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=giant TESTDIR=/home/ubuntu/cephtest CEPH_ID=1 PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.1/cls/test_cls_rgw.sh'* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/parallel_run/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_cache-pool-snaps.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814081 ** *2015-03-20 23:04:51.042345 mon.0 10.214.130.49:6789/0 3 : cluster [WRN] message from mon.1 was stamped 14400.248297s in the future, clocks not synchronized in cluster log* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/test_rbd_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/centos_6.5.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814155 ** *Could not reconnect to ubu...@vpm169.front.sepia.ceph.com* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/ec-rados-default.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814108 ** *Could not reconnect to ubu...@vpm166.front.sepia.ceph.com* *** upgrade:giant-x/stress-split-erasure-code/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-partial-upgrade/firsthalf.yaml 3-thrash/default.yaml 4-mon/mona.yaml 5-workload/ec-rados-default.yaml 6-next-mon/monb.yaml 8-next-mon/monc.yaml 9-workload/ec-rados-plugin=jerasure-k=3-m=1.yaml distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03
Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases
Loic, done, pls review and edit. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Monday, March 23, 2015 9:10:20 AM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases On 23/03/2015 16:44, Yuri Weinstein wrote: Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Monday, March 23, 2015 8:40:02 AM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases Hi Yuri, On 23/03/2015 16:09, Yuri Weinstein wrote: How will that go for the next run of upgrade/giant-x ? I was thinking that as soon as for example this suite passed, #11189 gets resolved as thus indicates that it's ready for for the hammer release cut. If the following happens: * hammer: upgrade/giant-x runs and passes * a dozen more commits are added because problems are fixed * hammer: upgrade/giant-x runs and passes That leaves us with two issues with the same name but with different update dates. So if I look at the hammer: upgrade/giant-x issues in chronological order, I have a complete history of the successive runs and I can check the latest one to see how it went. Or older ones if I need to dig the history. This is good :-) After hammer is released, the same will presumably happen for point releases. Instead of naming them hammer: upgrade/giant-x which would be confusing, I guess we could name them v0.94.1: upgrade/giant-x instead. Does that sound right ? Yes, we can alternatively name the set of those tasks as hammer v0.94.1 Great ! Would you like me to add a section at http://tracker.ceph.com/projects/ceph-releases/wiki/Wiki to summarize this conversation ? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Sage Weil sw...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Sunday, March 22, 2015 5:35:19 PM Subject: Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases On 22/03/2015 17:16, Yuri Weinstein wrote: Loic, I think the idea was to do more process driven approach for releasing hammer, e.g. keep track of suites vs. results and open issues, so we can have a high level view on status at any time before the final cut day. Do you have any suggestions or objections? Reading http://tracker.ceph.com/issues/11189 I see it has one run, and a run of failed tests, and got resolved because all passed. The title is hammer: upgrade/giant-x. How will that go for the next run of upgrade/giant-x ? I use a python snippet to display the errors in a redmine format (http://workbench.dachary.org/dachary/ceph-workbench/issues/2) $ python ../fail.py teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps ** *'mkdir -p -- /home/ubuntu/cephtest/mnt.1/client.1/tmp cd -- /home/ubuntu/cephtest/mnt.1/client.1/tmp CEPH_CLI_TEST_DUP_COMMAND=1 CEPH_REF=giant TESTDIR=/home/ubuntu/cephtest CEPH_ID=1 PATH=$PATH:/usr/sbin adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage timeout 3h /home/ubuntu/cephtest/workunit.client.1/cls/test_cls_rgw.sh'* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/parallel_run/{ec-rados-parallel.yaml rados_api.yaml rados_loadgenbig.yaml test_cache-pool-snaps.yaml test_rbd_api.yaml test_rbd_python.yaml} 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/rhel_7.0.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814081 ** *2015-03-20 23:04:51.042345 mon.0 10.214.130.49:6789/0 3 : cluster [WRN] message from mon.1 was stamped 14400.248297s in the future, clocks not synchronized in cluster log* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/test_rbd_api.yaml 3-upgrade-sequence/upgrade-all.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml} distros/centos_6.5.yaml}:http://pulpito.ceph.com/teuthology-2015-03-20_17:05:02-upgrade:giant-x-hammer-distro-basic-vps/814155 ** *Could not reconnect to ubu...@vpm169.front.sepia.ceph.com* *** upgrade:giant-x/parallel/{0-cluster/start.yaml 1-giant-install/giant.yaml 2-workload/sequential_run/ec-rados-default.yaml 3-upgrade-sequence/upgrade-mon-osd-mds.yaml 4-final-workload/{rados-snaps-few-objects.yaml rados_loadgenmix.yaml rados_mon_thrash.yaml rbd_cls.yaml rbd_import_export.yaml rgw_swift.yaml
Re: hammer tasks in http://tracker.ceph.com/projects/ceph-releases
Loic, I think the idea was to do more process driven approach for releasing hammer, e.g. keep track of suites vs. results and open issues, so we can have a high level view on status at any time before the final cut day. Do you have any suggestions or objections? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Sage Weil sw...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Sunday, March 22, 2015 1:54:06 AM Subject: hammer tasks in http://tracker.ceph.com/projects/ceph-releases Hi Sage, You have created a few hammer related tasks at http://tracker.ceph.com/projects/ceph-releases/issues . What did you have in mind ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: firefly integration branch for v0.80.9 ready for QE
QE validation is finished for this release and v0.80.9 is ready for next steps. Summary of all runs is in http://tracker.ceph.com/issues/10641 with details. The following suites were executed and passed as part of this validation: rados rbd rgw fs krbd kcephfs samba ceph-deploy upgrade/firefly upgrade/dumpling-firefly-x (to giant) powercycle Alfredo, the ticket is in your hands. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, March 3, 2015 8:30:42 AM Subject: firefly integration branch for v0.80.9 ready for QE Hi Yuri, The firefly branch for v0.80.9 as found at https://github.com/ceph/ceph/commits/firefly has been approved by Greg, Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/edd37e39d155fbe36012008df3d49e33ec3117cc and the details of the tests run are at http://tracker.ceph.com/issues/10641 Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: v0.80.8 and librbd performance
Ken PLs se http://tracker.ceph.com/issues/10641 for more details Thx YuriW - Original Message - From: Ken Dreyer kdre...@redhat.com To: Sage Weil sw...@redhat.com, ceph-devel@vger.kernel.org, ceph-us...@ceph.com Sent: Tuesday, March 3, 2015 3:28:02 PM Subject: Re: v0.80.8 and librbd performance On 03/03/2015 04:19 PM, Sage Weil wrote: Hi, This is just a heads up that we've identified a performance regression in v0.80.8 from previous firefly releases. A v0.80.9 is working it's way through QA and should be out in a few days. If you haven't upgraded yet you may want to wait. Thanks! sage Hi Sage, I've seen a couple Redmine tickets on this (eg http://tracker.ceph.com/issues/9854 , http://tracker.ceph.com/issues/10956). It's not totally clear to me which of the 70+ unreleased commits on the firefly branch fix this librbd issue. Is it only the three commits in https://github.com/ceph/ceph/pull/3410 , or are there more? - Ken -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: re-running teuthology jobs
Loic In case you want to add some comments - http://tracker.ceph.com/issues/10945 Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Ceph Development ceph-devel@vger.kernel.org Sent: Saturday, February 28, 2015 7:01:29 AM Subject: Re: re-running teuthology jobs The simpler way is to use the --filter argument of teuthology-suite with the value of the description: field found in the config.yaml file. For instance, running the rados failed jobs http://tracker.ceph.com/issues/10641#rados failed jobs: $ ./virtualenv/bin/teuthology-suite --priority 101 --suite rados --filter 'rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml},rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml},rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml}' --suite-branch firefly --machine-type plana,burnupi,mira --distro ubuntu --email l...@dachary.org --owner l...@dachary.org --ceph firefly-backports 2015-02-28 15:58:08,474.474 INFO:teuthology.suite:ceph sha1: e54834bfac3c38562987730b317cb1944a96005b 2015-02-28 15:58:08,969.969 INFO:teuthology.suite:ceph version: 0.80.8-75-ge54834b-1precise 2015-02-28 15:58:09,606.606 INFO:teuthology.suite:teuthology branch: master 2015-02-28 15:58:10,407.407 INFO:teuthology.suite:ceph-qa-suite branch: firefly 2015-02-28 15:58:10,409.409 INFO:teuthology.repo_utils:Fetching from upstream into /home/loic/src/ceph-qa-suite_firefly 2015-02-28 15:58:11,522.522 INFO:teuthology.repo_utils:Resetting repo at /home/loic/src/ceph-qa-suite_firefly to branch firefly 2015-02-28 15:58:12,393.393 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados generated 693 jobs (not yet filtered) 2015-02-28 15:58:12,419.419 INFO:teuthology.suite:Scheduling rados/multimon/{clusters/21.yaml msgr-failures/many.yaml tasks/mon_clock_with_skews.yaml} Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783145 2015-02-28 15:58:14,199.199 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/default.yaml workloads/cache-agent-small.yaml} Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783146 2015-02-28 15:58:15,650.650 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/morepggrow.yaml workloads/small-objects.yaml} Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783147 2015-02-28 15:58:16,837.837 INFO:teuthology.suite:Scheduling rados/thrash/{clusters/fixed-2.yaml fs/xfs.yaml msgr-failures/osd-delay.yaml thrashers/pggrow.yaml workloads/ec-small-objects.yaml} Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783148 2015-02-28 15:58:18,421.421 INFO:teuthology.suite:Scheduling rados/verify/{1thrash/none.yaml clusters/fixed-2.yaml fs/btrfs.yaml msgr-failures/few.yaml tasks/mon_recovery.yaml validater/valgrind.yaml} Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783149 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados scheduled 5 jobs. 2015-02-28 15:58:19,729.729 INFO:teuthology.suite:Suite rados in /home/loic/src/ceph-qa-suite_firefly/suites/rados -- 688 jobs were filtered out. Job scheduled with name loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi and ID 783150 Creates the http://pulpito.ceph.com/loic-2015-02-28_15:58:07-rados-firefly-backports---basic-multi/ run with just 5 jobs. On 28/02/2015 11:28, Loic Dachary wrote: Hi, A teuthology rados run ( https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados ) completed with five dead jobs out of 693. They failed because of DNS errors and I'd like to re-run them. Ideally I could do something like: teuthology-schedule --run loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi --job-id 781444 --job-id 781457 ... and it would re-schedule a run of the designated jobs from the designated run. But I don't think such a command exist. I will therefore manually do what such a command would do, for each failed job: * download http://qa-proxy.ceph.com/teuthology/loic-2015-02-27_20:22:09-rados-firefly-backports---basic-multi/781444/orig.config.yaml * git clone https://github.com/ceph/ceph-qa-suite /srv/ceph-qa-suite * cd /srv/ceph-qa-suite ; git
Re: dumpling integration branch for v0.67.12 ready for QE
All issues in http://tracker.ceph.com/issues/10560 updated. Loic - #10801 can be resolved. v0.67.12 ready for release. Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza z...@redhat.com, Sandon Van Ness svann...@redhat.com Sent: Wednesday, February 18, 2015 9:38:19 AM Subject: Re: dumpling integration branch for v0.67.12 ready for QE Hi all I updated all issues in http://tracker.ceph.com/issues/10560 Based on what is listed there, we have http://tracker.ceph.com/issues/10801 - Yehuda pls comment http://tracker.ceph.com/issues/10694 - Sam pls re-confirm rbd - Josh, I understood that we are good to go, pls re-confirm. I can re-run some suites if you'd like and we can make a call on this release. Loic - back to you, let me know what you think. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza z...@redhat.com, Sandon Van Ness svann...@redhat.com Sent: Thursday, February 12, 2015 2:17:49 PM Subject: Re: dumpling integration branch for v0.67.12 ready for QE On 12/02/2015 23:06, Yuri Weinstein wrote: I linked all issues related to this release testing to the ticket http://tracker.ceph.com/issues/10560 After the team leads make a call of those, including environment issues, I suggest re-running suites the failed again. Loic, I'd re-run them in the Octo, since we already started there, if you agree ? Sure :-) Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com Sent: Wednesday, February 11, 2015 2:24:33 PM Subject: Re: dumpling integration branch for v0.67.12 ready for QE I replied to individual suites runs, but just wanted to summarize QE validation status. The following suites were executed in the Octo lab (we will use Sepia in the future if nobody objects). upgrade:dumpling ['45493'] http://tracker.ceph.com/issues/10694 - Known Won't fix Assertion: osd/Watch.cc: 290: FAILED assert(!cb) *** Sam - pls confirm the Won't fix status. ['45495', '45496', '45498', '45499', '45500'] http://tracker.ceph.com/issues/10838 s3tests failed *** Yehuda - need your verdict on s3tests. fs All green ! rados ['45054'] http://tracker.ceph.com/issues/10841 Issued certificate has expired *** Sandon pls comment. ['45168', '45169'] http://tracker.ceph.com/issues/10840 coredump ceph_test_filestore_idempotent_sequence *** Sam - pls comment ['45215'] Missing packages - no ticket FYI Failed to fetch http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages Hash Sum mismatch *** Zack, Sandon ? ceph-deploy Travis - pls suggest In general I am not sure if we needed to test this - Sage? rbd ['45365', '45366', '45367'] http://tracker.ceph.com/issues/10842 unable to connect to apt-mirror.front.sepia.ceph.com ['45349', '45350', '45351', '45355', '45356', '45357', '45363'] http://tracker.ceph.com/issues/10802 error: image still has watchers (duplicate of 10680) *** Zack, Sandon, Josh - all environment noise, pls comment. rgw ['45382', '45390'] http://tracker.ceph.com/issues/10843 s3tests failed - could be related or duplicate of 10838 *** Yehuda - same as issues in upgrades? I am standing by for you analysis/replies and recommendations for next steps. Loic - let me know is you want to follow specific items in our backport testing process that I missed here. PS: I would think that you could've wanted to assign the release ticket to QE (me) for validation and at this point I could've re-assigned it back to devel (you), a? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 9:05:31 AM Subject: dumpling integration branch for v0.67.12 ready for QE Hi Yuri, The dumpling integration branch for v0.67.12 as found at https://github.com/ceph/ceph/commits/dumpling-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 I think it would be best for the QE tests to use the dumpling-backports. The alternative would be to merge dumpling-backports into dumpling. However, since testing may take a long time
Re: giant integration branch for v0.87.1 ready for QE
Team leads, Please review QE validation results summary in http://tracker.ceph.com/issues/10501 Loic - this RC looks ready for release (in my opinion) ! Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Monday, February 16, 2015 9:33:44 AM Subject: Re: giant integration branch for v0.87.1 ready for QE I have completed suites execution on giant branch (v0.87.1 RC) All results are summarized in http://tracker.ceph.com/issues/10501 under QE VALIDATION section. Some suites had to be run more then ones due to environment noise. Two suites are being re-run now - upgrade:firefly-x and powecycle. Next steps: - the team leads to review/confirm results - Loic - can you review and triage issues as needed. - two suites require results analysis: multimds rados (two known tickets, but need more checking) ## 10209, 9891 krbd (two new tickets, but need more checking) ## 10889, 10890 Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Wednesday, February 11, 2015 7:30:06 AM Subject: Re: giant integration branch for v0.87.1 ready for QE Hi Yuri, The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/78c71b9200da5e7d832ec58765478404d31ae6b5 Cheers On 10/02/2015 18:20, Loic Dachary wrote: Hi Yuri, The giant integration branch for v0.87.1 as found at https://github.com/ceph/ceph/commits/giant-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/6b08a729540c61f3c8b15c5a3ce9382634bf800c Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Clocks out of sync
David, We had ntp server issue tot too long ago, could be the same or new http://tracker.ceph.com/issues/10675 Thx YuriW - Original Message - From: David Zafman dzaf...@redhat.com To: ceph-devel@vger.kernel.org Sent: Friday, February 20, 2015 3:08:29 PM Subject: Clocks out of sync On 2 of my rados thrash runs clocks out of sync. Is this an occasional issue or did we have an infrastructure problem? On burnupi19 and burnupi25: 2015-02-20 12:52:52.636017 mon.1 10.214.134.14:6789/0 177 : cluster [WRN] message from mon.0 was stamped 0.501458s in the future, clocks not synchronized On plana62 and plana64: 2015-02-20 10:00:56.842533 mon.0 10.214.132.14:6789/0 3 : cluster [WRN] message from mon.1 was stamped 0.855106s in the future, clocks not synchronized -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Disk failing plana74
I ran smart and it came back good, hmm ubuntu@plana74:~$ /usr/libexec/smart.pl All 4 drives happy as clams Thx YuriW - Original Message - From: David Zafman dzaf...@redhat.com To: Sandon Van Ness svann...@redhat.com Cc: ceph-devel@vger.kernel.org Sent: Friday, February 20, 2015 1:10:48 PM Subject: Disk failing plana74 A recent test run had an EIO on the following disk: plana74 /dev/sdb The machine is locked right now. David Zafman Senior Developer -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling integration branch for v0.67.12 ready for QE
Hi all I updated all issues in http://tracker.ceph.com/issues/10560 Based on what is listed there, we have http://tracker.ceph.com/issues/10801 - Yehuda pls comment http://tracker.ceph.com/issues/10694 - Sam pls re-confirm rbd - Josh, I understood that we are good to go, pls re-confirm. I can re-run some suites if you'd like and we can make a call on this release. Loic - back to you, let me know what you think. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com, Zack Cerza z...@redhat.com, Sandon Van Ness svann...@redhat.com Sent: Thursday, February 12, 2015 2:17:49 PM Subject: Re: dumpling integration branch for v0.67.12 ready for QE On 12/02/2015 23:06, Yuri Weinstein wrote: I linked all issues related to this release testing to the ticket http://tracker.ceph.com/issues/10560 After the team leads make a call of those, including environment issues, I suggest re-running suites the failed again. Loic, I'd re-run them in the Octo, since we already started there, if you agree ? Sure :-) Thx YuriW - Original Message - From: Yuri Weinstein ywein...@redhat.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org, Sage Weil s...@redhat.com, Tamil Muthamizhan tmuth...@redhat.com Sent: Wednesday, February 11, 2015 2:24:33 PM Subject: Re: dumpling integration branch for v0.67.12 ready for QE I replied to individual suites runs, but just wanted to summarize QE validation status. The following suites were executed in the Octo lab (we will use Sepia in the future if nobody objects). upgrade:dumpling ['45493'] http://tracker.ceph.com/issues/10694 - Known Won't fix Assertion: osd/Watch.cc: 290: FAILED assert(!cb) *** Sam - pls confirm the Won't fix status. ['45495', '45496', '45498', '45499', '45500'] http://tracker.ceph.com/issues/10838 s3tests failed *** Yehuda - need your verdict on s3tests. fs All green ! rados ['45054'] http://tracker.ceph.com/issues/10841 Issued certificate has expired *** Sandon pls comment. ['45168', '45169'] http://tracker.ceph.com/issues/10840 coredump ceph_test_filestore_idempotent_sequence *** Sam - pls comment ['45215'] Missing packages - no ticket FYI Failed to fetch http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages Hash Sum mismatch *** Zack, Sandon ? ceph-deploy Travis - pls suggest In general I am not sure if we needed to test this - Sage? rbd ['45365', '45366', '45367'] http://tracker.ceph.com/issues/10842 unable to connect to apt-mirror.front.sepia.ceph.com ['45349', '45350', '45351', '45355', '45356', '45357', '45363'] http://tracker.ceph.com/issues/10802 error: image still has watchers (duplicate of 10680) *** Zack, Sandon, Josh - all environment noise, pls comment. rgw ['45382', '45390'] http://tracker.ceph.com/issues/10843 s3tests failed - could be related or duplicate of 10838 *** Yehuda - same as issues in upgrades? I am standing by for you analysis/replies and recommendations for next steps. Loic - let me know is you want to follow specific items in our backport testing process that I missed here. PS: I would think that you could've wanted to assign the release ticket to QE (me) for validation and at this point I could've re-assigned it back to devel (you), a? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 9:05:31 AM Subject: dumpling integration branch for v0.67.12 ready for QE Hi Yuri, The dumpling integration branch for v0.67.12 as found at https://github.com/ceph/ceph/commits/dumpling-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 I think it would be best for the QE tests to use the dumpling-backports. The alternative would be to merge dumpling-backports into dumpling. However, since testing may take a long time and require more patches, it probably is better to not do that iterative process on the dumpling branch itself. As it is now, there already are a number of commits in the dumpling branch that should really be in the dumpling-backports: they do not belong to v0.67.11 and are going to be released in v0.67.12. In the future though, the dumpling branch will only receive commits that have been carefully tested and all the integration work will be on the dumpling-backports branch exclusively. So that third parties do not have
Re: giant integration branch for v0.87.1 ready for QE
I have completed suites execution on giant branch (v0.87.1 RC) All results are summarized in http://tracker.ceph.com/issues/10501 under QE VALIDATION section. Some suites had to be run more then ones due to environment noise. Two suites are being re-run now - upgrade:firefly-x and powecycle. Next steps: - the team leads to review/confirm results - Loic - can you review and triage issues as needed. - two suites require results analysis: multimds rados (two known tickets, but need more checking) ## 10209, 9891 krbd (two new tickets, but need more checking) ## 10889, 10890 Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Wednesday, February 11, 2015 7:30:06 AM Subject: Re: giant integration branch for v0.87.1 ready for QE Hi Yuri, The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/78c71b9200da5e7d832ec58765478404d31ae6b5 Cheers On 10/02/2015 18:20, Loic Dachary wrote: Hi Yuri, The giant integration branch for v0.87.1 as found at https://github.com/ceph/ceph/commits/giant-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/6b08a729540c61f3c8b15c5a3ce9382634bf800c Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: giant integration branch for v0.87.1 ready for QE
Greg, thx, so noted. Thx YuriW - Original Message - From: Gregory Farnum g...@gregs42.com To: Yuri Weinstein ywein...@redhat.com, Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Monday, February 16, 2015 10:44:35 AM Subject: Re: giant integration branch for v0.87.1 ready for QE The multimds suite has never passed and is strictly informational at this point. You shouldn't worry about it. (We use it only to make sure we don't completely break multimds systems, but we don't expect it to pass. It's just nice to have a rough idea how far off we are.) -Greg On Mon, Feb 16, 2015 at 9:34 AM Yuri Weinstein ywein...@redhat.com wrote: I have completed suites execution on giant branch (v0.87.1 RC) All results are summarized in http://tracker.ceph.com/issues/10501 under QE VALIDATION section. Some suites had to be run more then ones due to environment noise. Two suites are being re-run now - upgrade:firefly-x and powecycle. Next steps: - the team leads to review/confirm results - Loic - can you review and triage issues as needed. - two suites require results analysis: multimds rados (two known tickets, but need more checking) ## 10209, 9891 krbd (two new tickets, but need more checking) ## 10889, 10890 Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Wednesday, February 11, 2015 7:30:06 AM Subject: Re: giant integration branch for v0.87.1 ready for QE Hi Yuri, The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/ 78c71b9200da5e7d832ec58765478404d31ae6b5 Cheers On 10/02/2015 18:20, Loic Dachary wrote: Hi Yuri, The giant integration branch for v0.87.1 as found at https://github.com/ceph/ceph/commits/giant-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/ 6b08a729540c61f3c8b15c5a3ce9382634bf800c Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Ceph-qa] 1 hung, 11 passed in teuthology-2015-02-11_16:13:01-samba-giant-distro-basic-multi
Yeah. Well, the last run alone isn't so important; we want to see a string of clean runs because a lot of issues aren't reproduced in every run. My hope was that we can see all green results for say this giant release/backport, but I agree that we would need to make our go/no-go decision based on multiple run results, as I am not sure if we can get them all green due to complexity, time needed to execute, environment state etc.. We could thou modify our process a bit: 1. after backport-branch is ready for QE, merge it to the named branch (say 'giant' in this example) - that what we did now 2. cut a release numbered brach (maybe it's tag, not sure), say v0.87.1 3. run all QE suites on v0.87.1 and get it to all passed state 4. make sure that commits to v0.87.1 are committed to the named branch ('giant') #2 is that we have not done this time. Thx YuriW - Original Message - From: Gregory Farnum g...@gregs42.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Friday, February 13, 2015 11:56:18 PM Subject: Re: [Ceph-qa] 1 hung, 11 passed in teuthology-2015-02-11_16:13:01-samba-giant-distro-basic-multi On Fri, Feb 13, 2015 at 10:34 PM, Loic Dachary l...@dachary.org wrote: Hi Greg, I'm curious to know how you handle the flow of mails from QA runs. Here is a wild guess: * from time to time check that the nightlies run the suites that should be run Uh, I guess? * read the ceph-qa reports daily Yeah * for each failed job, either relate it to an issue or create one or declare it noise Yeah * if a job fails on an existing ticket store a link to the job if it's rare occurrence and the cause is not yet known Yeah, or just to make clear it's still happening or whatever * bi-weekly bug scrub makes sure no issue, old or new, is forgotten Hopefully! * at release time you decide that it is ready based on: ** the list of urgent/immediate issues that you can browse to ensure no issue is a blocker ** the last run of each suite to ensure they are recent enough and environmental noise did not permanently shadow anything Yeah. Well, the last run alone isn't so important; we want to see a string of clean runs because a lot of issues aren't reproduced in every run. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Ceph-qa] 1 hung, 11 passed in teuthology-2015-02-11_16:13:01-samba-giant-distro-basic-multi
Loic, +1 - I like the way you're discussing: v0.87.1-rc2 v0.87.1-rcX = v0.87.1 - is it easy to make this look like this after the validation is completed? BTW: When I re-run suites now for validation I use -s named_branch arg in the command line. Maybe I should be using SHA ref instead? I never tried this way, but guessing it should work, what do you think? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Saturday, February 14, 2015 2:12:05 PM Subject: Re: [Ceph-qa] 1 hung, 11 passed in teuthology-2015-02-11_16:13:01-samba-giant-distro-basic-multi On 14/02/2015 22:53, Loic Dachary wrote: Hi Yuri, On 14/02/2015 17:22, Yuri Weinstein wrote: Yeah. Well, the last run alone isn't so important; we want to see a string of clean runs because a lot of issues aren't reproduced in every run. My hope was that we can see all green results for say this giant release/backport, but I agree that we would need to make our go/no-go decision based on multiple run results, as I am not sure if we can get them all green due to complexity, time needed to execute, environment state etc.. We could thou modify our process a bit: 1. after backport-branch is ready for QE, merge it to the named branch (say 'giant' in this example) - that what we did now 2. cut a release numbered brach (maybe it's tag, not sure), say v0.87.1 3. run all QE suites on v0.87.1 and get it to all passed state 4. make sure that commits to v0.87.1 are committed to the named branch ('giant') That makes sense to me, only with s/v0.87.1/78c71b9200da5e7d832ec58765478404d31ae6b5/. #2 is that we have not done this time. We have not done #2 but we have cut the branch at given SHA ( 78c71b9200da5e7d832ec58765478404d31ae6b5 ) instead, which is can be referenced by a tag if and when it is released. In the mail Re: giant integration branch for v0.87.1 ready for QE dated 11th february 2015 I wrote: The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/78c71b9200da5e7d832ec58765478404d31ae6b5 We cannot add a v0.87.1 tag to the branch before the release process is complete because we won't be able to change it afterwards (people rely on the fact that the history of the giant branch is not rewritten and that tags references are not changed). If during the QE test process we discover that a backport must be included (I'm thinking about https://github.com/ceph/ceph/pull/3731 for instance), 78c71b9200da5e7d832ec58765478404d31ae6b5 won't be v0.87.1 after all. In a nutshell I think we're having the same view of the process, modulo the timing of the tagging of the release. We could also have tags like: v0.87.1-rc1 = 78c71b9200da5e7d832ec58765478404d31ae6b5 v0.87.1-rc2 = whatever SHA includes more backports and if v0.87.1-rc2 turns out to be good the release notes could be committed and other non code changes. This naming scheme common, is there a downside to it ? It's easier to talk about v0.87.1-rc1 rather than 78c71b9200da5e7d832ec58765478404d31ae6b5 ;-) Cheers Cheers Thx YuriW - Original Message - From: Gregory Farnum g...@gregs42.com To: Loic Dachary l...@dachary.org Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Friday, February 13, 2015 11:56:18 PM Subject: Re: [Ceph-qa] 1 hung, 11 passed in teuthology-2015-02-11_16:13:01-samba-giant-distro-basic-multi On Fri, Feb 13, 2015 at 10:34 PM, Loic Dachary l...@dachary.org wrote: Hi Greg, I'm curious to know how you handle the flow of mails from QA runs. Here is a wild guess: * from time to time check that the nightlies run the suites that should be run Uh, I guess? * read the ceph-qa reports daily Yeah * for each failed job, either relate it to an issue or create one or declare it noise Yeah * if a job fails on an existing ticket store a link to the job if it's rare occurrence and the cause is not yet known Yeah, or just to make clear it's still happening or whatever * bi-weekly bug scrub makes sure no issue, old or new, is forgotten Hopefully! * at release time you decide that it is ready based on: ** the list of urgent/immediate issues that you can browse to ensure no issue is a blocker ** the last run of each suite to ensure they are recent enough and environmental noise did not permanently shadow anything Yeah. Well, the last run alone isn't so important; we want to see a string of clean runs because a lot of issues aren't reproduced in every run. -Greg -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Loïc Dachary, Artisan Logiciel Libre
Re: giant integration branch for v0.87.1 ready for QE
Loic Just to double check - giant is *ready* for testing? (you said below which is not ready for testing maybe wanted o say *now*) Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Wednesday, February 11, 2015 7:30:06 AM Subject: Re: giant integration branch for v0.87.1 ready for QE Hi Yuri, The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/78c71b9200da5e7d832ec58765478404d31ae6b5 Cheers On 10/02/2015 18:20, Loic Dachary wrote: Hi Yuri, The giant integration branch for v0.87.1 as found at https://github.com/ceph/ceph/commits/giant-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/6b08a729540c61f3c8b15c5a3ce9382634bf800c Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling integration branch for v0.67.12 ready for QE
I replied to individual suites runs, but just wanted to summarize QE validation status. The following suites were executed in the Octo lab (we will use Sepia in the future if nobody objects). upgrade:dumpling ['45493'] http://tracker.ceph.com/issues/10694 - Known Won't fix Assertion: osd/Watch.cc: 290: FAILED assert(!cb) *** Sam - pls confirm the Won't fix status. ['45495', '45496', '45498', '45499', '45500'] http://tracker.ceph.com/issues/10838 s3tests failed *** Yehuda - need your verdict on s3tests. fs All green ! rados ['45054'] http://tracker.ceph.com/issues/10841 Issued certificate has expired *** Sandon pls comment. ['45168', '45169'] http://tracker.ceph.com/issues/10840 coredump ceph_test_filestore_idempotent_sequence *** Sam - pls comment ['45215'] Missing packages - no ticket FYI Failed to fetch http://apt-mirror.front.sepia.ceph.com/archive.ubuntu.com/ubuntu/dists/trusty-updates/universe/binary-i386/Packages Hash Sum mismatch *** Zack, Sandon ? ceph-deploy Travis - pls suggest In general I am not sure if we needed to test this - Sage? rbd ['45365', '45366', '45367'] http://tracker.ceph.com/issues/10842 unable to connect to apt-mirror.front.sepia.ceph.com ['45349', '45350', '45351', '45355', '45356', '45357', '45363'] http://tracker.ceph.com/issues/10802 error: image still has watchers (duplicate of 10680) *** Zack, Sandon, Josh - all environment noise, pls comment. rgw ['45382', '45390'] http://tracker.ceph.com/issues/10843 s3tests failed - could be related or duplicate of 10838 *** Yehuda - same as issues in upgrades? I am standing by for you analysis/replies and recommendations for next steps. Loic - let me know is you want to follow specific items in our backport testing process that I missed here. PS: I would think that you could've wanted to assign the release ticket to QE (me) for validation and at this point I could've re-assigned it back to devel (you), a? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 9:05:31 AM Subject: dumpling integration branch for v0.67.12 ready for QE Hi Yuri, The dumpling integration branch for v0.67.12 as found at https://github.com/ceph/ceph/commits/dumpling-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 I think it would be best for the QE tests to use the dumpling-backports. The alternative would be to merge dumpling-backports into dumpling. However, since testing may take a long time and require more patches, it probably is better to not do that iterative process on the dumpling branch itself. As it is now, there already are a number of commits in the dumpling branch that should really be in the dumpling-backports: they do not belong to v0.67.11 and are going to be released in v0.67.12. In the future though, the dumpling branch will only receive commits that have been carefully tested and all the integration work will be on the dumpling-backports branch exclusively. So that third parties do not have to worry that the dumpling branch contains commits that have not been tested yet. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: giant integration branch for v0.87.1 ready for QE
I am planning to run a complete set of tests for this release in Sepia. Will temporarily disable all *giant* suites in crontab before 4 pm today and schedule all suites to run with high priority. Pls let me know if you have concerns or have emergency need for resources in Sepia lab. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Wednesday, February 11, 2015 7:30:06 AM Subject: Re: giant integration branch for v0.87.1 ready for QE Hi Yuri, The giant-backports pull requests were merged into https://github.com/ceph/ceph/tree/giant which is not ready for testing. For the record, the head is https://github.com/ceph/ceph/commit/78c71b9200da5e7d832ec58765478404d31ae6b5 Cheers On 10/02/2015 18:20, Loic Dachary wrote: Hi Yuri, The giant integration branch for v0.87.1 as found at https://github.com/ceph/ceph/commits/giant-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/6b08a729540c61f3c8b15c5a3ce9382634bf800c Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling giant backports update
Hi Loic I was thinking that as soon as one of the branches is declared ready we will merge *-backports with the main branch and execute appropriate set of suites as we run them in the Octo lab for released branches. For dumpling: rados rbd rgw fs ceph-deploy upgrade/dumpling For giant: rados rbd rgw fs krbd kcephfs knfs haddop samba rest multimds multi-version upgrade/giant powecycle Not sure if we need to run more tests, e.g. giant-x kind of suites for upgrades. Do you agree with this plan? Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein yuri.weinst...@inktank.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 7:25:55 AM Subject: dumpling giant backports update Hi Yuri, The dumpling integration branch https://github.com/ceph/ceph/commits/dumpling-backports is ready for Josh and Sam and we are expecting approval from Yehuda (the details are here http://tracker.ceph.com/issues/10560). The giant integration branch https://github.com/ceph/ceph/commits/giant-backports is ready for Sam and we are expecting approval from Josh and Yehuda (the details are here http://tracker.ceph.com/issues/10501). It is likely that we get approval for one branch or the other in the next 48h. When we do, I assume you will be conducting your own round of testing, using the dumpling-backports and/or giant-backports branch. Do you have a list of suites you plan to run already ? I'm also curious to understand who will be analyzing the results and ultimately declare that it is ready to be released. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling integration branch for v0.67.12 ready for QE
Loic, The only difference between options if we run suits on merged dumpling vs dumpling-backports first - is time. We will have to run suites on the final branch after the merge anyway. Unless I hear otherwise, I will schedule suites to run on dumpling-backports first (as you are suggesting, with higher priority) and then assuming that we resolved all issues, we will run on the dumpling merged. Sage, pls correct if this is not what has to be done. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 9:05:31 AM Subject: dumpling integration branch for v0.67.12 ready for QE Hi Yuri, The dumpling integration branch for v0.67.12 as found at https://github.com/ceph/ceph/commits/dumpling-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 I think it would be best for the QE tests to use the dumpling-backports. The alternative would be to merge dumpling-backports into dumpling. However, since testing may take a long time and require more patches, it probably is better to not do that iterative process on the dumpling branch itself. As it is now, there already are a number of commits in the dumpling branch that should really be in the dumpling-backports: they do not belong to v0.67.11 and are going to be released in v0.67.12. In the future though, the dumpling branch will only receive commits that have been carefully tested and all the integration work will be on the dumpling-backports branch exclusively. So that third parties do not have to worry that the dumpling branch contains commits that have not been tested yet. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling integration branch for v0.67.12 ready for QE
On 10/02/2015 18:19, Yuri Weinstein wrote: Loic, The only difference between options if we run suits on merged dumpling vs dumpling-backports first - is time. We will have to run suites on the final branch after the merge anyway. Could you explain why ? After merging dumpling and dumpling-backports will be exactly the same. Loic - I feel that finial QE validation should be done on the code that gets actually released to customers, e.g. dumpling branch after the merge. I do see your point about branches being identical and ready to change my mind if you insist. Does my reasoning make sense? Please advice, how we should proceed. Unless I hear otherwise, I will schedule suites to run on dumpling-backports first (as you are suggesting, with higher priority) and then assuming that we resolved all issues, we will run on the dumpling merged. Sage, pls correct if this is not what has to be done. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Yuri Weinstein ywein...@redhat.com Cc: Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 9:05:31 AM Subject: dumpling integration branch for v0.67.12 ready for QE Hi Yuri, The dumpling integration branch for v0.67.12 as found at https://github.com/ceph/ceph/commits/dumpling-backports has been approved by Yehuda, Josh and Sam and is ready for QE. For the record, the head is https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 I think it would be best for the QE tests to use the dumpling-backports. The alternative would be to merge dumpling-backports into dumpling. However, since testing may take a long time and require more patches, it probably is better to not do that iterative process on the dumpling branch itself. As it is now, there already are a number of commits in the dumpling branch that should really be in the dumpling-backports: they do not belong to v0.67.11 and are going to be released in v0.67.12. In the future though, the dumpling branch will only receive commits that have been carefully tested and all the integration work will be on the dumpling-backports branch exclusively. So that third parties do not have to worry that the dumpling branch contains commits that have not been tested yet. Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling integration branch for v0.67.12 ready for QE
Great! As soon as it's merged I will schedule suite to run as listed somewhere below ... dumpling with higher priority and then giant. Thx YuriW - Original Message - From: Loic Dachary l...@dachary.org To: Sage Weil s...@newdream.net, Gregory Farnum g...@gregs42.com Cc: Yuri Weinstein ywein...@redhat.com, Ceph Development ceph-devel@vger.kernel.org Sent: Tuesday, February 10, 2015 11:06:43 AM Subject: Re: dumpling integration branch for v0.67.12 ready for QE Hi, That's too much information for me to digest quickly. Instead of stalling I will go ahead and merge the dumpling pull requests into the dumpling branch so that Yuri can proceed. And I'll take time to revise my understanding of the backport workflow with your input. Cheers On 10/02/2015 19:37, Sage Weil wrote: On Tue, 10 Feb 2015, Gregory Farnum wrote: On Tue, Feb 10, 2015 at 10:04 AM, Loic Dachary l...@dachary.org wrote: On 10/02/2015 18:29, Yuri Weinstein wrote: On 10/02/2015 18:19, Yuri Weinstein wrote: Loic, The only difference between options if we run suits on merged dumpling vs dumpling-backports first - is time. We will have to run suites on the final branch after the merge anyway. Could you explain why ? After merging dumpling and dumpling-backports will be exactly the same. Loic - I feel that finial QE validation should be done on the code that gets actually released to customers, e.g. dumpling branch after the merge. I do see your point about branches being identical and ready to change my mind if you insist. Does my reasoning make sense? Please advice, how we should proceed. The dumpling-backports branch currently is at https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 after a successful test run from QE and a merge into dumpling, the dumpling branch will be at https://github.com/ceph/ceph/commit/3944c77c404c4a05886fe8276d5d0dd7e4f20410 as well. In other words they are identical and there is no point in running a test again. The only reason why they could be different is if a commit is inadvertently added to the dumpling branch while testing happens on the dumpling-backport branc. In this case the presence of this new commit would be reason enough to run another round of test indeed. So the process could be: If tests are ok and merge can fast forward, then release. If tests are ok and merge cannot fast forward, send back to loic because a commit was added by accident and needs to be approved by the leads. If testing happens on the dumpling branch, adding a commit to the dumpling branch would have side effects that could taint the results of the tests or, even worse, go unnoticed. There is zero chance that someone adds a commit to the dumpling-backports branch and that gives us something stable. On the contrary, the odds that someone adds a commit to the dumpling branch are high, specially if the tests take a few weeks to complete. As Greg mentioned, merging in dumpling does not matter much for this round because it is what has been done in the past. And to be honest, I would not mind if an additional commit taints the process by accident. However, unless there is a reason not to, it would be good to establish a process that is solid if we can. I've witnessed Alfredo's pain on each release and an additional benefit of having a dumpling-backports branch that nobody tampers with just occured to me. When and if QE finds that dumpling-backports is fit for release, instead of merging it into dumpling it could be handed over to Alfredo for release. And he would be able to proceed knowing it is stable and won't be moving forward. Once the release is done and the tag set to the proper commit, the dumpling branch can be reset to dumpling-backports. If commits were added during the process, their authors could be notified that they were discarded and need to be merge again. That would not work for the master branch but it would definitely be possible for the stable branches because such out of process commits are rarely added. I've not thought this through, but the more I think about it the more I like the idea of using dumpling-backports as a staging area until the release is final. What's the purpose of even having a dumpling branch at that point? We're not using it for anything under your model. Yeah, it seems to me like the same general process we use for 'next' and 'master' would work here: - prepare a batch of backports, say dumpling-rgw-next - run it through the rgw suite - if that is okay, merge to dumpling - run regular tests on dumpling (all suites) so that dumpling acts as in integration branch the same way the others do. This is reasonably lightweight on process and means that our periodic scheduled runs are doing double duty for the integration testing and catching long-tail bugs. After talking through the last release vs 'next' branch race
QE validation of dumpling and giant releases
I am planning to schedule suites with high priority in the Octo and disable temporarily schedule in crontab today until validations are finished. Please let me know if you have any concerns about this. Thx YuriW -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: dumpling giant backports update
Loic, Thanks for the updates! YuriW On 2/2/15 3:09 PM, Loic Dachary wrote: Hi Yuri, There is one remaining issue in the dumpling backports (the details are here http://tracker.ceph.com/issues/10560). The giant integration branch has been updated today with all the pending pull requests (rgw in particular) and the rbd, rados and rgw suites scheduled (the details are here http://tracker.ceph.com/issues/10501). I'll analyze the result as soon as one of them finishes. The previous run was good and I'm hopeful the additional backports won't create unexpected difficulties. Cheers P.S. I moved the branches inventory to a wiki updatable via git to save the tedious copy / paste. There are here now : http://workbench.dachary.org/ceph/ceph-backports/wikis/pages -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upcoming dumpling v0.67.12
Loic, Here is the run from sepia http://pulpito.front.sepia.ceph.com/ubuntu-2015-01-26_09:26:27-upgrade:dumpling-dumpling-distro-basic-vps/ Two failures seems like env noise. Thx YuriW On Mon, Jan 26, 2015 at 9:49 AM, Loic Dachary l...@dachary.org wrote: Thanks for letting me know about the upgrade tests results, it's encouraging :-) I'll let you know when the tests make progress. On 26/01/2015 18:00, Yuri Weinstein wrote: Loic, Thanks for the update. I ran upgrade/dumpling last week (and all 42 jobs passed in octo and sepia) to establish a base line. And today running another one, assuming it will pick up the already merged pull requests. Let me know when you ready for next steps. Thx YuriW On Mon, Jan 26, 2015 at 7:37 AM, Loic Dachary l...@dachary.org wrote: Hi Yuri, Here is a short update on the progress of the upcoming dumpling v0.67.12. It is tracked with http://tracker.ceph.com/issues/10560. In the inventory part, there is a list of all pull requests that are already merged in the dumpling branch. There only is one pull request waiting to be merged and three issues waiting for backports. While these last three are being worked on, I started rbd, rgw and rados suites. I chose to display the inventory by pull request because I figured it would be more convenient to read because sometimes a single pull request spans multiple issues ( https://github.com/ceph/ceph/pull/2611 for instance fixes two issues ). Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upcoming dumpling v0.67.12
Loic, Thanks for the update. I ran upgrade/dumpling last week (and all 42 jobs passed in octo and sepia) to establish a base line. And today running another one, assuming it will pick up the already merged pull requests. Let me know when you ready for next steps. Thx YuriW On Mon, Jan 26, 2015 at 7:37 AM, Loic Dachary l...@dachary.org wrote: Hi Yuri, Here is a short update on the progress of the upcoming dumpling v0.67.12. It is tracked with http://tracker.ceph.com/issues/10560. In the inventory part, there is a list of all pull requests that are already merged in the dumpling branch. There only is one pull request waiting to be merged and three issues waiting for backports. While these last three are being worked on, I started rbd, rgw and rados suites. I chose to display the inventory by pull request because I figured it would be more convenient to read because sometimes a single pull request spans multiple issues ( https://github.com/ceph/ceph/pull/2611 for instance fixes two issues ). Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph-qa analysis output
Hi Loic, I'd love to. Let's chat on Monday to finalize format. ( PS: I think format would be easier to maintain like new lines enforced, e,g. '701575', '701576', '701582', '701590' ttp://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num jobs,... issue RE description (this I suppose can be retrieved from tracker, a? if yes, we may not need it at all) ) Would it be possible to feed output of those machine digested emails (and hopefully others) into this doc - https://docs.google.com/a/inktank.com/spreadsheets/d/1S01gkuA149U5XSLStuzEoh-14tK2ICGm5tsCUomuzTo/edit#gid=403616374 (I granted you access as it's still under development/review) ? PPS: Again this ticket http://tracker.ceph.com/issues/10455 would be helpful in what we are discussing here. Thx YuriW On Sat, Jan 17, 2015 at 5:07 AM, Loic Dachary l...@dachary.org wrote: Hi Yuri, It would be great if the analysis you compile daily was machine readable. For instance, in a mail you sent to ceph-qa I read '701575', '701576', '701582', '701590' - known issue http://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num) (duplicate of http://tracker.ceph.com/issues/10430) which could be something like: '701575', '701576', '701582', '701590': http://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num) or any other format you find easier to use consistently. The reason I ask is because it would help me write a script that associates redmine tickets to your findings, in the context of backporting. It's just an idea, not a request ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ceph-qa analysis output
Sounds good, Loic Are you aware BTW about scrape tool written by John Spray? https://github.com/jcsp/scrape I use it for test runs analysis often. Just FYI Thx YuriW On Sat, Jan 17, 2015 at 3:27 PM, Loic Dachary l...@dachary.org wrote: On 17/01/2015 23:08, Yuri Weinstein wrote: Hi Loic, I'd love to. Let's chat on Monday to finalize format. ( PS: I think format would be easier to maintain like new lines enforced, e,g. '701575', '701576', '701582', '701590' ttp://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num Yes, as long as it's consistent enough to be machine readable, that works :-) PPS: Again this ticket http://tracker.ceph.com/issues/10455 would be helpful in what we are discussing here. I'm not sure where the output of such a parsing would go. The update redmine API is currently broken (the read API works ok) but if it was fixed the tickets could be updated indeed. Cheers Thx YuriW On Sat, Jan 17, 2015 at 5:07 AM, Loic Dachary l...@dachary.org wrote: Hi Yuri, It would be great if the analysis you compile daily was machine readable. For instance, in a mail you sent to ceph-qa I read '701575', '701576', '701582', '701590' - known issue http://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num) (duplicate of http://tracker.ceph.com/issues/10430) which could be something like: '701575', '701576', '701582', '701590': http://tracker.ceph.com/issues/10543 FAILED assert(m_seed old_pg_num) or any other format you find easier to use consistently. The reason I ask is because it would help me write a script that associates redmine tickets to your findings, in the context of backporting. It's just an idea, not a request ;-) Cheers -- Loïc Dachary, Artisan Logiciel Libre -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Testing the next giant release
Look for them in new Octo lab - http://pulpito.ceph.redhat.com/ On Wed, Jan 7, 2015 at 3:03 PM, Loic Dachary l...@dachary.org wrote: Thanks Yuri Tamil ! One last question : http://pulpito.ceph.com/?branch=giant does not show a run of rbd or rgw. They would be useful to figure out what kind of errors I should expect. Are past results archived elsewhere by any chance ? On 07/01/2015 23:53, Tamil Muthamizhan wrote: yes, just those suites will do Loic. On Wed, Jan 7, 2015 at 2:34 PM, Yuri Weinstein yuri.weinst...@inktank.com wrote: Loic, I think if you run those on bare metals (not vps) they will run on whatever machines are available in the octo or sepia labs. Thx YuriW On Wed, Jan 7, 2015 at 2:28 PM, Loic Dachary l...@dachary.org wrote: On 07/01/2015 23:20, Tamil Muthamizhan wrote: hi Loic, we have a suite to perform smoke tests for rados/rbd/rgw. maybe you can try that to make sure things work. Are these the suites I should try: https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw any specific settings or just all of three without restrictions (i.e. all os versions etc.) Cheers once it looks good, we can have them scheduled to run using teuthology for a more elaborate run. Thanks, Tamil On Wed, Jan 7, 2015 at 6:02 AM, Loic Dachary l...@dachary.org mailto:l...@dachary.org wrote: Hi Tamil, I've merged / integrated the giant backports found at https://github.com/ceph/ceph/pull/3186 https://github.com/ceph/ceph/pull/3178 https://github.com/ceph/ceph/pull/2954 https://github.com/ceph/ceph/pull/3191 https://github.com/ceph/ceph/pull/3168 https://github.com/ceph/ceph/pull/3289 into http://workbench.dachary.org/ceph/ceph/commit/0ea20e6c51208d6710f469454ab3f964bfa7c9d2 and successfully ran make check on it http://workbench.dachary.org:8080/projects/10?ref=giant-backports If I'm not mistaken the next step would be to run teuthology. If I'm to do it, would you be so kind as to let me know which suites are most relevant ? If someone else will take care of it, should I push the integration branch somewhere ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- Regards, Tamil -- Loïc Dachary, Artisan Logiciel Libre -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Testing the next giant release
Loic, I think if you run those on bare metals (not vps) they will run on whatever machines are available in the octo or sepia labs. Thx YuriW On Wed, Jan 7, 2015 at 2:28 PM, Loic Dachary l...@dachary.org wrote: On 07/01/2015 23:20, Tamil Muthamizhan wrote: hi Loic, we have a suite to perform smoke tests for rados/rbd/rgw. maybe you can try that to make sure things work. Are these the suites I should try: https://github.com/ceph/ceph-qa-suite/tree/master/suites/rados https://github.com/ceph/ceph-qa-suite/tree/master/suites/rbd https://github.com/ceph/ceph-qa-suite/tree/master/suites/rgw any specific settings or just all of three without restrictions (i.e. all os versions etc.) Cheers once it looks good, we can have them scheduled to run using teuthology for a more elaborate run. Thanks, Tamil On Wed, Jan 7, 2015 at 6:02 AM, Loic Dachary l...@dachary.org mailto:l...@dachary.org wrote: Hi Tamil, I've merged / integrated the giant backports found at https://github.com/ceph/ceph/pull/3186 https://github.com/ceph/ceph/pull/3178 https://github.com/ceph/ceph/pull/2954 https://github.com/ceph/ceph/pull/3191 https://github.com/ceph/ceph/pull/3168 https://github.com/ceph/ceph/pull/3289 into http://workbench.dachary.org/ceph/ceph/commit/0ea20e6c51208d6710f469454ab3f964bfa7c9d2 and successfully ran make check on it http://workbench.dachary.org:8080/projects/10?ref=giant-backports If I'm not mistaken the next step would be to run teuthology. If I'm to do it, would you be so kind as to let me know which suites are most relevant ? If someone else will take care of it, should I push the integration branch somewhere ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- Regards, Tamil -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Swift tests failing randomly
Here is what we have in vps.yaml now: overrides: ceph: conf: global: osd heartbeat grace: 40 What do we want to add? ~ On Mon, Aug 11, 2014 at 10:13 AM, Sage Weil sw...@redhat.com wrote: On Mon, 11 Aug 2014, Yehuda Sadeh wrote: Yeah, looking at these logs, it really seem that it's just that things are going slow on these machines and it's hitting timeouts. The fix is ok with me, although I'd rather have it adjusted per machine type (somehow). There is a vps.yaml that bumps up another timeout, so we could put it there. Right now it lives on the teuthology machine (~teuthworker/vps.yaml I think?), but perhaps we should stick it in ceph-qa-suite.git somewhere ... sage Yehuda On Mon, Aug 11, 2014 at 9:21 AM, Loic Dachary l...@dachary.org wrote: Hi Yehuda, It looks like increasing the rgw idle timeout makes the problem go away ( https://github.com/ceph/ceph-qa-suite/pull/79 and http://tracker.ceph.com/issues/8988 ). It previously was 300 sec which looks like a large value already. Does this fix / workaround make sense to you ? Cheers On 10/08/2014 10:46, Loic Dachary wrote: Hi Yehuda, In the past few months the swift tests failed randomly and I was unfortunately unable to figure out why. Here are a few examples: http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406944 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406941 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406946 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406947 and it has happened on every upgrade test run since I can remember. I fail to see a pattern and cannot figure out what the real problem is. It would be really great if you could take a look. Even a hunch or a tip would be greatly appreciated :-) You can find more context in http://tracker.ceph.com/issues/8988 http://tracker.ceph.com/issues/8016 http://tracker.ceph.com/issues/7799 and discussions at http://www.spinics.net/lists/ceph-devel/msg19933.html Cheers -- Lo?c Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Swift tests failing randomly
I thought we could do the same in run-time for vps'es only. Sage? On Mon, Aug 11, 2014 at 11:47 AM, Loic Dachary l...@dachary.org wrote: On 11/08/2014 19:34, Yuri Weinstein wrote: Here is what we have in vps.yaml now: overrides: ceph: conf: global: osd heartbeat grace: 40 What do we want to add? I think the idle_timeout values at https://github.com/ceph/ceph-qa-suite/pull/79/files ~ On Mon, Aug 11, 2014 at 10:13 AM, Sage Weil sw...@redhat.com wrote: On Mon, 11 Aug 2014, Yehuda Sadeh wrote: Yeah, looking at these logs, it really seem that it's just that things are going slow on these machines and it's hitting timeouts. The fix is ok with me, although I'd rather have it adjusted per machine type (somehow). There is a vps.yaml that bumps up another timeout, so we could put it there. Right now it lives on the teuthology machine (~teuthworker/vps.yaml I think?), but perhaps we should stick it in ceph-qa-suite.git somewhere ... sage Yehuda On Mon, Aug 11, 2014 at 9:21 AM, Loic Dachary l...@dachary.org wrote: Hi Yehuda, It looks like increasing the rgw idle timeout makes the problem go away ( https://github.com/ceph/ceph-qa-suite/pull/79 and http://tracker.ceph.com/issues/8988 ). It previously was 300 sec which looks like a large value already. Does this fix / workaround make sense to you ? Cheers On 10/08/2014 10:46, Loic Dachary wrote: Hi Yehuda, In the past few months the swift tests failed randomly and I was unfortunately unable to figure out why. Here are a few examples: http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406944 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406941 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406946 http://pulpito.ceph.com/loic-2014-08-08_12:17:30-upgrade:firefly-x:stress-split-wip-9025-chunk-remapping-testing-basic-vps/406947 and it has happened on every upgrade test run since I can remember. I fail to see a pattern and cannot figure out what the real problem is. It would be really great if you could take a look. Even a hunch or a tip would be greatly appreciated :-) You can find more context in http://tracker.ceph.com/issues/8988 http://tracker.ceph.com/issues/8016 http://tracker.ceph.com/issues/7799 and discussions at http://www.spinics.net/lists/ceph-devel/msg19933.html Cheers -- Lo?c Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Firefly upgrade tests
I killed several runs that had been running for 2-3 days, hopefully it will speed up your runs. Thx YuriW On Sat, Jul 5, 2014 at 6:46 AM, Loic Dachary l...@dachary.org wrote: Hi, It looks like there is a shortage of VPS for some reason: http://pulpito.ceph.com/loic-2014-07-03_11:24:33-upgrade:firefly-x:stress-split-wip-8475-testing-basic-vps/ has a number of tests scheduled since ~48h and not making progress. Cheers On 04/07/2014 00:39, Loic Dachary wrote: Hi Ceph, The firefly-x test upgrade suite is designed to check that upgrading from Firefly to a newer version (master or a branch) works as expected. It was created it by copying dumpling-x and can be browsed at https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade/firefly-x To establish a baseline, a run was scheduled to upgrade from firefly to firefly (i.e. no upgrade really ;-) and it should therefore show that when nothing happens all is well. It however fails in various ways as can be seen here. ./virtualenv/bin/teuthology-suite --suite upgrade/firefly-x/stress-split --suite-dir ~/software/ceph/ceph-qa-suite --ceph firefly --machine-type vps --email l...@dachary.org http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/ * Command failed on vpm105 with status 1: 'sudo yum install -y http://gitbuilder.ceph.com/kernel-rpm-redhatenterpriseserver6-x86_64-basic/sha1/8102ce7556a99f6348067c60583320d308f36362/kernel.x86_64.rpm' Does that mean kernels are not ready yet for this distribution and the tests should be skipped ? * Command failed on vpm058 with status 1: SWIFT_TEST_CONFIG_FILE=/home/ubuntu/cephtest/archive/testswift.client.0.conf /home/ubuntu/cephtest/swift/virtualenv/bin/nosetests -w /home/ubuntu/cephtest/swift/test/functional -v -a '!fails_on_rgw' http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338941 Although it looks like http://tracker.ceph.com/issues/7808 which is a duplicate of http://tracker.ceph.com/issues/7799 it is slightly different and http://tracker.ceph.com/issues/8735 was created to keep track of it. * Command failed on vpm070 with status 1: 'sudo adjust-ulimits ceph-coverage /home/ubuntu/cephtest/archive/coverage daemon-helper kill ceph-osd -f -i 1' http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338904/ Although the root of the error seems to be that osd 1 cannot be killed by the thrasher, I don't see meaningfull error messages. http://tracker.ceph.com/issues/8736 was filed to keep track of this condition. * timed out waiting for admin_socket to appear after osd.1 restart http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338908/ It looks like a race : the osd is killed at the same time it is restarted by the thrasher and http://tracker.ceph.com/issues/8737 was opened for this * hang on INFO:teuthology.task.rados:joining rados http://pulpito.ceph.com/loic-2014-07-02_23:05:05-upgrade:firefly-x:stress-split-firefly-testing-basic-vps/338915/ It looks like a bug and http://tracker.ceph.com/issues/8740 was filed When the same suite is run to upgrade from firefly to master it gives http://pulpito.ceph.com/loic-2014-07-02_22:04:23-upgrade:firefly-x:stress-split-master-testing-basic-vps/ which shows the following errors: * Command failed on vpm105 with status 1: 'sudo yum install -y http://gitbuilder.ceph.com/kernel-rpm-redhatenterpriseserver6-x86_64-basic/sha1/8102ce7556a99f6348067c60583320d308f36362/kernel.x86_64.rpm' (same as above) * Could not reconnect to ubu...@vpm042.front.sepia.ceph.com : it looks like a transient timeout problem that can be ignored http://pulpito.ceph.com/loic-2014-07-02_22:04:23-upgrade:firefly-x:stress-split-master-testing-basic-vps/338891/ 2014-07-02T18:52:24.546 INFO:teuthology.orchestra.connection:{'username': u'ubuntu', 'hostname': u'vpm042.front.sepia.ceph.com', 'timeout': 60} * Command failed on vpm017 with status 1: SWIFT_TEST_CONFIG_FILE=/home/ubuntu/cephtest/archive/testswift.client.0.conf /home/ubuntu/cephtest/swift/virtualenv/bin/nosetests -w /home/ubuntu/cephtest/swift/test/functional -v -a '!fails_on_rgw' One of which looks exactly as http://tracker.ceph.com/issues/7799 which was re-opened * hang on INFO:teuthology.task.rados:joining rados (same as above) Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: teuthology task waiting for machines ( 8h)
Technically yes. If queue is busy - patience is needed. Assuming that there are no runs in the queue which are hung. Zack is diligently looking and fixing to prevent hung tests. If we see runs older then say one day, we kill them (altho 'teuthology-kill' is not working for me today :( ) Another option to speed up run - use PRIO (for priority) when scheduling it and/or use not plana machines as they are in high demand. Thx YuriW On Sat, Jun 28, 2014 at 3:27 AM, Loic Dachary l...@dachary.org wrote: Hi Zack, http://pulpito.ceph.com/loic-2014-06-27_18:45:37-upgrade:firefly-x:stress-split-wip-8475-testing-basic-plana/329515/ seems to indicate that the tasks cannot obtain the machines it needs: 2014-06-27T17:55:19.072 INFO:teuthology.task.internal:Locking machines... 2014-06-27T17:55:19.110 INFO:teuthology.task.internal:waiting for more machines to be free (need 3 see 5)... 2014-06-27T17:55:29.175 INFO:teuthology.task.internal:waiting for more machines to be free (need 3 see 5)... ... 2014-06-28T03:22:13.745 INFO:teuthology.task.internal:waiting for more machines to be free (need 3 see 0)... 2014-06-28T03:22:23.787 INFO:teuthology.task.internal:waiting for more machines to be free (need 3 see 0)... Is it something expected (for instance when tasks with a higher priorty take precedence) ? If it is then all that's needed is patience right ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recommended teuthology upgrade test
Loic I don't intent to answer all questions, but some info, see inline On Fri, Jun 27, 2014 at 8:16 AM, Loic Dachary l...@dachary.org wrote: Hi Sam, TL;DR: what oneliner do you recommend to run upgrade tests for https://github.com/ceph/ceph/pull/1890 ? Running the rados suite can be done with : ./schedule_suite.sh rados wip-8071 testing l...@dachary.org basic master plana It was replaced with teuthology-suite, see --help for more info or something else since ./schedule_suite.sh was recently obsoleted ( http://tracker.ceph.com/issues/8678 ). Running something similar for upgrade will presumably run all of https://github.com/ceph/ceph-qa-suite/tree/master/suites/upgrade Is there a way to run minimal tests by limiting the upgrade suite so that it only focuses on a firefly cluster that upgrades to https://github.com/ceph/ceph/pull/1890 so that it checks the behavior when running a mixed cluster (firefly + master with the change) ? You can run specifying argument with smaller suite, like this: dumpling-x/parallel It looks like http://pulpito.ceph.com/?suite=upgrade was never run ( at least that's what appears to cause http://tracker.ceph.com/issues/8681 ) Is http://pulpito.ceph.com/?suite=upgrade-rados a good fit ? If so is there a way to figure out how it was created ? Cheers -- Loïc Dachary, Artisan Logiciel Libre -- To unsubscribe from this list: send the line unsubscribe ceph-devel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html