[ceph-users] v14.2.2 Nautilus released
This is the second bug fix release of Ceph Nautilus release series. We recommend all Nautilus users upgrade to this release. For upgrading from older releases of ceph, general guidelines for upgrade to nautilus must be followed. Notable Changes --- * The no{up,down,in,out} related commands have been revamped. There are now 2 ways to set the no{up,down,in,out} flags: the old 'ceph osd [un]set ' command, which sets cluster-wide flags; and the new 'ceph osd [un]set-group ' command, which sets flags in batch at the granularity of any crush node, or device class. * radosgw-admin introduces two subcommands that allow the managing of expire-stale objects that might be left behind after a bucket reshard in earlier versions of RGW. One subcommand lists such objects and the other deletes them. Read the troubleshooting section of the dynamic resharding docs for details. * Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where deploying a single new (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization stats reported by ceph df. Until all OSDs have been reprovisioned or updated (via ceph-bluestore-tool repair), the pool stats will show values that are lower than the true value. This is resolved in 14.2.2, such that the cluster only switches to using the more accurate per-pool stats after all OSDs are 14.2.2 (or later), are BlueStore, and (if they were created prior to Nautilus) have been updated via the repair function. * The default value for mon_crush_min_required_version has been changed from firefly to hammer, which means the cluster will issue a health warning if your CRUSH tunables are older than hammer. There is generally a small (but non-zero) amount of data that will move around by making the switch to hammer tunables. If possible, we recommend that you set the oldest allowed client to hammer or later. You can tell what the current oldest allowed client is with: ceph osd dump | grep min_compat_client If the current value is older than hammer, you can tell whether it is safe to make this change by verifying that there are no clients older than hammer current connected to the cluster with: ceph features The newer straw2 CRUSH bucket type was introduced in hammer, and ensuring that all clients are hammer or newer allows new features only supported for straw2 buckets to be used, including the crush-compat mode for the Balancer. For a detailed changelog please refer to the official release notes entry at the ceph blog: https://ceph.com/releases/v14-2-2-nautilus-released/ Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-14.2.2.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 4f8fa0a0024755aae7d95567c63f11d6862d55be ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] How does monitor know OSD is dead?
I don't know if it's relevant here, but I saw similar behavior while implementing a Luminous->Nautilus automated upgrade test. When I used a single-node cluster with 4 OSDs, the Nautilus cluster would not function properly after the reboot. IIRC some OSDs were reported by "ceph -s" as up, even though they weren't running. I "fixed" the issue by adding a second node to the cluster. With two nodes (8 OSDs), the upgrade works fine. I will reproduce the issue again and open a bug report. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
for all others on this list, it might also be helpful to know which setups are likely affected. Does this only occur for Filestore disks, i.e. if ceph-volume has taken over taking care of these? Does it happen on every RHEL 7.5 system? It affects all OSDs managed by ceph-disk on all RHEL systems (but not on CentOS), regardless of whether they are filestore or bluestore. We're still on 13.2.0 here and ceph-detect-init works fine on our CentOS 7.5 systems (it just echoes "systemd"). We're on Bluestore. Should we hold off on an upgrade, or are we unaffected? The regression does not affect CentOS - only RHEL. Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
Strange... - wouldn't swear, but pretty sure v13.2.0 was working ok before - so what do others say/see? - no one on v13.2.1 so far (hard to believe) OR - just don't have this "systemctl ceph-osd.target" problem and all just works? If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your Linix OS and version (I'm on RHEL 7.5 here) ? :O Hi ceph.novice: I'm the one to blame for this regretful incident. Today I have reproduced the issue in teuthology: 2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running: 'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init' 2018-07-29T18:20:07.796 INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call last): 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/bin/ceph-detect-init", line 9, in 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 'ceph-detect-init')() 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 56, in run 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: print(ceph_detect_init.get(args.use_rhceph).init) 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", line 42, in get 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: release=release) 2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform: Platform is not supported.: rhel 7.5 Just to be sure, can you confirm? (I.e. issue the command "ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd" it gives an error like above?) I'm working on a fix now at https://github.com/ceph/ceph/pull/23303 Nathan On 07/29/2018 11:16 AM, ceph.nov...@habmalnefrage.de wrote: > Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr Von: "Vasu Kulkarni" An: ceph.nov...@habmalnefrage.de Cc: "Sage Weil" , ceph-users , "Ceph Development" Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released") On Sat, Jul 28, 2018 at 6:02 PM, wrote: Have you guys changed something with the systemctl startup of the OSDs? I think there is some kind of systemd issue hidden in mimic, https://tracker.ceph.com/issues/25004 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph plugin balancer error
Update: opened http://tracker.ceph.com/issues/24779 to track this bug, and am in the process of fixing it. The fix will make its way into a future mimic point release. Thanks, Chris, for bringing the issue to my attention! Nathan On 07/05/2018 11:27 AM, Nathan Cutler wrote: Hi Chris: I suggest you raise your openSUSE Ceph-related questions on the openSUSE Ceph mailing list instead of ceph-users. For info on how to join, go to https://en.opensuse.org/openSUSE:Ceph#Communication The version of Ceph currently shipping in Leap 15.0 is built against Python 3 and this, as you found, exposes python2-specific code in the Ceph codebase. We might reconsider this and push a Python 2 build to Leap 15.0 - let's discuss it on opensuse-ceph. Thanks, Nathan On 07/05/2018 09:12 AM, Chris Hsiang wrote: Hi, I am running test on ceph mimic 13.0.2.1874+ge31585919b-lp150.1.2 using openSUSE-Leap-15.0 when I ran "ceph balancer status", it errors out. g1:/var/log/ceph # ceph balancer status Error EIO: Module 'balancer' has experienced an error and cannot handle commands: 'dict' object has no attribute 'iteritems' what config need to be done in order to get it work? Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph plugin balancer error
Hi Chris: I suggest you raise your openSUSE Ceph-related questions on the openSUSE Ceph mailing list instead of ceph-users. For info on how to join, go to https://en.opensuse.org/openSUSE:Ceph#Communication The version of Ceph currently shipping in Leap 15.0 is built against Python 3 and this, as you found, exposes python2-specific code in the Ceph codebase. We might reconsider this and push a Python 2 build to Leap 15.0 - let's discuss it on opensuse-ceph. Thanks, Nathan On 07/05/2018 09:12 AM, Chris Hsiang wrote: Hi, I am running test on ceph mimic 13.0.2.1874+ge31585919b-lp150.1.2 using openSUSE-Leap-15.0 when I ran "ceph balancer status", it errors out. g1:/var/log/ceph # ceph balancer status Error EIO: Module 'balancer' has experienced an error and cannot handle commands: 'dict' object has no attribute 'iteritems' what config need to be done in order to get it work? Chris ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Does anyone use rcceph script in CentOS/SUSE?
To all who are running Ceph on CentOS or SUSE: do you use the "rcceph" script? The ceph RPMs ship it in /usr/sbin/rcceph (Why I ask: more-or-less the same functionality is provided by the ceph-osd.target and ceph-mon.target systemd units, and the script is no longer maintained, so we'd like to drop it from the RPM packaging unless someone is using it.) Thanks, Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph release cadence
From a backporter's perspective, the appealing options are the ones that reduce the number of stable releases in maintenance at any particular time. In the current practice, there are always at least two LTS releases, and sometimes a non-LTS release as well, that are "live" and supposed to be getting backports. For example: * when kraken was released, hammer and jewel were "live LTS" and kraken was "live non-LTS", for a total of three live releases. * when luminous was released, hammer and kraken were declared EoL and there are now only two "live LTS" releases and no "live non-LTS". During the period when there are three live releases, almost every bugfix seen as warranting a backport gets marked for backport to the two most recent stable releases. (For example, from January to August 2017 with very few exceptions tracker issues got marked "Backport: jewel, kraken", not just "Backport: jewel".) This, of course, doubled the backporting workload, simply because if a bug is severe enough to backport to the most recent non-LTS release, it must be severe enough to be backported to the most recent LTS release as well. Unfortunately, there aren't enough developers working on backports to cover this double workload, so in practice the non-LTS release gets insufficient attention. A "train" model could lower this backporting workload if it was accompanied by a declaration that the n-1 release gets backports for all important bugfixes and n-2 gets backports for critical bugfixes only (and n-3 gets EOLed). Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v10.2.9 Jewel released
v10.2.9 Jewel released == This point release fixes a regression introduced in v10.2.8. We recommend that all Jewel users upgrade. For more detailed information, see the complete changelog[1] and release notes[2]. Notable Changes --- * cephfs: Damaged MDS with 10.2.8 (pr#16282, Nathan Cutler) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-10.2.9.tar.gz * For packages, see http://docs.ceph.com/docs/master/install/get-packages/ * For ceph-deploy, see http://docs.ceph.com/docs/master/install/install-ceph-deploy * Release SHA1: 2ee413f77150c0f375ff6f10edd6c8f9c7d060d0 [1]: http://docs.ceph.com/docs/master/_downloads/v10.2.9.txt [2]: http://ceph.com/releases/v10-2-9-jewel-released/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] v10.2.8 Jewel released
v10.2.8 Jewel released == This point release brought a number of important bugfixes in all major components of Ceph. However, it also introduced a regression that could cause MDS damage, and a new release, v10.2.9, was published to address this. Therefore, Jewel users should not upgrade to this version – instead, we recommend upgrading directly to v10.2.9. That being said, the v10.2.8 release notes do contain important information, so please read on. For more detailed information, refer to the complete changelog[1] and the release notes[2]. OSD Removal Caveat -- There was a bug introduced in Jewel (#19119) that broke the mapping behavior when an “out” OSD that still existed in the CRUSH map was removed with ‘osd rm’. This could result in ‘misdirected op’ and other errors. The bug is now fixed, but the fix itself introduces the same risk because the behavior may vary between clients and OSDs. To avoid problems, please ensure that all OSDs are removed from the CRUSH map before deleting them. That is, be sure to do: ceph osd crush rm osd.123 before: ceph osd rm osd.123 Snap Trimmer Improvements - This release greatly improves control and throttling of the snap trimmer. It introduces the “osd max trimming pgs” option (defaulting to 2), which limits how many PGs on an OSD can be trimming snapshots at a time. And it restores the safe use of the “osd snap trim sleep” option, wihch defaults to 0 but otherwise adds the given number of seconds in delay between every dispatch of trim operations to the underlying system. Other Notable Changes - * build/ops: “osd marked itself down” will not recognised if host runs mon + osd on shutdown/reboot (pr#13492, Boris Ranto) * build/ops: ceph-base package missing dependency for psmisc (pr#13786, Nathan Cutler) * build/ops: enable build of ceph-resource-agents package on rpm-based os (pr#13606, Nathan Cutler) * build/ops: rbdmap.service not included in debian packaging (jewel-only) (pr#14383, Ken Dreyer) * cephfs: Journaler may execute on_safe contexts prematurely (pr#15468, “Yan, Zheng”) * cephfs: MDS assert failed when shutting down (issue#19204, pr#14683, John Spray) * cephfs: MDS goes readonly writing backtrace for a file whose data pool has been removed (pr#14682, John Spray) * cephfs: MDS server crashes due to inconsistent metadata (pr#14676, John Spray) * cephfs: No output for ceph mds rmfailed 0 –yes-i-really-mean-it command (pr#14674, John Spray) * cephfs: Test failure: test_data_isolated (tasks.cephfs.test_volume_client.TestVolumeClient) (pr#14685, “Yan, Zheng”) * cephfs: Test failure: test_open_inode (issue#18661, pr#14669, John Spray) * cephfs: The mount point break off when mds switch hanppened (pr#14679, Guan yunfei) * cephfs: ceph-fuse does not recover after lost connection to MDS (pr#14698, Kefu Chai, Henrik Korkuc, Patrick Donnelly) * cephfs: client: fix the cross-quota rename boundary check conditions (pr#14667, Greg Farnum) * cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to a file (pr#14684, Yang Honggang) * cephfs: non-local quota changes not visible until some IO is done (pr#15466, John Spray, Nathan Cutler) * cephfs: normalize file open flags internally used by cephfs (pr#15000, Jan Fajerski, “Yan, Zheng”) * common: monitor creation with IPv6 public network segfaults (pr#14324, Fabian Grünbichler) * common: radosstriper: protect aio_write API from calls with 0 bytes (pr#13254, Sebastien Ponce) * core: Objecter::epoch_barrier isn’t respected in _op_submit() (pr#14332, Ilya Dryomov) * core: clear divergent_priors set off disk (issue#17916, pr#14596, Greg Farnum) * core: improve snap trimming, enable restriction of parallelism (pr#14492, Samuel Just, Greg Farnum) * core: os/filestore/HashIndex: be loud about splits (pr#13788, Dan van der Ster) * core: os/filestore: fix clang static check warn use-after-free (pr#14044, liuchang0812, yaoning) * core: transient jerasure unit test failures (issue#18070, issue#17951, pr#14701, Kefu Chai, Pan Liu, Loic Dachary, Jason Dillaman) * core: two instances of omap_digest mismatch (issue#18533, pr#14204, Samuel Just, David Zafman) * doc: Improvements to crushtool manpage (issue#19649, pr#14635, Loic Dachary, Nathan Cutler) * doc: PendingReleaseNotes: note about 19119 (issue#19119, pr#13732, Sage Weil) * doc: admin ops: fix the quota section (issue#19397, pr#14654, Chu, Hua-Rong) * doc: radosgw-admin: add the ‘object stat’ command to usage (pr#13872, Pavan Rallabhandi) * doc: rgw S3 create bucket should not do response in json (pr#13874, Abhishek Lekshmanan) * fs: Invalid error code returned by MDS is causing a kernel client WARNING (pr#13831, Jan Fajerski, xie xingguo) * librbd: Incomplete declaration for ContextWQ in librbd/Journal.h (pr#14152, Boris Ranto) * librbd: Issues with C API image metadata retrieval functions (pr#14666
Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken
Hi Xiaoxi Just wanna to confirm again, according to the definition of "LTS" in ceph, Hammer suppose not EOL till Luminous is released, This is correct. before that, can we expecting hammer upgrades and packages on Precise/Other old OS will still be provided? We have all our server side ceph cluster on Jewel but the pain point is there are still a few thousands hypervisors still on Ubuntu 12.04 , thus have to maintain hammer for these old stuffs. Luminous release (and, hence, hammer EOL) is very close. Now would be a good time to test the upgrade and let us know which hammer fixes you need, if any. Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] tracker.ceph.com
Please let me know if you notice anything is amiss. I haven't received any email notifications since the crash. Normally on a Monday I'd have several dozen. -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.94.6 Hammer released
The basic idea is to copy the packages that are build by gitbuilders or by the buildpackage teuthology task in a central place. Because these packages are built, for development versions as well as stable versions[2]. And they are tested via teuthology. The packages that are published on http://ceph.com/ are rebuilt from scratch, using the process that Alfredo described. This is fine for the supported platforms and for the stable releases. But for the development releases and the platforms that are no longer supported but still built by gibuilders, we could just copy the packages over. Does that sound sensible ? Hi Loic: Community packages for "deprecated" platforms ("deprecated" in the sense that the Ceph developers are no longer testing on them) would be welcomed by many, I imagine. And the additional workload for the Stable Releases team is not large. The question is, where will the packages be copied *to*? -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] v0.80.11 Firefly released
On 11/20/2015 09:31 AM, Loic Dachary wrote: > Hi, > > On 20/11/2015 02:13, Yonghua Peng wrote: >> I have been using firefly release. is there an official documentation for >> upgrading? thanks. > > Here it is : http://docs.ceph.com/docs/firefly/install/upgrading-ceph/ > > Enjoy ! Also suggest you read the relevant section of the Hammer release notes: http://docs.ceph.com/docs/master/release-notes/#id27 -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] ceph packages for openSUSE 13.2, Factory, Tumbleweed
This is to announce that ceph has been packaged for openSUSE 13.2, openSUSE Factory, and openSUSE Tumbleweed. It is building in the OpenSUSE Build Service (OBS), filesystems:ceph project, from the development branch of what will become SUSE Enterprise Storage 2. https://build.opensuse.org/package/show/filesystems:ceph/ceph If you have the time and inclination to test the OBS ceph packages on openSUSE 13.2, Factory, and/or Tumbleweed, I will be interested to hear from you. The same applies if you need help downloading/installing the packages. Thanks and regards. -- Nathan Cutler Software Engineer Distributed Storage SUSE LINUX, s.r.o. Tel.: +420 284 084 037 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] xattrs vs. omap with radosgw
We've since merged something that stripes over several small xattrs so that we can keep things inline, but it hasn't been backported to hammer yet. See c6cdb4081e366f471b372102905a1192910ab2da. Hi Sage: You wrote yet - should we earmark it for hammer backport? Nathan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW - Can't download complete object
The code has been backported and should be part of the firefly 0.80.10 release and the hammer 0.94.2 release. Nathan On 05/14/2015 07:30 AM, Yehuda Sadeh-Weinraub wrote: The code is in wip-11620, abd it's currently on top of the next branch. We'll get it through the tests, then get it into hammer and firefly. I wouldn't recommend installing it in production without proper testing first. Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 7:22:10 PM Subject: Re: [ceph-users] RGW - Can't download complete object Thank you so much Yahuda! I look forward to testing these. Is there a way for me to pull this code in? Is it in master? On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub yeh...@redhat.com wrote: Ok, I dug a bit more, and it seems to me that the problem is with the manifest that was created. I was able to reproduce a similar issue (opened ceph bug #11622), for which I also have a fix. I created new tests to cover this issue, and we'll get those recent fixes as soon as we can, after we test for any regressions. Thanks, Yehuda - Original Message - From: Yehuda Sadeh-Weinraub yeh...@redhat.com To: Sean Sullivan seapasu...@uchicago.edu Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:33:07 PM Subject: Re: [ceph-users] RGW - Can't download complete object That's another interesting issue. Note that for part 12_80 the manifest specifies (I assume, by the messenger log) this part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 (note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14') whereas it seems that you do have the original part: default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80 (note the '2/...') The part that the manifest specifies does not exist, which makes me think that there is some weird upload sequence, something like: - client uploads part, upload finishes but client does not get ack for it - client retries (second upload) - client gets ack for the first upload and gives up on the second one But I'm not sure if it would explain the manifest, I'll need to take a look at the code. Could such a sequence happen with the client that you're using to upload? Yehuda - Original Message - From: Sean Sullivan seapasu...@uchicago.edu To: Yehuda Sadeh-Weinraub yeh...@redhat.com Cc: ceph-users@lists.ceph.com Sent: Wednesday, May 13, 2015 2:07:22 PM Subject: Re: [ceph-users] RGW - Can't download complete object Sorry for the delay. It took me a while to figure out how to do a range request and append the data to a single file. The good news is that the end file seems to be 14G in size which matches the files manifest size. The bad news is that the file is completely corrupt and the radosgw log has errors. I am using the following code to perform the download:: https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py Here is a clip of the log file:: -- 2015-05-11 15:28:52.313742 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.11 10.64.64.101:6809/942707 5 osd_op_reply(74566287 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12 [read 0~858004] v0'0 uv41308 ondisk = 0) v6 304+0+858004 (1180387808 0 2445559038) 0x7f53d005b1a0 con 0x7f56f8119240 2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12934184960 len=858004 2015-05-11 15:28:52.372453 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.45 10.64.64.101:6845/944590 2 osd_op_reply(74566142 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80 [read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6 302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30 2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=12145655808 len=4194304 2015-05-11 15:28:52.372501 7f57067fc700 0 ERROR: got unexpected error when trying to read object: -2 2015-05-11 15:28:52.426079 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.21 10.64.64.102:6856/1133473 16 osd_op_reply(74566144 default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12 [read 0~3671316] v0'0 uv41395 ondisk = 0) v6 304+0+3671316 (1695485150 0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0 2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb: io completion ofs=10786701312 len=3671316 2015-05-11 15:28:52.504072 7f570db7d700 1 -- 10.64.64.126:0/108 == osd.82 10.64.64.103:6857/88524 2 osd_op_reply(74566283