[ceph-users] v14.2.2 Nautilus released

2019-07-23 Thread Nathan Cutler
This is the second bug fix release of Ceph Nautilus release series. We
recommend all Nautilus users upgrade to this release. For upgrading from older
releases of ceph, general guidelines for upgrade to nautilus must be followed.

Notable Changes
---

* The no{up,down,in,out} related commands have been revamped. There are now 2
  ways to set the no{up,down,in,out} flags: the old 'ceph osd [un]set '
  command, which sets cluster-wide flags; and the new 'ceph osd [un]set-group
   ' command, which sets flags in batch at the granularity of any
  crush node, or device class.

* radosgw-admin introduces two subcommands that allow the managing of
  expire-stale objects that might be left behind after a bucket reshard in
  earlier versions of RGW. One subcommand lists such objects and the other
  deletes them. Read the troubleshooting section of the dynamic resharding docs
  for details.

* Earlier Nautilus releases (14.2.1 and 14.2.0) have an issue where deploying a
  single new (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was
  originally deployed pre-Nautilus) breaks the pool utilization stats reported
  by ceph df. Until all OSDs have been reprovisioned or updated (via
  ceph-bluestore-tool repair), the pool stats will show values that are lower
  than the true value. This is resolved in 14.2.2, such that the cluster only
  switches to using the more accurate per-pool stats after all OSDs are 14.2.2
  (or later), are BlueStore, and (if they were created prior to Nautilus) have
  been updated via the repair function.

* The default value for mon_crush_min_required_version has been changed from
  firefly to hammer, which means the cluster will issue a health warning if
  your CRUSH tunables are older than hammer. There is generally a small (but
  non-zero) amount of data that will move around by making the switch to hammer
  tunables.

  If possible, we recommend that you set the oldest allowed client to hammer or
  later. You can tell what the current oldest allowed client is with:

  ceph osd dump | grep min_compat_client

  If the current value is older than hammer, you can tell whether it is safe to
  make this change by verifying that there are no clients older than hammer
  current connected to the cluster with:

  ceph features

  The newer straw2 CRUSH bucket type was introduced in hammer, and ensuring
  that all clients are hammer or newer allows new features only supported for
  straw2 buckets to be used, including the crush-compat mode for the Balancer.

For a detailed changelog please refer to the official release notes 
entry at the ceph blog: https://ceph.com/releases/v14-2-2-nautilus-released/


Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-14.2.2.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 4f8fa0a0024755aae7d95567c63f11d6862d55be
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How does monitor know OSD is dead?

2019-07-10 Thread Nathan Cutler
I don't know if it's relevant here, but I saw similar behavior while 
implementing
a Luminous->Nautilus automated upgrade test. When I used a single-node cluster
with 4 OSDs, the Nautilus cluster would not function properly after the reboot.
IIRC some OSDs were reported by "ceph -s" as up, even though they weren't 
running.

I "fixed" the issue by adding a second node to the cluster. With two nodes (8
OSDs), the upgrade works fine.

I will reproduce the issue again and open a bug report.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-30 Thread Nathan Cutler

for all others on this list, it might also be helpful to know which setups are 
likely affected.
Does this only occur for Filestore disks, i.e. if ceph-volume has taken over 
taking care of these?
Does it happen on every RHEL 7.5 system?


It affects all OSDs managed by ceph-disk on all RHEL systems (but not on 
CentOS), regardless of whether they are filestore or bluestore.



We're still on 13.2.0 here and ceph-detect-init works fine on our CentOS 7.5 systems (it 
just echoes "systemd").
We're on Bluestore.
Should we hold off on an upgrade, or are we unaffected?


The regression does not affect CentOS - only RHEL.

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")

2018-07-29 Thread Nathan Cutler

Strange...
- wouldn't swear, but pretty sure v13.2.0 was working ok before
- so what do others say/see?
  - no one on v13.2.1 so far (hard to believe) OR
  - just don't have this "systemctl ceph-osd.target" problem and all just works?

If you also __MIGRATED__ from Luminous (say ~ v12.2.5 or older) to Mimic (say 
v13.2.0 -> v13.2.1) and __DO NOT__ see the same systemctl problems, whats your 
Linix OS and version (I'm on RHEL 7.5 here) ? :O


Hi ceph.novice:

I'm the one to blame for this regretful incident. Today I have 
reproduced the issue in teuthology:


2018-07-29T18:20:07.288 INFO:teuthology.orchestra.run.ovh093:Running: 
'sudo TESTDIR=/home/ubuntu/cephtest bash -c ceph-detect-init'
2018-07-29T18:20:07.796 
INFO:teuthology.orchestra.run.ovh093.stderr:Traceback (most recent call 
last):
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/bin/ceph-detect-init", line 9, in 
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
load_entry_point('ceph-detect-init==1.0.1', 'console_scripts', 
'ceph-detect-init')()
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/main.py", line 
56, in run
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
print(ceph_detect_init.get(args.use_rhceph).init)
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
File "/usr/lib/python2.7/site-packages/ceph_detect_init/__init__.py", 
line 42, in get
2018-07-29T18:20:07.797 INFO:teuthology.orchestra.run.ovh093.stderr: 
release=release)
2018-07-29T18:20:07.797 
INFO:teuthology.orchestra.run.ovh093.stderr:ceph_detect_init.exc.UnsupportedPlatform: 
Platform is not supported.: rhel  7.5


Just to be sure, can you confirm? (I.e. issue the command 
"ceph-detect-init" on your RHEL 7.5 system. Instead of saying "systemd" 
it gives an error like above?)


I'm working on a fix now at https://github.com/ceph/ceph/pull/23303

Nathan

On 07/29/2018 11:16 AM, ceph.nov...@habmalnefrage.de wrote:
>



  


Gesendet: Sonntag, 29. Juli 2018 um 03:15 Uhr
Von: "Vasu Kulkarni" 
An: ceph.nov...@habmalnefrage.de
Cc: "Sage Weil" , ceph-users , "Ceph 
Development" 
Betreff: Re: [ceph-users] HELP! --> CLUSER DOWN (was "v13.2.1 Mimic released")
On Sat, Jul 28, 2018 at 6:02 PM,  wrote:

Have you guys changed something with the systemctl startup of the OSDs?


I think there is some kind of systemd issue hidden in mimic,
https://tracker.ceph.com/issues/25004
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph plugin balancer error

2018-07-05 Thread Nathan Cutler

Update: opened http://tracker.ceph.com/issues/24779 to track this bug,
and am in the process of fixing it.

The fix will make its way into a future mimic point release.

Thanks, Chris, for bringing the issue to my attention!

Nathan

On 07/05/2018 11:27 AM, Nathan Cutler wrote:

Hi Chris:

I suggest you raise your openSUSE Ceph-related questions on the openSUSE 
Ceph mailing list instead of ceph-users. For info on how to join, go to


https://en.opensuse.org/openSUSE:Ceph#Communication

The version of Ceph currently shipping in Leap 15.0 is built against 
Python 3 and this, as you found, exposes python2-specific code in the 
Ceph codebase.


We might reconsider this and push a Python 2 build to Leap 15.0 - let's 
discuss it on opensuse-ceph.


Thanks,
Nathan

On 07/05/2018 09:12 AM, Chris Hsiang wrote:

Hi,

I am running test on ceph mimic  13.0.2.1874+ge31585919b-lp150.1.2 
using openSUSE-Leap-15.0


when I ran "ceph balancer status", it errors out.

g1:/var/log/ceph # ceph balancer status
Error EIO: Module 'balancer' has experienced an error and cannot 
handle commands: 'dict' object has no attribute 'iteritems'


what config need to be done in order to get it work?

Chris


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph plugin balancer error

2018-07-05 Thread Nathan Cutler

Hi Chris:

I suggest you raise your openSUSE Ceph-related questions on the openSUSE 
Ceph mailing list instead of ceph-users. For info on how to join, go to


https://en.opensuse.org/openSUSE:Ceph#Communication

The version of Ceph currently shipping in Leap 15.0 is built against 
Python 3 and this, as you found, exposes python2-specific code in the 
Ceph codebase.


We might reconsider this and push a Python 2 build to Leap 15.0 - let's 
discuss it on opensuse-ceph.


Thanks,
Nathan

On 07/05/2018 09:12 AM, Chris Hsiang wrote:

Hi,

I am running test on ceph mimic  13.0.2.1874+ge31585919b-lp150.1.2 using 
openSUSE-Leap-15.0


when I ran "ceph balancer status", it errors out.

g1:/var/log/ceph # ceph balancer status
Error EIO: Module 'balancer' has experienced an error and cannot handle 
commands: 'dict' object has no attribute 'iteritems'


what config need to be done in order to get it work?

Chris


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Does anyone use rcceph script in CentOS/SUSE?

2018-01-11 Thread Nathan Cutler
To all who are running Ceph on CentOS or SUSE: do you use the "rcceph" 
script? The ceph RPMs ship it in /usr/sbin/rcceph


(Why I ask: more-or-less the same functionality is provided by the 
ceph-osd.target and ceph-mon.target systemd units, and the script is no 
longer maintained, so we'd like to drop it from the RPM packaging unless 
someone is using it.)


Thanks,
Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph release cadence

2017-09-11 Thread Nathan Cutler
From a backporter's perspective, the appealing options are the ones 
that reduce the number of stable releases in maintenance at any 
particular time.


In the current practice, there are always at least two LTS releases, and 
sometimes a non-LTS release as well, that are "live" and supposed to be 
getting backports. For example:


* when kraken was released, hammer and jewel were "live LTS" and kraken 
was "live non-LTS", for a total of three live releases.


* when luminous was released, hammer and kraken were declared EoL and 
there are now only two "live LTS" releases and no "live non-LTS".


During the period when there are three live releases, almost every 
bugfix seen as warranting a backport gets marked for backport to the two 
most recent stable releases. (For example, from January to August 2017 
with very few exceptions tracker issues got marked "Backport: jewel, 
kraken", not just "Backport: jewel".) This, of course, doubled the 
backporting workload, simply because if a bug is severe enough to 
backport to the most recent non-LTS release, it must be severe enough to 
be backported to the most recent LTS release as well. Unfortunately, 
there aren't enough developers working on backports to cover this double 
workload, so in practice the non-LTS release gets insufficient attention.


A "train" model could lower this backporting workload if it was 
accompanied by a declaration that the n-1 release gets backports for all 
important bugfixes and n-2 gets backports for critical bugfixes only 
(and n-3 gets EOLed).


Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v10.2.9 Jewel released

2017-07-14 Thread Nathan Cutler

v10.2.9 Jewel released
==

This point release fixes a regression introduced in v10.2.8.

We recommend that all Jewel users upgrade.

For more detailed information, see the complete changelog[1]
and release notes[2].

Notable Changes
---

* cephfs: Damaged MDS with 10.2.8 (pr#16282, Nathan Cutler)

Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-10.2.9.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* For ceph-deploy, see 
http://docs.ceph.com/docs/master/install/install-ceph-deploy

* Release SHA1: 2ee413f77150c0f375ff6f10edd6c8f9c7d060d0

[1]: http://docs.ceph.com/docs/master/_downloads/v10.2.9.txt
[2]: http://ceph.com/releases/v10-2-9-jewel-released/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] v10.2.8 Jewel released

2017-07-14 Thread Nathan Cutler

v10.2.8 Jewel released
==

This point release brought a number of important bugfixes in all major
components of Ceph. However, it also introduced a regression that
could cause MDS damage, and a new release, v10.2.9, was published to
address this.  Therefore, Jewel users should not upgrade to this
version – instead, we recommend upgrading directly to v10.2.9.

That being said, the v10.2.8 release notes do contain important
information, so please read on.

For more detailed information, refer to the complete changelog[1] and
the release notes[2].

OSD Removal Caveat
--

There was a bug introduced in Jewel (#19119) that broke the mapping
behavior when an “out” OSD that still existed in the CRUSH map was
removed with ‘osd rm’.  This could result in ‘misdirected op’ and
other errors. The bug is now fixed, but the fix itself introduces the
same risk because the behavior may vary between clients and OSDs. To
avoid problems, please ensure that all OSDs are removed from the CRUSH
map before deleting them. That is, be sure to do:

   ceph osd crush rm osd.123

before:

   ceph osd rm osd.123

Snap Trimmer Improvements
-

This release greatly improves control and throttling of the snap
trimmer. It introduces the “osd max trimming pgs” option (defaulting
to 2), which limits how many PGs on an OSD can be trimming snapshots
at a time. And it restores the safe use of the “osd snap trim sleep”
option, wihch defaults to 0 but otherwise adds the given number of
seconds in delay between every dispatch of trim operations to the
underlying system.

Other Notable Changes
-

* build/ops: “osd marked itself down” will not recognised if host runs
  mon + osd on shutdown/reboot (pr#13492, Boris Ranto)
* build/ops: ceph-base package missing dependency for psmisc
  (pr#13786, Nathan Cutler)
* build/ops: enable build of ceph-resource-agents package on rpm-based
  os (pr#13606, Nathan Cutler)
* build/ops: rbdmap.service not included in debian packaging
  (jewel-only) (pr#14383, Ken Dreyer)
* cephfs: Journaler may execute on_safe contexts prematurely
  (pr#15468, “Yan, Zheng”)
* cephfs: MDS assert failed when shutting down (issue#19204, pr#14683,
  John Spray)
* cephfs: MDS goes readonly writing backtrace for a file whose data
  pool has been removed (pr#14682, John Spray)
* cephfs: MDS server crashes due to inconsistent metadata (pr#14676,
  John Spray)
* cephfs: No output for ceph mds rmfailed 0 –yes-i-really-mean-it
  command (pr#14674, John Spray)
* cephfs: Test failure: test_data_isolated
  (tasks.cephfs.test_volume_client.TestVolumeClient) (pr#14685, “Yan,
  Zheng”)
* cephfs: Test failure: test_open_inode (issue#18661, pr#14669, John
  Spray)
* cephfs: The mount point break off when mds switch hanppened
  (pr#14679, Guan yunfei)
* cephfs: ceph-fuse does not recover after lost connection to MDS
  (pr#14698, Kefu Chai, Henrik Korkuc, Patrick Donnelly)
* cephfs: client: fix the cross-quota rename boundary check conditions
  (pr#14667, Greg Farnum)
* cephfs: mds is crushed, after I set about 400 64KB xattr kv pairs to
  a file (pr#14684, Yang Honggang)
* cephfs: non-local quota changes not visible until some IO is done
  (pr#15466, John Spray, Nathan Cutler)
* cephfs: normalize file open flags internally used by cephfs
  (pr#15000, Jan Fajerski, “Yan, Zheng”)
* common: monitor creation with IPv6 public network segfaults
  (pr#14324, Fabian Grünbichler)
* common: radosstriper: protect aio_write API from calls with 0 bytes
  (pr#13254, Sebastien Ponce)
* core: Objecter::epoch_barrier isn’t respected in _op_submit()
  (pr#14332, Ilya Dryomov)
* core: clear divergent_priors set off disk (issue#17916, pr#14596,
  Greg Farnum)
* core: improve snap trimming, enable restriction of parallelism
  (pr#14492, Samuel Just, Greg Farnum)
* core: os/filestore/HashIndex: be loud about splits (pr#13788, Dan
  van der Ster)
* core: os/filestore: fix clang static check warn use-after-free
  (pr#14044, liuchang0812, yaoning)
* core: transient jerasure unit test failures (issue#18070,
  issue#17951, pr#14701, Kefu Chai, Pan Liu, Loic Dachary, Jason
  Dillaman)
* core: two instances of omap_digest mismatch (issue#18533, pr#14204,
  Samuel Just, David Zafman)
* doc: Improvements to crushtool manpage (issue#19649, pr#14635, Loic
  Dachary, Nathan Cutler)
* doc: PendingReleaseNotes: note about 19119 (issue#19119, pr#13732,
  Sage Weil)
* doc: admin ops: fix the quota section (issue#19397, pr#14654, Chu,
  Hua-Rong)
* doc: radosgw-admin: add the ‘object stat’ command to usage
  (pr#13872, Pavan Rallabhandi)
* doc: rgw S3 create bucket should not do response in json (pr#13874,
  Abhishek Lekshmanan)
* fs: Invalid error code returned by MDS is causing a kernel client
  WARNING (pr#13831, Jan Fajerski, xie xingguo)
* librbd: Incomplete declaration for ContextWQ in librbd/Journal.h
  (pr#14152, Boris Ranto)
* librbd: Issues with C API image metadata retrieval functions
  (pr#14666

Re: [ceph-users] Ceph Package Repo on Ubuntu Precise(12.04) is broken

2017-04-25 Thread Nathan Cutler

Hi Xiaoxi


 Just wanna to confirm again, according to the definition of
"LTS" in ceph, Hammer suppose not EOL till Luminous is released,


This is correct.


before that, can we expecting  hammer upgrades and packages on
Precise/Other old OS will still be provided?

  We have all our server side ceph cluster on Jewel but the pain
point is there are still a few thousands hypervisors still on Ubuntu
12.04 , thus have to maintain hammer for these old stuffs.


Luminous release (and, hence, hammer EOL) is very close. Now would be a 
good time to test the upgrade and let us know which hammer fixes you 
need, if any.


Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] tracker.ceph.com

2016-12-19 Thread Nathan Cutler

Please let me know if you notice anything is amiss.


I haven't received any email notifications since the crash. Normally on 
a Monday I'd have several dozen.


--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.94.6 Hammer released

2016-02-29 Thread Nathan Cutler

The basic idea is to copy the packages that are build by gitbuilders or by the 
buildpackage teuthology task in a central place. Because these packages are 
built, for development versions as well as stable versions[2]. And they are 
tested via teuthology. The packages that are published on http://ceph.com/ are 
rebuilt from scratch, using the process that Alfredo described. This is fine 
for the supported platforms and for the stable releases. But for the 
development releases and the platforms that are no longer supported but still 
built by gibuilders, we could just copy the packages over.

Does that sound sensible ?


Hi Loic:

Community packages for "deprecated" platforms ("deprecated" in the sense 
that the Ceph developers are no longer testing on them) would be 
welcomed by many, I imagine. And the additional workload for the Stable 
Releases team is not large. The question is, where will the packages be 
copied *to*?


--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] v0.80.11 Firefly released

2015-11-24 Thread Nathan Cutler
On 11/20/2015 09:31 AM, Loic Dachary wrote:
> Hi,
> 
> On 20/11/2015 02:13, Yonghua Peng wrote:
>> I have been using firefly release. is there an official documentation for 
>> upgrading? thanks.
> 
> Here it is : http://docs.ceph.com/docs/firefly/install/upgrading-ceph/
> 
> Enjoy !

Also suggest you read the relevant section of the Hammer release notes:

http://docs.ceph.com/docs/master/release-notes/#id27

-- 
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph packages for openSUSE 13.2, Factory, Tumbleweed

2015-07-13 Thread Nathan Cutler
This is to announce that ceph has been packaged for openSUSE 13.2, 
openSUSE Factory, and openSUSE Tumbleweed. It is building in the 
OpenSUSE Build Service (OBS), filesystems:ceph project, from the 
development branch of what will become SUSE Enterprise Storage 2.


https://build.opensuse.org/package/show/filesystems:ceph/ceph

If you have the time and inclination to test the OBS ceph packages on
openSUSE 13.2, Factory, and/or Tumbleweed, I will be interested to hear 
from you. The same applies if you need help downloading/installing the 
packages.


Thanks and regards.

--
Nathan Cutler
Software Engineer Distributed Storage
SUSE LINUX, s.r.o.
Tel.: +420 284 084 037


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] xattrs vs. omap with radosgw

2015-06-17 Thread Nathan Cutler
 We've since merged something 
 that stripes over several small xattrs so that we can keep things inline, 
 but it hasn't been backported to hammer yet.  See
 c6cdb4081e366f471b372102905a1192910ab2da.

Hi Sage:

You wrote yet - should we earmark it for hammer backport?

Nathan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW - Can't download complete object

2015-05-30 Thread Nathan Cutler
The code has been backported and should be part of the firefly 0.80.10 
release and the hammer 0.94.2 release.


Nathan

On 05/14/2015 07:30 AM, Yehuda Sadeh-Weinraub wrote:

The code is in wip-11620, abd it's currently on top of the next branch. We'll 
get it through the tests, then get it into hammer and firefly. I wouldn't 
recommend installing it in production without proper testing first.

Yehuda

- Original Message -

From: Sean Sullivan seapasu...@uchicago.edu
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 7:22:10 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

Thank you so much Yahuda! I look forward to testing these. Is there a way
for me to pull this code in? Is it in master?


On May 13, 2015 7:08:44 PM Yehuda Sadeh-Weinraub yeh...@redhat.com wrote:


Ok, I dug a bit more, and it seems to me that the problem is with the
manifest that was created. I was able to reproduce a similar issue (opened
ceph bug #11622), for which I also have a fix.

I created new tests to cover this issue, and we'll get those recent fixes
as soon as we can, after we test for any regressions.

Thanks,
Yehuda

- Original Message -

From: Yehuda Sadeh-Weinraub yeh...@redhat.com
To: Sean Sullivan seapasu...@uchicago.edu
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:33:07 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

That's another interesting issue. Note that for part 12_80 the manifest
specifies (I assume, by the messenger log) this part:



default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

(note the 'tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14')

whereas it seems that you do have the original part:


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.12_80

(note the '2/...')

The part that the manifest specifies does not exist, which makes me think
that there is some weird upload sequence, something like:

  - client uploads part, upload finishes but client does not get ack for
  it
  - client retries (second upload)
  - client gets ack for the first upload and gives up on the second one

But I'm not sure if it would explain the manifest, I'll need to take a
look
at the code. Could such a sequence happen with the client that you're
using
to upload?

Yehuda

- Original Message -

From: Sean Sullivan seapasu...@uchicago.edu
To: Yehuda Sadeh-Weinraub yeh...@redhat.com
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:07:22 PM
Subject: Re: [ceph-users] RGW - Can't download complete object

Sorry for the delay. It took me a while to figure out how to do a range
request and append the data to a single file. The good news is that the
end
file seems to be 14G in size which matches the files manifest size. The
bad
news is that the file is completely corrupt and the radosgw log has
errors.
I am using the following code to perform the download::



https://raw.githubusercontent.com/mumrah/s3-multipart/master/s3-mp-download.py


Here is a clip of the log file::
--
2015-05-11 15:28:52.313742 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.11 10.64.64.101:6809/942707 5  osd_op_reply(74566287


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.13_12

[read 0~858004] v0'0 uv41308 ondisk = 0) v6  304+0+858004
(1180387808 0
2445559038) 0x7f53d005b1a0 con 0x7f56f8119240
2015-05-11 15:28:52.313797 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12934184960 len=858004
2015-05-11 15:28:52.372453 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.45 10.64.64.101:6845/944590 2  osd_op_reply(74566142


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.tJ8UddmcCxe0lOsgfHR9Q-ZHXdlrM14.12_80

[read 0~4194304] v0'0 uv0 ack = -2 ((2) No such file or directory)) v6

302+0+0 (3754425489 0 0) 0x7f53d005b1a0 con 0x7f56f81b1f30
2015-05-11 15:28:52.372494 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=12145655808 len=4194304

2015-05-11 15:28:52.372501 7f57067fc700  0 ERROR: got unexpected error
when
trying to read object: -2

2015-05-11 15:28:52.426079 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.21 10.64.64.102:6856/1133473 16  osd_op_reply(74566144


default.20283.1__shadow_b235040a-46b6-42b3-b134-962b1f8813d5/28357709e44fff211de63b1d2c437159.bam.2/-ztodNISNLlaNeV4kDmrQwmkECBP2mZ.11_12

[read 0~3671316] v0'0 uv41395 ondisk = 0) v6  304+0+3671316
(1695485150
0 3933234139) 0x7f53d005b1a0 con 0x7f56f81e17d0
2015-05-11 15:28:52.426123 7f57067fc700 20 get_obj_aio_completion_cb:
io
completion ofs=10786701312 len=3671316
2015-05-11 15:28:52.504072 7f570db7d700  1 -- 10.64.64.126:0/108
==
osd.82 10.64.64.103:6857/88524 2  osd_op_reply(74566283