hello All,
I am trying to upgrade a small test setup having one monitor and one osd
node which is in hammer release .
I updating from hammer to jewel using package update commands and things
are working.
How ever after updating from Jewel to Luminous, i am facing issues with osd
failing to start
On Tue, Sep 12, 2017 at 3:12 PM, Katie Holly wrote:
> Ben and Brad,
>
> big thanks to both of you for helping me track down this issue which -
> seemingly - was caused by more than one radosgw instance sharing the exact
> same --name value and solved by generating unique keys
Ben and Brad,
big thanks to both of you for helping me track down this issue which -
seemingly - was caused by more than one radosgw instance sharing the exact same
--name value and solved by generating unique keys and --name values for each
single radosgw instance.
Right now, all ceph-mgr
They all share the exact same exec arguments, so yes, they all have the same
--name as well. I'll try to run them with different --name parameters to see if
that solves the issue.
--
Katie
On 2017-09-12 06:13, Ben Hines wrote:
> Do the docker containers all have the same rgw --name ? Maybe
On Tue, Sep 12, 2017 at 2:11 PM, Katie Holly wrote:
> All radosgw instances are running
>> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
> as Docker containers, there are 15 of them at any possible time
>
>
> The "config"/exec-args for the radosgw
All radosgw instances are running
> ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
as Docker containers, there are 15 of them at any possible time
The "config"/exec-args for the radosgw instances are:
/usr/bin/radosgw \
-d \
--cluster=ceph \
--conf=/dev/null
It seems like it's choking on the report from the rados gateway. What
version is the rgw node running?
If possible, could you shut down the rgw and see if you can then start ceph-mgr?
Pure stab in the dark just to see if the problem is tied to the rgw instance.
On Tue, Sep 12, 2017 at 1:07 PM,
Thanks, I totally forgot to check the tracker. I added the information I
collected there, but don't have enough experience with ceph to dig through this
myself so let's see if someone is willing to sacrifice their free time to help
debugging this issue.
--
Katie
On 2017-09-12 03:15, Brad
Could this also have an effect on kRBD client's?
If so what ceph auth caps command should we use?
From: ceph-users on behalf of Jason
Dillaman
Sent: 11 September 2017 22:00:47
To: Nico Schottelius
Cc:
Looks like there is a tracker opened for this.
http://tracker.ceph.com/issues/21197
Please add your details there.
On Tue, Sep 12, 2017 at 11:04 AM, Katie Holly wrote:
> Hi,
>
> I recently upgraded one of our clusters from Kraken to Luminous (the cluster
> was initialized
Hi,
I recently upgraded one of our clusters from Kraken to Luminous (the cluster
was initialized with Jewel) on Ubuntu 16.04 and deployed ceph-mgr on all of our
ceph-mon nodes with ceph-deploy.
Related log entries after initial deployment of ceph-mgr:
2017-09-11 06:41:53.535025 7fb5aa7b8500
(Apologies if this is a double post - I think my phone turned it into
HTML and so bounced from ceph-devel)...
We currently use both upstream and distro (RHCS) versions on different
clusters. Downstream releases are still free to apply their own
models.
I like the idea of a predictable (and more
On 7 September 2017 at 01:23, Sage Weil wrote:
> * Drop the odd releases, and aim for a ~9 month cadence. This splits the
> difference between the current even/odd pattern we've been doing.
>
> + eliminate the confusing odd releases with dubious value
> + waiting for the next
Take a look at these which should answer at least some of your questions.
http://ceph.com/community/new-luminous-bluestore/
http://ceph.com/planet/understanding-bluestore-cephs-new-storage-backend/
On Mon, Sep 11, 2017 at 8:45 PM, Richard Hesketh
wrote:
> On
Great to see this issue sorted.
I have to say I am quite surprised anyone would implement the
export/import workaround mentioned here without *first* racing to this
ML or IRC and crying out for help. This is a valuable resource, made
more so by people sharing issues.
Cheers,
On 12 September
On 12 September 2017 at 01:15, Blair Bethwaite
wrote:
> Flow-control may well just mask the real problem. Did your throughput
> improve? Also, does that mean flow-control is on for all ports on the
> switch...? IIUC, then such "global pause" flow-control will mean
We have generally been running the latest non LTS 'stable' release since my
cluster is slightly less mission critical than others, and there were
important features to us added in both Infernalis and Kraken. But i really
only care about RGW. If the rgw component could be split out of ceph into a
On Wed, Sep 6, 2017 at 4:23 PM, Sage Weil wrote:
> Hi everyone,
>
> Traditionally, we have done a major named "stable" release twice a year,
> and every other such release has been an "LTS" release, with fixes
> backported for 1-2 years.
>
> With kraken and luminous we missed
For opennebula this would be
http://docs.opennebula.org/5.4/deployment/open_cloud_storage_setup/ceph_ds.html
(added opennebula in CC)
Jason Dillaman writes:
> Yes -- the upgrade documentation definitely needs to be updated to add
> a pre-monitor upgrade step to verify
Yes -- the upgrade documentation definitely needs to be updated to add
a pre-monitor upgrade step to verify your caps before proceeding -- I
will take care of that under this ticket [1]. I believe the OpenStack
documentation has been updated [2], but let me know if you find other
places.
[1]
That indeed worked! Thanks a lot!
The remaining question from my side: did we do anything wrong in the
upgrade process and if not, should it be documented somewhere how to
setup the permissions correctly on upgrade?
Or should the documentation on the side of the cloud infrastructure
software be
Since you have already upgraded to Luminous, the fastest and probably
easiest way to fix this is to run "ceph auth caps client.libvirt mon
'profile rbd' osd 'profile rbd pool=one'" [1]. Luminous provides
simplified RBD caps via named profiles which ensure all the correct
permissions are enabled.
The only errors message I see is from dmesg when trying to accessing the
XFS filesystem (see attached image).
Let me know if you need any more logs - luckily I can spin up this VM in
a broken state as often as you want to :-)
Jason Dillaman writes:
> ... also, do have
Thanks a lot for the great ceph.conf pointer, Mykola!
I found something interesting:
2017-09-11 22:26:23.418796 7efd7d479700 10 client.1039597.objecter ms_dispatch
0x55b55ab8f950 osd_op_reply(4 rbd_header.df7343d1b58ba [call] v0'0 uv0 ondisk =
-8 ((8) Exec format error)) v8
2017-09-11
From a backporter's perspective, the appealing options are the ones
that reduce the number of stable releases in maintenance at any
particular time.
In the current practice, there are always at least two LTS releases, and
sometimes a non-LTS release as well, that are "live" and supposed to be
... also, do have any logs from the OS associated w/ this log file? I
am specifically looking for anything to indicate which sector was
considered corrupt.
On Mon, Sep 11, 2017 at 4:41 PM, Jason Dillaman wrote:
> Thanks -- I'll take a look to see if anything else stands out.
Thanks -- I'll take a look to see if anything else stands out. That
"Exec format error" isn't actually an issue -- but now that I know
about it, we can prevent it from happening in the future [1]
[1] http://tracker.ceph.com/issues/21360
On Mon, Sep 11, 2017 at 4:32 PM, Nico Schottelius
On Mon, Sep 11, 2017 at 8:27 PM, Mclean, Patrick
wrote:
>
> On 2017-09-08 06:06 PM, Gregory Farnum wrote:
> > On Fri, Sep 8, 2017 at 5:47 PM, Mclean, Patrick
> > wrote:
> >
> >> On a related note, we are very curious why the snapshot id is
> >>
On 2017-09-08 06:06 PM, Gregory Farnum wrote:
> On Fri, Sep 8, 2017 at 5:47 PM, Mclean, Patrick
> wrote:
>
>> On a related note, we are very curious why the snapshot id is
>> incremented when a snapshot is deleted, this creates lots
>> phantom entries in the deleted
Please excuse my brain-fart. We're using 24 disks on the servers in
question. Only after discussing this further with a colleague did we
realize this.
This brings us right to the minimum-spec which generally isn't a good idea.
Sincerely
-Dave
On 11/09/17 11:38 AM, bulk.sch...@ucalgary.ca
Hi,
how could this happen:
pgs: 197528/1524 objects degraded (12961.155%)
I did some heavy failover tests, but a value higher than 100% looks strange
(ceph version 12.2.0). Recovery is quite slow.
cluster:
health: HEALTH_WARN
3/1524 objects misplaced (0.197%)
Hi Everyone,
I wonder if someone out there has a similar problem to this?
I keep having issues with memory usage. I have 2 OSD servers wiith 48G
memory and 12 2TB OSDs. I seem to have significantly more memory than
the minimum spec, but these two machines with 2TB drives seem to OOM
kill
Greetings -
I have created several test buckets in radosgw, to test different
expiration durations:
$ s3cmd mb s3://test2d
I set a lifecycle for each of these buckets:
$ s3cmd setlifecycle lifecycle2d.xml s3://test2d --signature-v2
The files look like this:
I found a couple OSDs that were seeing medium errors and marked them out
of the cluster. Once all the PGs were moved off those OSDs all the
buffer overflows went away.
So there must be some kind of bug that's being triggered when an OSD is
misbehaving.
Bryan
From: ceph-users
Flow-control may well just mask the real problem. Did your throughput
improve? Also, does that mean flow-control is on for all ports on the
switch...? IIUC, then such "global pause" flow-control will mean
switchports with links to upstream network devices will also be paused if
the switch is
On 2017-09-11 09:31, Nico Schottelius wrote:
>
> Sarunas,
>
> may I ask when this happened?
I was following
http://docs.ceph.com/docs/master/release-notes/#upgrade-from-jewel-or-kraken
I can't tell which step in particular the issue with VMs was triggered by.
> And did you move OSDs or mons
Definitely would love to see some debug-level logs (debug rbd = 20 and
debug objecter = 20) for any VM that experiences this issue. The only
thing I can think of is something to do with sparse object handling
since (1) krbd doesn't perform sparse reads and (2) re-importing the
file would eliminate
Sarunas,
may I ask when this happened?
And did you move OSDs or mons after that export/import procecdure?
I really wonder, what is the reason for this behaviour and also if it is
likely to experience it again.
Best,
Nico
Sarunas Burdulis writes:
> On 2017-09-10
Hi,
flow control was active on the NIC but not on the switch.
Enabling flowcontrol for both direction solved the problem:
flowcontrol receive on
flowcontrol send on
PortSend FlowControl Receive FlowControl RxPause TxPause
adminoper admin
You could try setting it to run with SimpleMessenger instead of
AsyncMessenger -- the default changed across those releases.
I imagine the root of the problem though is that with BlueStore the OSD is
using a lot more memory than it used to and so we're overflowing the 32-bit
address space...which
Have you tried to list your ceph keys with "/usr/bin/ceph config-key ls" ?
2017-09-11 15:56 GMT+05:00 M Ranga Swami Reddy :
> ceph-disk --prepare -dmcrypt --> cmd, where this command store the
> keys for bmcrypt?
>
> default as per the docs - /etc/ceph/dmcrypt-keys -> but
ceph-disk --prepare -dmcrypt --> cmd, where this command store the
keys for bmcrypt?
default as per the docs - /etc/ceph/dmcrypt-keys -> but this directory is empty.
Thanks
Swami
On Sat, Sep 9, 2017 at 4:34 PM, Дробышевский, Владимир wrote:
> AFAIK in case of dm-crypt luks (as
On 08/09/17 11:44, Richard Hesketh wrote:
> Hi,
>
> Reading the ceph-users list I'm obviously seeing a lot of people talking
> about using bluestore now that Luminous has been released. I note that many
> users seem to be under the impression that they need separate block devices
> for the
Hello, Alexandre!
Do you have any testing methodology to share? I have a fresh test
luminous 12.2.0 cluster with 4 nodes with 1 x 1.92TB Samsung sm863 +
Infiniband each with unsupported setup (with co-located system\mon\osd
partition and bluestore partition on the same drive created with
Good morning Lionel,
it's great to hear that it's not only us being affected!
I am not sure what you refer to by "glance" images, but what we see is
that we can spawn a new VM based on an existing image and that one runs.
Can I invite you (and anyone else who has problems w/ Luminous upgrade)
Hi,
recently I upgraded a test cluster from 10.2.9 to 12.2.0. When that was
done, I converted all OSDs from filestore to bluestore. Today, ceph
reported a scrub error in the cephfs metadata pool:
ceph health detail
HEALTH_ERR 6 scrub errors; Possible data damage: 2 pgs inconsistent
I don't think it's Mellanox problem.
Output drops at switchports are also seen when using a Intel 10 GBit/s NIC.
On 08.09.2017 17:41, Alexandre DERUMIER wrote:
> Sorry, I dind't see that you use proxmox5.
>
> As I'm a proxmox contributor, I can tell you that I have error with kernel
> 4.10
Hi,
On 08.09.2017 16:25, Burkhard Linke wrote:
>>> Regarding the drops (and without any experience with neither 25GBit ethernet
>>> nor the Arista switches):
>>> Do you have corresponding input drops on the server's network ports?
>> No input drops, just output drop
> Output drops on the switch
Hi,
We also have the same issue with Openstack instances (QEMU/libvirt) after
upgrading from kraken to luminous, and just after starting osd migration from
btrfs to bluestore.
We were able to restart failed VMs by mounting all disks from a linux box with
rbd map, and run fsck on them.
QEMU
looks like http://tracker.ceph.com/issues/18314.
please try:
run "ceph fs set cephfs1 allow_dirfrags 1”
and
set mds config mds_bal_frag to 1, set mds_bal_split_size to 5000, and set
mds_bal_fragment_size_max 5.
> On 11 Sep 2017, at 15:46, donglifec...@gmail.com wrote:
>
> ZhengYan,
>
ZhengYan,
kernel client is 4.12.0.
[root@yj43959-ceph-dev ~]# uname -a
Linux yj43959-ceph-dev.novalocal 4.12.0-1.el7.elrepo.x86_64 #1 SMP Sun Jul 2
20:38:48 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux
Thanks a lot.
donglifec...@gmail.com
From: Yan, Zheng
Date: 2017-09-11 15:24
To:
> On 11 Sep 2017, at 14:07, donglifec...@gmail.com wrote:
>
> ZhengYan,
>
> I set "mds_bal_fragment_size_max = 10, mds_bal_frag = true", then I
> write 10 files named 512k.file$i, but there are still some file is
> missing. such as :
> [root@yj43959-ceph-dev cephfs]# find ./volumes/
Hi,
Is anyone running Ceph Luminous (12.2.0) on 32bit Linux? Have you seen
any problems?
My setup has been 1 MON and 7 OSDs (no MDS, RGW, etc), all running Jewel
(10.2.1), on 32bit, with no issues at all.
I've upgraded everything to latest version of Jewel (10.2.9) and still
no issues.
53 matches
Mail list logo