IOs should then go only to the other 3 servers.
> > >
> > > JC
> > >
> > > On Oct 19, 2017, at 13:49, Russell Glaue <rgl...@cait.org> wrote:
> > >
> > > No, I have not ruled out the disk controller and backplane making the
> > > disks sl
:03 PM Jorge Pinilla López <jorp...@unizar.es>
wrote:
> Yes, I am trying it over luminous.
>
> Well the bug has been going for 8 month and it hasn't been merged yet. Idk
> if that is whats preventing me to make it work. Tomorrow I will try to
> prove it again.
>
> El 19
How are you uploading a file? RGW, librados, CephFS, or RBD? There are
multiple reasons that the space might not be updating or cleaning itself
up. The more information you can give us about how you're testing, the
more we can help you.
On Thu, Oct 19, 2017 at 5:00 PM nigel davies
Pinilla López <jorp...@unizar.es>
wrote:
> Well I was trying it some days ago and it didn't work for me.
>
> maybe because of this:
>
> http://tracker.ceph.com/issues/18749
>
> https://github.com/ceph/ceph/pull/17619
>
> I don't know if now it's actually working
>
Have you ruled out the disk controller and backplane in the server running
slower?
On Thu, Oct 19, 2017 at 4:42 PM Russell Glaue wrote:
> I ran the test on the Ceph pool, and ran atop on all 4 storage servers, as
> suggested.
>
> Out of the 4 servers:
> 3 of them performed with
I'm speaking to the method in general and don't know the specifics of
bluestore. Recovering from a failed journal in this way is only a good
idea if you were able to flush the journal before making a new one. If the
journal failed during operation and you couldn't cleanly flush the journal,
then
I don't see that same_interval_since being cleared by split.
PG::split_into() copies the history from the parent PG to child. The
only code in Luminous that I see that clears it is in
ceph_objectstore_tool.cc
David
On 10/16/17 3:59 PM, Gregory Farnum wrote:
On Mon, Oct 16, 2017 at 3:49
On Sat, Oct 14, 2017 at 9:33 AM, David Turner <drakonst...@gmail.com> wrote:
> First, there is no need to deep scrub your PGs every 2 days.
They aren’t being deep scrubbed every two days, nor is there any
attempt (or desire) to do so. That would be require 8+ scrubs running
at once.
This would not be the
first time I've seen a bonded network cause issues at least this bad on a
cluster. Do you have cluster_network and public_network set? What does your
network topology look like?
On Fri, Oct 13, 2017, 11:02 PM J David <j.david.li...@gmail.com> wrote:
> Thanks all for input on t
What is the output of your `ceph status`?
On Fri, Oct 13, 2017, 10:09 PM dE <de.tec...@gmail.com> wrote:
> On 10/14/2017 12:53 AM, David Turner wrote:
>
> What does your environment look like? Someone recently on the mailing
> list had PGs stuck creating because of
Thanks all for input on this.
It’s taken a couple of weeks, but based on the feedback from the list,
we’ve got our version of a scrub-one-at-a-time cron script running and
confirmed that it’s working properly.
Unfortunately, this hasn’t really solved the real problem. Even with
just one scrub
What does your environment look like? Someone recently on the mailing list
had PGs stuck creating because of a networking issue.
On Fri, Oct 13, 2017 at 2:03 PM Ronny Aasen
wrote:
> strange that no osd is acting for your pg's
> can you show the output from
> ceph osd
I improved the code to compute degraded objects during
backfill/recovery. During my testing it wouldn't result in percentage
above 100%. I'll have to look at the code and verify that some
subsequent changes didn't break things.
David
On 10/13/17 9:55 AM, Florian Haas wrote:
Okay
I don't have access to a luminous cluster at the moment, but I would try
looking in the pg dump first. You could also try the crush map.
Worst case scenario you could set up a bunch of test clients and attempt to
connect them to your cluster. You should be able to find which is the
oldest
John covered everything better than I was going to, so I'll just remove
that from my reply.
If you aren't using DC SSDs and this is prod, then I wouldn't recommend
moving towards this model. However you are correct on how to move the pool
to the SSDs from the HDDs and based on how simple and
Here is your friend.
http://docs.ceph.com/docs/luminous/rados/operations/erasure-code/#erasure-coding-with-overwrites
On Thu, Oct 12, 2017 at 2:09 PM Jason Dillaman wrote:
> The image metadata still needs to live in a replicated data pool --
> only the data blocks can be
__core_scsi3_write_aptpl_to_file() seems to be the only function that
uses the path. Otherwise I would have thought the same, that the
propagating the file to backup gateways prior to failover would be
sufficient.
Cheers, David
___
ceph-users mailing l
target. I am not sure
> of the current state of that work, but it would benefit all LIO targets
> when complete.
Zhu Lingshan (cc'ed) worked on a prototype for tcmu PR support. IIUC,
whether DLM or the underlying Ceph cluster gets used for PR state
storage is still under consideration.
Cheers, Da
The full ratio is based on the max bytes. if you say that the cache should
have a max bytes of 1TB and that the full ratio is .8, then it will aim to
keep it at 800GB. Without a max bytes value set, the ratios are a
percentage of unlimited... aka no limit themselves. The full_ratio should
be
Christian is correct that min_size does not affect how many need to ACK the
write, it is responsible for how many copies need to be available for the
PG to be accessible. This is where SSD journals for filestore and SSD
DB/WAL partitions come into play. The write is considered ACK'd as soon as
I've managed RBD cluster that had all of the RBDs configured to 1M objects
and filled up the cluster to 75% full with 4TB drives. Other than the
collection splitting (subfolder splitting as I've called it before) we
didn't have any problems with object counts.
On Wed, Oct 11, 2017 at 9:47 AM
the device, the drive letter
> is "sdx" according to the link above what would be the right command to
> re-use the two NVME partitions for block db and wal ?
>
> I presume that everything else is the same.
> best.
>
>
> On Sat, Sep 30, 2017 at 9:00 PM, D
t;
> In case you have one mon per DC all operations in the isolated DC will be
> frozen, so I believe you would not lose data.
>
>
>>
>>
>>
>> On Sat, Oct 7, 2017 at 3:36 PM Peter Linder <peter.lin...@fiberdirekt.se>
>> wrote:
>>
>>> On 1
Just to make sure you understand that the reads will happen on the primary
osd for the PG and not the nearest osd, meaning that reads will go between
the datacenters. Also that each write will not ack until all 3 writes
happen adding the latency to the writes and reads both.
On Sat, Oct 7, 2017,
y of all of the data in the
pool, but that's on the much cheaper HDD storage and can probably be
considered acceptable losses for the sake of having the primary OSD on NVMe
drives.
On Sat, Oct 7, 2017 at 3:36 PM Peter Linder <peter.lin...@fiberdirekt.se>
wrote:
> On 10/7/2017 8:08 PM, David Turner
, preventing the
> flushing of objects to the underlying data pool. Once I killed that
> process, objects started to flush to the data pool automatically (with
> target_max_bytes & target_max_objects set); and I can force the flushing
> with 'rados -p cephfs_cache cache-flush-
ng data pool:
> # rados -p cephfs_data ls
>
> Any advice?
>
> On Fri, Oct 6, 2017 at 9:45 AM, David Turner <drakonst...@gmail.com>
> wrote:
>
>> Notice in the URL for the documentation the use of "luminous". When you
>> looked a few weeks ago, you mi
;> > >>
>> > >> On Fri, Oct 6, 2017 at 4:49 PM, Shawfeng Dong <s...@ucsc.edu> wrote:
>> > >> > Dear all,
>> > >> >
>> > >> > Thanks a lot for the very insightful comments/suggestions!
>> > >>
On Fri, Oct 6, 2017, 1:05 AM Christian Balzer <ch...@gol.com> wrote:
>
> Hello,
>
> On Fri, 06 Oct 2017 03:30:41 + David Turner wrote:
>
> > You're missing most all of the important bits. What the osds in your
> > cluster look like, your tree, and your cac
You're missing most all of the important bits. What the osds in your
cluster look like, your tree, and your cache pool settings.
ceph df
ceph osd df
ceph osd tree
ceph osd pool get cephfs_cache all
You have your writeback cache on 3 nvme drives. It looks like you have
1.6TB available between
Just to make sure you're not confusing redundancy with backups. Having
your data in another site does not back up your data, but makes it more
redundant. For instance if an object/file is accidentally deleted from RGW
and you're syncing those files to AWS, Google buckets, or a second RGW
cluster
My guess is a networking problem. Do you have vlans, cluster network vs
public network in the ceph.conf, etc configured? Can you ping between all
of your storage nodes on all of their IPs?
All of your OSDs communicate with the mons on the public network, but they
communicate with each other for
p
>
> Andrei
> ------
>
> *From: *"David Turner" <drakonst...@gmail.com>
> *To: *"Jack" <c...@jack.fr.eu.org>, "ceph-users" <
> ceph-users@lists.ceph.com>
> *Sent: *Monday, 2 October, 2017 22:28:33
>
If you take Ceph out of your search string you should find loads of
tutorials on setting up the popular collectd/influxdb/grafana stack. Once
you've got that in place, the Ceph bit should be fairly easy. There's Ceph
collectd plugins out there or you could write your own.
On Mon, Oct 2, 2017 at
ion that
because if it is that easy to enable/disable, then testing it should be
simple and easy to compare.
On Sat, Sep 30, 2017, 8:10 PM Chad William Seys <cws...@physics.wisc.edu>
wrote:
> Hi David,
>Thanks for the clarification. Reminded me of some details I forgot
> to mentio
I'm pretty sure that the process is the same as with filestore. The cluster
doesn't really know if an osd is filestore or bluestore... It's just an osd
running a daemon.
If there are any differences, they would be in the release notes for
Luminous as changes from Jewel.
On Sat, Sep 30, 2017,
Proofread failure. "modified and read during* the first X hours, and then
remains in cold storage for the remainder of its life with rare* reads"
On Sat, Sep 30, 2017, 1:32 PM David Turner <drakonst...@gmail.com> wrote:
> I can only think of 1 type of cache tier usage th
I can only think of 1 type of cache tier usage that is faster if you are
using the cache tier on the same root of osds as the EC pool. That is cold
storage where the file is written initially, modified and read door the
first X hours, and then remains in cold storage for the remainder of its
life
His dilemma sounded like he has access to the cluster, but not any of the
clients where the RBDs are used or even the hypervisors in charge of those.
On Fri, Sep 29, 2017 at 12:03 PM Maged Mokhtar wrote:
> On 2017-09-29 17:13, Matthew Stroud wrote:
>
> Is there a way I
There is no tool on the Ceph side to see which RBDs are doing what.
Generally you need to monitor the mount points for the RBDs to track that
down with iostat or something.
That said, there are some tricky things you could probably do to track down
the RBD that is doing a bunch of stuff (as long
The reason it is recommended not to raid your disks is to give them all to
Ceph. When a disk fails, Ceph can generally recover faster than the raid
can. The biggest problem with raid is that you need to replace the disk
and rebuild the raid asap. When a disk fails in Ceph, the cluster just
I'm going to assume you're dealing with your scrub errors and have a game
plan for those as you didn't mention them in your question at all.
One thing I'm always leary of when I see blocked requests happening is that
the PGs might be splitting subfolders. It is pretty much a guarantee if
you're
If you're scheduling them appropriately so that no deep scrubs will happen
on their own, then you can just check the cluster status if any PGs are
deep scrubbing at all. If you're only scheduling them for specific pools,
then you can confirm which PGs are being deep scrubbed in a specific pool
ve written scripts to do
> it by turning off deep scrubs, forcing individual PGs to deep scrub at
> intervals, and then enabling deep scrubs again.
> -Greg
>
>
> On Wed, Sep 27, 2017 at 6:34 AM David Turner <drakonst...@gmail.com>
> wrote:
>
>> This isn't an answer, bu
There are new PG states that cause health_err. In this case it is
undersized that is causing this state.
While I decided to upgrade my tunables before upgrading the rest of my
cluster, it does not seem to be a requirement. However I would recommend
upgrading them sooner than later. It will cause
for scrub should give you
some ideas of things to try.
On Tue, Sep 26, 2017, 2:04 PM J David <j.david.li...@gmail.com> wrote:
> With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
> the default, at almost all times, there are 2-3 deep scrubs running.
>
> 3 simult
I've reinstalled a host many times over the years. We used dmcrypt so I
made sure to back up the keys for that. Other than that it is seamless as
long as your installation process only affects the root disk. If it
affected any osd or journal disk, then you would need to mark those osds
out and
When you lose 2 osds you have 30 osds accepting the degraded data and
performing the backfilling. When the 2 osds are added back in you only have
2 osds receiving the majority of the data from the backfilling. 2 osds
have a lot less available iops and spindle speed than the other 30 did when
they
You can also use ceph-fuse instead of the kernel driver to mount cephfs. It
supports all of the luminous features.
On Wed, Sep 27, 2017, 8:46 AM Yoann Moulin wrote:
> Hello,
>
> > Try to work with the tunables:
> >
> > $ *ceph osd crush show-tunables*
> > {
> >
ot;osd": 2,
"primary": false } ], "selected_object_info": "3:ce3f1d6a:::
mytestobject:head(47'54 osd.0.0:53 dirty|omap|data_digest|omap_digest s
143456 uv 3 dd 2ddbf8f5 od f5fba2c6 alloc_hint [0 0 0])",
"union_shard_errors": [ "data_digest_mismatch_oi&
With “osd max scrubs” set to 1 in ceph.conf, which I believe is also
the default, at almost all times, there are 2-3 deep scrubs running.
3 simultaneous deep scrubs is enough to cause a constant stream of:
mon.ceph1 [WRN] Health check update: 69 slow requests are blocked > 32
sec (REQUEST_SLOW)
You can update the server with the mapped rbd and shouldn't see as much as
a blip on your VMs.
On Tue, Sep 26, 2017, 3:32 AM Götz Reinicke <goetz.reini...@filmakademie.de>
wrote:
> Hi Thanks David & David,
>
> we don’t use the fuse code. And may be I was a bit unclear, but you
db/wal partitions are per OSD. DB partitions need to be made as big as you
need them. If they run out of space, they will fall back to the block
device. If the DB and block are on the same device, then there's no reason
to partition them and figure out the best size. If they are on separate
version.
In general RBDs are not affected by upgrades as long as you don't take down
too much of the cluster at once and are properly doing a rolling upgrade.
On Mon, Sep 25, 2017 at 8:07 AM David <dclistsli...@gmail.com> wrote:
> Hi Götz
>
> If you did a rolling upgrade, RBD clients
Hi Götz
If you did a rolling upgrade, RBD clients shouldn't have experienced
interrupted IO and therefor IO to NFS exports shouldn't have been affected.
However, in the past when using kernel NFS over kernel RBD, I did have some
lockups when OSDs went down in the cluster so that's something to
-514.2.2.el7.x86_64
Thanks,
David
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_redirected e8017)
> currently waiting for missing object
>
>
>
> Thanks,
>
> Matthew Stroud
>
>
>
> *From: *David Turner <drakonst...@gmail.com>
> *Date: *Friday, September 22, 2017 at 9:57 AM
> *To: *Matthew Stroud <mattstr...@overstock.com>, "
>
The request remains blocked if you issue `ceph osd down 2`? Marking the
offending OSD as down usually clears up blocked requests for me... at least
it resets the timer on it and the requests start blocking again if the OSD
is starting to fail.
On Fri, Sep 22, 2017 at 11:51 AM Matthew Stroud
s is that in the logs you need to find out the first
>> slow request and identify where it's from, for example, is it deep-scrub,
>> or some client accessing corrupted objects, disk errors etc.
>>
>> On 20/09/17 8:13 AM, David Turner wrote:
>>
>> Just start
You can always add the telegraf user to the ceph group. That change will
persist on reboots and allow the user running the commands to read any
folder/file that is owned by the group ceph. I do this for Zabbix and
Nagios now that the /var/lib/ceph folder is not public readable.
On Wed, Sep 20,
his is (quite) working as described with *ceph osd out * and *ceph
> osd in *, but I am wondering
> if this produces a realistic behavior.
>
>
> Am 20.09.2017 um 18:06 schrieb David Turner <drakonst...@gmail.com>:
>
> When you posted your ceph status, you only had 56 PGs deg
up just long enough before it crashed to cause problems.
On Wed, Sep 20, 2017 at 1:12 PM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi David,
>
> Thank you for your support. What can be the cause of
> active+clean+inconsistent still growing up? Bad disk
Correction, if the OSD had been marked down and been marked out, some of
its PGs would be in a backfill state while others would be in a recovery
state depending on how long the OSD was marked down and how much
backfilling had completed in the cluster.
On Wed, Sep 20, 2017 at 12:06 PM David
gt; active+clean; 1975 GB data, 3011 GB used, 7063 GB / 10075 GB avail;
> 30549/1376215 objects degraded (2.220%); 12201/1376215 objects misplaced
> (0.887%); 21868 kB/s, 3 objects/s recovering
>
> Is this an acceptable recovery rate? Unfortunately I have no point of
> reference. My in
o see your
currently running settings to make sure that they took effect.
http://docs.ceph.com/docs/kraken/rados/operations/monitoring/#using-the-admin-socket
On Wed, Sep 20, 2017 at 11:42 AM David Turner <drakonst...@gmail.com> wrote:
> You are currently on Kraken, but if you upgrade to L
> setting *osd_max_backfills *as high as possible). Are there any important
> options that I have to know?
>
> What is the best practice to deal with the issue recovery speed vs.
> read/write speed during a recovery situation? Do you
> have any suggestions/references/hints how to de
,17,9,0,3,16,24,2]
> pg 3.5b is active+recovery_wait+degraded, acting
> [4,15,14,30,28,1,12,10,2,29,24,18]
> pg 3.52 is active+recovery_wait+degraded, acting
> [17,24,20,23,4,14,18,27,8,22,9,31]
> pg 3.51 is active+recovery_wait+degraded, acting
> [13,31,11,22,25,30,1,3,27,23,21,17
Can you please provide the output of `ceph status`, `ceph osd tree`, and
`ceph health detail`? Thank you.
On Tue, Sep 19, 2017 at 2:59 PM Jonas Jaszkowic <
jonasjaszkowic.w...@gmail.com> wrote:
> Hi all,
>
> I have setup a Ceph cluster consisting of one monitor, 32 OSD hosts (1 OSD
> of size
Just starting 3 nights ago we started seeing OSDs randomly going down in
our cluster (Jewel 10.2.7). At first I saw that each OSD that was recently
marked down in the cluster (`ceph osd dump | grep -E '^osd\.[0-9]+\s' |
sort -nrk11` sorted list of OSDs by which OSDs have been marked down in the
gui...@aguilardelgado.com> wrote:
> Hi David,
>
> What I want is to add the OSD back with its data yes. But avoiding any
> troubles that can happen from the time it was out.
>
> Is it possible? I suppose that some pg has been updated after. Will ceph
> manage it gracefully?
>
> C
Are you asking to add the osd back with its data or add it back in as a
fresh osd. What is your `ceph status`?
On Tue, Sep 19, 2017, 5:23 AM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi David,
>
> Thank you for the great explanation of the weights, I th
t refered to a /sys/class,
> which I don’t have) echo 1 > /sys/devices/rbd/21/refresh
>
> (I am trying to online increase the size via kvm, virtio disk in win
> 2016)
>
>
> -Original Message-
> From: David Turner [mailto:drakonst...@gmail.com]
> Sent: maandag 1
I've never needed to do anything other than extend the partition and/or
filesystem when I increased the size of an RBD. Particularly if I didn't
partition the RBD I only needed to extend the filesystem.
Which method are you mapping/mounting the RBD? Is it through a Hypervisor
or just mapped to
that it will for those folks)?
On Fri, Sep 15, 2017 at 6:49 PM Gregory Farnum <gfar...@redhat.com> wrote:
> On Fri, Sep 15, 2017 at 3:34 PM David Turner <drakonst...@gmail.com>
> wrote:
>
>> I don't understand a single use case where I want updating my packages
>&g
at 6:06 PM Vasu Kulkarni <vakul...@redhat.com> wrote:
> On Fri, Sep 15, 2017 at 2:10 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > I'm glad that worked for you to finish the upgrade.
> >
> > He has multiple MONs, but all of them are on nodes with
the packages is causing a restart of the Ceph
daemons, it is most definitely a bug and needs to be fixed.
On Fri, Sep 15, 2017 at 4:48 PM David <dclistsli...@gmail.com> wrote:
> Happy to report I got everything up to Luminous, used your tip to keep the
> OSDs running, David, thanks again for t
Happy to report I got everything up to Luminous, used your tip to keep the
OSDs running, David, thanks again for that.
I'd say this is a potential gotcha for people collocating MONs. It appears
that if you're running selinux, even in permissive mode, upgrading the
ceph-selinux packages forces
I have this issue with my NVMe OSDs, but not my HDD OSDs. I have 15 HDD's
and 2 NVMe's in each host. We put most of the journals on one of the
NVMe's and a few on the second, but added a small OSD partition to the
second NVMe for RGW metadata pools.
When restarting a server manually for
Hi David
I like your thinking! Thanks for the suggestion. I've got a maintenance
window later to finish the update so will give it a try.
On Thu, Sep 14, 2017 at 6:24 PM, David Turner <drakonst...@gmail.com> wrote:
> This isn't a great solution, but something you could try. If you
3 0.64 340
> 6 0.90919 1.0 931G 164G 766G 17.70 0.67 210
> TOTAL 4179G G 3067G 26.60
> MIN/MAX VAR: 0.64/2.32 STDDEV: 16.99
>
> As I said I still have OSD1 intact so I can do whatever you need except
> readding to the cluster. Since I don't know wh
The warning you are seeing is because those settings are out of order and
it's showing you which ones are greater than the ones they should be.
backfillfull_ratio is supposed to be higher than nearfull_ratio and
osd_failsafe_full_ratio is supposed to be higher than full_ratio.
nearfull_ratio is
and
paste the running command (viewable in ps) to know exactly what to run in
the screens to start the daemons like this.
On Wed, Sep 13, 2017 at 6:53 PM David <dclistsli...@gmail.com> wrote:
> Hi All
>
> I did a Jewel -> Luminous upgrade on my dev cluster and it went very
> smoothl
Did you configure your crush map to have that hierarchy of region,
datacenter, room, row, rack, and chassis? If you're using the default
crush map, then it has no idea about any of those places/locations. I
don't know what the crush map would look like after using that syntax if
the crush map
What do you mean by "updated crush map to 1"? Can you please provide a
copy of your crush map and `ceph osd df`?
On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado <
gagui...@aguilardelgado.com> wrote:
> Hi,
>
> I'recently updated crush map to 1 and did all relocation of the pgs. At
> the
Hi All
I did a Jewel -> Luminous upgrade on my dev cluster and it went very
smoothly.
I've attempted to upgrade on a small production cluster but I've hit a
snag.
After installing the ceph 12.2.0 packages with "yum install ceph" on the
first node and accepting all the dependencies, I found that
t them in the archive like debian-dumpling and debian-firefly?
> 13 sep. 2017 kl. 03:09 skrev David <da...@visions.se>:
>
> Hi!
>
> Noticed tonight during maintenance that the hammer repo for debian wheezy
> only has 2 packages listed in the Packages file.
> Thought pe
tps://download.ceph.com/debian-hammer/pool/main/c/ceph/>
Is it a known issue or rather a "feature" =D
Kind Regards,
David Majchrzak___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
et-omaphdr
obj_header
$ for i in $(ceph-objectstore-tool --data-path ... --pgid 5.3d40
.dir.default.64449186.344176 list-omap)
do
echo -n "${i}: "
ceph-objectstore-tool --data-path ... .dir.default.292886573.13181.12
get-omap $i
done
key1: val1
key2: val2
key3: val3
David
On 9/8/17 12
fault.292886573.13181.12 remove"
.dir.default.64449186.344176 has selected_object_info with "od 337cf025"
so shards have "omap_digest_mismatch_oi" except for osd 990.
The pg repair code will use osd.990 to fix the other 2 copies without
further handling.
David
On 9/8/17 11
e? Can I use passphase as empty?
>
> On Wed, Sep 6, 2017 at 11:23 PM, M Ranga Swami Reddy
> <swamire...@gmail.com> wrote:
> > Thank you. Iam able to replace the dmcrypt journal successfully.
> >
> > On Sep 5, 2017 18:14, "David Turner" <drakonst...@gma
I sent the output of all of the files including the logs to you. Thank you
for your help so far.
On Thu, Sep 7, 2017 at 4:48 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com>
wrote:
> On Thu, Sep 7, 2017 at 11:37 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > I
I'm pretty sure I'm using the cluster admin user/keyring. Is there any
output that would be helpful? Period, zonegroup get, etc?
On Thu, Sep 7, 2017 at 4:27 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com>
wrote:
> On Thu, Sep 7, 2017 at 11:02 PM, David Turner <drakonst...@gmail.
in `mdlog list`.
On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com>
wrote:
> On Thu, Sep 7, 2017 at 10:04 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > One realm is called public with a zonegroup called public-zg with a zone
> for
>
com>
wrote:
> On Thu, Sep 7, 2017 at 7:44 PM, David Turner <drakonst...@gmail.com>
> wrote:
> > Ok, I've been testing, investigating, researching, etc for the last week
> and
> > I don't have any problems with data syncing. The clients on one side are
> > creatin
To be fair, other times I have to go in and tweak configuration settings
and timings to resolve chronic blocked requests.
On Thu, Sep 7, 2017 at 1:32 PM David Turner <drakonst...@gmail.com> wrote:
> `ceph health detail` will give a little more information into the blocked
&
`ceph health detail` will give a little more information into the blocked
requests. Specifically which OSDs are the requests blocked on and how long
have they actually been blocked (as opposed to '> 32 sec'). I usually find
a pattern after watching that for a time and narrow things down to an
On Filestore you would flush the journal and then after mapping the new
journal device use the command to create the journal. I'm sure there's
something similar for bluestore, but I don't have any experience with it
yet. Is there a new command similar to flush and create for the WAL and DB?
On
syncing in the other realm? I have 2 realms being sync using
multi-site and it's only 1 of them that isn't getting the metadata across.
As far as I can tell it is configured identically.
On Thu, Aug 31, 2017 at 12:46 PM David Turner <drakonst...@gmail.com> wrote:
> All of the messages
Did the journal drive fail during operation? Or was it taken out during
pre-failure. If it fully failed, then most likely you can't guarantee the
consistency of the underlying osds. In this case, you just put the affected
osds and add them back in as new osds.
In the case of having good data on
p'd OSD's old ip
> address?
>
> Jake
>
>
> On Wed, Aug 30, 2017 at 3:55 PM Jeremy Hanmer <jeremy.han...@dreamhost.com>
> wrote:
>
>> This is simply not true. We run quite a few ceph clusters with
>> rack-level layer2 domains (thus routing between racks) and ever
701 - 800 of 1451 matches
Mail list logo