me in...
> >
> > Cheers, Dan
> >
> > On Fri, Oct 16, 2015 at 1:38 PM, Richard Bade <hitr...@gmail.com> wrote:
> > > Thanks for your quick response Dan, but no. All the ceph-mon.*.log
> files are
> > > empty.
> > > I did track this down i
Hi Everyone,
I upgraded our cluster to Hammer 0.94.3 a couple of days ago and today
we've had one monitor crash twice and another one once. We have 3 monitors
total and have been running Firefly 0.80.10 for quite some time without any
monitor issues.
When the monitor crashes it leaves a core file
ctober 2015 at 00:33, Dan van der Ster <d...@vanderster.com> wrote:
> Hi,
> Is there a backtrace in /var/log/ceph/ceph-mon.*.log ?
> Cheers, Dan
>
> On Fri, Oct 16, 2015 at 12:46 PM, Richard Bade <hitr...@gmail.com> wrote:
> > Hi Everyone,
> > I upgraded
Hi Everyone,
We have a Ceph pool that is entirely made up of Intel S3700/S3710
enterprise SSD's.
We are seeing some significant I/O delays on the disks causing a “SCSI Task
Abort” from the OS. This seems to be triggered by the drive receiving a
“Synchronize cache command”.
My current thinking
.
Regards,
Richard
On 5 September 2015 at 07:55, Richard Bade <hitr...@gmail.com> wrote:
> Hi Everyone,
>
> We have a Ceph pool that is entirely made up of Intel S3700/S3710
> enterprise SSD's.
>
> We are seeing some significant I/O delays on the disks causing a “SCSI
&
Hi Christian,
On 8 September 2015 at 14:02, Christian Balzer wrote:
>
> Indeed. But first a word about the setup where I'm seeing this.
> These are 2 mailbox server clusters (2 nodes each), replicating via DRBD
> over Infiniband (IPoIB at this time), LSI 3008 controller. One
conds to recover), not whatever insignificant delay caused by
> the SSDs.
>
> Christian
> On Tue, 8 Sep 2015 11:35:38 +1200 Richard Bade wrote:
>
> > Thanks guys for the pointers to this Intel thread:
> >
> > https://communities.intel.com/thread/77801
> >
> > It
t; not necessary with fast drives (such as S3700).
>
> Take a look in the mailing list archives, I elaborated on this quite a bit
> in the past, including my experience with Kingston drives + XFS + LSI (and
> the effect is present even on Intels, but because they are much faster it
> shouldn't cau
to update the firmware
on the remainder of the S3710 drives this week and also set nobarriers.
Regards,
Richard
On 8 September 2015 at 14:27, Richard Bade <hitr...@gmail.com> wrote:
> Hi Christian,
>
> On 8 September 2015 at 14:02, Christian Balzer <ch...@gol.com> wrote:
>>
Hi Everyone,
Can anyone tell me how the ceph pg x.x mark_unfound_lost revert|delete
command is meant to work?
Due to some not fully know strange circumstances I have 1 unfound
object in one of my pools.
I've read through
update we have not had any Monitor crashes. It's now been
over two months and the Mon's have been stable.
Thanks again,
Richard
On 17 October 2015 at 07:26, Richard Bade <hitr...@gmail.com> wrote:
> Ok, debugging increased
> ceph tell mon.[abc] injectargs --debug-mon 20
> cep
Hi Everyone,
I've got a strange one. After doing a reweight of some osd's the other
night our cluster is showing 1pg stuck unclean.
2017-01-25 09:48:41 : 1 pgs stuck unclean | recovery 140/71532872
objects degraded (0.000%) | recovery 2553/71532872 objects misplaced
(0.004%)
When I query the pg
of the osd's in one pool down to around 0.3. This seems to have
caused the crush map not to be able to find a suitable osd for the 2nd
copy.
Changing the reweight weights back up to near 1 has resolved the issue.
Regards,
Richard
On 25 January 2017 at 10:58, Richard Bade <hitr...@gmail.com>
ubb...@redhat.com> wrote:
> On Sat, Oct 21, 2017 at 1:59 AM, Richard Bade <hitr...@gmail.com> wrote:
>> Hi Lincoln,
>> Yes the object is 0-bytes on all OSD's. Has the same filesystem
>> date/time too. Before I removed the rbd image (migrated disk to
>> different po
the scrub was finished the
inconsistency went away.
Note, the object in question was empty (size of zero bytes) before I
started this process. I emptied the object by moving the rbd image to
another pool.
Rich
On 24 October 2017 at 14:34, Richard Bade <hitr...@gmail.com> wrote:
> What I'm
Hi Everyone,
We run some hosts with Proxmox 4.4 connected to our ceph cluster for
RBD storage. Occasionally we get a vm suddenly stop with no real
explanation. The last time this happened to one particular vm I turned
on some qemu logging via Proxmox Monitor tab for the vm and got this
dump this
Hi Everyone,
In our cluster running 0.94.10 we had a pg pop up as inconsistent
during scrub. Previously when this has happened running ceph pg repair
[pg_num] has resolved the problem. This time the repair runs but it
remains inconsistent.
~$ ceph health detail
HEALTH_ERR 1 pgs inconsistent; 2
ent metadata. Ultimately it was resolved by doing a
> "rados get" and then a "rados put" on the object. *However* that was a last
> ditch effort after I couldn't get any other repair option to work, and I have
> no idea if that will cause any issues down the road :)
>
>
Hi Everyone,
I've got a couple of pools that I don't believe are being used but
have a reasonably large number of pg's (approx 50% of our total pg's).
I'd like to delete them but as they were pre-existing when I inherited
the cluster, I wanted to make sure they aren't needed for anything
first.
Thanks John, I removed these pools on Friday and as you suspected
there was no impact.
Regards,
Rich
On 8 January 2018 at 23:15, John Spray <jsp...@redhat.com> wrote:
> On Mon, Jan 8, 2018 at 2:55 AM, Richard Bade <hitr...@gmail.com> wrote:
>> Hi Everyone,
>> I've g
I'm using compression on a cephfs-data pool in luminous. I didn't do
anything special
$ sudo ceph osd pool get cephfs-data all | grep ^compression
compression_mode: aggressive
compression_algorithm: zlib
You can check how much compression you're getting on the osd's
$ for osd in `seq 0 11`; do
expect for my k4, m2 pool settings.
On Fri, 29 Jun 2018 at 17:08, Richard Bade wrote:
>
> I'm using compression on a cephfs-data pool in luminous. I didn't do
> anything special
>
> $ sudo ceph osd pool get cephfs-data all | grep ^compression
> compression_mode: aggressive
>
Hi Andrei,
These are good questions. We have another cluster with filestore and
bcache but for this particular one I was interested in testing out
bluestore. So I have used bluestore both with and without bcache.
For my synthetic load on the vm's I'm using this fio command:
fio --randrepeat=1
Hi Everyone,
There's been a few threads go past around this but I haven't seen any
that pointed me in the right direction.
We've recently set up a new luminous (12.2.5) cluster with 5 hosts
each with 12 4TB Seagate Constellation ES spinning disks for osd's. We
also have 2x 400GB Intel DC P3700's
Hi Everyone,
Recently we moved a bunch of our servers from one rack to another. In
the late stages of this we hit a point when some requests were blocked
due to one pg being in "peered" state.
This was unexpected to us, but on discussion with Wido we understand
why this happened. However it's
> How is that possible? I dont know how much more proof I need to present that
> there's a bug.
I also think there's a bug in the balancer plugin as it seems to have
stopped for me also. I'm on Luminous though, so not sure if that will
be the same bug.
The balancer used to work flawlessly,
26 matches
Mail list logo