a variable is a different type or a missing delimiter. womp. I am
definitely out of my depth but now is a great time to learn! Can anyone
shed some more light as to what may be wrong?
On Fri, May 4, 2018 at 7:49 PM, Yan, Zheng <uker...@gmail.com> wrote:
> On Wed, May 2, 2018 at 7:19 AM, Se
> kh10-8 (200MB) mds log -- https://griffin-objstore.op
> ensciencedatacloud.org/logs/ceph-mds.kh10-8.log
> kh09-8 (4.1GB) mds log -- https://griffin-objstore.op
> ensciencedatacloud.org/logs/ceph-mds.kh09-8.log
>
> On Tue, May 1, 2018 at 12:09 AM, Patrick Donnelly <pdonn..
lly <pdonn...@redhat.com>
wrote:
> Hello Sean,
>
> On Mon, Apr 30, 2018 at 2:32 PM, Sean Sullivan <lookcr...@gmail.com>
> wrote:
> > I was creating a new user and mount point. On another hardware node I
> > mounted CephFS as admin to mount as root. I created
018 at 7:24 PM, Sean Sullivan <lookcr...@gmail.com> wrote:
> So I think I can reliably reproduce this crash from a ceph client.
>
> ```
> root@kh08-8:~# ceph -s
> cluster:
> id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
> health: HEALTH_OK
>
> services:
>
can't seem to get them to start again.
On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan <lookcr...@gmail.com> wrote:
> I had 2 MDS servers (one active one standby) and both were down. I took a
> dumb chance and marked the active as down (it said it was up but laggy).
> Then started th
at 4:32 PM, Sean Sullivan <lookcr...@gmail.com> wrote:
> I was creating a new user and mount point. On another hardware node I
> mounted CephFS as admin to mount as root. I created /aufstest and then
> unmounted. From there it seems that both of my mds nodes crashed for some
> reason
I was creating a new user and mount point. On another hardware node I
mounted CephFS as admin to mount as root. I created /aufstest and then
unmounted. From there it seems that both of my mds nodes crashed for some
reason and I can't start them any more.
https://pastebin.com/1ZgkL9fa -- my mds
On freshly installed ubuntu 16.04 servers with the HWE kernel selected
(4.10). I can not use ceph-deploy or ceph-disk to provision osd.
whenever I try I get the following::
ceph-disk -v prepare --dmcrypt --dmcrypt-key-dir /etc/ceph/dmcrypt-keys
--bluestore --cluster ceph --fs-type xfs --
I am trying to stand up ceph (luminous) on 3 72 disk supermicro servers
running ubuntu 16.04 with HWE enabled (for a 4.10 kernel for cephfs). I am
not sure how this is possible but even though I am running the following
line to wipe all disks of their partitions, once I run ceph-disk to
partition
I have tried using ceph-disk directly and i'm running into all sorts of
trouble but I'm trying my best. Currently I am using the following cobbled
script which seems to be working:
https://github.com/seapasulli/CephScripts/blob/master/provision_storage.sh
I'm at 11 right now. I hope this works.
I am trying to install Ceph luminous (ceph version 12.2.1) on 4 ubuntu
16.04 servers each with 74 disks, 60 of which are HGST 7200rpm sas drives::
HGST HUS724040AL sdbv sas
root@kg15-2:~# lsblk --output MODEL,KNAME,TRAN | grep HGST | wc -l
60
I am trying to deploy them all with ::
a line like
I have a hammer cluster that died a bit ago (hammer 94.9) consisting of 3
monitors and 630 osds spread across 21 storage hosts. The clusters monitors
all died due to leveldb corruption and the cluster was shut down. I was
finally given word that I could try to revive the cluster this week!
--
I have tried copying my monitor and admin keyring into the admin.keyring
used to try to rebuild and it still fails. I am not sure whether this is
due to my packages or if something else is wrong. Is there a way to test or
see what may be happening?
On Sat, Aug 13, 2016 a
I am sorry for posting this if this has been addressed already. I am not
sure on how to search through old ceph-users mailing list posts. I used to
use gmane.org but that seems to be down.
My setup::
I have a moderate ceph cluster (ceph hammer 94.9
- fe6d859066244b97b24f09d46552afc2071e6f90 ).
give it another shot in a test instance
and see how it goes.
Thanks for your help as always Mr. Balzer.
On Aug 28, 2016 8:59 PM, "Christian Balzer" <ch...@gol.com> wrote:
>
> Hello,
>
> On Sun, 28 Aug 2016 14:34:25 -0500 Sean Sullivan wrote:
>
> > I was curi
I was curious if anyone has filled ceph storage beyond 75%. Admitedly we
lost a single host due to power failure and are down 1 host until the
replacement parts arrive but outside of that I am seeing disparity between
the most and least full osd::
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR
We have a hammer cluster that experienced a similar power failure and ended
up corrupting our monitors leveldb stores. I am still trying to repair ours
but I can give you a few tips that seem to help.
1.) I would copy the database off to somewhere safe right away. Just
opening it seems to change
--- end dump of recent events ---
Aborted (core dumped)
---
---
I feel like I am so close but so far. Can anyone give me a nudge as to what
I can do next? it looks like it is bombing out on trying to get an updated
paxos.
On F
?
All should have the same keys/values although constructed differently
right? I can't blindly copy /var/lib/ceph/mon/ceph-$(hostname)/store.db/
from one host to another right? But can I copy the keys/values from one to
another?
On Fri, Aug 12, 2016 at 12:45 PM, Sean Sullivan <seapasu...@uchicago.
Op 11 augustus 2016 om 15:17 schreef Sean Sullivan <
> seapasu...@uchicago.edu>:
> >
> >
> > Hello Wido,
> >
> > Thanks for the advice. While the data center has a/b circuits and
> > redundant power, etc if a ground fault happens it travels outside and
> &
of the previous osd maps are there.
I just don't understand what key/values I need inside.
On Aug 11, 2016 1:33 AM, "Wido den Hollander" <w...@42on.com> wrote:
>
> > Op 11 augustus 2016 om 0:10 schreef Sean Sullivan <
> seapasu...@uchicago.edu>:
> >
> >
>
I think it just got worse::
all three monitors on my other cluster say that ceph-mon can't open
/var/lib/ceph/mon/$(hostname). Is there any way to recover if you lose all
3 monitors? I saw a post by Sage saying that the data can be recovered as
all of the data is held on other servers. Is this
So our datacenter lost power and 2/3 of our monitors died with FS
corruption. I tried fixing it but it looks like the store.db didn't make
it.
I copied the working journal via
1.
sudo mv /var/lib/ceph/mon/ceph-$(hostname){,.BAK}
2.
sudo ceph-mon -i {mon-id} --mkfs --monmap
So we recently had a power outage and I seem to have lost 2 of 3 of my
monitors. I have since copied /var/lib/ceph/mon/ceph-$(hostname){,.BAK} and
then created a new cephfs and finally generated a new filesystem via
''' sudo ceph-mon -i {mon-id} --mkfs --monmap {tmp}/{map-filename}
--keyring
Hi Ben!
I'm using ubuntu 14.04
I have restarted the gateways with the numthreads line you suggested. I
hope this helps. I would think I would get some kind of throttle log or
something.
500 seems really strange as well. Do you have a thread for this? RGW still
has a weird race condition
Some context. I have a small cluster running ubuntu 14.04 and giant ( now
hsmmer). I ran some updates everything was fine. Rebooted a node and a
drive must have failed as it no longer shows up.
I use --dmcrypt with ceph deploy and 5 osds per ssd journal. To do this I
created the ssd
Sorry for the delay. It took me a while to figure out how to do a range request
and append the data to a single file. The good news is that the end file seems
to be 14G in size which matches the files manifest size. The bad news is that
the file is completely corrupt and the radosgw log has
-Weinraub yeh...@redhat.com
To: Sean Sullivan seapasu...@uchicago.edu
Cc: ceph-users@lists.ceph.com
Sent: Wednesday, May 13, 2015 2:33:07 PM
Subject: Re: [ceph-users] RGW - Can't download complete object
That's another interesting issue. Note that for part 12_80 the manifest
specifies (I assume
Will do. The reason for the partial request is that the total size of the
file is close to 1TB so attempting a download would take quite some time on
our 10Gb connection. What is odd is that if I request the last bit
received to the end of the file we get a 406 can not be satisfied response
I have a single radosgw user with 2 s3 keys and 1 swift key. I have created a
few buckets and I can list all of the contents of bucket A and C but not B with
either S3 (boto) or python-swiftclient. I am able to list the first 1000
entries using radosgw-admin 'bucket list --bucket=bucketB'
I am trying to understand these drive throttle markers that were
mentioned to get an idea of why these drives are marked as slow.::
here is the iostat of the drive /dev/sdbm
http://paste.ubuntu.com/9607168/
an IO wait of .79 doesn't seem bad but a write wait of 21.52 seems
really high. Looking
/q5E6JjkG
On 12/19/2014 08:10 PM, Christian Balzer wrote
Hello Sean,
On Fri, 19 Dec 2014 02:47:41 -0600 Sean Sullivan wrote:
Hello Christian,
Thanks again for all of your help! I started a bonnie test using the
following::
bonnie -d /mnt/rbd/scratch2/ -m $(hostname) -f -b
While that gives you
less hardware. Hopefully it is indeed a bug of
somesort and not yet another screw up on my end. Furthermore hopefully I
find the bug and fix it for others to find and profit from ^_^.
Thanks for all of your help!
On 12/22/2014 05:26 PM, Craig Lewis wrote:
On Mon, Dec 22, 2014 at 2:57 PM, Sean
:
Hello,
On Thu, 18 Dec 2014 23:45:57 -0600 Sean Sullivan wrote:
Wow Christian,
Sorry I missed these in line replies. Give me a minute to gather some
data. Thanks a million for the in depth responses!
No worries.
I thought about raiding it but I needed the space unfortunately. I had a
3x60 osd
Hello Yall!
I can't figure out why my gateways are performing so poorly and I am not
sure where to start looking. My RBD mounts seem to be performing fine
(over 300 MB/s) while uploading a 5G file to Swift/S3 takes 2m32s
(32MBps i believe). If we try a 1G file it's closer to 8MBps. Testing
with
if they're behaving differently on the different requests
is one angle of attack. The other is look into is if the RGW daemons
are hitting throttler limits or something that the RBD clients aren't.
-Greg
On Thu, Dec 18, 2014 at 7:35 PM Sean Sullivan seapasu...@uchicago.edu
mailto:seapasu
gateway was made last minute to test and rule out the hardware.
On December 18, 2014 10:57:41 PM Christian Balzer ch...@gol.com wrote:
Hello,
Nice cluster, I wouldn't mind getting my hand or her ample nacelles, er,
wrong movie. ^o^
On Thu, 18 Dec 2014 21:35:36 -0600 Sean Sullivan wrote:
Hello
Wow Christian,
Sorry I missed these in line replies. Give me a minute to gather some data.
Thanks a million for the in depth responses!
I thought about raiding it but I needed the space unfortunately. I had a
3x60 osd node test cluster that we tried before this and it didn't have
this
I had this happen to me as well. Turned out to be a connlimit thing for me.
I would check dmesg/kernel log and see if you see any conntrack limit
reached connection dropped messages then increase connlimit. Odd as I
connected over ssh for this but I can't deny syslog.
So this was working a moment ago and I was running rados bencharks as
well as swift benchmarks to try to see how my install was doing. Now
when I try to download an object I get this read_length error::
http://pastebin.com/R4CW8Cgj
To try to poke at this I wiped all of the .rgw pools, removed
I think I have a split issue or I can't seem to get rid of these objects.
How can I tell ceph to forget the objects and revert?
How this happened is that due to the python 2.7.8/ceph bug ( a whole rack
of ceph went town (it had ubuntu 14.10 and that seemed to have 2.7.8 before
14.04). I didn't
I forgot to register before posting so reposting.
I think I have a split issue or I can't seem to get rid of these objects.
How can I tell ceph to forget the objects and revert?
How this happened is that due to the python 2.7.8/ceph bug ( a whole rack
of ceph went town (it had ubuntu 14.10 and
42 matches
Mail list logo