Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Bryan Henderson
plicated across all three, with the hope that this sort of thing would not be fatal. It's a Jewel system with that version's default of 1 for "mon osd min down reporters". -- Bryan Henderson San Jose, California ___

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-02 Thread Bryan Henderson
the people who buy those are not aware that it's designed never to go down and if something breaks while the system is coming up, a repair action may be necessary before data is accessible again. -- Bryan Henderson San Jose, California

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-01 Thread Bryan Henderson
hat possible? A related question: If I mark an OSD down administratively, does it stay down until I give a command to mark it back up, or will the monitor detect signs of life and declare it up again on its own? -- Bryan Henderson San Jose

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Bryan Henderson
that to the monitor, which would believe it within about a minute and mark the OSDs down. ("osd heartbeat interval", "mon osd min down reports", "mon osd min down reporters", "osd reporter subtree level"). -- Brya

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Bryan Henderson
d down, which is pretty complicated, at http://docs.ceph.com/docs/master/rados/configuration/mon-osd-interaction/ It just doesn't seem to match the implementation. -- Bryan Henderson San Jose, California ___ ceph-users m

[ceph-users] How does monitor know OSD is dead?

2019-06-27 Thread Bryan Henderson
of mon_osd_report_timeout), it marks it down. But it didn't. I did "osd down" commands for the dead OSDs and the status changed to down and I/O started working. And wouldn't even 15 minutes of grace be unacceptable if it means I/Os have to wait that long before falling back to a redundant OSD?

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Bryan Henderson
ons. Without such options, in the one case I tried, Linux 4.9, blocksize was 32K. Maybe it's affected by the server or by the filesystem the NFS server is serving. This was NFS 3. > This patch should address this issue [massive reads of e.g. /dev/urandom]:. Thanks! > mount option should wor

Re: [ceph-users] cephfs file block size: must it be so big?

2018-12-14 Thread Bryan Henderson
's layout. In the default layout, stripe unit size is 4 MiB. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] cephfs file block size: must it be so big?

2018-12-13 Thread Bryan Henderson
stat block size been discussed much? Is there a good reason that it's the RADOS object size? I'm thinking of modifying the cephfs filesystem driver to add a mount option to specify a fixed block size to be reported for all files, and using 4K or 64K. Would that break something? -- Bryan

[ceph-users] searching mailing list archives

2018-11-12 Thread Bryan Henderson
Is it possible to search the mailing list archives? http://lists.ceph.com/pipermail/ceph-users-ceph.com/ seems to have a search function, but in my experience never finds anything. -- Bryan Henderson San Jose, California

[ceph-users] How to repair rstats mismatch

2018-11-08 Thread Bryan Henderson
empty, while also giving an empty list of its contents. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Should OSD write error result in damaged filesystem?

2018-11-04 Thread Bryan Henderson
tem driver is the same way - I have to tell it how big a write it can do; it can't figure it out from the OSDs. So maybe its a fundamental architecture thing. -- Bryan Henderson San Jose, California ___ ceph-users mailing

[ceph-users] Should OSD write error result in damaged filesystem?

2018-11-03 Thread Bryan Henderson
? Is this a job for cephfs-journal-tool event recover_dentries cephfs-journal-tool journal reset ? This is Jewel. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http

Re: [ceph-users] MDS does not always failover to hot standby

2018-09-07 Thread Bryan Henderson
filesystem is offline) -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS does not always failover to hot standby on reboot

2018-09-01 Thread Bryan Henderson
> If the active MDS is connected to a monitor and they fail at the same time, > the monitors can't replace the mds until they've been through their own > election and a full mds timeout window. So how long are we talking? -- Bryan Henderson San Jose, C

Re: [ceph-users] Why does Ceph probe for end of MDS log?

2018-08-26 Thread Bryan Henderson
er happened. This failure to restart happened after the MDS crashed, and I lost any messages that would tell me why it crashed. I'll fix that and turn up verbosity and if it happens again, I'll have a better idea how the zeroes got there. -- Bryan Henderson

[ceph-users] Why does Ceph probe for end of MDS log?

2018-08-23 Thread Bryan Henderson
rectly written? I'm looking at this because I have an MDS that will not start because there is junk (zeroes) in that space after where the log header says the log ends, so replay of the log fails there. -- Bryan Henderson

Re: [ceph-users] Fwd: down+peering PGs, can I move PGs from one OSD to another

2018-08-04 Thread Bryan Henderson
belong on another OSD (which I guess it ought to, since the OSD is out), ceph-objecstore-tool is what you would use to move them over there manually, since ordinary peering can't do it. -- Bryan Henderson San Jose, Cali

Re: [ceph-users] Cephfs kernel driver availability

2018-07-22 Thread Bryan Henderson
system for these clients. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Cephfs kernel driver availability

2018-07-22 Thread Bryan Henderson
that there are many more bugs in the 3.16 cephfs filesystem driver waiting for me. Indeed, I've seen panics not yet explained. So what are other people using? A less stable kernel? An out-of-tree driver? FUSE? Is there a working process for getting known bugs fixed in 3.16? -- Bryan Henderson

Re: [ceph-users] Data recovery after loosing all monitors

2018-06-02 Thread Bryan Henderson
> Kill all mds first , create new fs with old pools , then run ‘fs reset’ > before start any MDS. Brilliant! I can't wait to try it. Thanks. -- Bryan Henderson San Jose, California ___ ceph-users mailing lis

Re: [ceph-users] Data recovery after loosing all monitors

2018-06-01 Thread Bryan Henderson
it takes along with 'ceph-objecstore-tool --op update-mon-db' to recover from a lost cluster map. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] Data recovery after loosing all monitors

2018-05-26 Thread Bryan Henderson
u've recovered access to the OSDs? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intepreting reason for blocked request

2018-05-19 Thread Bryan Henderson
inside the cluster), and the requests aren't just blocked for a long time; they're blocked indefinitely. The only time I've seen it is when I brought the cluster up in a different order than I usually do. So I'm just trying to understand the inner workings in case I nee

[ceph-users] Intepreting reason for blocked request

2018-05-12 Thread Bryan Henderson
I recently had some requests blocked indefinitely; I eventually cleared it up by recycling the OSDs, but I'd like some help interpreting the log messages that supposedly give clue as to what caused the blockage: (I reformatted for easy email reading) 2018-05-03 01:56:35.248623 osd.0

[ceph-users] stale status from monitor?

2018-05-08 Thread Bryan Henderson
My cluster got stuck somehow, and at one point in trying to recycle things to unstick it, I ended up shutting down everything, then bringing up just the monitors. At that point, the cluster reported the status below. With nothing but the monitors running, I don't see how the status can say there

[ceph-users] Shutting down: why OSDs first?

2018-05-07 Thread Bryan Henderson
would I be taking if I just haphazardly killed everything instead of orchestrating a shutdown? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi

[ceph-users] Why keep old epochs?

2017-11-14 Thread Bryan Henderson
than that, and what happens if the maximum I set is too low to cover those necessesary old pgmaps? -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/lis

[ceph-users] What goes in the monitor database?

2017-11-04 Thread Bryan Henderson
h_kvstore_tool after shutting down the monitor, I see hundreds of keys. So what does the monitor have to store to do a "status" command? I've seen clues that the activity has to do with Paxos elections, but I'm fuzzy on why elections would be happening or why they would need a persistent

[ceph-users] Ceph program memory usage

2017-04-29 Thread Bryan Henderson
someone searching the archives for memory usage information. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph program uses lots of memory

2017-01-03 Thread Bryan Henderson
o real memory or paging rate rlimit. As it stands, any normal shell on my systems has an address space limit of 256M, which has never been a problem before, but is majorly inconvenient now. -- Bryan Henderson San Jose, Cali

[ceph-users] ceph program uses lots of memory

2016-12-29 Thread Bryan Henderson
specific command I'm doing and it does this even with there is no ceph cluster running, so it must be something pretty basic. -- Bryan Henderson San Jose, California ___ ceph-users mailing list ceph-users@lists.ceph.com