[ceph-users] cephfs reporting 2x data available

2016-06-14 Thread Daniel Davidson
I have just deployed a cluster and started messing with it, which I think two replicas. However when I have a metadata server and mount via fuse, it is reporting its full size. With two replicas, I thought it would be only reporting half of that. Did I make a mistake, or is there something

[ceph-users] problems mounting from fstab on boot

2016-06-22 Thread Daniel Davidson
When I add my ceph system to fstab, I can make mount by referencing it, but when I restart the system it stops during boot because the mount failed. I am guessing it is because fstab is run before the network starts? Using centos 7. thanks for the help, Dan

Re: [ceph-users] cephfs reporting 2x data available

2016-06-14 Thread Daniel Davidson
Thanks John, I just wanted to make sure I wasnt doing anything wrong, that should work fine. Dan On 06/14/2016 03:24 PM, John Spray wrote: On Tue, Jun 14, 2016 at 7:45 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: I have just deployed a cluster and started messing with it, w

[ceph-users] Improving metadata throughput

2016-06-29 Thread Daniel Davidson
I am starting to work with and benchmark our ceph cluster. While throughput is so far looking good, metadata performance so far looks to be suffering. Is there anything that can be done to speed up the response time of looking through a lot of small files and folders? Right now, I am

[ceph-users] Speeding Up Balancing After Adding Nodes

2017-02-01 Thread Daniel Davidson
I just added two nodes to our cluster for the first time since we really had any data on them to speak of. Each node has two rather large raid arrays on them. Rebalancing it taking a very long time, estimated weeks, to complete. Are there any tunables/procedures to speed this up? Network

[ceph-users] purging strays faster

2017-03-03 Thread Daniel Davidson
ceph daemonperf mds.ceph-0 -mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log rlat inos caps|hsr hcs hcr |writ read actv|recd recy stry purg|segs evts subm| 0 336k 97k| 000 | 00 20 | 00 246k 0 | 31 27k 0 0 336k 97k| 000

Re: [ceph-users] purging strays faster

2017-03-14 Thread Daniel Davidson
Thanks John, I think that has resolved the problems. Dan On 03/04/2017 09:08 AM, John Spray wrote: On Fri, Mar 3, 2017 at 9:48 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: ceph daemonperf mds.ceph-0 -mds-- --mds_server-- ---objecter--- -mds_cache- ---mds_log---

Re: [ceph-users] purging strays faster

2017-03-06 Thread Daniel Davidson
Thanks for the suggestion, however I think my more immediate problem is the ms_handle_reset messages. I do not think the mds are getting the updates when I send them. Dan On 03/04/2017 09:08 AM, John Spray wrote: On Fri, Mar 3, 2017 at 9:48 PM, Daniel Davidson <dani...@igb.illinois.

Re: [ceph-users] purging strays faster

2017-03-07 Thread Daniel Davidson
incorrectly somewhere, but I do not know where to look. Dan On 03/06/2017 09:05 AM, John Spray wrote: On Mon, Mar 6, 2017 at 3:03 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: Thanks for the suggestion, however I think my more immediate problem is the ms_handle_reset messages

[ceph-users] Crashes Compiling Ruby

2017-07-13 Thread Daniel Davidson
We have a weird issue. Whenever compiling Ruby, and only Ruby, on a location served by cephfs, the node in our cluster (not the ceph node) will crash. This always happens, even if we do not use a PXE bootable node like the head/management node. If we compile to local disk, it will succeed.

Re: [ceph-users] Performance after adding a node

2017-05-09 Thread Daniel Davidson
the right number for your environment. Good Luck :) On Mon, May 8, 2017 at 5:43 PM Daniel Davidson <dani...@igb.illinois.edu <mailto:dani...@igb.illinois.edu>> wrote: Our ceph system performs very poorly or not even at all while the remapping procedure is underway. We are u

Re: [ceph-users] Inpu/output error mounting

2017-06-23 Thread Daniel Davidson
the daemon and stopping any operations it's working on. Also while it's down, the secondary OSDs for the PG should be able to handle the requests that are blocked. Check it's log to see what it's doing. You didn't answer what your size and min_size are for your 2 pools. On Fri, J

Re: [ceph-users] Inpu/output error mounting

2017-06-23 Thread Daniel Davidson
e 2 inactive PGs. Not sure yet if that is anything of concern, but didn't want to ignore it. On Fri, Jun 23, 2017 at 1:16 PM Daniel Davidson <dani...@igb.illinois.edu <mailto:dani...@igb.illinois.edu>> wrote: Two of our OSD systems hit 75% disk utilization, so I added another

[ceph-users] Inpu/output error mounting

2017-06-23 Thread Daniel Davidson
Two of our OSD systems hit 75% disk utilization, so I added another system to try and bring that back down. The system was usable for a day while the data was being migrated, but now the system is not responding when I try to mount it: mount -t ceph ceph-0,ceph-1,ceph-2,ceph-3:6789:/ /home

[ceph-users] Performance after adding a node

2017-05-08 Thread Daniel Davidson
Our ceph system performs very poorly or not even at all while the remapping procedure is underway. We are using replica 2 and the following ceph tweaks while it is in process: 1013 ceph tell osd.* injectargs '--osd-recovery-max-active 20' 1014 ceph tell osd.* injectargs

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
may be lost (~mds0/stray7) Dan On 10/25/2017 08:54 AM, Daniel Davidson wrote: Thanks for the information. I did: # ceph daemon mds.ceph-0 scrub_path / repair recursive Saw in the logs it finished # ceph daemon mds.ceph-0 flush journal Saw in the logs it finished #ceph mds fail 0 #ceph mds

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
ame is how you would refer to the daemon from systemd, it's often set to the hostname where the daemon is running by default. John On Wed, Oct 25, 2017 at 2:30 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: I do have a problem with running the commands you mentioned to repair the mds

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
. Dan On 10/25/2017 03:55 AM, John Spray wrote: On Tue, Oct 24, 2017 at 7:14 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: Our ceph system is having a problem. A few days a go we had a pg that was marked as inconsistent, and today I fixed it with a: #ceph pg repair 1.37c then

Re: [ceph-users] MDS damaged

2017-10-25 Thread Daniel Davidson
Any idea why that is not working? Dan On 10/25/2017 06:45 AM, Daniel Davidson wrote: John, thank you so much.  After doing the initial rados command you mentioned it is back up and running.  It did complain about a bunch of files which frankly are not important having duplicate inodes, but I

Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
may be lost (~mds0/stray7) 2017-10-26 05:03:17.661711 7f1c598a6700 -1 mds.0.damage notify_dirfrag Damage to fragment * of ino 607 is fatal because it is a system directory for this rank I would be grateful for any help in repair, Dan On 10/25/2017 04:17 PM, Daniel Davidson wrote: A bit more

Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
, Daniel Davidson wrote: I increased the logging of the mds to try and get some more information.  I think the relevant lines are: 2017-10-26 05:03:17.661683 7f1c598a6700  0 mds.0.cache.dir(607) _fetched missing object for [dir 607 ~mds0/stray7/ [2,head] auth v=108918871 cv=0/0 ap=1+0+0 state

Re: [ceph-users] MDS damaged

2017-10-24 Thread Daniel Davidson
5643 mon.0 [INF] fsmap e121619: 0/1/1 up, 1 damaged 2017-10-25 00:02:10.182101 mon.0 [INF] mds.? 172.16.31.1:6809/2991612296 up:boot 2017-10-25 00:02:10.182189 mon.0 [INF] fsmap e121620: 0/1/1 up, 1 up:standby, 1 damaged What should I do next? ceph fs reset igbhome scares me. Dan On 10/24/

Re: [ceph-users] MDS damaged

2017-10-24 Thread Daniel Davidson
a lot of messages like: 2017-10-24 21:24:10.910489 7f775e539bc0  1 scavenge_dentries: frag 607. is corrupt, overwriting The frag number is the same for every line and there have been thousands. I really could use some assistance, Dan On 10/24/2017 12:14 PM, Daniel Davidson wrote

[ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Daniel Davidson
Hello, Today we had a node crash, and looking at it, it seems there is a problem with the RAID controller, so it is not coming back up, maybe ever.  It corrupted the local filesytem for the ceph storage there. The remainder of our storage (10.2.10) cluster is running, and it looks to be

Re: [ceph-users] Node crash, filesytem not usable

2018-05-11 Thread Daniel Davidson
ate of your cluster.  Most notable is `ceph status` but `ceph osd tree` would be helpful. What are the size of the pools in your cluster?  Are they all size=3 min_size=2? On Fri, May 11, 2018 at 12:05 PM Daniel Davidson <dani...@igb.illinois.edu <mailto:dani...@igb.illinois.edu>> wro

Re: [ceph-users] MDS damaged

2017-10-26 Thread Daniel Davidson
and then make the system go down? thanks again for all of your help, Dan On 10/26/2017 09:23 AM, John Spray wrote: On Thu, Oct 26, 2017 at 12:40 PM, Daniel Davidson <dani...@igb.illinois.edu> wrote: And at the risk of bombing the mailing list, I can also see that the stray7_head o