Re: [ceph-users] CephFS: caps went stale, renewing

2016-09-02 Thread Yan, Zheng
On Sat, Sep 3, 2016 at 1:35 AM, Gregory Farnum wrote: > On Fri, Sep 2, 2016 at 2:58 AM, David wrote: >> Hi All >> >> Kernel client: 4.6.4-1.el7.elrepo.x86_64 >> MDS version: 10.2.2 >> OS: CentOS 7 >> >> I have Cephfs mounted on a few servers, I see the

Re: [ceph-users] CephFS: caps went stale, renewing

2016-09-02 Thread Yan, Zheng
On Fri, Sep 2, 2016 at 5:58 PM, David wrote: > Hi All > > Kernel client: 4.6.4-1.el7.elrepo.x86_64 > MDS version: 10.2.2 > OS: CentOS 7 > > I have Cephfs mounted on a few servers, I see the following in the log > approx every 20 secs on all of them: > > kernel: ceph: mds0

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Yan, Zheng
On Fri, Sep 2, 2016 at 5:10 PM, Sean Redmond wrote: > I have checked all the servers in scope running 'dmesg | grep -i stale' and > it does not yield any results. > > As a test I have rebooted the servers in scope and I can still replicate the > behavior 100% of the time.

[ceph-users] Can someone explain the strange leftover OSD devices in CRUSH map -- renamed from osd.N to deviceN?

2016-09-02 Thread Dan Jakubiec
A while back we removed two damaged OSDs from our cluster, osd.0 and osd.8. They are now gone from most Ceph commands, but are still showing up in the CRUSH map with weird device names: ... # devices device 0 device0 device 1 osd.1 device 2 osd.2 device 3 osd.3 device 4 osd.4 device 5 osd.5

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Sean Redmond
Hi, The work around is fine, I still think there could be a bug here. I will try and spend some time in the next few days to write something to test with Regular buffered IO. But even using memory-mapped IO I would not expect the read request from server1 or server2 to get zeros's in place of

[ceph-users] How to abandon PGs that are stuck in "incomplete"?

2016-09-02 Thread Dan Jakubiec
Re-packaging this question which was buried in a larger, less-specific thread from a couple of days ago. Hoping this will be more useful here. We have been working on restoring our Ceph cluster after losing a large number of OSDs. We have all PGs active now except for 80 PGs that are stuck in

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Gregory Farnum
On Fri, Sep 2, 2016 at 11:35 AM, Sean Redmond wrote: > Hi, > > That makes sense, I have worked around this by forcing the sync within the > application running under apache and it is working very well now without the > need for the 'sync' mount option. > > What

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Sean Redmond
Hi, That makes sense, I have worked around this by forcing the sync within the application running under apache and it is working very well now without the need for the 'sync' mount option. What interesting is that in the pastebin provided below it shows a way to replicate this, I was just using

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Gregory Farnum
On Thu, Sep 1, 2016 at 8:02 AM, Sean Redmond wrote: > Hi, > > It seems to be using syscall mmap() from what I read this indicates it is > using memory-mapped IO. > > Please see a strace here: http://pastebin.com/6wjhSNrP Zheng meant is Apache using memory-mapped IO. From

Re: [ceph-users] OSD daemon randomly stops

2016-09-02 Thread Samuel Just
Probably an EIO. You can reproduce with debug filestore = 20 to confirm. -Sam On Fri, Sep 2, 2016 at 10:18 AM, Reed Dier wrote: > OSD has randomly stopped for some reason. Lots of recovery processes > currently running on the ceph cluster. OSD log with assert below: > >

[ceph-users] OSD daemon randomly stops

2016-09-02 Thread Reed Dier
OSD has randomly stopped for some reason. Lots of recovery processes currently running on the ceph cluster. OSD log with assert below: > -14> 2016-09-02 11:32:38.672460 7fcf65514700 5 -- op tracker -- seq: 1147, > time: 2016-09-02 11:32:38.672460, event: queued_for_pg, op: >

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-02 Thread Steve Taylor
You can use 'rbd -p images --image 417ef4b6-b4b2-4e94-9ae6-ef7a4ee3e560 info' to see the parentage of your cloned RBD from Ceph's perspective. It seems like that could be useful at various times throughout this test to determine what glance is doing under the covers.

Re: [ceph-users] Slow Request on OSD

2016-09-02 Thread Reed Dier
Just to circle back to this: Drives: Seagate ST8000NM0065 Controller: LSI 3108 RAID-on-Chip At the time, no BBU on RoC controller. Each OSD drive was configured as a single RAID0 VD. What I believe to be the snake that bit us was the Seagate drives’ on-board caching. Using storcli to manage

Re: [ceph-users] RadosGW zonegroup id error

2016-09-02 Thread Yehuda Sadeh-Weinraub
On Fri, Sep 2, 2016 at 12:54 AM, Yoann Moulin wrote: > Hello, > >> I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I >> don't >> know when this occured, but I think I did a wrong command during the >> manipulation of zones and regions. Now the ID of

Re: [ceph-users] Turn snapshot of a flattened snapshot into regular image

2016-09-02 Thread Eugen Block
Something isn't right. Ceph won't delete RBDs that have existing snapshots That's what I thought, and I also noticed that in the first test, but not in the second. The clone becomes a cinder device that is then attached to the nova instance. This is one option, but I don't use it. nova

[ceph-users] CephFS: caps went stale, renewing

2016-09-02 Thread David
Hi All Kernel client: 4.6.4-1.el7.elrepo.x86_64 MDS version: 10.2.2 OS: CentOS 7 I have Cephfs mounted on a few servers, I see the following in the log approx every 20 secs on all of them: kernel: ceph: mds0 caps went stale, renewing kernel: ceph: mds0 caps stale kernel: ceph: mds0 caps renewed

Re: [ceph-users] vmware + iscsi + tgt + reservations

2016-09-02 Thread Oliver Dzombic
Hi Nick, yes, vaai is disabled successfully. Performance and everything is good. The pure problem is, that ONE lun can only be used on ONE node at the same time. So the reservation is not working. There is no ATS and there is no vaai active. But still, of course, vmware will, just like any

Re: [ceph-users] vmware + iscsi + tgt + reservations

2016-09-02 Thread Nick Fisk
Have you disabled the vaai functions in ESXi? I can't remember off the top of my head, but one of them makes everything slow to a crawl. > -Original Message- > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of > Oliver Dzombic > Sent: 02 September 2016 09:50 > To:

Re: [ceph-users] cephfs page cache

2016-09-02 Thread Sean Redmond
I have checked all the servers in scope running 'dmesg | grep -i stale' and it does not yield any results. As a test I have rebooted the servers in scope and I can still replicate the behavior 100% of the time. On Fri, Sep 2, 2016 at 4:37 AM, Yan, Zheng wrote: > I think

Re: [ceph-users] vmware + iscsi + tgt + reservations

2016-09-02 Thread Oliver Dzombic
Hi, VMFS-5.61 file system spanning 1 partitions. Mode: public The Filesystem is working fine ( on the 1st node where multiple instances are started ). And it continues to work fine, also after mounting the same LUN to the 2nd node and trying write operations there. So i have no reason to think

Re: [ceph-users] RadosGW zonegroup id error

2016-09-02 Thread Yoann Moulin
Hello, > I have an issue with the default zonegroup on my cluster (Jewel 10.2.2), I > don't > know when this occured, but I think I did a wrong command during the > manipulation of zones and regions. Now the ID of my zonegroup is "default" > instead of "4d982760-7853-4174-8c05-cec2ef148cf0", I