Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Gregory Farnum
Either ought to work fine.

On Thu, Nov 2, 2017 at 4:58 PM Andras Pataki 
wrote:

> I'm planning to test the newer ceph-fuse tomorrow.  Would it be better to
> stay with the Jewel 10.2.10 client, or would the 12.2.1 Luminous client be
> better (even though the back-end is Jewel for now)?
>
>
> Andras
>
>
>
> On 11/02/2017 05:54 PM, Gregory Farnum wrote:
>
> Have you tested on the new ceph-fuse? This does sound vaguely familiar and
> is an issue I'd generally expect to have the fix backported for, once it
> was identified.
>
> On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki <
> apat...@flatironinstitute.org> wrote:
>
>> We've been running into a strange problem with Ceph using ceph-fuse and
>> the filesystem. All the back end nodes are on 10.2.10, the fuse clients
>> are on 10.2.7.
>>
>> After some hours of runs, some processes get stuck waiting for fuse like:
>>
>> [root@worker1144 ~]# cat /proc/58193/stack
>> [] wait_answer_interruptible+0x91/0xe0 [fuse]
>> [] __fuse_request_send+0x253/0x2c0 [fuse]
>> [] fuse_request_send+0x12/0x20 [fuse]
>> [] fuse_send_write+0xd6/0x110 [fuse]
>> [] fuse_perform_write+0x2f5/0x5a0 [fuse]
>> [] fuse_file_aio_write+0x2a1/0x340 [fuse]
>> [] do_sync_write+0x8d/0xd0
>> [] vfs_write+0xbd/0x1e0
>> [] SyS_write+0x7f/0xe0
>> [] system_call_fastpath+0x16/0x1b
>> [] 0x
>>
>> The cluster is healthy (all OSDs up, no slow requests, etc.).  More
>> details of my investigation efforts are in the bug report I just
>> submitted:
>>  http://tracker.ceph.com/issues/22008
>>
>> It looks like the fuse client is asking for some caps that it never
>> thinks it receives from the MDS, so the thread waiting for those caps on
>> behalf of the writing client never wakes up.  The restart of the MDS
>> fixes the problem (since ceph-fuse re-negotiates caps).
>>
>> Any ideas/suggestions?
>>
>> Andras
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-11-02 Thread Brad Hubbard
On Wed, Nov 1, 2017 at 11:54 PM, Mazzystr  wrote:
> I experienced this as well on tiny Ceph cluster testing...
>
> HW spec - 3x
> Intel i7-4770K quad core
> 32Gb m2/ssd
> 8Gb memory
> Dell PERC H200
> 6 x 3Tb Seagate
> Centos 7.x
> Ceph 12.x
>
> I also run 3 memory hungry procs on the Ceph nodes.  Obviously there is a
> memory problem here.  Here are the steps I took avoid oom-killer killing the
> node ...
>
> /etc/rc.local -
> for i in $(pgrep ceph-mon); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-osd); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-mgr); do echo 50 > /proc/$i/oom_score_adj; done
>
> /etc/sysctl.conf -
> vm.swappiness = 100
> vm.vfs_cache_pressure = 1000

This is generally not a good idea. Just sayin'

$ grep -A17 ^vfs_cache_pressure sysctl/vm.txt
vfs_cache_pressure
--

This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.

> vm.min_free_kbytes = 512
>
> /etc/ceph/ceph.conf -
> [osd]
> bluestore_cache_size = 52428800
> bluestore_cache_size_hdd = 52428800
> bluestore_cache_size_ssd = 52428800
> bluestore_cache_kv_max = 52428800
>
> You're going to see memory page-{in,out} skyrocket with this setup but it
> should keep oom-killer at bay until a memory fix can be applied.  Client
> performance to the cluster wasn't spectacular but wasn't terrible.  I was
> seeing +/- 60Mb/sec of bandwidth.
>
> Ultimately I upgraded the nodes to 16Gb
>
> /Chris C
>
> On Tue, Oct 31, 2017 at 10:30 PM, shadow_lin  wrote:
>>
>> Hi Sage,
>> We have tried compiled the latest ceph source code from github.
>> The build is ceph version 12.2.1-249-g42172a4
>> (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable).
>> The memory problem seems better but the memory usage of osd is still keep
>> increasing as more data are wrote into the rbd image and the memory usage
>> won't drop after the write is stopped.
>>Could you specify from which commit the memeory bug is fixed?
>> Thanks
>> 2017-11-01
>> 
>> lin.yunfan
>> 
>>
>> 发件人:Sage Weil 
>> 发送时间:2017-10-24 20:03
>> 主题:Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot
>> of data to cluster
>> 收件人:"shadow_lin"
>> 抄送:"ceph-users"
>>
>> On Tue, 24 Oct 2017, shadow_lin wrote:
>> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body
>> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,
>> > The cluster has 24 osd with 24 8TB hdd.
>> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the
>> > memory
>> > is below the remmanded value, but this osd server is an ARM  server so I
>> > can't do anything to add more ram.
>> > I created a replicated(2 rep) pool and an 20TB image and mounted to the
>> > test
>> > server with xfs fs.
>> >
>> > I have set the ceph.conf to this(according to other related post
>> > suggested):
>> > [osd]
>> > bluestore_cache_size = 104857600
>> > bluestore_cache_size_hdd = 104857600
>> > bluestore_cache_size_ssd = 104857600
>> > bluestore_cache_kv_max = 103809024
>> >
>> >  osd map cache size = 20
>> > osd map max advance = 10
>> > osd map share max epochs = 10
>> > osd pg epoch persisted max stale = 10
>> > The bluestore cache setting did improve the situation,but if i try to
>> > write
>> > 1TB data by dd command(dd if=/dev/zero of=test bs=1G count=1000)  to rbd
>> > the
>> > osd will eventually be killed by oom killer.
>> > If I only wirte like 100G  data once then everything is fine.
>> >
>> > Why does the osd memory usage keep increasing whle writing ?
>> > Is there anything I can do to reduce the memory usage?
>>
>> There is a bluestore memory bug that was fixed just after 12.2.1 was
>> released; it will be fixed in 12.2.2.  In the meantime, you can run
>> consider running the latest luminous branch (not fully tested) from
>> https://shaman.ceph.com/builds/ceph/luminous.
>>
>> sage
>>
>>
>> 

Re: [ceph-users] Ceph RDB with iSCSI multipath

2017-11-02 Thread Jason Dillaman
There was a little delay getting things merged in the upstream kernel so we
are now hoping for v4.16. You should be able to take a 4.15 rc XYZ kernel
and apply the patches from this thread [1]. It's due to this upstream delay
that CentOS 7.4. doesn't have the patches backported, but
hopefully a forthcoming Z-stream of 7.4 will include the changes (fingers
crossed).

[1] http://www.spinics.net/lists/target-devel/msg16162.html

On Wed, Nov 1, 2017 at 9:37 PM, GiangCoi Mr  wrote:

> Hi
> I follow this guide, http://docs.ceph.com/docs/
> master/rbd/iscsi-target-cli/ to install iSCSI Ceph on CentOS 7.4 kernel
> 4.xx. But why Ceph don't support for OS CentOS. In this document, they
> wrote:
>
> *Requirements:*
>
>-
>
>A running Ceph Luminous or later storage cluster
>-
>
>RHEL/CentOS 7.4; or Linux kernel v4.14 or newer
>-
>
>The following packages must be installed from your Linux
>distribution’s software repository:
>- targetcli-2.1.fb47 or newer package
>   - python-rtslib-2.1.fb64 or newer package
>   - tcmu-runner-1.3.0 or newer package
>   - ceph-iscsi-config-2.3 or newer package
>   - ceph-iscsi-cli-2.5 or newer package
>
> How I can configure iSCSI?
>
> Regards,
>
> GiangLT
>
> 2017-11-01 20:14 GMT+07:00 Jason Dillaman :
>
>> Did you encounter an issue with the steps documented here [1]?
>>
>> [1] http://docs.ceph.com/docs/master/rbd/iscsi-initiator-win/
>>
>> On Wed, Nov 1, 2017 at 5:59 AM, GiangCoi Mr  wrote:
>> > Hi all.
>> >
>> > I'm configuring Ceph RDB to expose iSCSI gateway. I am using 3 Ceph-node
>> > (CentOS 7.4 + Ceph Luminous). I want to configure iSCSI gateway on 3
>> > Ceph-node for Windows server 2016 connect to Multipath iSCSI. How I can
>> > configure. Please help me to configure it. Thanks
>> >
>> > Regards,
>> > Giang
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> Jason
>>
>
>


-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI: tcmu-runner can't open images?

2017-11-02 Thread Jason Dillaman
On Thu, Nov 2, 2017 at 11:34 AM, Matthias Leopold <
matthias.leop...@meduniwien.ac.at> wrote:

> Hi,
>
> i'm trying to set up iSCSI gateways for a Ceph luminous cluster using
> these instructions:
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/
>
> When arriving at step "Configuring: Adding a RADOS Block Device (RBD)"
> things start to get messy: there is no "disks" entry in my target path, so
> i can't "cd 
> /iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:/disks/".
> When i try to create a disk in the top level "/disks" path ("/disks> create
> pool=ovirt-default image=itest04 size=50g") gwcli crashes with "ValueError:
> No JSON object could be decoded" (there is more output when using debug but
> i don't think it matters). More interesting is /var/log/tcmu-runner.log, it
> says consistently
>
> [DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2.
> [DEBUG] dev_added:768 rbd/ovirt-default.itest04: Got block_size 512, size
> in bytes 53687091200
> [DEBUG] tcmu_rbd_open:581 rbd/ovirt-default.itest04: tcmu_rbd_open config
> rbd/ovirt-default/itest04/osd_op_timeout=30 block size 512 num lbas
> 104857600.
> [DEBUG] timer_check_and_set_def:234 rbd/ovirt-default.itest04: The
> cluster's default osd op timeout(30.00), osd heartbeat grace(20)
> interval(6)
> [DEBUG] timer_check_and_set_def:242 rbd/ovirt-default.itest04: The osd op
> timeout will remain the default value: 30.00
> [ERROR] tcmu_rbd_image_open:318 rbd/ovirt-default.itest04: Could not open
> image itest04/osd_op_timeout=30. (Err -2)
>

The error is that ceph-iscsi-config has instructed tcmu-runner that the
name of the image is "itest04/osd_op_timeout=30". We changed the delimiter
for separating optionals from "/" to ";" and that is what your version of
tcmu-runner is expecting. Upgrade to the latest available version of
ceph-iscsi-config from here [1].


> [ERROR] add_device:496: handler open failed for uio0
>
> in the moment of the crash. The funny thing is, the image is created in
> the ceph pool 'ovirt-default', only gwcli/tcmu-runner can't read it. The
> "/disks" path in gwcli and the "/backstores/user:rbd" path in targetcli are
> always empty.
>
> I haven't gotten past this, can anybody tell me what's wrong?
>
> I tried 2 different tcmu binaries, one self compiled from sources from
> https://github.com/open-iscsi/tcmu-runner/tree/v1.3.0-rc4, the other rpm
> binaries from https://shaman.ceph.com/repos/tcmu-runner/ (ID: 58311). The
> error is the same with both versions.
>
> My setup:
> - CentOS 7.4
> - kernel 3.10.0-693.2.2.el7.x86_64
> - iscsi gw co-located on a ceph OSD node
> - ceph programs from http://download.ceph.com/rpm-luminous
> - python-rtslib-2.1.fb64 installed with "pip install"
> - ceph-iscsi-config-2.3 installed as rpm compiled from
> https://github.com/ceph/ceph-iscsi-config/tree/2.3
> - ceph-iscsi-cli-2.5 installed as rpm from https://github.com/ceph/ceph-i
> scsi-cli/tree/2.5
>
> thx a lot for help
> matthias
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

[1] https://shaman.ceph.com/repos/ceph-iscsi-config/master/

-- 
Jason
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Christian Wuerdig
I'm not a big expert but the OP said he's suspecting bitrot is at
least part of issue in which case you can have the situation where the
drive has ACK'ed the write but a later scrub discovered checksum
errors
Plus you don't need to actually loose a drive to get inconsistent pgs
with size=2 min_size=1 > flapping OSDs (even just temporary) while the
cluster is receiving writes can generate this.

On Fri, Nov 3, 2017 at 12:05 PM, Denes Dolhay  wrote:
> Hi Greg,
>
> Accepting the fact, that an osd with outdated data can never accept write,
> or io of any kind, how is it possible, that the system goes into this state?
>
> -All osds are Bluestore, checksum, mtime etc.
>
> -All osds are up and in
>
> -No hw failures, lost disks, damaged journals or databases etc.
>
> -The data became inconsistent
>
>
> Thanks,
>
> Denke.
>
>
> On 11/02/2017 11:51 PM, Gregory Farnum wrote:
>
>
> On Thu, Nov 2, 2017 at 1:21 AM koukou73gr  wrote:
>>
>> The scenario is actually a bit different, see:
>>
>> Let's assume size=2, min_size=1
>> -We are looking at pg "A" acting [1, 2]
>> -osd 1 goes down
>> -osd 2 accepts a write for pg "A"
>> -osd 2 goes down
>> -osd 1 comes back up, while osd 2 still down
>> -osd 1 has no way to know osd 2 accepted a write in pg "A"
>> -osd 1 accepts a new write to pg "A"
>> -osd 2 comes back up.
>>
>> bang! osd 1 and 2 now have different views of pg "A" but both claim to
>> have current data.
>
>
> In this case, OSD 1 will not accept IO precisely because it can not prove it
> has the current data. That is the basic purpose of OSD peering and holds in
> all cases.
> -Greg
>
>>
>>
>> -K.
>>
>> On 2017-11-01 20:27, Denes Dolhay wrote:
>> > Hello,
>> >
>> > I have a trick question for Mr. Turner's scenario:
>> > Let's assume size=2, min_size=1
>> > -We are looking at pg "A" acting [1, 2]
>> > -osd 1 goes down, OK
>> > -osd 1 comes back up, backfill of pg "A" commences from osd 2 to osd 1,
>> > OK
>> > -osd 2 goes down (and therefore pg "A" 's backfill to osd 1 is
>> > incomplete and stopped) not OK, but this is the case...
>> > --> In this event, why does osd 1 accept IO to pg "A" knowing full well,
>> > that it's data is outdated and will cause an inconsistent state?
>> > Wouldn't it be prudent to deny io to pg "A" until either
>> > -osd 2 comes back (therefore we have a clean osd in the acting group)
>> > ... backfill would continue to osd 1 of course
>> > -or data in pg "A" is manually marked as lost, and then continues
>> > operation from osd 1 's (outdated) copy?
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Andras Pataki
I'm planning to test the newer ceph-fuse tomorrow.  Would it be better 
to stay with the Jewel 10.2.10 client, or would the 12.2.1 Luminous 
client be better (even though the back-end is Jewel for now)?


Andras


On 11/02/2017 05:54 PM, Gregory Farnum wrote:
Have you tested on the new ceph-fuse? This does sound vaguely familiar 
and is an issue I'd generally expect to have the fix backported for, 
once it was identified.


On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki 
> 
wrote:


We've been running into a strange problem with Ceph using
ceph-fuse and
the filesystem. All the back end nodes are on 10.2.10, the fuse
clients
are on 10.2.7.

After some hours of runs, some processes get stuck waiting for
fuse like:

[root@worker1144 ~]# cat /proc/58193/stack
[] wait_answer_interruptible+0x91/0xe0 [fuse]
[] __fuse_request_send+0x253/0x2c0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_send_write+0xd6/0x110 [fuse]
[] fuse_perform_write+0x2f5/0x5a0 [fuse]
[] fuse_file_aio_write+0x2a1/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

The cluster is healthy (all OSDs up, no slow requests, etc.). More
details of my investigation efforts are in the bug report I just
submitted:
http://tracker.ceph.com/issues/22008

It looks like the fuse client is asking for some caps that it never
thinks it receives from the MDS, so the thread waiting for those
caps on
behalf of the writing client never wakes up.  The restart of the MDS
fixes the problem (since ceph-fuse re-negotiates caps).

Any ideas/suggestions?

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-11-02 Thread Nigel Williams
On 3 November 2017 at 07:45, Martin Overgaard Hansen  wrote:
> I want to bring this subject back in the light and hope someone can provide
> insight regarding the issue, thanks.

Thanks Martin, I was going to do the same.

Is it possible to make the DB partition (on the fastest device) too
big? in other words is there a point where for a given set of OSDs
(number + size) the DB partition is sized too large and is wasting
resources. I recall a comment by someone proposing to split up a
single large (fast) SSD into 100GB partitions for each OSD.

The answer could be couched as some intersection of pool type (RBD /
RADOS / CephFS), object change(update?) intensity, size of OSD etc and
rule-of-thumb.

An idea occurred to me that by monitoring for the logged spill message
(the event when the DB partition spills/overflows to the OSD), OSDs
could be (lazily) destroyed and recreated with a new DB partition
increased in size say by 10% each time.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Denes Dolhay

Hi Greg,

Accepting the fact, that an osd with outdated data can never accept 
write, or io of any kind, how is it possible, that the system goes into 
this state?


-All osds are Bluestore, checksum, mtime etc.

-All osds are up and in

-No hw failures, lost disks, damaged journals or databases etc.

-The data became inconsistent


Thanks,

Denke.


On 11/02/2017 11:51 PM, Gregory Farnum wrote:


On Thu, Nov 2, 2017 at 1:21 AM koukou73gr > wrote:


The scenario is actually a bit different, see:

Let's assume size=2, min_size=1
-We are looking at pg "A" acting [1, 2]
-osd 1 goes down
-osd 2 accepts a write for pg "A"
-osd 2 goes down
-osd 1 comes back up, while osd 2 still down
-osd 1 has no way to know osd 2 accepted a write in pg "A"
-osd 1 accepts a new write to pg "A"
-osd 2 comes back up.

bang! osd 1 and 2 now have different views of pg "A" but both claim to
have current data.


In this case, OSD 1 will not accept IO precisely because it can not 
prove it has the current data. That is the basic purpose of OSD 
peering and holds in all cases.

-Greg



-K.

On 2017-11-01 20:27, Denes Dolhay wrote:
> Hello,
>
> I have a trick question for Mr. Turner's scenario:
> Let's assume size=2, min_size=1
> -We are looking at pg "A" acting [1, 2]
> -osd 1 goes down, OK
> -osd 1 comes back up, backfill of pg "A" commences from osd 2 to
osd 1, OK
> -osd 2 goes down (and therefore pg "A" 's backfill to osd 1 is
> incomplete and stopped) not OK, but this is the case...
> --> In this event, why does osd 1 accept IO to pg "A" knowing
full well,
> that it's data is outdated and will cause an inconsistent state?
> Wouldn't it be prudent to deny io to pg "A" until either
> -osd 2 comes back (therefore we have a clean osd in the acting
group)
> ... backfill would continue to osd 1 of course
> -or data in pg "A" is manually marked as lost, and then continues
> operation from osd 1 's (outdated) copy?
___
ceph-users mailing list
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Gregory Farnum
On Thu, Nov 2, 2017 at 1:21 AM koukou73gr  wrote:

> The scenario is actually a bit different, see:
>
> Let's assume size=2, min_size=1
> -We are looking at pg "A" acting [1, 2]
> -osd 1 goes down
> -osd 2 accepts a write for pg "A"
> -osd 2 goes down
> -osd 1 comes back up, while osd 2 still down
> -osd 1 has no way to know osd 2 accepted a write in pg "A"
> -osd 1 accepts a new write to pg "A"
> -osd 2 comes back up.
>
> bang! osd 1 and 2 now have different views of pg "A" but both claim to
> have current data.


In this case, OSD 1 will not accept IO precisely because it can not prove
it has the current data. That is the basic purpose of OSD peering and holds
in all cases.
-Greg


>
> -K.
>
> On 2017-11-01 20:27, Denes Dolhay wrote:
> > Hello,
> >
> > I have a trick question for Mr. Turner's scenario:
> > Let's assume size=2, min_size=1
> > -We are looking at pg "A" acting [1, 2]
> > -osd 1 goes down, OK
> > -osd 1 comes back up, backfill of pg "A" commences from osd 2 to osd 1,
> OK
> > -osd 2 goes down (and therefore pg "A" 's backfill to osd 1 is
> > incomplete and stopped) not OK, but this is the case...
> > --> In this event, why does osd 1 accept IO to pg "A" knowing full well,
> > that it's data is outdated and will cause an inconsistent state?
> > Wouldn't it be prudent to deny io to pg "A" until either
> > -osd 2 comes back (therefore we have a clean osd in the acting group)
> > ... backfill would continue to osd 1 of course
> > -or data in pg "A" is manually marked as lost, and then continues
> > operation from osd 1 's (outdated) copy?
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] UID Restrictions

2017-11-02 Thread Keane Wolter
Awesome!

Thanks much again.

Keane

On Thu, Nov 2, 2017 at 5:23 PM, Douglas Fuller  wrote:

> Hi Keane,
>
> No problem. A fix for the gids bug should go in shortly. See:
> https://github.com/ceph/ceph/pull/18689
>
> Cheers,
> --Doug
>
> On Thu, Nov 2, 2017 at 4:24 PM Keane Wolter  wrote:
>
>> Here we go. removing the trailing slash and adding the gids parameter in
>> auth caps works.
>>
>> [kwolter@um-test03 ~]$ sudo ceph auth get-or-create-key
>> client.kwolter_test1 mon 'allow r' mds 'allow r, allow rw path=/user
>> uid=100026 gids=100026' osd 'allow rw pool=cephfs_osiris, allow rw
>> pool=cephfs_users'
>> 
>> [kwolter@um-test03 ~]$ sudo ceph auth export client.kwolter_test1 >
>> ceph.client.kwolter_test1
>> export auth( with 3 caps)
>> [kwolter@um-test03 ~]$ mv ceph.client.kwolter_test1
>> ceph.client.kwolter_test1.keyring
>> [kwolter@um-test03 ~]$ sudo ceph-fuse --id=kwolter_test1 -k
>> ./ceph.client.kwolter_test1.keyring -r /user/kwolter
>> --client-die-on-failed-remount=false ceph
>> ceph-fuse[3458051]: starting ceph client
>> ceph-fuse[3458051]: starting fuse
>> [kwolter@um-test03 ~]$
>>
>> [kwolter@um-test03 ~]$ touch ceph/test.txt
>> [kwolter@um-test03 ~]$ ls -lt test.txt
>> -rw-rw-r-- 1 kwolter kwolter 0 Nov  2 16:20 test.txt
>> [kwolter@um-test03 ~]$ sudo touch ceph/test2.txt
>> touch: cannot touch ‘ceph/test2.txt’: Permission denied
>> [kwolter@um-test03 ~]$
>>
>> [kwolter@um-test03 ~]$ sudo umount ceph
>> [kwolter@um-test03 ~]$
>>
>> Thank you very much!
>>
>> Keane
>>
>> On Thu, Nov 2, 2017 at 3:51 PM, Douglas Fuller 
>> wrote:
>>
>>> Looks like there may be a bug here.
>>>
>>> Please try:
>>>
>>> * Removing the trailing slash from path= (needs documentation or fixing)
>>> * Adding your gid to a “gids” parameter in the auth caps? (bug: we’re
>>> checking the gid when none is supplied)
>>>
>>> mds “allow r, allow rw path=/user uid=100026 gids=100026”
>>>
>>> Please let me know if that works and I’ll file a bug.
>>>
>>> Thanks,
>>> —Doug
>>>
>>> > On Nov 2, 2017, at 2:48 PM, Keane Wolter  wrote:
>>> >
>>> > Hi Doug,
>>> >
>>> > Here is the output:
>>> > [kwolter@um-test03 ~]$ sudo ceph auth get client.kwolter_test1
>>> > exported keyring for client.kwolter_test1
>>> > [client.kwolter_test1]
>>> > key = 
>>> > caps mds = "allow r, allow rw path=/user/ uid=100026"
>>> > caps mon = "allow r"
>>> > caps osd = "allow rw pool=cephfs_osiris, allow rw
>>> pool=cephfs_users"
>>> > [kwolter@um-test03 ~]$
>>> >
>>> > As for the logs, the only lines I get are about the ceph-fuse being
>>> mounted.
>>> > 2017-11-02 14:45:53.246388 7f72d7a9e040  0 ceph version 12.2.1
>>> () luminous (stable), process (unknown), pid 3454195
>>> > 2017-11-02 14:45:53.247947 7f72d7a9e040  0 pidfile_write: ignore empty
>>> --pid-file
>>> > 2017-11-02 14:45:53.251078 7f72d7a9e040 -1 init, newargv =
>>> 0x55e035f524c0 newargc=9
>>> >
>>> > Thanks,
>>> > Keane
>>> >
>>> >
>>> > On Thu, Nov 2, 2017 at 2:42 PM, Douglas Fuller 
>>> wrote:
>>> > Hi Keane,
>>> >
>>> > Could you include the output of
>>> >
>>> > ceph auth get client.kwolter_test1
>>> >
>>> > Also, please take a look at your MDS log and see if you see an error
>>> from the file access attempt there.
>>> >
>>> > Thanks,
>>> > —Doug
>>> >
>>> > > On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
>>> > >
>>> > > Hi Doug,
>>> > >
>>> > > Here is my current mds line I have for my user: caps: [mds] allow r,
>>> allow rw path=/user/ uid=100026. My results are as follows when I mount:
>>> > > sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring
>>> -r /user/kwolter --client-die-on-failed-remount=false ceph
>>> > > ceph-fuse[3453714]: starting ceph client
>>> > > ceph-fuse[3453714]: starting fuse
>>> > > [kwolter@um-test03 ~]$
>>> > >
>>> > > I then get a permission denied when I try to add anything to the
>>> mount, even though I have matching UIDs:
>>> > > [kwolter@um-test03 ~]$ touch ceph/test.txt
>>> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
>>> > > [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
>>> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
>>> > > [kwolter@um-test03 ~]$
>>> > >
>>> > > Thanks,
>>> > > Keane
>>> > >
>>> > > On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller 
>>> wrote:
>>> > > Hi Keane,
>>> > >
>>> > > path= has to come before uid=
>>> > >
>>> > > mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
>>> > >
>>> > > If that doesn’t work, could you send along a transcript of your
>>> shell session in setting up the ceph user, mounting the file system, and
>>> attempting access?
>>> > >
>>> > > Thanks,
>>> > > —Doug
>>> > >
>>> > > > On Nov 1, 2017, at 2:06 PM, Keane Wolter 
>>> wrote:
>>> > > >
>>> > > > I have ownership of the directory /user/kwolter on the cephFS
>>> server and I am mounting to ~/ceph, which 

Re: [ceph-users] CephFS desync

2017-11-02 Thread Gregory Farnum
On Thu, Nov 2, 2017 at 9:05 AM Andrey Klimentyev <
andrey.kliment...@flant.com> wrote:

> Hi,
>
> we've recently hit a problem in a production cluster. The gist of it is
> that sometimes file will be changed on one machine, but only the "change
> time" would propagate to others. The checksum is different. Contents,
> obviously, differ as well. How can I debug this?
>
> In other words, how would I approach such problem with "stuck files"?
> Haven't found anything on Google or troubleshooting docs.
>

What versions are you running?
The only way I can think of this happening is if one of the clients had
permission to access the CephFS namespace on the MDS, but not to write to
the OSDs which store the file data. Have you checked that the clients all
have the same caps? ("ceph auth list" or one of the related more-specific
commands will let you compare.)
-Greg


>
> --
> Andrey Klimentyev,
> DevOps engineer @ JSC «Flant»
> http://flant.com/ 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Gregory Farnum
Have you tested on the new ceph-fuse? This does sound vaguely familiar and
is an issue I'd generally expect to have the fix backported for, once it
was identified.

On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki 
wrote:

> We've been running into a strange problem with Ceph using ceph-fuse and
> the filesystem. All the back end nodes are on 10.2.10, the fuse clients
> are on 10.2.7.
>
> After some hours of runs, some processes get stuck waiting for fuse like:
>
> [root@worker1144 ~]# cat /proc/58193/stack
> [] wait_answer_interruptible+0x91/0xe0 [fuse]
> [] __fuse_request_send+0x253/0x2c0 [fuse]
> [] fuse_request_send+0x12/0x20 [fuse]
> [] fuse_send_write+0xd6/0x110 [fuse]
> [] fuse_perform_write+0x2f5/0x5a0 [fuse]
> [] fuse_file_aio_write+0x2a1/0x340 [fuse]
> [] do_sync_write+0x8d/0xd0
> [] vfs_write+0xbd/0x1e0
> [] SyS_write+0x7f/0xe0
> [] system_call_fastpath+0x16/0x1b
> [] 0x
>
> The cluster is healthy (all OSDs up, no slow requests, etc.).  More
> details of my investigation efforts are in the bug report I just submitted:
>  http://tracker.ceph.com/issues/22008
>
> It looks like the fuse client is asking for some caps that it never
> thinks it receives from the MDS, so the thread waiting for those caps on
> behalf of the writing client never wakes up.  The restart of the MDS
> fixes the problem (since ceph-fuse re-negotiates caps).
>
> Any ideas/suggestions?
>
> Andras
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] UID Restrictions

2017-11-02 Thread Douglas Fuller
Hi Keane,

No problem. A fix for the gids bug should go in shortly. See:
https://github.com/ceph/ceph/pull/18689

Cheers,
--Doug

On Thu, Nov 2, 2017 at 4:24 PM Keane Wolter  wrote:

> Here we go. removing the trailing slash and adding the gids parameter in
> auth caps works.
>
> [kwolter@um-test03 ~]$ sudo ceph auth get-or-create-key
> client.kwolter_test1 mon 'allow r' mds 'allow r, allow rw path=/user
> uid=100026 gids=100026' osd 'allow rw pool=cephfs_osiris, allow rw
> pool=cephfs_users'
> 
> [kwolter@um-test03 ~]$ sudo ceph auth export client.kwolter_test1 >
> ceph.client.kwolter_test1
> export auth( with 3 caps)
> [kwolter@um-test03 ~]$ mv ceph.client.kwolter_test1
> ceph.client.kwolter_test1.keyring
> [kwolter@um-test03 ~]$ sudo ceph-fuse --id=kwolter_test1 -k
> ./ceph.client.kwolter_test1.keyring -r /user/kwolter
> --client-die-on-failed-remount=false ceph
> ceph-fuse[3458051]: starting ceph client
> ceph-fuse[3458051]: starting fuse
> [kwolter@um-test03 ~]$
>
> [kwolter@um-test03 ~]$ touch ceph/test.txt
> [kwolter@um-test03 ~]$ ls -lt test.txt
> -rw-rw-r-- 1 kwolter kwolter 0 Nov  2 16:20 test.txt
> [kwolter@um-test03 ~]$ sudo touch ceph/test2.txt
> touch: cannot touch ‘ceph/test2.txt’: Permission denied
> [kwolter@um-test03 ~]$
>
> [kwolter@um-test03 ~]$ sudo umount ceph
> [kwolter@um-test03 ~]$
>
> Thank you very much!
>
> Keane
>
> On Thu, Nov 2, 2017 at 3:51 PM, Douglas Fuller  wrote:
>
>> Looks like there may be a bug here.
>>
>> Please try:
>>
>> * Removing the trailing slash from path= (needs documentation or fixing)
>> * Adding your gid to a “gids” parameter in the auth caps? (bug: we’re
>> checking the gid when none is supplied)
>>
>> mds “allow r, allow rw path=/user uid=100026 gids=100026”
>>
>> Please let me know if that works and I’ll file a bug.
>>
>> Thanks,
>> —Doug
>>
>> > On Nov 2, 2017, at 2:48 PM, Keane Wolter  wrote:
>> >
>> > Hi Doug,
>> >
>> > Here is the output:
>> > [kwolter@um-test03 ~]$ sudo ceph auth get client.kwolter_test1
>> > exported keyring for client.kwolter_test1
>> > [client.kwolter_test1]
>> > key = 
>> > caps mds = "allow r, allow rw path=/user/ uid=100026"
>> > caps mon = "allow r"
>> > caps osd = "allow rw pool=cephfs_osiris, allow rw
>> pool=cephfs_users"
>> > [kwolter@um-test03 ~]$
>> >
>> > As for the logs, the only lines I get are about the ceph-fuse being
>> mounted.
>> > 2017-11-02 14:45:53.246388 7f72d7a9e040  0 ceph version 12.2.1
>> () luminous (stable), process (unknown), pid 3454195
>> > 2017-11-02 14:45:53.247947 7f72d7a9e040  0 pidfile_write: ignore empty
>> --pid-file
>> > 2017-11-02 14:45:53.251078 7f72d7a9e040 -1 init, newargv =
>> 0x55e035f524c0 newargc=9
>> >
>> > Thanks,
>> > Keane
>> >
>> >
>> > On Thu, Nov 2, 2017 at 2:42 PM, Douglas Fuller 
>> wrote:
>> > Hi Keane,
>> >
>> > Could you include the output of
>> >
>> > ceph auth get client.kwolter_test1
>> >
>> > Also, please take a look at your MDS log and see if you see an error
>> from the file access attempt there.
>> >
>> > Thanks,
>> > —Doug
>> >
>> > > On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
>> > >
>> > > Hi Doug,
>> > >
>> > > Here is my current mds line I have for my user: caps: [mds] allow r,
>> allow rw path=/user/ uid=100026. My results are as follows when I mount:
>> > > sudo ceph-fuse --id=kwolter_test1 -k
>> ./ceph.client.kwolter_test1.keyring -r /user/kwolter
>> --client-die-on-failed-remount=false ceph
>> > > ceph-fuse[3453714]: starting ceph client
>> > > ceph-fuse[3453714]: starting fuse
>> > > [kwolter@um-test03 ~]$
>> > >
>> > > I then get a permission denied when I try to add anything to the
>> mount, even though I have matching UIDs:
>> > > [kwolter@um-test03 ~]$ touch ceph/test.txt
>> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
>> > > [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
>> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
>> > > [kwolter@um-test03 ~]$
>> > >
>> > > Thanks,
>> > > Keane
>> > >
>> > > On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller 
>> wrote:
>> > > Hi Keane,
>> > >
>> > > path= has to come before uid=
>> > >
>> > > mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
>> > >
>> > > If that doesn’t work, could you send along a transcript of your shell
>> session in setting up the ceph user, mounting the file system, and
>> attempting access?
>> > >
>> > > Thanks,
>> > > —Doug
>> > >
>> > > > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
>> > > >
>> > > > I have ownership of the directory /user/kwolter on the cephFS
>> server and I am mounting to ~/ceph, which I also own.
>> > > >
>> > > > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum 
>> wrote:
>> > > > Which directory do you have ownership of? Keep in mind your local
>> filesystem permissions do not get applied to the remote CephFS mount...
>> > > >

Re: [ceph-users] FAILED assert(p.same_interval_since) and unusable cluster

2017-11-02 Thread Jon Light
I followed the instructions in the Github repo for cloning and setting up
the build environment, checked out the 12.2.0 tag, modified OSD.cc with the
fix, and then tried to build with dpkg-buildpackage. I got the following
error:
"ceph/src/kv/RocksDBStore.cc:593:22: error: ‘perf_context’ is not a member
of ‘rocksdb’"
I guess some changes have been made to RocksDB since 12.2.0?

Am I going about this the right way? Should I just simply recompile the OSD
binary with the fix and then copy it to the nodes in my cluster? What's the
best way to get this fix applied to my current installation?

Thanks

On Wed, Nov 1, 2017 at 11:39 AM, Jon Light  wrote:

> I'm currently running 12.2.0. How should I go about applying the patch?
> Should I upgrade to 12.2.1, apply the changes, and then recompile?
>
> I really appreciate the patch.
> Thanks
>
> On Wed, Nov 1, 2017 at 11:10 AM, David Zafman  wrote:
>
>>
>> Jon,
>>
>> If you are able please test my tentative fix for this issue which is
>> in https://github.com/ceph/ceph/pull/18673
>>
>>
>> Thanks
>>
>> David
>>
>>
>>
>> On 10/30/17 1:13 AM, Jon Light wrote:
>>
>>> Hello,
>>>
>>> I have three OSDs that are crashing on start with a FAILED
>>> assert(p.same_interval_since) error. I ran across a thread from a few
>>> days
>>> ago about the same issue and a ticket was created here:
>>> http://tracker.ceph.com/issues/21833.
>>>
>>> A very overloaded node in my cluster OOM'd many times which eventually
>>> led
>>> to the problematic PGs and then the failed assert.
>>>
>>> I currently have 49 pgs inactive, 33 pgs down, 15 pgs incomplete as well
>>> as
>>> 0.028% of objects unfound. Presumably due to this, I can't add any data
>>> to
>>> the FS or read some data. Just about any IO ends up in a good bit of
>>> stuck
>>> requests.
>>>
>>> Hopefully a fix can come from the issue, but can anyone give me some
>>> suggestions or guidance to get the cluster in a working state in the
>>> meantime?
>>>
>>> Thanks
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bluestore OSD_DATA, WAL & DB

2017-11-02 Thread Martin Overgaard Hansen
Hi, it seems like I’m in the same boat as everyone else in this particular 
thread.

I’m also unable to find any guidelines or recommendations regarding sizing of 
the wal and / or db.

I want to bring this subject back in the light and hope someone can provide 
insight regarding the issue, thanks.

Best Regards,
Martin Overgaard Hansen
MultiHouse IT Partner A/S
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] UID Restrictions

2017-11-02 Thread Keane Wolter
Here we go. removing the trailing slash and adding the gids parameter in
auth caps works.

[kwolter@um-test03 ~]$ sudo ceph auth get-or-create-key
client.kwolter_test1 mon 'allow r' mds 'allow r, allow rw path=/user
uid=100026 gids=100026' osd 'allow rw pool=cephfs_osiris, allow rw
pool=cephfs_users'

[kwolter@um-test03 ~]$ sudo ceph auth export client.kwolter_test1 >
ceph.client.kwolter_test1
export auth( with 3 caps)
[kwolter@um-test03 ~]$ mv ceph.client.kwolter_test1
ceph.client.kwolter_test1.keyring
[kwolter@um-test03 ~]$ sudo ceph-fuse --id=kwolter_test1 -k
./ceph.client.kwolter_test1.keyring -r /user/kwolter
--client-die-on-failed-remount=false ceph
ceph-fuse[3458051]: starting ceph client
ceph-fuse[3458051]: starting fuse
[kwolter@um-test03 ~]$

[kwolter@um-test03 ~]$ touch ceph/test.txt
[kwolter@um-test03 ~]$ ls -lt test.txt
-rw-rw-r-- 1 kwolter kwolter 0 Nov  2 16:20 test.txt
[kwolter@um-test03 ~]$ sudo touch ceph/test2.txt
touch: cannot touch ‘ceph/test2.txt’: Permission denied
[kwolter@um-test03 ~]$

[kwolter@um-test03 ~]$ sudo umount ceph
[kwolter@um-test03 ~]$

Thank you very much!

Keane

On Thu, Nov 2, 2017 at 3:51 PM, Douglas Fuller  wrote:

> Looks like there may be a bug here.
>
> Please try:
>
> * Removing the trailing slash from path= (needs documentation or fixing)
> * Adding your gid to a “gids” parameter in the auth caps? (bug: we’re
> checking the gid when none is supplied)
>
> mds “allow r, allow rw path=/user uid=100026 gids=100026”
>
> Please let me know if that works and I’ll file a bug.
>
> Thanks,
> —Doug
>
> > On Nov 2, 2017, at 2:48 PM, Keane Wolter  wrote:
> >
> > Hi Doug,
> >
> > Here is the output:
> > [kwolter@um-test03 ~]$ sudo ceph auth get client.kwolter_test1
> > exported keyring for client.kwolter_test1
> > [client.kwolter_test1]
> > key = 
> > caps mds = "allow r, allow rw path=/user/ uid=100026"
> > caps mon = "allow r"
> > caps osd = "allow rw pool=cephfs_osiris, allow rw
> pool=cephfs_users"
> > [kwolter@um-test03 ~]$
> >
> > As for the logs, the only lines I get are about the ceph-fuse being
> mounted.
> > 2017-11-02 14:45:53.246388 7f72d7a9e040  0 ceph version 12.2.1
> () luminous (stable), process (unknown), pid 3454195
> > 2017-11-02 14:45:53.247947 7f72d7a9e040  0 pidfile_write: ignore empty
> --pid-file
> > 2017-11-02 14:45:53.251078 7f72d7a9e040 -1 init, newargv =
> 0x55e035f524c0 newargc=9
> >
> > Thanks,
> > Keane
> >
> >
> > On Thu, Nov 2, 2017 at 2:42 PM, Douglas Fuller 
> wrote:
> > Hi Keane,
> >
> > Could you include the output of
> >
> > ceph auth get client.kwolter_test1
> >
> > Also, please take a look at your MDS log and see if you see an error
> from the file access attempt there.
> >
> > Thanks,
> > —Doug
> >
> > > On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
> > >
> > > Hi Doug,
> > >
> > > Here is my current mds line I have for my user: caps: [mds] allow r,
> allow rw path=/user/ uid=100026. My results are as follows when I mount:
> > > sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring
> -r /user/kwolter --client-die-on-failed-remount=false ceph
> > > ceph-fuse[3453714]: starting ceph client
> > > ceph-fuse[3453714]: starting fuse
> > > [kwolter@um-test03 ~]$
> > >
> > > I then get a permission denied when I try to add anything to the
> mount, even though I have matching UIDs:
> > > [kwolter@um-test03 ~]$ touch ceph/test.txt
> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > > [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
> > > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > > [kwolter@um-test03 ~]$
> > >
> > > Thanks,
> > > Keane
> > >
> > > On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller 
> wrote:
> > > Hi Keane,
> > >
> > > path= has to come before uid=
> > >
> > > mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
> > >
> > > If that doesn’t work, could you send along a transcript of your shell
> session in setting up the ceph user, mounting the file system, and
> attempting access?
> > >
> > > Thanks,
> > > —Doug
> > >
> > > > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> > > >
> > > > I have ownership of the directory /user/kwolter on the cephFS server
> and I am mounting to ~/ceph, which I also own.
> > > >
> > > > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum 
> wrote:
> > > > Which directory do you have ownership of? Keep in mind your local
> filesystem permissions do not get applied to the remote CephFS mount...
> > > >
> > > > On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter 
> wrote:
> > > > I am mounting a directory under /user which I am the owner of with
> the permissions of 700. If I remove the uid=100026 option, I have no
> issues. I start having issues as soon as the uid restrictions are in place.
> > > >
> > > > On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum 

Re: [ceph-users] UID Restrictions

2017-11-02 Thread Douglas Fuller
Looks like there may be a bug here.

Please try:

* Removing the trailing slash from path= (needs documentation or fixing)
* Adding your gid to a “gids” parameter in the auth caps? (bug: we’re checking 
the gid when none is supplied)

mds “allow r, allow rw path=/user uid=100026 gids=100026”

Please let me know if that works and I’ll file a bug.

Thanks,
—Doug

> On Nov 2, 2017, at 2:48 PM, Keane Wolter  wrote:
> 
> Hi Doug,
> 
> Here is the output:
> [kwolter@um-test03 ~]$ sudo ceph auth get client.kwolter_test1
> exported keyring for client.kwolter_test1
> [client.kwolter_test1]
> key = 
> caps mds = "allow r, allow rw path=/user/ uid=100026"
> caps mon = "allow r"
> caps osd = "allow rw pool=cephfs_osiris, allow rw pool=cephfs_users"
> [kwolter@um-test03 ~]$ 
> 
> As for the logs, the only lines I get are about the ceph-fuse being mounted.
> 2017-11-02 14:45:53.246388 7f72d7a9e040  0 ceph version 12.2.1 () 
> luminous (stable), process (unknown), pid 3454195
> 2017-11-02 14:45:53.247947 7f72d7a9e040  0 pidfile_write: ignore empty 
> --pid-file
> 2017-11-02 14:45:53.251078 7f72d7a9e040 -1 init, newargv = 0x55e035f524c0 
> newargc=9
> 
> Thanks,
> Keane
> 
> 
> On Thu, Nov 2, 2017 at 2:42 PM, Douglas Fuller  wrote:
> Hi Keane,
> 
> Could you include the output of
> 
> ceph auth get client.kwolter_test1
> 
> Also, please take a look at your MDS log and see if you see an error from the 
> file access attempt there.
> 
> Thanks,
> —Doug
> 
> > On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
> >
> > Hi Doug,
> >
> > Here is my current mds line I have for my user: caps: [mds] allow r, allow 
> > rw path=/user/ uid=100026. My results are as follows when I mount:
> > sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring -r 
> > /user/kwolter --client-die-on-failed-remount=false ceph
> > ceph-fuse[3453714]: starting ceph client
> > ceph-fuse[3453714]: starting fuse
> > [kwolter@um-test03 ~]$
> >
> > I then get a permission denied when I try to add anything to the mount, 
> > even though I have matching UIDs:
> > [kwolter@um-test03 ~]$ touch ceph/test.txt
> > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
> > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > [kwolter@um-test03 ~]$
> >
> > Thanks,
> > Keane
> >
> > On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller  wrote:
> > Hi Keane,
> >
> > path= has to come before uid=
> >
> > mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
> >
> > If that doesn’t work, could you send along a transcript of your shell 
> > session in setting up the ceph user, mounting the file system, and 
> > attempting access?
> >
> > Thanks,
> > —Doug
> >
> > > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> > >
> > > I have ownership of the directory /user/kwolter on the cephFS server and 
> > > I am mounting to ~/ceph, which I also own.
> > >
> > > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum  wrote:
> > > Which directory do you have ownership of? Keep in mind your local 
> > > filesystem permissions do not get applied to the remote CephFS mount...
> > >
> > > On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter  wrote:
> > > I am mounting a directory under /user which I am the owner of with the 
> > > permissions of 700. If I remove the uid=100026 option, I have no issues. 
> > > I start having issues as soon as the uid restrictions are in place.
> > >
> > > On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum  wrote:
> > > Well, obviously UID 100026 needs to have the normal POSIX permissions to 
> > > write to the /user path, which it probably won't until after you've done 
> > > something as root to make it so...
> > >
> > > On Wed, Nov 1, 2017 at 9:57 AM Keane Wolter  wrote:
> > > Acting as UID 100026, I am able to successfully run ceph-fuse and mount 
> > > the filesystem. However, as soon as I try to write a file as UID 100026, 
> > > I get permission denied, but I am able to write to disk as root without 
> > > issue. I am looking for the inverse of this. I want to write changes to 
> > > disk as UID 100026, but not as root. From what I understood in the email 
> > > at 
> > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016173.html,
> > >  I should be able to do so with the following cephx caps set to "caps: 
> > > [mds] allow r, allow rw path=/user uid=100026". Am I wrong with this 
> > > assumption or is there something else at play I am not aware of?
> > >
> > > Thanks,
> > > Keane
> > >
> > > On Wed, Oct 25, 2017 at 5:52 AM, Gregory Farnum  
> > > wrote:
> > >
> > > On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter  wrote:
> > > Hi Gregory,
> > >
> > > I did set the cephx caps for the client to:
> > >
> > > caps: [mds] 

Re: [ceph-users] UID Restrictions

2017-11-02 Thread Keane Wolter
Hi Doug,

Here is the output:
[kwolter@um-test03 ~]$ sudo ceph auth get client.kwolter_test1
exported keyring for client.kwolter_test1
[client.kwolter_test1]
key = 
caps mds = "allow r, allow rw path=/user/ uid=100026"
caps mon = "allow r"
caps osd = "allow rw pool=cephfs_osiris, allow rw pool=cephfs_users"
[kwolter@um-test03 ~]$

As for the logs, the only lines I get are about the ceph-fuse being mounted.
2017-11-02 14:45:53.246388 7f72d7a9e040  0 ceph version 12.2.1 ()
luminous (stable), process (unknown), pid 3454195
2017-11-02 14:45:53.247947 7f72d7a9e040  0 pidfile_write: ignore empty
--pid-file
2017-11-02 14:45:53.251078 7f72d7a9e040 -1 init, newargv = 0x55e035f524c0
newargc=9

Thanks,
Keane


On Thu, Nov 2, 2017 at 2:42 PM, Douglas Fuller  wrote:

> Hi Keane,
>
> Could you include the output of
>
> ceph auth get client.kwolter_test1
>
> Also, please take a look at your MDS log and see if you see an error from
> the file access attempt there.
>
> Thanks,
> —Doug
>
> > On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
> >
> > Hi Doug,
> >
> > Here is my current mds line I have for my user: caps: [mds] allow r,
> allow rw path=/user/ uid=100026. My results are as follows when I mount:
> > sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring
> -r /user/kwolter --client-die-on-failed-remount=false ceph
> > ceph-fuse[3453714]: starting ceph client
> > ceph-fuse[3453714]: starting fuse
> > [kwolter@um-test03 ~]$
> >
> > I then get a permission denied when I try to add anything to the mount,
> even though I have matching UIDs:
> > [kwolter@um-test03 ~]$ touch ceph/test.txt
> > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
> > touch: cannot touch ‘ceph/test.txt’: Permission denied
> > [kwolter@um-test03 ~]$
> >
> > Thanks,
> > Keane
> >
> > On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller 
> wrote:
> > Hi Keane,
> >
> > path= has to come before uid=
> >
> > mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
> >
> > If that doesn’t work, could you send along a transcript of your shell
> session in setting up the ceph user, mounting the file system, and
> attempting access?
> >
> > Thanks,
> > —Doug
> >
> > > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> > >
> > > I have ownership of the directory /user/kwolter on the cephFS server
> and I am mounting to ~/ceph, which I also own.
> > >
> > > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum 
> wrote:
> > > Which directory do you have ownership of? Keep in mind your local
> filesystem permissions do not get applied to the remote CephFS mount...
> > >
> > > On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter 
> wrote:
> > > I am mounting a directory under /user which I am the owner of with the
> permissions of 700. If I remove the uid=100026 option, I have no issues. I
> start having issues as soon as the uid restrictions are in place.
> > >
> > > On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum 
> wrote:
> > > Well, obviously UID 100026 needs to have the normal POSIX permissions
> to write to the /user path, which it probably won't until after you've done
> something as root to make it so...
> > >
> > > On Wed, Nov 1, 2017 at 9:57 AM Keane Wolter  wrote:
> > > Acting as UID 100026, I am able to successfully run ceph-fuse and
> mount the filesystem. However, as soon as I try to write a file as UID
> 100026, I get permission denied, but I am able to write to disk as root
> without issue. I am looking for the inverse of this. I want to write
> changes to disk as UID 100026, but not as root. From what I understood in
> the email at http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-February/016173.html, I should be able to do so with the following
> cephx caps set to "caps: [mds] allow r, allow rw path=/user uid=100026". Am
> I wrong with this assumption or is there something else at play I am not
> aware of?
> > >
> > > Thanks,
> > > Keane
> > >
> > > On Wed, Oct 25, 2017 at 5:52 AM, Gregory Farnum 
> wrote:
> > >
> > > On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter 
> wrote:
> > > Hi Gregory,
> > >
> > > I did set the cephx caps for the client to:
> > >
> > > caps: [mds] allow r, allow rw uid=100026 path=/user, allow rw
> path=/project
> > >
> > > So you’ve got three different permission granting clauses here:
> > > 1) allows the client to read anything
> > > 2) allows the client to act as uid 100026 in the path /user
> > > 3) allows the user to do any read or write (as any user) in path
> /project
> > >
> > >
> > > caps: [mon] allow r
> > > caps: [osd] allow rw pool=cephfs_osiris, allow rw pool=cephfs_users
> > >
> > > Keane
> > >
> > > On Fri, Oct 20, 2017 at 5:35 PM, Gregory Farnum 
> wrote:
> > > What did you actually 

Re: [ceph-users] UID Restrictions

2017-11-02 Thread Douglas Fuller
Hi Keane,

Could you include the output of

ceph auth get client.kwolter_test1

Also, please take a look at your MDS log and see if you see an error from the 
file access attempt there.

Thanks,
—Doug

> On Nov 2, 2017, at 2:24 PM, Keane Wolter  wrote:
> 
> Hi Doug,
> 
> Here is my current mds line I have for my user: caps: [mds] allow r, allow rw 
> path=/user/ uid=100026. My results are as follows when I mount:
> sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring -r 
> /user/kwolter --client-die-on-failed-remount=false ceph
> ceph-fuse[3453714]: starting ceph client
> ceph-fuse[3453714]: starting fuse
> [kwolter@um-test03 ~]$ 
> 
> I then get a permission denied when I try to add anything to the mount, even 
> though I have matching UIDs:
> [kwolter@um-test03 ~]$ touch ceph/test.txt
> touch: cannot touch ‘ceph/test.txt’: Permission denied
> [kwolter@um-test03 ~]$ sudo touch ceph/test.txt
> touch: cannot touch ‘ceph/test.txt’: Permission denied
> [kwolter@um-test03 ~]$ 
> 
> Thanks,
> Keane
> 
> On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller  wrote:
> Hi Keane,
> 
> path= has to come before uid=
> 
> mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
> 
> If that doesn’t work, could you send along a transcript of your shell session 
> in setting up the ceph user, mounting the file system, and attempting access?
> 
> Thanks,
> —Doug
> 
> > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> >
> > I have ownership of the directory /user/kwolter on the cephFS server and I 
> > am mounting to ~/ceph, which I also own.
> >
> > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum  wrote:
> > Which directory do you have ownership of? Keep in mind your local 
> > filesystem permissions do not get applied to the remote CephFS mount...
> >
> > On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter  wrote:
> > I am mounting a directory under /user which I am the owner of with the 
> > permissions of 700. If I remove the uid=100026 option, I have no issues. I 
> > start having issues as soon as the uid restrictions are in place.
> >
> > On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum  wrote:
> > Well, obviously UID 100026 needs to have the normal POSIX permissions to 
> > write to the /user path, which it probably won't until after you've done 
> > something as root to make it so...
> >
> > On Wed, Nov 1, 2017 at 9:57 AM Keane Wolter  wrote:
> > Acting as UID 100026, I am able to successfully run ceph-fuse and mount the 
> > filesystem. However, as soon as I try to write a file as UID 100026, I get 
> > permission denied, but I am able to write to disk as root without issue. I 
> > am looking for the inverse of this. I want to write changes to disk as UID 
> > 100026, but not as root. From what I understood in the email at 
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016173.html,
> >  I should be able to do so with the following cephx caps set to "caps: 
> > [mds] allow r, allow rw path=/user uid=100026". Am I wrong with this 
> > assumption or is there something else at play I am not aware of?
> >
> > Thanks,
> > Keane
> >
> > On Wed, Oct 25, 2017 at 5:52 AM, Gregory Farnum  wrote:
> >
> > On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter  wrote:
> > Hi Gregory,
> >
> > I did set the cephx caps for the client to:
> >
> > caps: [mds] allow r, allow rw uid=100026 path=/user, allow rw path=/project
> >
> > So you’ve got three different permission granting clauses here:
> > 1) allows the client to read anything
> > 2) allows the client to act as uid 100026 in the path /user
> > 3) allows the user to do any read or write (as any user) in path /project
> >
> >
> > caps: [mon] allow r
> > caps: [osd] allow rw pool=cephfs_osiris, allow rw pool=cephfs_users
> >
> > Keane
> >
> > On Fri, Oct 20, 2017 at 5:35 PM, Gregory Farnum  wrote:
> > What did you actually set the cephx caps to for that client?
> >
> > On Fri, Oct 20, 2017 at 8:01 AM Keane Wolter  wrote:
> > Hello all,
> >
> > I am trying to limit what uid/gid a client is allowed to run as (similar to 
> > NFS' root squashing). I have referenced this email,  
> > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016173.html,
> >  with no success.  After generating the keyring, moving it to a client 
> > machine, and mounting the filesystem with ceph-fuse, I am still able to 
> > create files with the UID and GID of root.
> >
> > Is there something I am missing or can do to prevent root from working with 
> > a ceph-fuse mounted filesystem?
> >
> > Thanks,
> > Keane
> > wolt...@umich.edu
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > 

[ceph-users] CephFS: clients hanging on write with ceph-fuse

2017-11-02 Thread Andras Pataki
We've been running into a strange problem with Ceph using ceph-fuse and 
the filesystem. All the back end nodes are on 10.2.10, the fuse clients 
are on 10.2.7.


After some hours of runs, some processes get stuck waiting for fuse like:

[root@worker1144 ~]# cat /proc/58193/stack
[] wait_answer_interruptible+0x91/0xe0 [fuse]
[] __fuse_request_send+0x253/0x2c0 [fuse]
[] fuse_request_send+0x12/0x20 [fuse]
[] fuse_send_write+0xd6/0x110 [fuse]
[] fuse_perform_write+0x2f5/0x5a0 [fuse]
[] fuse_file_aio_write+0x2a1/0x340 [fuse]
[] do_sync_write+0x8d/0xd0
[] vfs_write+0xbd/0x1e0
[] SyS_write+0x7f/0xe0
[] system_call_fastpath+0x16/0x1b
[] 0x

The cluster is healthy (all OSDs up, no slow requests, etc.).  More 
details of my investigation efforts are in the bug report I just submitted:

    http://tracker.ceph.com/issues/22008

It looks like the fuse client is asking for some caps that it never 
thinks it receives from the MDS, so the thread waiting for those caps on 
behalf of the writing client never wakes up.  The restart of the MDS 
fixes the problem (since ceph-fuse re-negotiates caps).


Any ideas/suggestions?

Andras

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph inconsistent pg missing ec object

2017-11-02 Thread Gregory Farnum
Okay, after consulting with a colleague this appears to be an instance of
http://tracker.ceph.com/issues/21382. Assuming the object is one that
doesn't have snapshots, your easiest resolution is to use rados get to
retrieve the object (which, unlike recovery, should work) and then "rados
put" it back in to place.

This fix might be backported to Jewel for a later release, but it's tricky
so wasn't done proactively.
-Greg

On Fri, Oct 20, 2017 at 12:27 AM Stijn De Weirdt 
wrote:

> hi gregory,
>
> we more or less followed the instructions on the site (famous last
> words, i know ;)
>
> grepping for the error in the osd logs of the osds of the pg, the
> primary logs had "5.5e3s0 shard 59(5) missing
> 5:c7ae919b:::10014d3184b.:head"
>
> we looked for the object using the find command, we got
>
> > [root@osd003 ~]# find /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/
> -name "*10014d3184b.*"
> >
> >
> /var/lib/ceph/osd/ceph-35/current/5.5e3s0_head/DIR_3/DIR_E/DIR_5/DIR_7/DIR_9/10014d3184b.__head_D98975E3__5__0
>
> then we ran this find on all 11 osds from the pg, and 10 out of 11 osds
> gave similar path (the suffix _[0-9a] matched the index of the osd in
> the list of osds reported by the pg, so i assumed that was the ec
> splitting up the data in 11 pieces)
>
> on one osd in the list of osds, there was no such object (the 6th one,
> index 5, so more assuming form our side that this was the 5 in 5:...
> from the logfile). so we assumed this was the missing object that the
> error reported. we have absolutely no clue why it was missing or what
> happened, nothing in any logs.
>
> what we did then was stop the osd that had the missing object, flush the
> journal and start the osd and ran repair. (the guide mentioned to delete
> an object, we did not delete anything, because we assumed the issue was
> the already missing object from the 6th osd)
>
> flushing the journal segfaulted, but the osd started fine again.
>
> the scrub errors did not disappear, so we did the same again on the
> primary (no deleting of anything; and again, the flush segfaulted).
>
> wrt the segfault, i attached the output of a segfaulting flush with
> debug on another osd.
>
>
> stijn
>
>
> On 10/20/2017 02:56 AM, Gregory Farnum wrote:
> > Okay, you're going to need to explain in very clear terms exactly what
> > happened to your cluster, and *exactly* what operations you performed
> > manually.
> >
> > The PG shards seem to have different views of the PG in question. The
> > primary has a different log_tail, last_user_version, and last_epoch_clean
> > from the others. Plus different log sizes? It's not making a ton of sense
> > at first glance.
> > -Greg
> >
> > On Thu, Oct 19, 2017 at 1:08 AM Stijn De Weirdt  >
> > wrote:
> >
> >> hi greg,
> >>
> >> i attached the gzip output of the query and some more info below. if you
> >> need more, let me know.
> >>
> >> stijn
> >>
> >>> [root@mds01 ~]# ceph -s
> >>> cluster 92beef0a-1239-4000-bacf-4453ab630e47
> >>>  health HEALTH_ERR
> >>> 1 pgs inconsistent
> >>> 40 requests are blocked > 512 sec
> >>> 1 scrub errors
> >>> mds0: Behind on trimming (2793/30)
> >>>  monmap e1: 3 mons at {mds01=
> >> 1.2.3.4:6789/0,mds02=1.2.3.5:6789/0,mds03=1.2.3.6:6789/0}
> >>> election epoch 326, quorum 0,1,2 mds01,mds02,mds03
> >>>   fsmap e238677: 1/1/1 up {0=mds02=up:active}, 2 up:standby
> >>>  osdmap e79554: 156 osds: 156 up, 156 in
> >>> flags sortbitwise,require_jewel_osds
> >>>   pgmap v51003893: 4096 pgs, 3 pools, 387 TB data, 243 Mobjects
> >>> 545 TB used, 329 TB / 874 TB avail
> >>> 4091 active+clean
> >>>4 active+clean+scrubbing+deep
> >>>1 active+clean+inconsistent
> >>>   client io 284 kB/s rd, 146 MB/s wr, 145 op/s rd, 177 op/s wr
> >>>   cache io 115 MB/s flush, 153 MB/s evict, 14 op/s promote, 3 PG(s)
> >> flushing
> >>
> >>> [root@mds01 ~]# ceph health detail
> >>> HEALTH_ERR 1 pgs inconsistent; 52 requests are blocked > 512 sec; 5
> osds
> >> have slow requests; 1 scrub errors; mds0: Behind on trimming (2782/30)
> >>> pg 5.5e3 is active+clean+inconsistent, acting
> >> [35,50,91,18,139,59,124,40,104,12,71]
> >>> 34 ops are blocked > 524.288 sec on osd.8
> >>> 6 ops are blocked > 524.288 sec on osd.67
> >>> 6 ops are blocked > 524.288 sec on osd.27
> >>> 1 ops are blocked > 524.288 sec on osd.107
> >>> 5 ops are blocked > 524.288 sec on osd.116
> >>> 5 osds have slow requests
> >>> 1 scrub errors
> >>> mds0: Behind on trimming (2782/30)(max_segments: 30, num_segments:
> 2782)
> >>
> >>> # zgrep -C 1 ERR ceph-osd.35.log.*.gz
> >>> ceph-osd.35.log.5.gz:2017-10-14 11:25:52.260668 7f34d6748700  0 --
> >> 10.141.16.13:6801/1001792 >> 1.2.3.11:6803/1951 pipe(0x56412da80800
> >> sd=273 :6801 s=2 pgs=3176 cs=31 l=0 c=0x564156e83b00).fault with
> 

Re: [ceph-users] UID Restrictions

2017-11-02 Thread Keane Wolter
Hi Doug,

Here is my current mds line I have for my user: caps: [mds] allow r, allow
rw path=/user/ uid=100026. My results are as follows when I mount:
sudo ceph-fuse --id=kwolter_test1 -k ./ceph.client.kwolter_test1.keyring -r
/user/kwolter --client-die-on-failed-remount=false ceph
ceph-fuse[3453714]: starting ceph client
ceph-fuse[3453714]: starting fuse
[kwolter@um-test03 ~]$

I then get a permission denied when I try to add anything to the mount,
even though I have matching UIDs:
[kwolter@um-test03 ~]$ touch ceph/test.txt
touch: cannot touch ‘ceph/test.txt’: Permission denied
[kwolter@um-test03 ~]$ sudo touch ceph/test.txt
touch: cannot touch ‘ceph/test.txt’: Permission denied
[kwolter@um-test03 ~]$

Thanks,
Keane

On Thu, Nov 2, 2017 at 1:15 PM, Douglas Fuller  wrote:

> Hi Keane,
>
> path= has to come before uid=
>
> mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"
>
> If that doesn’t work, could you send along a transcript of your shell
> session in setting up the ceph user, mounting the file system, and
> attempting access?
>
> Thanks,
> —Doug
>
> > On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> >
> > I have ownership of the directory /user/kwolter on the cephFS server and
> I am mounting to ~/ceph, which I also own.
> >
> > On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum 
> wrote:
> > Which directory do you have ownership of? Keep in mind your local
> filesystem permissions do not get applied to the remote CephFS mount...
> >
> > On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter  wrote:
> > I am mounting a directory under /user which I am the owner of with the
> permissions of 700. If I remove the uid=100026 option, I have no issues. I
> start having issues as soon as the uid restrictions are in place.
> >
> > On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum 
> wrote:
> > Well, obviously UID 100026 needs to have the normal POSIX permissions to
> write to the /user path, which it probably won't until after you've done
> something as root to make it so...
> >
> > On Wed, Nov 1, 2017 at 9:57 AM Keane Wolter  wrote:
> > Acting as UID 100026, I am able to successfully run ceph-fuse and mount
> the filesystem. However, as soon as I try to write a file as UID 100026, I
> get permission denied, but I am able to write to disk as root without
> issue. I am looking for the inverse of this. I want to write changes to
> disk as UID 100026, but not as root. From what I understood in the email at
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-February/016173.html, I should be able to do so with the following
> cephx caps set to "caps: [mds] allow r, allow rw path=/user uid=100026". Am
> I wrong with this assumption or is there something else at play I am not
> aware of?
> >
> > Thanks,
> > Keane
> >
> > On Wed, Oct 25, 2017 at 5:52 AM, Gregory Farnum 
> wrote:
> >
> > On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter  wrote:
> > Hi Gregory,
> >
> > I did set the cephx caps for the client to:
> >
> > caps: [mds] allow r, allow rw uid=100026 path=/user, allow rw
> path=/project
> >
> > So you’ve got three different permission granting clauses here:
> > 1) allows the client to read anything
> > 2) allows the client to act as uid 100026 in the path /user
> > 3) allows the user to do any read or write (as any user) in path /project
> >
> >
> > caps: [mon] allow r
> > caps: [osd] allow rw pool=cephfs_osiris, allow rw pool=cephfs_users
> >
> > Keane
> >
> > On Fri, Oct 20, 2017 at 5:35 PM, Gregory Farnum 
> wrote:
> > What did you actually set the cephx caps to for that client?
> >
> > On Fri, Oct 20, 2017 at 8:01 AM Keane Wolter  wrote:
> > Hello all,
> >
> > I am trying to limit what uid/gid a client is allowed to run as (similar
> to NFS' root squashing). I have referenced this email,
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/
> 2017-February/016173.html, with no success.  After generating the
> keyring, moving it to a client machine, and mounting the filesystem with
> ceph-fuse, I am still able to create files with the UID and GID of root.
> >
> > Is there something I am missing or can do to prevent root from working
> with a ceph-fuse mounted filesystem?
> >
> > Thanks,
> > Keane
> > wolt...@umich.edu
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] UID Restrictions

2017-11-02 Thread Douglas Fuller
Hi Keane,

path= has to come before uid=

mds “allow r, allow rw path=/user uid=100026, allow rw path=/project"

If that doesn’t work, could you send along a transcript of your shell session 
in setting up the ceph user, mounting the file system, and attempting access?

Thanks,
—Doug

> On Nov 1, 2017, at 2:06 PM, Keane Wolter  wrote:
> 
> I have ownership of the directory /user/kwolter on the cephFS server and I am 
> mounting to ~/ceph, which I also own.
> 
> On Wed, Nov 1, 2017 at 2:04 PM, Gregory Farnum  wrote:
> Which directory do you have ownership of? Keep in mind your local filesystem 
> permissions do not get applied to the remote CephFS mount...
> 
> On Wed, Nov 1, 2017 at 11:03 AM Keane Wolter  wrote:
> I am mounting a directory under /user which I am the owner of with the 
> permissions of 700. If I remove the uid=100026 option, I have no issues. I 
> start having issues as soon as the uid restrictions are in place.
> 
> On Wed, Nov 1, 2017 at 1:05 PM, Gregory Farnum  wrote:
> Well, obviously UID 100026 needs to have the normal POSIX permissions to 
> write to the /user path, which it probably won't until after you've done 
> something as root to make it so...
> 
> On Wed, Nov 1, 2017 at 9:57 AM Keane Wolter  wrote:
> Acting as UID 100026, I am able to successfully run ceph-fuse and mount the 
> filesystem. However, as soon as I try to write a file as UID 100026, I get 
> permission denied, but I am able to write to disk as root without issue. I am 
> looking for the inverse of this. I want to write changes to disk as UID 
> 100026, but not as root. From what I understood in the email at 
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016173.html,
>  I should be able to do so with the following cephx caps set to "caps: [mds] 
> allow r, allow rw path=/user uid=100026". Am I wrong with this assumption or 
> is there something else at play I am not aware of?
> 
> Thanks,
> Keane
> 
> On Wed, Oct 25, 2017 at 5:52 AM, Gregory Farnum  wrote:
> 
> On Mon, Oct 23, 2017 at 5:03 PM Keane Wolter  wrote:
> Hi Gregory,
> 
> I did set the cephx caps for the client to:
> 
> caps: [mds] allow r, allow rw uid=100026 path=/user, allow rw path=/project
> 
> So you’ve got three different permission granting clauses here:
> 1) allows the client to read anything
> 2) allows the client to act as uid 100026 in the path /user
> 3) allows the user to do any read or write (as any user) in path /project
> 
> 
> caps: [mon] allow r
> caps: [osd] allow rw pool=cephfs_osiris, allow rw pool=cephfs_users
> 
> Keane
> 
> On Fri, Oct 20, 2017 at 5:35 PM, Gregory Farnum  wrote:
> What did you actually set the cephx caps to for that client?
> 
> On Fri, Oct 20, 2017 at 8:01 AM Keane Wolter  wrote:
> Hello all,
> 
> I am trying to limit what uid/gid a client is allowed to run as (similar to 
> NFS' root squashing). I have referenced this email,  
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016173.html,
>  with no success.  After generating the keyring, moving it to a client 
> machine, and mounting the filesystem with ceph-fuse, I am still able to 
> create files with the UID and GID of root.
> 
> Is there something I am missing or can do to prevent root from working with a 
> ceph-fuse mounted filesystem?
> 
> Thanks,
> Keane
> wolt...@umich.edu
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Deleting large pools

2017-11-02 Thread Gregory Farnum
Deletion is throttled, though I don’t know the configs to change it you
could poke around if you want stuff to go faster.

Don’t just remove the directory in the filesystem; you need to clean up the
leveldb metadata as well. ;)
Removing the pg via Ceph-objectstore-tool would work fine but I’ve seen too
many people kill the wrong thing to recommend it.
-Greg
On Thu, Nov 2, 2017 at 9:40 AM David Turner  wrote:

> Jewel 10.2.7; XFS formatted OSDs; no dmcrypt or LVM.  I have a pool that I
> deleted 16 hours ago that accounted for about 70% of the available space on
> each OSD (averaging 84% full), 370M objects in 8k PGs, ec 4+2 profile.
> Based on the rate that the OSDs are freeing up space after deleting the
> pool, it will take about a week to finish deleting the PGs from the OSDs.
>
> Is there anything I can do to speed this process up?  I feel like there
> may be a way for me to go through the OSDs and delete the PG folders either
> with the objectstore tool or while the OSD is offline.  I'm not sure what
> Ceph is doing to delete the pool, but I don't think that an `rm -Rf` of the
> PG folder would take nearly this long.
>
> Thank you all for your help.
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Deleting large pools

2017-11-02 Thread David Turner
Jewel 10.2.7; XFS formatted OSDs; no dmcrypt or LVM.  I have a pool that I
deleted 16 hours ago that accounted for about 70% of the available space on
each OSD (averaging 84% full), 370M objects in 8k PGs, ec 4+2 profile.
Based on the rate that the OSDs are freeing up space after deleting the
pool, it will take about a week to finish deleting the PGs from the OSDs.

Is there anything I can do to speed this process up?  I feel like there may
be a way for me to go through the OSDs and delete the PG folders either
with the objectstore tool or while the OSD is offline.  I'm not sure what
Ceph is doing to delete the pool, but I don't think that an `rm -Rf` of the
PG folder would take nearly this long.

Thank you all for your help.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] CephFS desync

2017-11-02 Thread Andrey Klimentyev
Hi,

we've recently hit a problem in a production cluster. The gist of it is
that sometimes file will be changed on one machine, but only the "change
time" would propagate to others. The checksum is different. Contents,
obviously, differ as well. How can I debug this?

In other words, how would I approach such problem with "stuck files"?
Haven't found anything on Google or troubleshooting docs.

-- 
Andrey Klimentyev,
DevOps engineer @ JSC «Flant»
http://flant.com/ 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] iSCSI: tcmu-runner can't open images?

2017-11-02 Thread Heðin Ejdesgaard Møller
Hello Matthias,

We encountered a similar issue, it turned out to be because we used another 
pool then rbd with gwcli.

We got it fixed and it should be in a pull request upstream.

/Heðin

- Original Message -
From: "Matthias Leopold" 
To: ceph-users@lists.ceph.com
Sent: Thursday, 2 November, 2017 15:34:45
Subject: [ceph-users] iSCSI: tcmu-runner can't open images?

Hi,

i'm trying to set up iSCSI gateways for a Ceph luminous cluster using 
these instructions:
http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/

When arriving at step "Configuring: Adding a RADOS Block Device (RBD)" 
things start to get messy: there is no "disks" entry in my target path, 
so i can't "cd 
/iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:/disks/". 
When i try to create a disk in the top level "/disks" path ("/disks> 
create pool=ovirt-default image=itest04 size=50g") gwcli crashes with 
"ValueError: No JSON object could be decoded" (there is more output when 
using debug but i don't think it matters). More interesting is 
/var/log/tcmu-runner.log, it says consistently

[DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2.
[DEBUG] dev_added:768 rbd/ovirt-default.itest04: Got block_size 512, 
size in bytes 53687091200
[DEBUG] tcmu_rbd_open:581 rbd/ovirt-default.itest04: tcmu_rbd_open 
config rbd/ovirt-default/itest04/osd_op_timeout=30 block size 512 num 
lbas 104857600.
[DEBUG] timer_check_and_set_def:234 rbd/ovirt-default.itest04: The 
cluster's default osd op timeout(30.00), osd heartbeat grace(20) 
interval(6)
[DEBUG] timer_check_and_set_def:242 rbd/ovirt-default.itest04: The osd 
op timeout will remain the default value: 30.00
[ERROR] tcmu_rbd_image_open:318 rbd/ovirt-default.itest04: Could not 
open image itest04/osd_op_timeout=30. (Err -2)
[ERROR] add_device:496: handler open failed for uio0

in the moment of the crash. The funny thing is, the image is created in 
the ceph pool 'ovirt-default', only gwcli/tcmu-runner can't read it. The 
"/disks" path in gwcli and the "/backstores/user:rbd" path in targetcli 
are always empty.

I haven't gotten past this, can anybody tell me what's wrong?

I tried 2 different tcmu binaries, one self compiled from sources from 
https://github.com/open-iscsi/tcmu-runner/tree/v1.3.0-rc4, the other rpm 
binaries from https://shaman.ceph.com/repos/tcmu-runner/ (ID: 58311). 
The error is the same with both versions.

My setup:
- CentOS 7.4
- kernel 3.10.0-693.2.2.el7.x86_64
- iscsi gw co-located on a ceph OSD node
- ceph programs from http://download.ceph.com/rpm-luminous
- python-rtslib-2.1.fb64 installed with "pip install"
- ceph-iscsi-config-2.3 installed as rpm compiled from 
https://github.com/ceph/ceph-iscsi-config/tree/2.3
- ceph-iscsi-cli-2.5 installed as rpm from 
https://github.com/ceph/ceph-iscsi-cli/tree/2.5

thx a lot for help
matthias
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] iSCSI: tcmu-runner can't open images?

2017-11-02 Thread Matthias Leopold

Hi,

i'm trying to set up iSCSI gateways for a Ceph luminous cluster using 
these instructions:

http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/

When arriving at step "Configuring: Adding a RADOS Block Device (RBD)" 
things start to get messy: there is no "disks" entry in my target path, 
so i can't "cd 
/iscsi-target/iqn.2003-01.com.redhat.iscsi-gw:/disks/". 
When i try to create a disk in the top level "/disks" path ("/disks> 
create pool=ovirt-default image=itest04 size=50g") gwcli crashes with 
"ValueError: No JSON object could be decoded" (there is more output when 
using debug but i don't think it matters). More interesting is 
/var/log/tcmu-runner.log, it says consistently


[DEBUG] handle_netlink:207: cmd 1. Got header version 2. Supported 2.
[DEBUG] dev_added:768 rbd/ovirt-default.itest04: Got block_size 512, 
size in bytes 53687091200
[DEBUG] tcmu_rbd_open:581 rbd/ovirt-default.itest04: tcmu_rbd_open 
config rbd/ovirt-default/itest04/osd_op_timeout=30 block size 512 num 
lbas 104857600.
[DEBUG] timer_check_and_set_def:234 rbd/ovirt-default.itest04: The 
cluster's default osd op timeout(30.00), osd heartbeat grace(20) 
interval(6)
[DEBUG] timer_check_and_set_def:242 rbd/ovirt-default.itest04: The osd 
op timeout will remain the default value: 30.00
[ERROR] tcmu_rbd_image_open:318 rbd/ovirt-default.itest04: Could not 
open image itest04/osd_op_timeout=30. (Err -2)

[ERROR] add_device:496: handler open failed for uio0

in the moment of the crash. The funny thing is, the image is created in 
the ceph pool 'ovirt-default', only gwcli/tcmu-runner can't read it. The 
"/disks" path in gwcli and the "/backstores/user:rbd" path in targetcli 
are always empty.


I haven't gotten past this, can anybody tell me what's wrong?

I tried 2 different tcmu binaries, one self compiled from sources from 
https://github.com/open-iscsi/tcmu-runner/tree/v1.3.0-rc4, the other rpm 
binaries from https://shaman.ceph.com/repos/tcmu-runner/ (ID: 58311). 
The error is the same with both versions.


My setup:
- CentOS 7.4
- kernel 3.10.0-693.2.2.el7.x86_64
- iscsi gw co-located on a ceph OSD node
- ceph programs from http://download.ceph.com/rpm-luminous
- python-rtslib-2.1.fb64 installed with "pip install"
- ceph-iscsi-config-2.3 installed as rpm compiled from 
https://github.com/ceph/ceph-iscsi-config/tree/2.3
- ceph-iscsi-cli-2.5 installed as rpm from 
https://github.com/ceph/ceph-iscsi-cli/tree/2.5


thx a lot for help
matthias
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph versions not showing RGW

2017-11-02 Thread John Spray
On Thu, Nov 2, 2017 at 1:54 PM, Hans van den Bogert
 wrote:
> Just to get this really straight, Jewel OSDs do send this metadata?
> Otherwise I'm probably mistaken that I ever saw 10.2.x versions in the
> output.

RGW daemons only started sending metadata in Luminous.  OSD/mon/MDS
daemons already sent metadata in Jewel.

John

>
> Thanks,
>
> Hans
>
> On 2 Nov 2017 12:31 PM, "John Spray"  wrote:
>>
>> On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert
>>  wrote:
>> > Hi all,
>> >
>> > During our upgrade from Jewel to Luminous I saw the following behaviour,
>> > if
>> > my memory serves me right:
>> >
>> > When upgrading for example monitors and OSDs, we saw that the `ceph
>> > versions` command correctly showed at one that some OSDs were still on
>> > Jewel
>> > (10.2.x) and some were already upgraded and thus showed a version of
>> > 12.2.0.
>> > All as expected -- however, for the RGWs, those only showed up until
>> > they
>> > were upgraded to luminous. So the command gives a false sense of a
>> > complete
>> > overview of the Ceph cluster, e.g., in my case this resulted that I
>> > forgot
>> > about 4 out of 6 RGW instances which were still on Jewel.
>> >
>> > What are the semantics of the `ceph versions` ? -- Was I wrong in
>> > expecting
>> > that Jewel RGWs should show up there?
>>
>> RGW daemons reporting metadata to the mon/mgr daemons is a new
>> features in luminous -- older RGW daemons are effectively invisible to
>> us, so Jewel daemons will not show up at all in the versions output.
>>
>> John
>>
>> >
>> > Thanks,
>> >
>> > Hans
>> >
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Just to get this really straight, Jewel OSDs do send this metadata?
Otherwise I'm probably mistaken that I ever saw 10.2.x versions in the
output.

Thanks,

Hans

On 2 Nov 2017 12:31 PM, "John Spray"  wrote:

> On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert
>  wrote:
> > Hi all,
> >
> > During our upgrade from Jewel to Luminous I saw the following behaviour,
> if
> > my memory serves me right:
> >
> > When upgrading for example monitors and OSDs, we saw that the `ceph
> > versions` command correctly showed at one that some OSDs were still on
> Jewel
> > (10.2.x) and some were already upgraded and thus showed a version of
> 12.2.0.
> > All as expected -- however, for the RGWs, those only showed up until they
> > were upgraded to luminous. So the command gives a false sense of a
> complete
> > overview of the Ceph cluster, e.g., in my case this resulted that I
> forgot
> > about 4 out of 6 RGW instances which were still on Jewel.
> >
> > What are the semantics of the `ceph versions` ? -- Was I wrong in
> expecting
> > that Jewel RGWs should show up there?
>
> RGW daemons reporting metadata to the mon/mgr daemons is a new
> features in luminous -- older RGW daemons are effectively invisible to
> us, so Jewel daemons will not show up at all in the versions output.
>
> John
>
> >
> > Thanks,
> >
> > Hans
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How would ec profile effect performance?

2017-11-02 Thread shadow_lin
Hi all,
I am wondering how ec profile would effect ceph performance?
Will ec profile k=10,m=2 perform better than k=8,m=2 since there would be more 
chunk to wirte and read concurrently?
Will ec profile k=10,m=2 perform need more memory and cpu power than ec profile 
k=8,m=2?


2017-11-02



lin.yunfan___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph versions not showing RGW

2017-11-02 Thread John Spray
On Thu, Nov 2, 2017 at 11:16 AM, Hans van den Bogert
 wrote:
> Hi all,
>
> During our upgrade from Jewel to Luminous I saw the following behaviour, if
> my memory serves me right:
>
> When upgrading for example monitors and OSDs, we saw that the `ceph
> versions` command correctly showed at one that some OSDs were still on Jewel
> (10.2.x) and some were already upgraded and thus showed a version of 12.2.0.
> All as expected -- however, for the RGWs, those only showed up until they
> were upgraded to luminous. So the command gives a false sense of a complete
> overview of the Ceph cluster, e.g., in my case this resulted that I forgot
> about 4 out of 6 RGW instances which were still on Jewel.
>
> What are the semantics of the `ceph versions` ? -- Was I wrong in expecting
> that Jewel RGWs should show up there?

RGW daemons reporting metadata to the mon/mgr daemons is a new
features in luminous -- older RGW daemons are effectively invisible to
us, so Jewel daemons will not show up at all in the versions output.

John

>
> Thanks,
>
> Hans
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph versions not showing RGW

2017-11-02 Thread Hans van den Bogert
Hi all,

During our upgrade from Jewel to Luminous I saw the following behaviour, if
my memory serves me right:

When upgrading for example monitors and OSDs, we saw that the `ceph
versions` command correctly showed at one that some OSDs were still on
Jewel (10.2.x) and some were already upgraded and thus showed a version of
12.2.0.
All as expected -- however, for the RGWs, those only showed up until they
were upgraded to luminous. So the command gives a false sense of a complete
overview of the Ceph cluster, e.g., in my case this resulted that I forgot
about 4 out of 6 RGW instances which were still on Jewel.

What are the semantics of the `ceph versions` ? -- Was I wrong in expecting
that Jewel RGWs should show up there?

Thanks,

Hans
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert
Never mind, I should’ve read the whole thread first.
> On Nov 2, 2017, at 10:50 AM, Hans van den Bogert  wrote:
> 
> 
>> On Nov 1, 2017, at 4:45 PM, David Turner > > wrote:
>> 
>> All it takes for data loss is that an osd on server 1 is marked down and a 
>> write happens to an osd on server 2.  Now the osd on server 2 goes down 
>> before the osd on server 1 has finished backfilling and the first osd 
>> receives a request to modify data in the object that it doesn't know the 
>> current state of.  Tada, you have data loss.
> 
> I’m probably misunderstanding, but if a osd on server 1 is backfilling, and 
> its only candidate to backfill from is an osd on server 2, and the latter 
> goes down; then wouldn’t the osd on server 1 block, i.e., not accept requests 
> to modify, until server 1 comes up again?
> Or is there a ‘hole' here somewhere where server 1 *thinks* it’s done 
> backfilling whereas the osdmap it used to backfill with was out of date?
> 
> Thanks, 
> 
> Hans

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread Hans van den Bogert

> On Nov 1, 2017, at 4:45 PM, David Turner  wrote:
> 
> All it takes for data loss is that an osd on server 1 is marked down and a 
> write happens to an osd on server 2.  Now the osd on server 2 goes down 
> before the osd on server 1 has finished backfilling and the first osd 
> receives a request to modify data in the object that it doesn't know the 
> current state of.  Tada, you have data loss.

I’m probably misunderstanding, but if a osd on server 1 is backfilling, and its 
only candidate to backfill from is an osd on server 2, and the latter goes 
down; then wouldn’t the osd on server 1 block, i.e., not accept requests to 
modify, until server 1 comes up again?
Or is there a ‘hole' here somewhere where server 1 *thinks* it’s done 
backfilling whereas the osdmap it used to backfill with was out of date?

Thanks, 

Hans___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] PGs inconsistent, do I fear data loss?

2017-11-02 Thread koukou73gr
The scenario is actually a bit different, see:

Let's assume size=2, min_size=1
-We are looking at pg "A" acting [1, 2]
-osd 1 goes down
-osd 2 accepts a write for pg "A"
-osd 2 goes down
-osd 1 comes back up, while osd 2 still down
-osd 1 has no way to know osd 2 accepted a write in pg "A"
-osd 1 accepts a new write to pg "A"
-osd 2 comes back up.

bang! osd 1 and 2 now have different views of pg "A" but both claim to
have current data.

-K.

On 2017-11-01 20:27, Denes Dolhay wrote:
> Hello,
> 
> I have a trick question for Mr. Turner's scenario:
> Let's assume size=2, min_size=1
> -We are looking at pg "A" acting [1, 2]
> -osd 1 goes down, OK
> -osd 1 comes back up, backfill of pg "A" commences from osd 2 to osd 1, OK
> -osd 2 goes down (and therefore pg "A" 's backfill to osd 1 is
> incomplete and stopped) not OK, but this is the case...
> --> In this event, why does osd 1 accept IO to pg "A" knowing full well,
> that it's data is outdated and will cause an inconsistent state?
> Wouldn't it be prudent to deny io to pg "A" until either
> -osd 2 comes back (therefore we have a clean osd in the acting group)
> ... backfill would continue to osd 1 of course
> -or data in pg "A" is manually marked as lost, and then continues
> operation from osd 1 's (outdated) copy?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] 回复: 回复: Re: [luminous]OSD memory usage increase when writing^J a lot of data to cluster

2017-11-02 Thread shadow_lin
Hi Sage,
I did some more test and found this:
I use  ceph tell osd.6 heap stats to found that
osd.6 tcmalloc heap stats:
MALLOC:  404608432 (  385.9 MiB) Bytes in use by application
MALLOC: + 26599424 (   25.4 MiB) Bytes in page heap freelist
MALLOC: + 13442496 (   12.8 MiB) Bytes in central cache freelist
MALLOC: + 21112288 (   20.1 MiB) Bytes in transfer cache freelist
MALLOC: + 21702320 (   20.7 MiB) Bytes in thread cache freelists
MALLOC: +  3021024 (2.9 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =490485984 (  467.8 MiB) Actual memory used (physical + swap)
MALLOC: +162922496 (  155.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =653408480 (  623.1 MiB) Virtual address space used
MALLOC:
MALLOC:  12958  Spans in use
MALLOC: 32  Thread heaps in use
MALLOC:   8192  Tcmalloc page size


and the page heap won't release by osd itself and keep increasing,but if i use 
"ceph tell osd.6 heap release" to manual release it then the page heap freelist 
is released.

osd.6 tcmalloc heap stats:
MALLOC:  404608432 (  385.9 MiB) Bytes in use by application
MALLOC: + 26599424 (   25.4 MiB) Bytes in page heap freelist
MALLOC: + 13442496 (   12.8 MiB) Bytes in central cache freelist
MALLOC: + 21112288 (   20.1 MiB) Bytes in transfer cache freelist
MALLOC: + 21702320 (   20.7 MiB) Bytes in thread cache freelists
MALLOC: +  3021024 (2.9 MiB) Bytes in malloc metadata
MALLOC:   
MALLOC: =490485984 (  467.8 MiB) Actual memory used (physical + swap)
MALLOC: +162922496 (  155.4 MiB) Bytes released to OS (aka unmapped)
MALLOC:   
MALLOC: =653408480 (  623.1 MiB) Virtual address space used
MALLOC:
MALLOC:  12958  Spans in use
MALLOC: 32  Thread heaps in use
MALLOC:   8192  Tcmalloc page si


i found this problem was discussed  before at 
http://tracker.ceph.com/issues/12681, is it a tcmalloc problem?


2017-11-02 


lin.yunfan



发件人:Sage Weil 
发送时间:2017-11-01 20:11
主题:Re: 回复: Re: [ceph-users] [luminous]OSD memory usage increase when writing^J 
a lot of data to cluster
收件人:"shadow_lin"
抄送:"ceph-users"

On Wed, 1 Nov 2017, shadow_lin wrote: 
> Hi Sage, 
> We have tried compiled the latest ceph source code from github. 
> The build is ceph version 12.2.1-249-g42172a4 
> (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable). 
> The memory problem seems better but the memory usage of osd is still keep 
> increasing as more data are wrote into the rbd image and the memory usage 
> won't drop after the write is stopped. 
>Could you specify from which commit the memeory bug is fixed? 

f60a942023088cbba53a816e6ef846994921cab3 and the prior 2 commits. 

If you look at 'cpeh daemon osd.nnn dump_mempools' you can see three 
bluestore pools.  This is what bluestore is using to account for its usage  
so it can know when to trim its cache.  Do those add up to the  
bluestore_cache_size - 512m (for rocskdb) that you have configured? 

sage 


> Thanks 
> 2017-11-01 
>  
>  
> body {font-size:10.5pt; font-family:微软雅黑,serif} lin.yunfan 
>  
>  
>   发件人:Sage Weil  
> 发送时间:2017-10-24 20:03 
> 主题:Re: [ceph-users] [luminous]OSD memory usage increase when 
> writing a lot of data to cluster 
> 收件人:"shadow_lin" 
> 抄送:"ceph-users" 
>   
> On Tue, 24 Oct 2017, shadow_lin wrote:  
> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body  
> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,  
> > The cluster has 24 osd with 24 8TB hdd.  
> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the memor 
> y  
> > is below the remmanded value, but this osd server is an ARM  server so I 
?? ?> can't do anything to add more ram.  
> > I created a replicated(2 rep) pool and an 20TB image and mounted to the t 
> est  
> > server with xfs fs.   
> >
> > I have set the ceph.conf to this(according to other related post suggeste 
> d):  
> > [osd]  
> > bluestore_cache_size = 104857600  
> > bluestore_cache_size_hdd = 104857600  
> > bluestore_cache_size_ssd = 104857600  
> > bluestore_cache_kv_max = 103809024  
> >
> >  osd map cache size = 20  
> > osd map max advance = 10  
> > osd map share max epochs = 10  
> > osd pg epoch persisted max stale = 10  
> > The bluestore cache setting did improve the situation,but if i try to wri 
> te  
> > 1TB data by dd command(dd if=/dev/zero 

Re: [ceph-users] [SUSPECTED SPAM] Ceph RDB with iSCSI multipath

2017-11-02 Thread jorpilo
Hey
Right now multipath is not supported.There is an issue whenClients send a write 
to AA blocks the writeThe client timeout so it send the same write to BB writes 
The client send another write to BB writesA unlocks and overwrite the second B 
write with old information
 It can end up corrupting data so right now only Active/Passive is supported 
until a solution is found.
Here you have some info in old 
emailshttp://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021437.html
 Mensaje original De: GiangCoi Mr  Fecha: 
1/11/17  10:59 a. m.  (GMT+01:00) Para: ceph-users  
Asunto: [SUSPECTED SPAM][ceph-users] Ceph RDB with iSCSI multipath 
Hi all.

I'm configuring Ceph RDB to expose iSCSI gateway. I am using 3 Ceph-node 
(CentOS 7.4 + Ceph Luminous). I want to configure iSCSI gateway on 3 Ceph-node 
for Windows server 2016 connect to Multipath iSCSI. How I can configure. Please 
help me to configure it. Thanks
Regards,Giang

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com