Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Gregory Farnum
I don't know anything about the BlueStore code, but given the snippets
you've posted this appears to be a debug thing that doesn't expect to be
invoked (or perhaps only in an unexpected case that it's trying hard to
recover from). Have you checked where the dump() function is invoked from?
I'd imagine it's something about having to try extra-hard to allocate free
space or something.
-Greg

On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander  wrote:

>
>
> On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> > Hi,
> >
> > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> > seeing OSDs writing heavily to their logfiles spitting out these lines:
> >
> >
> > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2078000~34000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd22cc000~24000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd230~2
> > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2324000~24000
> > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd26c~24000
> > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> > dump  0x15cd2704000~3
> >
> > It goes so fast that the OS-disk in this case can't keep up and become
> > 100% util.
> >
> > This causes the OSD to slow down and cause slow requests and starts to
> flap.
> >
>
> I've set 'log_file' to /dev/null for now, but that doesn't solve it
> either. Randomly OSDs just start spitting out slow requests and have
> these issues.
>
> Any suggestions on how to fix this?
>
> Wido
>
> > It seems that this is *only* happening on OSDs which are the fullest
> > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> > that's high).
> >
> > Looking at StupidAllocator.cc I see this piece of code:
> >
> > void StupidAllocator::dump()
> > {
> >   std::lock_guard l(lock);
> >   for (unsigned bin = 0; bin < free.size(); ++bin) {
> > ldout(cct, 0) << __func__ << " free bin " << bin << ": "
> >   << free[bin].num_intervals() << " extents" << dendl;
> > for (auto p = free[bin].begin();
> >  p != free[bin].end();
> >  ++p) {
> >   ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> > << "~"
> > << p.get_len() << std::dec << dendl;
> > }
> >   }
> > }
> >
> > I'm just wondering why it would spit out these lines and what's causing
> it.
> >
> > Has anybody seen this before?
> >
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] fixing another remapped+incomplete EC 4+2 pg

2018-10-15 Thread Gregory Farnum
On Thu, Oct 11, 2018 at 3:22 PM Graham Allan  wrote:

> As the osd crash implies, setting "nobackfill" appears to let all the
> osds keep running and the pg stays active and can apparently serve data.
>
> If I track down the object referenced below in the object store, I can
> download it without error via s3... though as I can't generate a
> matching etag, it may well be corrupt.
>
> Still I do wonder if deleting this object - either via s3, or maybe more
> likely directly within filestore, might permit backfill to continue.
>

Yes, that is very likely! (...unless there are a bunch of other objects
with the same issue.)

I'm not immediately familiar with the crash asserts you're seeing, but it
certainly looks like somehow the object data didn't quite get stored
correctly as the metadata understands it. Perhaps a write got lost/missed
on m+1 of the PG shards, setting the osd_find_best_info_ignore_history_les
caused it to try and recover from what it had rather than following normal
recovery procedures, and now it's not working.
-Greg


>
> Of other objects referenced in "last_backfill" for each osd in the pg,
> both also download via s3, but one I have seen implicated in similar
> crash logs for otehr OSDs and the etag again does not match; the other I
> have not seen in crash logs and does generate a matching etag.
>
> Opened a tracker issue for this: http://tracker.ceph.com/issues/36411
>
> Graham
>
> On 10/09/2018 06:55 PM, Graham Allan wrote:
> >
> > On 10/09/2018 01:14 PM, Graham Allan wrote:
> >>
> >> On 10/9/2018 12:19 PM, Gregory Farnum wrote:
> >>>
> >>> I think unfortunately the easiest thing for you to fix this will be
> >>> to set the min_size back to 4 until the PG is recovered (or at least
> >>> has 5 shards done). This will be fixed in a later version of Ceph and
> >>> probably backported, but sadly it's not done yet.
> >>> -Greg
> >
> >> Thanks Greg, though sadly I've tried that; whatever I do, one of the 4
> >> osds involved will simply crash (not just the ones I previously tried
> >> to re-import via ceph-objectstore-tool). I just spend time chasing
> >> them around but never succeeding in having a complete set run long
> >> enough to make progress. They seem to crash when starting backfill on
> >> the next object. There has to be something in the current set of
> >> shards which it can't handle.
> >>
> >> Since then I've been focusing on trying to get the pg to revert to an
> >> earlier interval using osd_find_best_info_ignore_history_les, though
> >> the information I find around it is minimal.
> >
> > Since my experiments with osd_find_best_info_ignore_history_les have not
> > borne any results, I'm looking again at the osd crashes when I get
> > enough of them running for backfill to start.
> >
> > They all crash in the same way; with "debug osd=10", the very last bit
> is:
> >
> >> -2> 2018-10-09 16:27:25.425004 7faa866bd700 10 osd.190 pg_epoch:
> >> 710808 pg[70.82ds2( v 710799'704745 (586066'698574,710799'704745]
> >> local-lis/les=710807/710808 n=102929 ec=21494/21494 lis/c
> >> 710807/588565 les/c/f 710808/588566/0 710711/710807/710807)
> >> [820,761,105,789,562,485]/[2147483647 <(214)%20748-3647>,2147483647
> <(214)%20748-3647>,190,448,61,315]p190(2) r=2
> >> lpr=710807 pi=[588565,710807)/39 rops=1
> >> bft=105(2),485(5),562(4),761(1),789(3),820(0) crt=710799'70
> >> 4745 lcod 0'0 mlcod 0'0
> >> active+undersized+degraded+remapped+backfilling] continue_recovery_op:
> >> continuing
> >>
> RecoveryOp(hoid=70:b415ca12:::default.10341087.1__shadow_Subjects%2fsub-NDARINV6BVVAY29%2fses-baselineYear1Arm1%2ffunc%2frun-04_AROMA%2fdenoised_func_data_nonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16_3:head
>
> >> v=579167'695462 missing_on=105(2),485(5),562(4),761(1),789(3),820(0)
> >> missing_on_shards=0,1,2,3,4,5 recovery_inf
> >>
> o=ObjectRecoveryInfo(70:b415ca12:::default.10341087.1__shadow_Subjects%2fsub-NDARINV6BVVAY29%2fses-baselineYear1Arm1%2ffunc%2frun-04_AROMA%2fdenoised_func_data_nonaggr.nii.2~VfI6rAYnU4XtzUcFAoecJDU5TpVR8AP.16_3:head@579167'695462,
>
> >> size: 0, copy_subset: [], clone_subset: {}, snapset: 0=[]:[])
> >> recovery_progress=ObjectRecoveryProgress(first, data_recovered_to:0,
> >> data_complete:false, omap_recovered_to:, omap_complete:true, error
> >> :false) obc refcount=3 state=READING waiting_on_pushes=
> >> extent_requested=0,8388608)
> >> -1> 2018-10-09 16:27:25.425105 7faa866bd700 10 osd.190 pg_epoch:
> >> 710808 pg[70.82ds2( v 710799'704745 (586066'698574,710799'704745]
> >> local-lis/les=710807/710808 n=102929 ec=21494/21494 lis/c
> >> 710807/588565 les/c/f 710808/588566/0 710711/710807/710807)
> >> [820,761,105,789,562,485]/[2147483647 <(214)%20748-3647>,2147483647
> <(214)%20748-3647>,190,448,61,315]p190(2) r=2
> >> lpr=710807 pi=[588565,710807)/39 rops=1
> >> bft=105(2),485(5),562(4),761(1),789(3),820(0) crt=710799'70
> >> 4745 lcod 0'0 mlcod 0'0
> >> active+undersized+degraded+remapped+backfilling] continue_recovery_op:
> >> 

[ceph-users] warning: fast-diff map is invalid operation may be slow; object map invalid

2018-10-15 Thread Anthony D'Atri

We turned on all the RBD v2 features while running Jewel; since then all 
clusters have been updated to Luminous 12.2.2 and additional clusters added 
that have never run Jewel.

Today I find that a few percent of volumes in each cluster have issues, 
examples below.

I'm concerned that these issues may present problems when using rbd-mirror to 
move volumes between clusters.  Many instances involve heads or nodes of 
snapshot trees; it's possible but unverified that those not currently 
snap-related may have been in the past.

In the Jewel days we retroactively applied fast-diff, object-map to existing 
volumes but did not bother with tombstones.

Any thoughts on

1) How this happens?
2) Is rbd object-map rebuild"  always safe, especially on volumes that are in 
active use?
3) The disturbing messages spewed by `rbd ls` -- related or not?
4) Would this as I fear confound successful rbd-mirror migration?

I've found 
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-August/012137.html 
 
that *seems* to indicate that a live rebuild is safe,but I'm still uncertain 
about the root cause, and if it's still happening.  I've never ventured into 
this dark corner before so I'm being careful.

All clients are QEMU/libvirt; most are 12.2.2 but there are some lingering 
Jewel, most likely 10.2.6 or perhaps 10.2.3.  Eg:


# ceph features
{
"mon": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 5
}
},
"osd": {
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 983
}
},
"client": {
"group": {
"features": "0x7fddff8ee84bffb",
"release": "jewel",
"num": 15
},
"group": {
"features": "0x1ffddff8eea4fffb",
"release": "luminous",
"num": 3352
}
}
}


# rbd ls  -l |wc
2018-10-05 20:55:17.397288 7f976cff9700 -1 librbd::image::RefreshParentRequest: 
failed to locate snapshot: Snapshot with this id not found
2018-10-05 20:55:17.397334 7f976cff9700 -1 librbd::image::RefreshRequest: 
failed to refresh parent image: (2) No such file or directory
2018-10-05 20:55:17.397397 7f976cff9700 -1 librbd::image::OpenRequest: failed 
to refresh image: (2) No such file or directory
2018-10-05 20:55:17.398025 7f976cff9700 -1 librbd::io::AioCompletion: 
0x7f978667b570 fail: (2) No such file or directory
2018-10-05 20:55:17.398075 7f976cff9700 -1 librbd::image::RefreshParentRequest: 
failed to locate snapshot: Snapshot with this id not found
2018-10-05 20:55:17.398079 7f976cff9700 -1 librbd::image::RefreshRequest: 
failed to refresh parent image: (2) No such file or directory
2018-10-05 20:55:17.398096 7f976cff9700 -1 librbd::image::OpenRequest: failed 
to refresh image: (2) No such file or directory
2018-10-05 20:55:17.398659 7f976cff9700 -1 librbd::io::AioCompletion: 
0x7f978660c240 fail: (2) No such file or directory
2018-10-05 20:55:30.416174 7f976cff9700 -1 librbd::io::AioCompletion: 
0x7f9786cd5ee0 fail: (2) No such file or directory
2018-10-05 20:55:34.083188 7f976d7fa700 -1 librbd::object_map::RefreshRequest: 
failed to load object map: rbd_object_map.b18d634146825.2d8f
2018-10-05 20:55:34.084101 7f976cff9700 -1 
librbd::object_map::InvalidateRequest: 0x7f97544d11e0 should_complete: r=0
2018-10-05 20:55:38.597014 7f976d7fa700 -1 librbd::image::OpenRequest: failed 
to retreive immutable metadata: (2) No such file or directory
2018-10-05 20:55:38.597109 7f976cff9700 -1 librbd::io::AioCompletion: 
0x7f9786d3a7c0 fail: (2) No such file or directory
2018-10-05 20:55:51.584101 7f976d7fa700 -1 librbd::object_map::RefreshRequest: 
failed to load object map: rbd_object_map.c447c403109b2.6a04
2018-10-05 20:55:51.592616 7f976cff9700 -1 
librbd::object_map::InvalidateRequest: 0x7f975409fee0 should_complete: r=0
2018-10-05 20:55:59.414229 7f976d7fa700 -1 librbd::image::OpenRequest: failed 
to retreive immutable metadata: (2) No such file or directory
2018-10-05 20:55:59.414321 7f976cff9700 -1 librbd::io::AioCompletion: 
0x7f9786df0760 fail: (2) No such file or directory
2018-10-05 20:56:09.029179 7f976d7fa700 -1 librbd::object_map::RefreshRequest: 
failed to load object map: rbd_object_map.9b28e148b97af.6a09
2018-10-05 20:56:09.035212 7f976cff9700 -1 
librbd::object_map::InvalidateRequest: 0x7f9754644030 should_complete: r=0
2018-10-05 20:56:09.036087 7f976d7fa700 -1 librbd::object_map::RefreshRequest: 
failed to load object map: rbd_object_map.9b28e148b97af.6a0a
2018-10-05 20:56:09.042200 7f976cff9700 -1 
librbd::object_map::InvalidateRequest: 0x7f97541d2c10 should_complete: r=0
   6544   22993 1380784

# rbd du
warning: fast-diff map is invalid for 

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander
Hi,

On 10/15/2018 10:43 PM, Igor Fedotov wrote:
> Hi Wido,
> 
> once you apply the PR you'll probably see the initial error in the log
> that triggers the dump. Which is most probably the lack of space
> reported by _balance_bluefs_freespace() function. If so this means that
> BlueFS rebalance is unable to allocate contiguous 1M chunk at main
> device to gift to BlueFS. I.e. your main device space is very fragmented.
> 
> Unfortunately I don't know any ways to recover from this state but OSD
> redeployment or data removal.
> 

We are moving data away from these OSDs. Lucky us that we have HDD OSDs
in there as well, moving a lot of data there.

How would re-deployment work? Just wipe the OSDs and bring them back
into the cluster again? That would be a very painful task.. :-(

> Upcoming PR that brings an ability for offline BlueFS volume
> manipulation (https://github.com/ceph/ceph/pull/23103) will probably
> help to recover from this issue in future by migrating BlueFS data to a
> new larger DB volume. (targeted for Nautilus, not sure about backporting
> to Mimic or Luminous).
> 
> For now I can suggest the only preventive mean to avoid the case - have
> large enough space at your standalone DB volume. So that master device
> isn't used for DB at all or as minimum as possible. Hence no rebalance
> is needed and no fragmentation is present.
> 

I see, but these are SSD-only OSDs.

> BTW wondering if you have one for your OSDs? How large if so?
> 

The cluster consists out of 96 OSDs with Samsung PM863a 1.92TB OSDs.

The fullest OSD currently is 78% full, which is 348GiB free on the
1.75TiB device.

Does this information help?

Thanks!

Wido

> Everything above is "IMO", some chances that I missed something..
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/15/2018 10:12 PM, Wido den Hollander wrote:
>>
>> On 10/15/2018 08:23 PM, Gregory Farnum wrote:
>>> I don't know anything about the BlueStore code, but given the snippets
>>> you've posted this appears to be a debug thing that doesn't expect to be
>>> invoked (or perhaps only in an unexpected case that it's trying hard to
>>> recover from). Have you checked where the dump() function is invoked
>>> from? I'd imagine it's something about having to try extra-hard to
>>> allocate free space or something.
>> It seems BlueFS that is having a hard time finding free space.
>>
>> I'm trying this PR now: https://github.com/ceph/ceph/pull/24543
>>
>> It will stop the spamming, but that's not the root cause. The OSDs in
>> this case are at max 80% full and they do have a lot of OMAP (RGW
>> indexes) in them, but that's all.
>>
>> I'm however not sure why this is happening suddenly in this cluster.
>>
>> Wido
>>
>>> -Greg
>>>
>>> On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander >> > wrote:
>>>
>>>
>>>
>>>  On 10/11/2018 12:08 AM, Wido den Hollander wrote:
>>>  > Hi,
>>>  >
>>>  > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and
>>> 12.2.8 I'm
>>>  > seeing OSDs writing heavily to their logfiles spitting out these
>>>  lines:
>>>  >
>>>  >
>>>  > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd2078000~34000
>>>  > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd22cc000~24000
>>>  > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd230~2
>>>  > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd2324000~24000
>>>  > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd26c~24000
>>>  > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
>>>  0x0x55828ae047d0
>>>  > dump  0x15cd2704000~3
>>>  >
>>>  > It goes so fast that the OS-disk in this case can't keep up
>>> and become
>>>  > 100% util.
>>>  >
>>>  > This causes the OSD to slow down and cause slow requests and
>>>  starts to flap.
>>>  >
>>>
>>>  I've set 'log_file' to /dev/null for now, but that doesn't solve it
>>>  either. Randomly OSDs just start spitting out slow requests and
>>> have
>>>  these issues.
>>>
>>>  Any suggestions on how to fix this?
>>>
>>>  Wido
>>>
>>>  > It seems that this is *only* happening on OSDs which are the
>>> fullest
>>>  > (~85%) on this cluster and they have about ~400 PGs each (Yes,
>>> I know,
>>>  > that's high).
>>>  >
>>>  > Looking at StupidAllocator.cc I see this piece of code:
>>>  >
>>>  > void StupidAllocator::dump()
>>>  > {
>>>  >   std::lock_guard l(lock);
>>>  >   for (unsigned bin = 0; bin < free.size(); ++bin) {
>>>  >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
>>>  >                   << free[bin].num_intervals() << " extents"
>>> << dendl;
>>>  >     for (auto p = 

Re: [ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread Kisik Jeong
I attached osd & fs dumps. There are two pools (cephfs_data,
cephfs_metadata) for CephFS clearly. And this system's network is 40Gbps
ethernet for public & cluster. So I don't think the network speed would be
problem. Thank you.

2018년 10월 16일 (화) 오전 1:18, John Spray 님이 작성:

> On Mon, Oct 15, 2018 at 4:24 PM Kisik Jeong 
> wrote:
> >
> > Thank you for your reply, John.
> >
> > I  restarted my Ceph cluster and captured the mds logs.
> >
> > I found that mds shows slow request because some OSDs are laggy.
> >
> > I followed the ceph mds troubleshooting with 'mds slow request', but
> there is no operation in flight:
> >
> > root@hpc1:~/iodc# ceph daemon mds.hpc1 dump_ops_in_flight
> > {
> > "ops": [],
> > "num_ops": 0
> > }
> >
> > Is there any other reason that mds shows slow request? Thank you.
>
> Those stuck requests seem to be stuck because they're targeting pools
> that don't exist.  Has something strange happened in the history of
> this cluster that might have left a filesystem referencing pools that
> no longer exist?  Ceph is not supposed to permit removal of pools in
> use by CephFS, but perhaps something went wrong.
>
> Check out the "ceph osd dump --format=json-pretty" and "ceph fs dump
> --format=json-pretty" outputs and how the pool ID's relate.  According
> to those logs, data pool with ID 1 and metadata pool with ID 2 do not
> exist.
>
> John
>
> > -Kisik
> >
> > 2018년 10월 15일 (월) 오후 11:43, John Spray 님이 작성:
> >>
> >> On Mon, Oct 15, 2018 at 3:34 PM Kisik Jeong 
> wrote:
> >> >
> >> > Hello,
> >> >
> >> > I successfully deployed Ceph cluster with 16 OSDs and created CephFS
> before.
> >> > But after rebooting due to mds slow request problem, when creating
> CephFS, Ceph mds goes creating status and never changes.
> >> > Seeing Ceph status, there is no other problem I think. Here is 'ceph
> -s' result:
> >>
> >> That's pretty strange.  Usually if an MDS is stuck in "creating", it's
> >> because an OSD operation is stuck, but in your case all your PGs are
> >> healthy.
> >>
> >> I would suggest setting "debug mds=20" and "debug objecter=10" on your
> >> MDS, restarting it and capturing those logs so that we can see where
> >> it got stuck.
> >>
> >> John
> >>
> >> > csl@hpc1:~$ ceph -s
> >> >   cluster:
> >> > id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> >> > health: HEALTH_OK
> >> >
> >> >   services:
> >> > mon: 1 daemons, quorum hpc1
> >> > mgr: hpc1(active)
> >> > mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
> >> > osd: 16 osds: 16 up, 16 in
> >> >
> >> >   data:
> >> > pools:   2 pools, 640 pgs
> >> > objects: 7 objects, 124B
> >> > usage:   34.3GiB used, 116TiB / 116TiB avail
> >> > pgs: 640 active+clean
> >> >
> >> > However, CephFS still works in case of 8 OSDs.
> >> >
> >> > If there is any doubt of this phenomenon, please let me know. Thank
> you.
> >> >
> >> > PS. I attached my ceph.conf contents:
> >> >
> >> > [global]
> >> > fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> >> > mon_initial_members = hpc1
> >> > mon_host = 192.168.40.10
> >> > auth_cluster_required = cephx
> >> > auth_service_required = cephx
> >> > auth_client_required = cephx
> >> >
> >> > public_network = 192.168.40.0/24
> >> > cluster_network = 192.168.40.0/24
> >> >
> >> > [osd]
> >> > osd journal size = 1024
> >> > osd max object name len = 256
> >> > osd max object namespace len = 64
> >> > osd mount options f2fs = active_logs=2
> >> >
> >> > [osd.0]
> >> > host = hpc9
> >> > public_addr = 192.168.40.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.1]
> >> > host = hpc10
> >> > public_addr = 192.168.40.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.2]
> >> > host = hpc9
> >> > public_addr = 192.168.40.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.3]
> >> > host = hpc10
> >> > public_addr = 192.168.40.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.4]
> >> > host = hpc9
> >> > public_addr = 192.168.40.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.5]
> >> > host = hpc10
> >> > public_addr = 192.168.40.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.6]
> >> > host = hpc9
> >> > public_addr = 192.168.40.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.7]
> >> > host = hpc10
> >> > public_addr = 192.168.40.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.8]
> >> > host = hpc9
> >> > public_addr = 192.168.40.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.9]
> >> > host = hpc10
> >> > public_addr = 192.168.40.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.10]
> >> > host = hpc9
> >> > public_addr = 192.168.10.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.11]
> >> > host = hpc10
> >> > public_addr = 192.168.10.19
> >> > cluster_addr = 192.168.40.19
> >> >
> >> > [osd.12]
> >> > host = hpc9
> >> > public_addr = 192.168.10.18
> >> > cluster_addr = 192.168.40.18
> >> >
> >> > [osd.13]
> >> > host = hpc10
> >> > public_addr = 192.168.10.19
> >> > 

Re: [ceph-users] cephfs kernel client blocks when removing large files

2018-10-15 Thread Gregory Farnum
On Tue, Oct 9, 2018 at 10:57 PM Dylan McCulloch  wrote:

> Hi Greg,
>
> Nowhere in your test procedure do you mention syncing or flushing the
> files to disk. That is almost certainly the cause of the slowness
>
> We have tested performing sync after file creation and the delay still
> occurs. (See Test3 results below)
>
> To clarify, it appears the delay is observed only when ls is performed on
> the same directory in which the files were removed, provided the files have
> been recently cached.
>
> e.g. rm -f /mnt/cephfs_mountpoint/file*; ls /mnt/cephfs_mountpoint
>
> the client which wrote the data is required to flush it out before
> dropping enough file "capabilities" for the other client to do the rm.
>
> Our tests are performed on the same host.
>
> In Test1 the rm and ls are performed by the same client id. And for other
> tests in which an unmount & remount were performed, I would assume the
> unmount would cause that particular client id to terminate and drop any
> caps.
>
> Do you still believe held caps are contributing to slowness in these test
> scenarios?
>

Hmm, perhaps not. Or at least not in that way.

These tests are interesting; I'm not quite sure what might be going on
here, but I think I'll have to let one of our more dedicated kernel CephFS
people look at it, sorry.
-Greg


>
> We’ve added 3 additional test cases below.
>
> Test 3) Sync write (delay observed when writing files and syncing)
>
> Test 4) Bypass cache (no delay observed when files are not written to
> cache)
>
> Test 5) Read test (delay observed when removing files that have been read
> recently in to cache)
>
> Test3: Sync Write - File creation, with sync after write.
>
> 1) unmount & remount:
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576;done
>
> 3) sync
>
> 4) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m8.765s
>
> user0m0.001s
>
> sys 0m0.000s
>
>
> Test4: Bypass cache - File creation, with nocache options for dd.
>
> 1) unmount & remount:
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576 oflag=nocache,sync iflag=nocache;done
>
> 3) sync
>
> 4) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m0.003s
>
> user0m0.000s
>
> sys 0m0.001s
>
>
> Test5: Read test - Read files into empty page cache, before deletion.
>
> 1) unmount & remount
>
> 2) Add 5 x 100GB files to a directory:
>
> for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576;done
>
> 3) sync
>
> 4) unmount & remount #empty cache
>
> 5) read files (to add back to cache)
>
> for i in {1..5};do cat /mnt/cephfs_mountpoint/file$i.txt > /dev/null; done
>
> 6) Delete all files in directory:
>
> for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done
>
> 5) Immediately perform ls on directory:
>
> time ls /mnt/cephfs_mountpoint
>
> real0m8.723s
>
> user0m0.000s
>
> sys 0m0.001s
>
> Regards,
>
> Dylan
> --
> *From:* Gregory Farnum 
> *Sent:* Wednesday, October 10, 2018 4:37:49 AM
> *To:* Dylan McCulloch
> *Cc:* ceph-users@lists.ceph.com
> *Subject:* Re: [ceph-users] cephfs kernel client blocks when removing
> large files
>
> Nowhere in your test procedure do you mention syncing or flushing the
> files to disk. That is almost certainly the cause of the slowness — the
> client which wrote the data is required to flush it out before dropping
> enough file "capabilities" for the other client to do the rm.
> -Greg
>
> On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch 
> wrote:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hi all, We have identified some unexpected blocking behaviour by the
> ceph-fs kernel client. When performing 'rm' on large files (100+GB), there
> appears to be a significant delay of 10 seconds or more, before a 'stat'
> operation can be performed on the same directory on the filesystem. Looking
> at the kernel client's mds inflight-ops, we observe that there are pending
> UNLINK operations corresponding to the deleted files. We have noted some
> correlation between files being in the client page cache and the blocking
> behaviour. For example, if the cache is dropped or the filesystem remounted
> the blocking will not occur. Test scenario below: /mnt/cephfs_mountpoint
> type ceph
> (rw,relatime,name=ceph_filesystem,secret=,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> Test1: 1) unmount & remount: 2) Add 10 x 100GB files to a directory: for i
> in {1..10}; do dd if=/dev/zero 

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander


On 10/15/2018 08:23 PM, Gregory Farnum wrote:
> I don't know anything about the BlueStore code, but given the snippets
> you've posted this appears to be a debug thing that doesn't expect to be
> invoked (or perhaps only in an unexpected case that it's trying hard to
> recover from). Have you checked where the dump() function is invoked
> from? I'd imagine it's something about having to try extra-hard to
> allocate free space or something.

It seems BlueFS that is having a hard time finding free space.

I'm trying this PR now: https://github.com/ceph/ceph/pull/24543

It will stop the spamming, but that's not the root cause. The OSDs in
this case are at max 80% full and they do have a lot of OMAP (RGW
indexes) in them, but that's all.

I'm however not sure why this is happening suddenly in this cluster.

Wido

> -Greg
> 
> On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander  > wrote:
> 
> 
> 
> On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> > Hi,
> >
> > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> > seeing OSDs writing heavily to their logfiles spitting out these
> lines:
> >
> >
> > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd2078000~34000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd22cc000~24000
> > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd230~2
> > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd2324000~24000
> > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd26c~24000
> > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
> 0x0x55828ae047d0
> > dump  0x15cd2704000~3
> >
> > It goes so fast that the OS-disk in this case can't keep up and become
> > 100% util.
> >
> > This causes the OSD to slow down and cause slow requests and
> starts to flap.
> >
> 
> I've set 'log_file' to /dev/null for now, but that doesn't solve it
> either. Randomly OSDs just start spitting out slow requests and have
> these issues.
> 
> Any suggestions on how to fix this?
> 
> Wido
> 
> > It seems that this is *only* happening on OSDs which are the fullest
> > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> > that's high).
> >
> > Looking at StupidAllocator.cc I see this piece of code:
> >
> > void StupidAllocator::dump()
> > {
> >   std::lock_guard l(lock);
> >   for (unsigned bin = 0; bin < free.size(); ++bin) {
> >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
> >                   << free[bin].num_intervals() << " extents" << dendl;
> >     for (auto p = free[bin].begin();
> >          p != free[bin].end();
> >          ++p) {
> >       ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> > << "~"
> >                     << p.get_len() << std::dec << dendl;
> >     }
> >   }
> > }
> >
> > I'm just wondering why it would spit out these lines and what's
> causing it.
> >
> > Has anybody seen this before?
> >
> > Wido
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com 
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph client libraries for OSX

2018-10-15 Thread Gregory Farnum
This is really cool! I can mostly parse what the tap is doing, and it's
good to see somebody managed to programmatically define the build
dependencies since that's always been an issue for people on OS X.

On Mon, Oct 15, 2018 at 7:43 AM Christopher Blum 
wrote:

> Hi folks,
>
> Just wanted to announce that with the help of Kefu, I was able to create a
> working tap for ceph client libraries and binaries for the OSX platform.
> Currently, we only test the tap on High-Sierra and Mojave.
>
> This was mostly built so that people can use go-ceph on their Macs without
> VM, but I'm happy if it helps you in other ways as well!
>
> Everything you need to try it out is available here:
> https://github.com/zeichenanonym/homebrew-ceph-client
>
> After installing, you will have access to the following executables:
> ceph
> ceph-conf
> ceph-fuse
> rados
> rbd
>
> Disclaimer:
> 1) This is not official - do not expect frequent updates or support from
> Ceph.
> 2) CephFS fuse mounts work on Mojave. And it’s known that we will have I/O
> error on High-Sierra.
>
> Cheers,
> Chris
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Igor Fedotov

Hi Wido,

once you apply the PR you'll probably see the initial error in the log 
that triggers the dump. Which is most probably the lack of space 
reported by _balance_bluefs_freespace() function. If so this means that 
BlueFS rebalance is unable to allocate contiguous 1M chunk at main 
device to gift to BlueFS. I.e. your main device space is very fragmented.


Unfortunately I don't know any ways to recover from this state but OSD 
redeployment or data removal.


Upcoming PR that brings an ability for offline BlueFS volume 
manipulation (https://github.com/ceph/ceph/pull/23103) will probably 
help to recover from this issue in future by migrating BlueFS data to a 
new larger DB volume. (targeted for Nautilus, not sure about backporting 
to Mimic or Luminous).


For now I can suggest the only preventive mean to avoid the case - have 
large enough space at your standalone DB volume. So that master device 
isn't used for DB at all or as minimum as possible. Hence no rebalance 
is needed and no fragmentation is present.


BTW wondering if you have one for your OSDs? How large if so?

Everything above is "IMO", some chances that I missed something..


Thanks,

Igor


On 10/15/2018 10:12 PM, Wido den Hollander wrote:


On 10/15/2018 08:23 PM, Gregory Farnum wrote:

I don't know anything about the BlueStore code, but given the snippets
you've posted this appears to be a debug thing that doesn't expect to be
invoked (or perhaps only in an unexpected case that it's trying hard to
recover from). Have you checked where the dump() function is invoked
from? I'd imagine it's something about having to try extra-hard to
allocate free space or something.

It seems BlueFS that is having a hard time finding free space.

I'm trying this PR now: https://github.com/ceph/ceph/pull/24543

It will stop the spamming, but that's not the root cause. The OSDs in
this case are at max 80% full and they do have a lot of OMAP (RGW
indexes) in them, but that's all.

I'm however not sure why this is happening suddenly in this cluster.

Wido


-Greg

On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander mailto:w...@42on.com>> wrote:



 On 10/11/2018 12:08 AM, Wido den Hollander wrote:
 > Hi,
 >
 > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
 > seeing OSDs writing heavily to their logfiles spitting out these
 lines:
 >
 >
 > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd2078000~34000
 > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd22cc000~24000
 > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd230~2
 > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd2324000~24000
 > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd26c~24000
 > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
 0x0x55828ae047d0
 > dump  0x15cd2704000~3
 >
 > It goes so fast that the OS-disk in this case can't keep up and become
 > 100% util.
 >
 > This causes the OSD to slow down and cause slow requests and
 starts to flap.
 >

 I've set 'log_file' to /dev/null for now, but that doesn't solve it
 either. Randomly OSDs just start spitting out slow requests and have
 these issues.

 Any suggestions on how to fix this?

 Wido

 > It seems that this is *only* happening on OSDs which are the fullest
 > (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
 > that's high).
 >
 > Looking at StupidAllocator.cc I see this piece of code:
 >
 > void StupidAllocator::dump()
 > {
 >   std::lock_guard l(lock);
 >   for (unsigned bin = 0; bin < free.size(); ++bin) {
 >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
 >                   << free[bin].num_intervals() << " extents" << dendl;
 >     for (auto p = free[bin].begin();
 >          p != free[bin].end();
 >          ++p) {
 >       ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
 > << "~"
 >                     << p.get_len() << std::dec << dendl;
 >     }
 >   }
 > }
 >
 > I'm just wondering why it would spit out these lines and what's
 causing it.
 >
 > Has anybody seen this before?
 >
 > Wido
 > ___
 > ceph-users mailing list
 > ceph-users@lists.ceph.com 
 > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 >
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com 
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users 

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Igor Fedotov


On 10/15/2018 11:47 PM, Wido den Hollander wrote:

Hi,

On 10/15/2018 10:43 PM, Igor Fedotov wrote:

Hi Wido,

once you apply the PR you'll probably see the initial error in the log
that triggers the dump. Which is most probably the lack of space
reported by _balance_bluefs_freespace() function. If so this means that
BlueFS rebalance is unable to allocate contiguous 1M chunk at main
device to gift to BlueFS. I.e. your main device space is very fragmented.

Unfortunately I don't know any ways to recover from this state but OSD
redeployment or data removal.


We are moving data away from these OSDs. Lucky us that we have HDD OSDs
in there as well, moving a lot of data there.

How would re-deployment work? Just wipe the OSDs and bring them back
into the cluster again? That would be a very painful task.. :-(


Good chances that you'll face the same issue again one day.
May be allocate some SSDs to serve as DB devices?



Upcoming PR that brings an ability for offline BlueFS volume
manipulation (https://github.com/ceph/ceph/pull/23103) will probably
help to recover from this issue in future by migrating BlueFS data to a
new larger DB volume. (targeted for Nautilus, not sure about backporting
to Mimic or Luminous).

For now I can suggest the only preventive mean to avoid the case - have
large enough space at your standalone DB volume. So that master device
isn't used for DB at all or as minimum as possible. Hence no rebalance
is needed and no fragmentation is present.


I see, but these are SSD-only OSDs.


BTW wondering if you have one for your OSDs? How large if so?


The cluster consists out of 96 OSDs with Samsung PM863a 1.92TB OSDs.

The fullest OSD currently is 78% full, which is 348GiB free on the
1.75TiB device.

Does this information help?

Yeah, thanks for sharing.


Thanks!

Wido


Everything above is "IMO", some chances that I missed something..


Thanks,

Igor


On 10/15/2018 10:12 PM, Wido den Hollander wrote:

On 10/15/2018 08:23 PM, Gregory Farnum wrote:

I don't know anything about the BlueStore code, but given the snippets
you've posted this appears to be a debug thing that doesn't expect to be
invoked (or perhaps only in an unexpected case that it's trying hard to
recover from). Have you checked where the dump() function is invoked
from? I'd imagine it's something about having to try extra-hard to
allocate free space or something.

It seems BlueFS that is having a hard time finding free space.

I'm trying this PR now: https://github.com/ceph/ceph/pull/24543

It will stop the spamming, but that's not the root cause. The OSDs in
this case are at max 80% full and they do have a lot of OMAP (RGW
indexes) in them, but that's all.

I'm however not sure why this is happening suddenly in this cluster.

Wido


-Greg

On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander mailto:w...@42on.com>> wrote:



  On 10/11/2018 12:08 AM, Wido den Hollander wrote:
  > Hi,
  >
  > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and
12.2.8 I'm
  > seeing OSDs writing heavily to their logfiles spitting out these
  lines:
  >
  >
  > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd2078000~34000
  > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd22cc000~24000
  > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd230~2
  > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd2324000~24000
  > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd26c~24000
  > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
  0x0x55828ae047d0
  > dump  0x15cd2704000~3
  >
  > It goes so fast that the OS-disk in this case can't keep up
and become
  > 100% util.
  >
  > This causes the OSD to slow down and cause slow requests and
  starts to flap.
  >

  I've set 'log_file' to /dev/null for now, but that doesn't solve it
  either. Randomly OSDs just start spitting out slow requests and
have
  these issues.

  Any suggestions on how to fix this?

  Wido

  > It seems that this is *only* happening on OSDs which are the
fullest
  > (~85%) on this cluster and they have about ~400 PGs each (Yes,
I know,
  > that's high).
  >
  > Looking at StupidAllocator.cc I see this piece of code:
  >
  > void StupidAllocator::dump()
  > {
  >   std::lock_guard l(lock);
  >   for (unsigned bin = 0; bin < free.size(); ++bin) {
  >     ldout(cct, 0) << __func__ << " free bin " << bin << ": "
  >                   << free[bin].num_intervals() << " extents"
<< dendl;
  >     for (auto p = free[bin].begin();
  >          p != free[bin].end();
  >          ++p) {
  >       ldout(cct, 0) << __func__ << "  0x" << std::hex <<

Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander


On 10/16/2018 12:04 AM, Igor Fedotov wrote:
> 
> On 10/15/2018 11:47 PM, Wido den Hollander wrote:
>> Hi,
>>
>> On 10/15/2018 10:43 PM, Igor Fedotov wrote:
>>> Hi Wido,
>>>
>>> once you apply the PR you'll probably see the initial error in the log
>>> that triggers the dump. Which is most probably the lack of space
>>> reported by _balance_bluefs_freespace() function. If so this means that
>>> BlueFS rebalance is unable to allocate contiguous 1M chunk at main
>>> device to gift to BlueFS. I.e. your main device space is very
>>> fragmented.
>>>
>>> Unfortunately I don't know any ways to recover from this state but OSD
>>> redeployment or data removal.
>>>
>> We are moving data away from these OSDs. Lucky us that we have HDD OSDs
>> in there as well, moving a lot of data there.
>>
>> How would re-deployment work? Just wipe the OSDs and bring them back
>> into the cluster again? That would be a very painful task.. :-(
> 
> Good chances that you'll face the same issue again one day.
> May be allocate some SSDs to serve as DB devices?

Maybe, but this is a very common use-case where people run WAL+DB+DATA
on a single SSD.

Now we are running into it, but aren't the chances big other people will
run into it as well?

>>
>>> Upcoming PR that brings an ability for offline BlueFS volume
>>> manipulation (https://github.com/ceph/ceph/pull/23103) will probably
>>> help to recover from this issue in future by migrating BlueFS data to a
>>> new larger DB volume. (targeted for Nautilus, not sure about backporting
>>> to Mimic or Luminous).
>>>
>>> For now I can suggest the only preventive mean to avoid the case - have
>>> large enough space at your standalone DB volume. So that master device
>>> isn't used for DB at all or as minimum as possible. Hence no rebalance
>>> is needed and no fragmentation is present.
>>>
>> I see, but these are SSD-only OSDs.
>>
>>> BTW wondering if you have one for your OSDs? How large if so?
>>>
>> The cluster consists out of 96 OSDs with Samsung PM863a 1.92TB OSDs.
>>
>> The fullest OSD currently is 78% full, which is 348GiB free on the
>> 1.75TiB device.
>>
>> Does this information help?
> Yeah, thanks for sharing.

Let me know if you need more!

Wido

>>
>> Thanks!
>>
>> Wido
>>
>>> Everything above is "IMO", some chances that I missed something..
>>>
>>>
>>> Thanks,
>>>
>>> Igor
>>>
>>>
>>> On 10/15/2018 10:12 PM, Wido den Hollander wrote:
 On 10/15/2018 08:23 PM, Gregory Farnum wrote:
> I don't know anything about the BlueStore code, but given the snippets
> you've posted this appears to be a debug thing that doesn't expect
> to be
> invoked (or perhaps only in an unexpected case that it's trying
> hard to
> recover from). Have you checked where the dump() function is invoked
> from? I'd imagine it's something about having to try extra-hard to
> allocate free space or something.
 It seems BlueFS that is having a hard time finding free space.

 I'm trying this PR now: https://github.com/ceph/ceph/pull/24543

 It will stop the spamming, but that's not the root cause. The OSDs in
 this case are at max 80% full and they do have a lot of OMAP (RGW
 indexes) in them, but that's all.

 I'm however not sure why this is happening suddenly in this cluster.

 Wido

> -Greg
>
> On Mon, Oct 15, 2018 at 10:02 AM Wido den Hollander  > wrote:
>
>
>
>   On 10/11/2018 12:08 AM, Wido den Hollander wrote:
>   > Hi,
>   >
>   > On a Luminous cluster running a mix of 12.2.4, 12.2.5 and
> 12.2.8 I'm
>   > seeing OSDs writing heavily to their logfiles spitting out
> these
>   lines:
>   >
>   >
>   > 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd2078000~34000
>   > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd22cc000~24000
>   > 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd230~2
>   > 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd2324000~24000
>   > 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd26c~24000
>   > 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc
>   0x0x55828ae047d0
>   > dump  0x15cd2704000~3
>   >
>   > It goes so fast that the OS-disk in this case can't keep up
> and become
>   > 100% util.
>   >
>   > This causes the OSD to slow down and cause slow requests and
>   starts to flap.
>   >
>
>   I've set 'log_file' to /dev/null for now, but that doesn't
> solve it
>   either. Randomly OSDs just start 

Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread ST Wong (ITSC)
Thanks.

Shall I mount /var/lib/ceph/mon on the SSD device, or by updating the "mon 
data" ?  Seems not recommended to change the default data location.
We're going to try ceph ansible.  Shall we first set up the cluster, then move 
/var/lib/ceph/mon to SSD devices on all MON ?   Thanks.

/st wong

-Original Message-
From: Wido den Hollander  
Sent: Tuesday, October 16, 2018 1:59 AM
To: solarflow99 ; ST Wong (ITSC) 
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] SSD for MON/MGR/MDS



On 10/15/2018 07:50 PM, solarflow99 wrote:
> I think the answer is, yes.  I'm pretty sure only the OSDs require 
> very long life enterprise grade SSDs
> 

Yes and No. Please use reliable Datacenter Grade SSDs for your MON databases.

Something like 200GB is more then enough in your MON servers.

Wido

> On Mon, Oct 15, 2018 at 4:16 AM ST Wong (ITSC)  > wrote:
> 
> Hi all,
> 
> __ __
> 
> We’ve got some servers with some small size SSD but no hard disks
> other than system disks.  While they’re not suitable for OSD, will
> the SSD be useful for running MON/MGR/MDS?
> 
> __ __
> 
> Thanks a lot.
> 
> Regards,
> 
> /st wong
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 12:41 PM, Dietmar Rieder wrote:
> On 10/15/18 12:02 PM, jes...@krogh.cc wrote:
 On Sun, Oct 14, 2018 at 8:21 PM  wrote:
 how many cephfs mounts that access the file? Is is possible that some
 program opens that file in RW mode (even they just read the file)?
>>>
>>>
>>> The nature of the program is that it is "prepped" by one-set of commands
>>> and queried by another, thus the RW case is extremely unlikely.
>>> I can change permission bits to rewoke the w-bit for the user, they
>>> dont need it anyway... it is just the same service-users that generates
>>> the data and queries it today.
>>
>> Just to remove the suspicion of other clients fiddling with the files I did a
>> more structured test. I have 4 x 10GB files from fio-benchmarking, total
>> 40GB . Hosted on
>>
>> 1) CephFS /ceph/cluster/home/jk
>> 2) NFS /z/home/jk
>>
>> First I read them .. then sleep 900 seconds .. then read again (just with dd)
>>
>> jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
>> if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
>> for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
>> parallel -j 4
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s
>>
>> real0m3.449s
>> user0m0.217s
>> sys 0m11.497s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s
>>
>> real5m56.634s
>> user0m0.260s
>> sys 0m16.515s
>> jk@sild12:/ceph/cluster/home/jk$
>>
>>
>> Then NFS:
>>
>> jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
>> of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
>> $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
>> -j 4
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s
>>
>> real0m2.855s
>> user0m0.185s
>> sys 0m8.888s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
>> 10240+0 records in
>> 10240+0 records out
>> 10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s
>>
>> real0m2.980s
>> user0m0.173s
>> sys 0m8.239s
>> jk@sild12:~$
>>
>>
>> Can I ask one of you to run the same "test" (or similar) .. and report back
>> i you can reproduce it?
> 
> here my test on e EC (6+3) pool using cephfs kernel client:
> 
> 7061+1 records in
> 7061+1 records out
> 7404496985 bytes (7.4 GB) copied, 3.62754 s, 2.0 GB/s
> 7450+1 records in
> 7450+1 records out
> 7812246720 bytes (7.8 GB) copied, 4.11908 s, 1.9 GB/s
> 7761+1 records in
> 7761+1 records out
> 8138636188 bytes (8.1 GB) copied, 4.34788 s, 1.9 GB/s
> 8212+1 records in
> 8212+1 records out
> 8611295220 bytes (8.6 GB) copied, 4.53371 s, 1.9 GB/s
> 
> real0m4.936s
> user0m0.275s
> sys 0m16.828s
> 
> 7061+1 records in
> 7061+1 records out
> 7404496985 bytes (7.4 GB) copied, 3.19726 s, 2.3 GB/s
> 7761+1 records in
> 7761+1 records out
> 8138636188 bytes (8.1 GB) copied, 3.31881 s, 2.5 GB/s
> 7450+1 records in
> 7450+1 records out
> 7812246720 bytes (7.8 GB) copied, 3.36354 s, 2.3 GB/s
> 8212+1 records in
> 8212+1 records out
> 8611295220 bytes (8.6 GB) copied, 3.74418 s, 2.3 GB/s
> 
> 
> No big difference here.
> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64

...forgot to mention: all is luminous ceph-12.2.7

~Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Vincent Godin
Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?
Thx
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread jesper
>> On Sun, Oct 14, 2018 at 8:21 PM  wrote:
>> how many cephfs mounts that access the file? Is is possible that some
>> program opens that file in RW mode (even they just read the file)?
>
>
> The nature of the program is that it is "prepped" by one-set of commands
> and queried by another, thus the RW case is extremely unlikely.
> I can change permission bits to rewoke the w-bit for the user, they
> dont need it anyway... it is just the same service-users that generates
> the data and queries it today.

Just to remove the suspicion of other clients fiddling with the files I did a
more structured test. I have 4 x 10GB files from fio-benchmarking, total
40GB . Hosted on

1) CephFS /ceph/cluster/home/jk
2) NFS /z/home/jk

First I read them .. then sleep 900 seconds .. then read again (just with dd)

jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
parallel -j 4
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s

real0m3.449s
user0m0.217s
sys 0m11.497s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s

real5m56.634s
user0m0.260s
sys 0m16.515s
jk@sild12:/ceph/cluster/home/jk$


Then NFS:

jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
$(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
-j 4
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s

real0m2.855s
user0m0.185s
sys 0m8.888s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s

real0m2.980s
user0m0.173s
sys 0m8.239s
jk@sild12:~$


Can I ask one of you to run the same "test" (or similar) .. and report back
i you can reproduce it?

Thoughts/comments/suggestions are highly apprecitated?  Should I try with
the fuse-client ?

-- 
Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Apply bucket policy to bucket for LDAP user: what is the correct identifier for principal

2018-10-15 Thread Ha Son Hai
Hi Matt and Adam,
Thanks a lot for your reply.

Attached are logs that that are generated when I shared the bucket from a
rgw user (ceph-dashboard) to a ldap user (sonhaiha) and vice versa.

[sonhaiha@DEFR500 ~]$ s3cmd -c .s3cfg-cephdb info s3://shared-bucket
s3://shared-bucket/ (bucket):
   Location:  us-east-1
   Payer: BucketOwner
   Expiration Rule: none
   Policy:{
  "Version": "2012-10-17",
  "Statement": [{
"Effect": "Allow",
"Principal": {"AWS": ["arn:aws:iam:::user/sonhaiha"]},
"Action": "s3:*",
"Resource": [
  "arn:aws:s3:::shared-bucket",
  "arn:aws:s3:::shared-bucket/*"
]
  }]
}

   CORS:  none
   ACL:   Ceph Dashboard: FULL_CONTROL
# i tried also with "arn:aws:iam:::user/sonhaiha$sonhaiha" but not
successful

I saw that, in the case of ldap user, when it accesses the shared bucket,
the rgw server could not find the permissions for the ldap user.

2018-10-15 10:43:36.521 7f3c65146700 15 decode_policy Read
AccessControlPolicyhttp://s3.amazonaws.com/doc/2006-03-01/;>ceph-dashboardCeph
Dashboardhttp://www.w3.org/2001/XMLSchema-instance;
xsi:type="CanonicalUser">ceph-dashboardCeph
DashboardFULL_CONTROL
2018-10-15 10:43:36.522 7f3c65146700  2 req 4:0.026275:s3:GET
/shared-bucket/:list_bucket:recalculating target
2018-10-15 10:43:36.522 7f3c65146700  2 req 4:0.026288:s3:GET
/shared-bucket/:list_bucket:reading permissions
2018-10-15 10:43:36.522 7f3c65146700  2 req 4:0.026291:s3:GET
/shared-bucket/:list_bucket:init op
2018-10-15 10:43:36.522 7f3c65146700  2 req 4:0.026292:s3:GET
/shared-bucket/:list_bucket:verifying op mask
2018-10-15 10:43:36.522 7f3c65146700 20 required_mask= 1 user.op_mask=7
2018-10-15 10:43:36.522 7f3c65146700  2 req 4:0.026295:s3:GET
/shared-bucket/:list_bucket:verifying op permissions
2018-10-15 10:43:36.522 7f3c65146700 20 -- Getting permissions begin with
perm_mask=49
2018-10-15 10:43:36.522 7f3c65146700  5 Searching permissions for
identity=rgw::auth::SysReqApplier ->
rgw::auth::RemoteApplier(acct_user=sonhaiha, acct_name=sonhaiha,
perm_mask=15, is_admin=0) mask=49
2018-10-15 10:43:36.522 7f3c65146700  5 Searching permissions for
uid=sonhaiha
2018-10-15 10:43:36.522 7f3c65146700  5 Permissions for user not found
2018-10-15 10:43:36.522 7f3c65146700  5 Searching permissions for
uid=sonhaiha$sonhaiha
2018-10-15 10:43:36.522 7f3c65146700  5 Permissions for user not found
2018-10-15 10:43:36.522 7f3c65146700 20 from ACL got perm=0
2018-10-15 10:43:36.522 7f3c65146700  5 Searching permissions for group=1
mask=49
2018-10-15 10:43:36.522 7f3c65146700  5 Permissions for group not found
2018-10-15 10:43:36.522 7f3c65146700  5 Searching permissions for group=2
mask=49
2018-10-15 10:43:36.522 7f3c65146700  5 Permissions for group not found
2018-10-15 10:43:36.522 7f3c65146700  5 -- Getting permissions done for
identity=rgw::auth::SysReqApplier ->
rgw::auth::RemoteApplier(acct_user=sonhaiha, acct_name=sonhaiha,
perm_mask=15, is_admin=0), owner=ceph-dashboard, perm=0

Thank you
Ha

On Thu, Oct 11, 2018 at 8:16 PM Matt Benjamin  wrote:

> right, the user can be the dn component or something else projected
> from the entry, details in the docs
>
> Matt
>
> On Thu, Oct 11, 2018 at 1:26 PM, Adam C. Emerson 
> wrote:
> > Ha Son Hai  wrote:
> >> Hello everyone,
> >> I try to apply the bucket policy to my bucket for LDAP user but it
> doesn't work.
> >> For user created by radosgw-admin, the policy works fine.
> >>
> >> {
> >>
> >>   "Version": "2012-10-17",
> >>
> >>   "Statement": [{
> >>
> >> "Effect": "Allow",
> >>
> >> "Principal": {"AWS": ["arn:aws:iam:::user/radosgw-user"]},
> >>
> >> "Action": "s3:*",
> >>
> >> "Resource": [
> >>
> >>   "arn:aws:s3:::shared-tenant-test",
> >>
> >>   "arn:aws:s3:::shared-tenant-test/*"
> >>
> >> ]
> >>
> >>   }]
> >>
> >> }
> >
> > LDAP users essentially are RGW users, so it should be this same
> > format. As I understand RGW's LDAP interface (I have not worked with
> > LDAP personally), every LDAP users get a corresponding RGW user whose
> > name is derived from rgw_ldap_dnattr, often 'uid' or 'cn', but this is
> > dependent on site.
> >
> > If you, can check that part of configuration, and if that doesn't work
> > if you'll send some logs I'll take a look. If something fishy is going
> > on we can try opening a bug.
> >
> > Thank you.
> >
> > --
> > Senior Software Engineer   Red Hat Storage, Ann Arbor, MI, US
> > IRC: Aemerson@OFTC, Actinic@Freenode
> > 0x80F7544B90EDBFB9 E707 86BA 0C1B 62CC 152C  7C12 80F7 544B 90ED BFB9
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>


-- 
Best regards,
Son-Hai HA



Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 12:02 PM, jes...@krogh.cc wrote:
>>> On Sun, Oct 14, 2018 at 8:21 PM  wrote:
>>> how many cephfs mounts that access the file? Is is possible that some
>>> program opens that file in RW mode (even they just read the file)?
>>
>>
>> The nature of the program is that it is "prepped" by one-set of commands
>> and queried by another, thus the RW case is extremely unlikely.
>> I can change permission bits to rewoke the w-bit for the user, they
>> dont need it anyway... it is just the same service-users that generates
>> the data and queries it today.
> 
> Just to remove the suspicion of other clients fiddling with the files I did a
> more structured test. I have 4 x 10GB files from fio-benchmarking, total
> 40GB . Hosted on
> 
> 1) CephFS /ceph/cluster/home/jk
> 2) NFS /z/home/jk
> 
> First I read them .. then sleep 900 seconds .. then read again (just with dd)
> 
> jk@sild12:/ceph/cluster/home/jk$ time  for i in $(seq 0 3); do echo "dd
> if=test.$i.0 of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time 
> for i in $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  |
> parallel -j 4
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.56413 s, 4.2 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.82234 s, 3.8 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.9361 s, 3.7 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 3.10397 s, 3.5 GB/s
> 
> real0m3.449s
> user0m0.217s
> sys 0m11.497s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 315.439 s, 34.0 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 338.661 s, 31.7 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 354.725 s, 30.3 MB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 356.126 s, 30.2 MB/s
> 
> real5m56.634s
> user0m0.260s
> sys 0m16.515s
> jk@sild12:/ceph/cluster/home/jk$
> 
> 
> Then NFS:
> 
> jk@sild12:~$ time  for i in $(seq 0 3); do echo "dd if=test.$i.0
> of=/dev/null bs=1M"; done  | parallel -j 4 ; sleep 900; time  for i in
> $(seq 0 3); do echo "dd if=test.$i.0 of=/dev/null bs=1M"; done  | parallel
> -j 4
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.60267 s, 6.7 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.18602 s, 4.9 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.47564 s, 4.3 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.54674 s, 4.2 GB/s
> 
> real0m2.855s
> user0m0.185s
> sys 0m8.888s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.68613 s, 6.4 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 1.6983 s, 6.3 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.20059 s, 4.9 GB/s
> 10240+0 records in
> 10240+0 records out
> 10737418240 bytes (11 GB, 10 GiB) copied, 2.58077 s, 4.2 GB/s
> 
> real0m2.980s
> user0m0.173s
> sys 0m8.239s
> jk@sild12:~$
> 
> 
> Can I ask one of you to run the same "test" (or similar) .. and report back
> i you can reproduce it?

here my test on e EC (6+3) pool using cephfs kernel client:

7061+1 records in
7061+1 records out
7404496985 bytes (7.4 GB) copied, 3.62754 s, 2.0 GB/s
7450+1 records in
7450+1 records out
7812246720 bytes (7.8 GB) copied, 4.11908 s, 1.9 GB/s
7761+1 records in
7761+1 records out
8138636188 bytes (8.1 GB) copied, 4.34788 s, 1.9 GB/s
8212+1 records in
8212+1 records out
8611295220 bytes (8.6 GB) copied, 4.53371 s, 1.9 GB/s

real0m4.936s
user0m0.275s
sys 0m16.828s

7061+1 records in
7061+1 records out
7404496985 bytes (7.4 GB) copied, 3.19726 s, 2.3 GB/s
7761+1 records in
7761+1 records out
8138636188 bytes (8.1 GB) copied, 3.31881 s, 2.5 GB/s
7450+1 records in
7450+1 records out
7812246720 bytes (7.8 GB) copied, 3.36354 s, 2.3 GB/s
8212+1 records in
8212+1 records out
8611295220 bytes (8.6 GB) copied, 3.74418 s, 2.3 GB/s


No big difference here.
all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64

HTH
  Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread jesper
> On 10/15/18 12:41 PM, Dietmar Rieder wrote:
>> No big difference here.
>> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64
>
> ...forgot to mention: all is luminous ceph-12.2.7

Thanks for your time in testing, this is very valueable to me in the
debugging. 2 questions:

Did you "sleep 900" in-between the execution?
Are you using the kernel client or the fuse client?

If I run them "right after each other" .. then I get the same behaviour.

-- 
Jesper


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph dashboard ac-* commands not working (Mimic)

2018-10-15 Thread Hayashida, Mami
John,

Thanks for your reply.  I am glad you clarified the docs URL mystery for me
as that has confused me many times.

About the Dashboard: Does that mean that, with Mimic 13.2.2, the only
dashboard user management command that works is to create a user?  In other
words, no way to check the user list, delete users, or control their access
levels at this point?

Mami

On Sun, Oct 14, 2018 at 10:38 AM, John Spray  wrote:

> The docs you're looking at are from the master (development) version of
> ceph, so you're seeing commands that don't exist in mimic.  You can swap
> master for mimic in that URL.
>
> Hopefully we'll soon have some changes to make this more apparent when
> looking at the docs.
>
> John
>
> On Fri, 12 Oct 2018, 17:43 Hayashida, Mami, 
> wrote:
>
>> I set up a new Mimic cluster recently and have just enabled the
>> Dashboard.  I first tried to add a (Dashboard) user with the
>> "ac-user-create" command following this version of documentation (
>> http://docs.ceph.com/docs/master/mgr/dashboard/), but the command did
>> not work.  Following the   /mimic/mgr/dashboard/ version, I used the
>> "set-login-credentials" command, I was able to create a user with a
>> password, which was successful.  But with none of the ac-* command working,
>> how can we manage the dashboard user accounts?  At this point, I cannot
>> figure out what level of permissions have been given to the (test)
>> dashboard user I have just created.  Neither have I figured out how to
>> delete a user or obtain a list of dashboard users created so far.
>>
>> I am using Ceph version 13.2.2 and  all the ac-* commands I have tried
>> returns exactly the same message.
>>
>> mon0:~$ ceph dashboard ac-user-show  test-user
>> no valid command found; 10 closest matches:
>> dashboard get-rgw-api-user-id
>> dashboard get-rest-requests-timeout
>> dashboard set-rgw-api-host 
>> dashboard set-rgw-api-secret-key 
>> dashboard get-rgw-api-access-key
>> dashboard set-rest-requests-timeout 
>> dashboard get-rgw-api-scheme
>> dashboard get-rgw-api-host
>> dashboard set-login-credentials  
>> dashboard set-session-expire 
>> Error EINVAL: invalid command
>>
>>
>> --
>> -
>>
>> *Mami Hayashida*
>>
>> *Research Computing Associate*
>> Research Computing Infrastructure
>> University of Kentucky Information Technology Services
>> 301 Rose Street
>>  | 102
>> James F. Hardymon Building
>> Lexington, KY 40506-0495
>> mami.hayash...@uky.edu
>> (859)323-7521
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>


-- 
*Mami Hayashida*

*Research Computing Associate*
Research Computing Infrastructure
University of Kentucky Information Technology Services
301 Rose Street | 102 James F. Hardymon Building
Lexington, KY 40506-0495
mami.hayash...@uky.edu
(859)323-7521
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Eugen Block

Hi Andrei,

we have been using the script from [1] to define the number of PGs to  
deep-scrub in parallel, we currently use MAXSCRUBS=4, you could start  
with 1 to minimize performance impacts.


And these are the scrub settings from our ceph.conf:

ceph:~ # grep scrub /etc/ceph/ceph.conf
osd_scrub_begin_hour = 0
osd_scrub_end_hour = 7
osd_scrub_sleep = 0.1
osd_deep_scrub_interval = 2419200

The osd_deep_scrub_interval is set to 4 weeks so that it doesn't  
interfere with our own interval defined by the cronjob, scrubbing a  
quarter of PGs four times a week, so that every PG has been  
deep-scrubbed within one week.


Regards,
Eugen

[1]  
https://www.formann.de/2015/05/cronjob-to-enable-timed-deep-scrubbing-in-a-ceph-cluster/



Zitat von Andrei Mikhailovsky :


Hello,

I am currently running Luminous 12.2.8 on Ubuntu with  
4.15.0-36-generic kernel from the official ubuntu repo. The cluster  
has 4 mon + osd servers. Each osd server has the total of 9 spinning  
osds and 1 ssd for the hdd and ssd pools. The hdds are backed by the  
S3710 ssds for journaling with a ration of 1:5. The ssd pool osds  
are not using external journals. Ceph is used as a Primary storage  
for Cloudstack - all vm disk images are stored on the cluster.


I have recently migrated all osds to the bluestore, which was a long  
process with ups and downs, but I am happy to say that the migration  
is done. During the migration I've disabled the scrubbing (both deep  
and standard). After reenabling the scrubbing I have noticed the  
cluster started having a large number of slow requests and poor  
client IO (to the point of vms stall for minutes). Further  
investigation showed that the slow requests happen because of the  
osds flapping. In a single day my logs have over 1000 entries which  
report osd going down. This effects random osds. Disabling  
deep-scrubbing stabilises the cluster and the osds are no longer  
flap and the slow requests disappear. As a short term solution I've  
disabled the deepscurbbing, but was hoping to fix the issues with  
your help.


At the moment, I am running the cluster with default settings apart  
from the following settings:


[global]
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = idle
osd_recovery_op_priority = 1

[osd]
debug_ms = 0
debug_auth = 0
debug_osd = 0
debug_bluestore = 0
debug_bluefs = 0
debug_bdev = 0
debug_rocksdb = 0


Could you share experiences with deep scrubbing of bluestore osds?  
Are there any options that I should set to make sure the osds are  
not flapping and the client IO is still available?


Thanks

Andrei




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Andrei Mikhailovsky
Hello, 

I am currently running Luminous 12.2.8 on Ubuntu with 4.15.0-36-generic kernel 
from the official ubuntu repo. The cluster has 4 mon + osd servers. Each osd 
server has the total of 9 spinning osds and 1 ssd for the hdd and ssd pools. 
The hdds are backed by the S3710 ssds for journaling with a ration of 1:5. The 
ssd pool osds are not using external journals. Ceph is used as a Primary 
storage for Cloudstack - all vm disk images are stored on the cluster. 

I have recently migrated all osds to the bluestore, which was a long process 
with ups and downs, but I am happy to say that the migration is done. During 
the migration I've disabled the scrubbing (both deep and standard). After 
reenabling the scrubbing I have noticed the cluster started having a large 
number of slow requests and poor client IO (to the point of vms stall for 
minutes). Further investigation showed that the slow requests happen because of 
the osds flapping. In a single day my logs have over 1000 entries which report 
osd going down. This effects random osds. Disabling deep-scrubbing stabilises 
the cluster and the osds are no longer flap and the slow requests disappear. As 
a short term solution I've disabled the deepscurbbing, but was hoping to fix 
the issues with your help. 

At the moment, I am running the cluster with default settings apart from the 
following settings: 

[global] 
osd_disk_thread_ioprio_priority = 7 
osd_disk_thread_ioprio_class = idle 
osd_recovery_op_priority = 1 

[osd] 
debug_ms = 0 
debug_auth = 0 
debug_osd = 0 
debug_bluestore = 0 
debug_bluefs = 0 
debug_bdev = 0 
debug_rocksdb = 0 


Could you share experiences with deep scrubbing of bluestore osds? Are there 
any options that I should set to make sure the osds are not flapping and the 
client IO is still available? 

Thanks 

Andrei 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Matthew Vernon
Hi,

On 15/10/18 11:44, Vincent Godin wrote:
> Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?

No, but there is some --help output:

root@sto-1-1:~# ceph-objectstore-tool --help

Allowed options:
  --help  produce help message
  --type arg  Arg is one of [filestore (default), memstore]
  --data-path arg path to object store, mandatory
  --journal-path arg  path to journal, mandatory for filestore type
  --pgid arg  PG id, mandatory for info, log, remove,
export,
  rm-past-intervals, mark-complete, and
mandatory
  for apply-layout-settings if --pool is not
  specified
  --pool arg  Pool name, mandatory for
apply-layout-settings if
  --pgid is not specified
  --op argArg is one of [info, log, remove, mkfs, fsck,
  fuse, export, import, list, fix-lost,
list-pgs,
  rm-past-intervals, dump-journal, dump-super,
  meta-list, get-osdmap, set-osdmap,
  get-inc-osdmap, set-inc-osdmap, mark-complete,
  apply-layout-settings, update-mon-db]
  --epoch arg epoch# for get-osdmap and get-inc-osdmap, the
  current epoch in use if not specified
  --file arg  path of file to export, import, get-osdmap,
  set-osdmap, get-inc-osdmap or set-inc-osdmap
  --mon-store-path argpath of monstore to update-mon-db
  --mountpoint argfuse mountpoint
  --format arg (=json-pretty) Output format which may be json, json-pretty,
  xml, xml-pretty
  --debug Enable diagnostic output to stderr
  --force Ignore some types of errors and proceed with
  operation - USE WITH CAUTION: CORRUPTION
POSSIBLE
  NOW OR IN THE FUTURE
  --skip-journal-replay   Disable journal replay
  --skip-mount-omap   Disable mounting of omap
  --head  Find head/snapdir when searching for
objects by
  name
  --dry-run   Don't modify the objectstore


Positional syntax:

ceph-objectstore-tool ...  (get|set)-bytes [file]
ceph-objectstore-tool ...  set-(attr|omap)  [file]
ceph-objectstore-tool ...  (get|rm)-(attr|omap) 
ceph-objectstore-tool ...  get-omaphdr
ceph-objectstore-tool ...  set-omaphdr [file]
ceph-objectstore-tool ...  list-attrs
ceph-objectstore-tool ...  list-omap
ceph-objectstore-tool ...  remove
ceph-objectstore-tool ...  dump
ceph-objectstore-tool ...  set-size
ceph-objectstore-tool ...  remove-clone-metadata 

 can be a JSON object description as displayed
by --op list.
 can be an object name which will be looked up in all
the OSD's PGs.
 can be the empty string ('') which with a provided pgid
specifies the pgmeta object

The optional [file] argument will read stdin or write stdout
if not specified or if '-' specified.

[that's for the Jewel version]

HTH,

Matthew


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph dashboard ac-* commands not working (Mimic)

2018-10-15 Thread Hayashida, Mami
Ah, ok.  Thanks!

On Mon, Oct 15, 2018 at 8:52 AM, John Spray  wrote:

> On Mon, Oct 15, 2018 at 1:47 PM Hayashida, Mami 
> wrote:
> >
> > John,
> >
> > Thanks for your reply.  I am glad you clarified the docs URL mystery for
> me as that has confused me many times.
> >
> > About the Dashboard: Does that mean that, with Mimic 13.2.2, the only
> dashboard user management command that works is to create a user?  In other
> words, no way to check the user list, delete users, or control their access
> levels at this point?
>
> Correct: the code in Mimic just has a single user account.
>
> John
>
> > Mami
> >
> > On Sun, Oct 14, 2018 at 10:38 AM, John Spray  wrote:
> >>
> >> The docs you're looking at are from the master (development) version of
> ceph, so you're seeing commands that don't exist in mimic.  You can swap
> master for mimic in that URL.
> >>
> >> Hopefully we'll soon have some changes to make this more apparent when
> looking at the docs.
> >>
> >> John
> >>
> >> On Fri, 12 Oct 2018, 17:43 Hayashida, Mami, 
> wrote:
> >>>
> >>> I set up a new Mimic cluster recently and have just enabled the
> Dashboard.  I first tried to add a (Dashboard) user with the
> "ac-user-create" command following this version of documentation (
> http://docs.ceph.com/docs/master/mgr/dashboard/), but the command did not
> work.  Following the   /mimic/mgr/dashboard/ version, I used the
> "set-login-credentials" command, I was able to create a user with a
> password, which was successful.  But with none of the ac-* command working,
> how can we manage the dashboard user accounts?  At this point, I cannot
> figure out what level of permissions have been given to the (test)
> dashboard user I have just created.  Neither have I figured out how to
> delete a user or obtain a list of dashboard users created so far.
> >>>
> >>> I am using Ceph version 13.2.2 and  all the ac-* commands I have tried
> returns exactly the same message.
> >>>
> >>> mon0:~$ ceph dashboard ac-user-show  test-user
> >>> no valid command found; 10 closest matches:
> >>> dashboard get-rgw-api-user-id
> >>> dashboard get-rest-requests-timeout
> >>> dashboard set-rgw-api-host 
> >>> dashboard set-rgw-api-secret-key 
> >>> dashboard get-rgw-api-access-key
> >>> dashboard set-rest-requests-timeout 
> >>> dashboard get-rgw-api-scheme
> >>> dashboard get-rgw-api-host
> >>> dashboard set-login-credentials  
> >>> dashboard set-session-expire 
> >>> Error EINVAL: invalid command
> >>>
> >>>
> >>> --
> >>> 
> -
> >>> Mami Hayashida
> >>> Research Computing Associate
> >>>
> >>> Research Computing Infrastructure
> >>> University of Kentucky Information Technology Services
> >>> 301 Rose Street | 102 James F. Hardymon Building
> >>> Lexington, KY 40506-0495
> >>> mami.hayash...@uky.edu
> >>> (859)323-7521
> >>> ___
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> >
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Luminous with osd flapping, slow requests when deep scrubbing

2018-10-15 Thread Igor Fedotov

Perhaps this is the same issue as indicated here:

https://tracker.ceph.com/issues/36364


Can you check OSD iostat reports for similarities to this ticket, please?

Thanks,
Igor

On 10/15/2018 2:26 PM, Andrei Mikhailovsky wrote:

Hello,

I am currently running Luminous 12.2.8 on Ubuntu with 
4.15.0-36-generic kernel from the official ubuntu repo. The cluster 
has 4 mon + osd servers. Each osd server has the total of 9 spinning 
osds and 1 ssd for the hdd and ssd pools. The hdds are backed by the 
S3710 ssds for journaling with a ration of 1:5. The ssd pool osds are 
not using external journals. Ceph is used as a Primary storage for 
Cloudstack - all vm disk images are stored on the cluster.


I have recently migrated all osds to the bluestore, which was a long 
process with ups and downs, but I am happy to say that the migration 
is done. During the migration I've disabled the scrubbing (both deep 
and standard). After reenabling the scrubbing I have noticed the 
cluster started having a large number of slow requests and poor client 
IO (to the point of vms stall for minutes). Further investigation 
showed that the slow requests happen because of the osds flapping. In 
a single day my logs have over 1000 entries which report osd going 
down. This effects random osds. Disabling deep-scrubbing stabilises 
the cluster and the osds are no longer flap and the slow requests 
disappear. As a short term solution I've disabled the deepscurbbing, 
but was hoping to fix the issues with your help.


At the moment, I am running the cluster with default settings apart 
from the following settings:


[global]
osd_disk_thread_ioprio_priority = 7
osd_disk_thread_ioprio_class = idle
osd_recovery_op_priority = 1

[osd]
debug_ms = 0
debug_auth = 0
debug_osd = 0
debug_bluestore = 0
debug_bluefs = 0
debug_bdev = 0
debug_rocksdb = 0


Could you share experiences with deep scrubbing of bluestore osds? Are 
there any options that I should set to make sure the osds are not 
flapping and the client IO is still available?


Thanks

Andrei


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw lifecycle not removing delete markers

2018-10-15 Thread Sean Purdy
Hi,


Versions 12.2.7 and 12.2.8.  I've set up a bucket with versioning enabled and 
upload a lifecycle configuration.  I upload some files and delete them, 
inserting delete markers.  The configured lifecycle DOES remove the deleted 
binaries (non current versions).  The lifecycle DOES NOT remove the delete 
markers.  With ExpiredObjectDeleteMarker set.

Is this a known issue?  I have an empty bucket full of delete markers.

Does this lifecycle do what I expect?  Remove the non-current version after a 
day, and remove orphaned delete markers:

{
"Rules": [
{
"Status": "Enabled", 
"Prefix": "", 
"NoncurrentVersionExpiration": {
"NoncurrentDays": 1
}, 
"Expiration": {
"ExpiredObjectDeleteMarker": true
}, 
"ID": "Test expiry"
}
]
}


I can't be the only one who wants to use this feature.

Thanks,

Sean
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread David Turner
Mgr and MDS do not use physical space on a disk. Mons do use the disk and
benefit from SSDs, but they write a lot of stuff all the time. Depending
why the SSDs aren't suitable for OSDs, they might not be suitable for mons
either.

On Mon, Oct 15, 2018, 7:16 AM ST Wong (ITSC)  wrote:

> Hi all,
>
>
>
> We’ve got some servers with some small size SSD but no hard disks other
> than system disks.  While they’re not suitable for OSD, will the SSD be
> useful for running MON/MGR/MDS?
>
>
>
> Thanks a lot.
>
> Regards,
>
> /st wong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread ST Wong (ITSC)
Hi all,

We've got some servers with some small size SSD but no hard disks other than 
system disks.  While they're not suitable for OSD, will the SSD be useful for 
running MON/MGR/MDS?

Thanks a lot.
Regards,
/st wong
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs kernel client - page cache being invaildated.

2018-10-15 Thread Dietmar Rieder
On 10/15/18 1:17 PM, jes...@krogh.cc wrote:
>> On 10/15/18 12:41 PM, Dietmar Rieder wrote:
>>> No big difference here.
>>> all CentOS 7.5 official kernel 3.10.0-862.11.6.el7.x86_64
>>
>> ...forgot to mention: all is luminous ceph-12.2.7
> 
> Thanks for your time in testing, this is very valueable to me in the
> debugging. 2 questions:
> 
> Did you "sleep 900" in-between the execution?
> Are you using the kernel client or the fuse client?
> 
> If I run them "right after each other" .. then I get the same behaviour.
> 

Hi, as I stated I'm using the kernel client, and yes I did the sleep 900
between the two runs.

~Dietmar



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph dashboard ac-* commands not working (Mimic)

2018-10-15 Thread John Spray
On Mon, Oct 15, 2018 at 1:47 PM Hayashida, Mami  wrote:
>
> John,
>
> Thanks for your reply.  I am glad you clarified the docs URL mystery for me 
> as that has confused me many times.
>
> About the Dashboard: Does that mean that, with Mimic 13.2.2, the only 
> dashboard user management command that works is to create a user?  In other 
> words, no way to check the user list, delete users, or control their access 
> levels at this point?

Correct: the code in Mimic just has a single user account.

John

> Mami
>
> On Sun, Oct 14, 2018 at 10:38 AM, John Spray  wrote:
>>
>> The docs you're looking at are from the master (development) version of 
>> ceph, so you're seeing commands that don't exist in mimic.  You can swap 
>> master for mimic in that URL.
>>
>> Hopefully we'll soon have some changes to make this more apparent when 
>> looking at the docs.
>>
>> John
>>
>> On Fri, 12 Oct 2018, 17:43 Hayashida, Mami,  wrote:
>>>
>>> I set up a new Mimic cluster recently and have just enabled the Dashboard.  
>>> I first tried to add a (Dashboard) user with the "ac-user-create" command 
>>> following this version of documentation 
>>> (http://docs.ceph.com/docs/master/mgr/dashboard/), but the command did not 
>>> work.  Following the   /mimic/mgr/dashboard/ version, I used the 
>>> "set-login-credentials" command, I was able to create a user with a 
>>> password, which was successful.  But with none of the ac-* command working, 
>>> how can we manage the dashboard user accounts?  At this point, I cannot 
>>> figure out what level of permissions have been given to the (test) 
>>> dashboard user I have just created.  Neither have I figured out how to 
>>> delete a user or obtain a list of dashboard users created so far.
>>>
>>> I am using Ceph version 13.2.2 and  all the ac-* commands I have tried 
>>> returns exactly the same message.
>>>
>>> mon0:~$ ceph dashboard ac-user-show  test-user
>>> no valid command found; 10 closest matches:
>>> dashboard get-rgw-api-user-id
>>> dashboard get-rest-requests-timeout
>>> dashboard set-rgw-api-host 
>>> dashboard set-rgw-api-secret-key 
>>> dashboard get-rgw-api-access-key
>>> dashboard set-rest-requests-timeout 
>>> dashboard get-rgw-api-scheme
>>> dashboard get-rgw-api-host
>>> dashboard set-login-credentials  
>>> dashboard set-session-expire 
>>> Error EINVAL: invalid command
>>>
>>>
>>> --
>>> -
>>> Mami Hayashida
>>> Research Computing Associate
>>>
>>> Research Computing Infrastructure
>>> University of Kentucky Information Technology Services
>>> 301 Rose Street | 102 James F. Hardymon Building
>>> Lexington, KY 40506-0495
>>> mami.hayash...@uky.edu
>>> (859)323-7521
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Mami Hayashida
> Research Computing Associate
>
> Research Computing Infrastructure
> University of Kentucky Information Technology Services
> 301 Rose Street | 102 James F. Hardymon Building
> Lexington, KY 40506-0495
> mami.hayash...@uky.edu
> (859)323-7521
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread John Spray
On Mon, Oct 15, 2018 at 3:34 PM Kisik Jeong  wrote:
>
> Hello,
>
> I successfully deployed Ceph cluster with 16 OSDs and created CephFS before.
> But after rebooting due to mds slow request problem, when creating CephFS, 
> Ceph mds goes creating status and never changes.
> Seeing Ceph status, there is no other problem I think. Here is 'ceph -s' 
> result:

That's pretty strange.  Usually if an MDS is stuck in "creating", it's
because an OSD operation is stuck, but in your case all your PGs are
healthy.

I would suggest setting "debug mds=20" and "debug objecter=10" on your
MDS, restarting it and capturing those logs so that we can see where
it got stuck.

John

> csl@hpc1:~$ ceph -s
>   cluster:
> id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> health: HEALTH_OK
>
>   services:
> mon: 1 daemons, quorum hpc1
> mgr: hpc1(active)
> mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
> osd: 16 osds: 16 up, 16 in
>
>   data:
> pools:   2 pools, 640 pgs
> objects: 7 objects, 124B
> usage:   34.3GiB used, 116TiB / 116TiB avail
> pgs: 640 active+clean
>
> However, CephFS still works in case of 8 OSDs.
>
> If there is any doubt of this phenomenon, please let me know. Thank you.
>
> PS. I attached my ceph.conf contents:
>
> [global]
> fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> mon_initial_members = hpc1
> mon_host = 192.168.40.10
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> public_network = 192.168.40.0/24
> cluster_network = 192.168.40.0/24
>
> [osd]
> osd journal size = 1024
> osd max object name len = 256
> osd max object namespace len = 64
> osd mount options f2fs = active_logs=2
>
> [osd.0]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.1]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.2]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.3]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.4]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.5]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.6]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.7]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.8]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.9]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.10]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.11]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.12]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.13]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.14]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.15]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> --
> Kisik Jeong
> Ph.D. Student
> Computer Systems Laboratory
> Sungkyunkwan University
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph client libraries for OSX

2018-10-15 Thread Christopher Blum
Hi folks,

Just wanted to announce that with the help of Kefu, I was able to create a
working tap for ceph client libraries and binaries for the OSX platform.
Currently, we only test the tap on High-Sierra and Mojave.

This was mostly built so that people can use go-ceph on their Macs without
VM, but I'm happy if it helps you in other ways as well!

Everything you need to try it out is available here:
https://github.com/zeichenanonym/homebrew-ceph-client

After installing, you will have access to the following executables:
ceph
ceph-conf
ceph-fuse
rados
rbd

Disclaimer:
1) This is not official - do not expect frequent updates or support from
Ceph.
2) CephFS fuse mounts work on Mojave. And it’s known that we will have I/O
error on High-Sierra.

Cheers,
Chris
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread Kisik Jeong
Hello,

I successfully deployed Ceph cluster with 16 OSDs and created CephFS before.
But after rebooting due to mds slow request problem, when creating CephFS,
Ceph mds goes creating status and never changes.
Seeing Ceph status, there is no other problem I think. Here is 'ceph -s'
result:

csl@hpc1:~$ ceph -s
  cluster:
id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
health: HEALTH_OK

  services:
mon: 1 daemons, quorum hpc1
mgr: hpc1(active)
mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
osd: 16 osds: 16 up, 16 in

  data:
pools:   2 pools, 640 pgs
objects: 7 objects, 124B
usage:   34.3GiB used, 116TiB / 116TiB avail
pgs: 640 active+clean

However, CephFS still works in case of 8 OSDs.

If there is any doubt of this phenomenon, please let me know. Thank you.

PS. I attached my ceph.conf contents:

[global]
fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
mon_initial_members = hpc1
mon_host = 192.168.40.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx

public_network = 192.168.40.0/24
cluster_network = 192.168.40.0/24

[osd]
osd journal size = 1024
osd max object name len = 256
osd max object namespace len = 64
osd mount options f2fs = active_logs=2

[osd.0]
host = hpc9
public_addr = 192.168.40.18
cluster_addr = 192.168.40.18

[osd.1]
host = hpc10
public_addr = 192.168.40.19
cluster_addr = 192.168.40.19

[osd.2]
host = hpc9
public_addr = 192.168.40.18
cluster_addr = 192.168.40.18

[osd.3]
host = hpc10
public_addr = 192.168.40.19
cluster_addr = 192.168.40.19

[osd.4]
host = hpc9
public_addr = 192.168.40.18
cluster_addr = 192.168.40.18

[osd.5]
host = hpc10
public_addr = 192.168.40.19
cluster_addr = 192.168.40.19

[osd.6]
host = hpc9
public_addr = 192.168.40.18
cluster_addr = 192.168.40.18

[osd.7]
host = hpc10
public_addr = 192.168.40.19
cluster_addr = 192.168.40.19

[osd.8]
host = hpc9
public_addr = 192.168.40.18
cluster_addr = 192.168.40.18

[osd.9]
host = hpc10
public_addr = 192.168.40.19
cluster_addr = 192.168.40.19

[osd.10]
host = hpc9
public_addr = 192.168.10.18
cluster_addr = 192.168.40.18

[osd.11]
host = hpc10
public_addr = 192.168.10.19
cluster_addr = 192.168.40.19

[osd.12]
host = hpc9
public_addr = 192.168.10.18
cluster_addr = 192.168.40.18

[osd.13]
host = hpc10
public_addr = 192.168.10.19
cluster_addr = 192.168.40.19

[osd.14]
host = hpc9
public_addr = 192.168.10.18
cluster_addr = 192.168.40.18

[osd.15]
host = hpc10
public_addr = 192.168.10.19
cluster_addr = 192.168.40.19

-- 
Kisik Jeong
Ph.D. Student
Computer Systems Laboratory
Sungkyunkwan University
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] OSD log being spammed with BlueStore stupidallocator dump

2018-10-15 Thread Wido den Hollander



On 10/11/2018 12:08 AM, Wido den Hollander wrote:
> Hi,
> 
> On a Luminous cluster running a mix of 12.2.4, 12.2.5 and 12.2.8 I'm
> seeing OSDs writing heavily to their logfiles spitting out these lines:
> 
> 
> 2018-10-10 21:52:04.019037 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2078000~34000
> 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd22cc000~24000
> 2018-10-10 21:52:04.019038 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd230~2
> 2018-10-10 21:52:04.019039 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2324000~24000
> 2018-10-10 21:52:04.019040 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd26c~24000
> 2018-10-10 21:52:04.019041 7f90c2f0f700  0 stupidalloc 0x0x55828ae047d0
> dump  0x15cd2704000~3
> 
> It goes so fast that the OS-disk in this case can't keep up and become
> 100% util.
> 
> This causes the OSD to slow down and cause slow requests and starts to flap.
> 

I've set 'log_file' to /dev/null for now, but that doesn't solve it
either. Randomly OSDs just start spitting out slow requests and have
these issues.

Any suggestions on how to fix this?

Wido

> It seems that this is *only* happening on OSDs which are the fullest
> (~85%) on this cluster and they have about ~400 PGs each (Yes, I know,
> that's high).
> 
> Looking at StupidAllocator.cc I see this piece of code:
> 
> void StupidAllocator::dump()
> {
>   std::lock_guard l(lock);
>   for (unsigned bin = 0; bin < free.size(); ++bin) {
> ldout(cct, 0) << __func__ << " free bin " << bin << ": "
>   << free[bin].num_intervals() << " extents" << dendl;
> for (auto p = free[bin].begin();
>  p != free[bin].end();
>  ++p) {
>   ldout(cct, 0) << __func__ << "  0x" << std::hex << p.get_start()
> << "~"
> << p.get_len() << std::dec << dendl;
> }
>   }
> }
> 
> I'm just wondering why it would spit out these lines and what's causing it.
> 
> Has anybody seen this before?
> 
> Wido
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-objectstore-tool manual

2018-10-15 Thread Vincent Godin
Yes, but there is a lot of non document options !
For example, when we tried to rebuild a mon store, we had to add the
option --no-mon-config (which is not in the help) because
ceph-objectstore-tool tried to join the monitors and never responded.
It would be nice if someone could produce a more complete manual
Le lun. 15 oct. 2018 à 14:26, Matthew Vernon  a écrit :
>
> Hi,
>
> On 15/10/18 11:44, Vincent Godin wrote:
> > Does a man exist on ceph-objectstore-tool ? if yes, where can i find it ?
>
> No, but there is some --help output:
>
> root@sto-1-1:~# ceph-objectstore-tool --help
>
> Allowed options:
>   --help  produce help message
>   --type arg  Arg is one of [filestore (default), memstore]
>   --data-path arg path to object store, mandatory
>   --journal-path arg  path to journal, mandatory for filestore type
>   --pgid arg  PG id, mandatory for info, log, remove,
> export,
>   rm-past-intervals, mark-complete, and
> mandatory
>   for apply-layout-settings if --pool is not
>   specified
>   --pool arg  Pool name, mandatory for
> apply-layout-settings if
>   --pgid is not specified
>   --op argArg is one of [info, log, remove, mkfs, fsck,
>   fuse, export, import, list, fix-lost,
> list-pgs,
>   rm-past-intervals, dump-journal, dump-super,
>   meta-list, get-osdmap, set-osdmap,
>   get-inc-osdmap, set-inc-osdmap, mark-complete,
>   apply-layout-settings, update-mon-db]
>   --epoch arg epoch# for get-osdmap and get-inc-osdmap, the
>   current epoch in use if not specified
>   --file arg  path of file to export, import, get-osdmap,
>   set-osdmap, get-inc-osdmap or set-inc-osdmap
>   --mon-store-path argpath of monstore to update-mon-db
>   --mountpoint argfuse mountpoint
>   --format arg (=json-pretty) Output format which may be json, json-pretty,
>   xml, xml-pretty
>   --debug Enable diagnostic output to stderr
>   --force Ignore some types of errors and proceed with
>   operation - USE WITH CAUTION: CORRUPTION
> POSSIBLE
>   NOW OR IN THE FUTURE
>   --skip-journal-replay   Disable journal replay
>   --skip-mount-omap   Disable mounting of omap
>   --head  Find head/snapdir when searching for
> objects by
>   name
>   --dry-run   Don't modify the objectstore
>
>
> Positional syntax:
>
> ceph-objectstore-tool ...  (get|set)-bytes [file]
> ceph-objectstore-tool ...  set-(attr|omap)  [file]
> ceph-objectstore-tool ...  (get|rm)-(attr|omap) 
> ceph-objectstore-tool ...  get-omaphdr
> ceph-objectstore-tool ...  set-omaphdr [file]
> ceph-objectstore-tool ...  list-attrs
> ceph-objectstore-tool ...  list-omap
> ceph-objectstore-tool ...  remove
> ceph-objectstore-tool ...  dump
> ceph-objectstore-tool ...  set-size
> ceph-objectstore-tool ...  remove-clone-metadata 
>
>  can be a JSON object description as displayed
> by --op list.
>  can be an object name which will be looked up in all
> the OSD's PGs.
>  can be the empty string ('') which with a provided pgid
> specifies the pgmeta object
>
> The optional [file] argument will read stdin or write stdout
> if not specified or if '-' specified.
>
> [that's for the Jewel version]
>
> HTH,
>
> Matthew
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is 215 Euston Road, London, NW1 2BE.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread John Spray
On Mon, Oct 15, 2018 at 4:24 PM Kisik Jeong  wrote:
>
> Thank you for your reply, John.
>
> I  restarted my Ceph cluster and captured the mds logs.
>
> I found that mds shows slow request because some OSDs are laggy.
>
> I followed the ceph mds troubleshooting with 'mds slow request', but there is 
> no operation in flight:
>
> root@hpc1:~/iodc# ceph daemon mds.hpc1 dump_ops_in_flight
> {
> "ops": [],
> "num_ops": 0
> }
>
> Is there any other reason that mds shows slow request? Thank you.

Those stuck requests seem to be stuck because they're targeting pools
that don't exist.  Has something strange happened in the history of
this cluster that might have left a filesystem referencing pools that
no longer exist?  Ceph is not supposed to permit removal of pools in
use by CephFS, but perhaps something went wrong.

Check out the "ceph osd dump --format=json-pretty" and "ceph fs dump
--format=json-pretty" outputs and how the pool ID's relate.  According
to those logs, data pool with ID 1 and metadata pool with ID 2 do not
exist.

John

> -Kisik
>
> 2018년 10월 15일 (월) 오후 11:43, John Spray 님이 작성:
>>
>> On Mon, Oct 15, 2018 at 3:34 PM Kisik Jeong  wrote:
>> >
>> > Hello,
>> >
>> > I successfully deployed Ceph cluster with 16 OSDs and created CephFS 
>> > before.
>> > But after rebooting due to mds slow request problem, when creating CephFS, 
>> > Ceph mds goes creating status and never changes.
>> > Seeing Ceph status, there is no other problem I think. Here is 'ceph -s' 
>> > result:
>>
>> That's pretty strange.  Usually if an MDS is stuck in "creating", it's
>> because an OSD operation is stuck, but in your case all your PGs are
>> healthy.
>>
>> I would suggest setting "debug mds=20" and "debug objecter=10" on your
>> MDS, restarting it and capturing those logs so that we can see where
>> it got stuck.
>>
>> John
>>
>> > csl@hpc1:~$ ceph -s
>> >   cluster:
>> > id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
>> > health: HEALTH_OK
>> >
>> >   services:
>> > mon: 1 daemons, quorum hpc1
>> > mgr: hpc1(active)
>> > mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
>> > osd: 16 osds: 16 up, 16 in
>> >
>> >   data:
>> > pools:   2 pools, 640 pgs
>> > objects: 7 objects, 124B
>> > usage:   34.3GiB used, 116TiB / 116TiB avail
>> > pgs: 640 active+clean
>> >
>> > However, CephFS still works in case of 8 OSDs.
>> >
>> > If there is any doubt of this phenomenon, please let me know. Thank you.
>> >
>> > PS. I attached my ceph.conf contents:
>> >
>> > [global]
>> > fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
>> > mon_initial_members = hpc1
>> > mon_host = 192.168.40.10
>> > auth_cluster_required = cephx
>> > auth_service_required = cephx
>> > auth_client_required = cephx
>> >
>> > public_network = 192.168.40.0/24
>> > cluster_network = 192.168.40.0/24
>> >
>> > [osd]
>> > osd journal size = 1024
>> > osd max object name len = 256
>> > osd max object namespace len = 64
>> > osd mount options f2fs = active_logs=2
>> >
>> > [osd.0]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.1]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.2]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.3]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.4]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.5]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.6]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.7]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.8]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.9]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.10]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.11]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.12]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.13]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.14]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.15]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > --
>> > Kisik Jeong
>> > Ph.D. Student
>> > Computer Systems Laboratory
>> > Sungkyunkwan University
>> > ___
>> > ceph-users mailing list
>> > ceph-users@lists.ceph.com
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Kisik Jeong
> Ph.D. Student
> Computer Systems Laboratory
> 

Re: [ceph-users] Ceph mds is stuck in creating status

2018-10-15 Thread solarflow99
I had the same thing happen too when I built a ceph cluster on a single VM
for testing, I wasn't concerned though because I knew the slow speed was
likely a problem.


On Mon, Oct 15, 2018 at 7:34 AM Kisik Jeong 
wrote:

> Hello,
>
> I successfully deployed Ceph cluster with 16 OSDs and created CephFS
> before.
> But after rebooting due to mds slow request problem, when creating CephFS,
> Ceph mds goes creating status and never changes.
> Seeing Ceph status, there is no other problem I think. Here is 'ceph -s'
> result:
>
> csl@hpc1:~$ ceph -s
>   cluster:
> id: 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> health: HEALTH_OK
>
>   services:
> mon: 1 daemons, quorum hpc1
> mgr: hpc1(active)
> mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
> osd: 16 osds: 16 up, 16 in
>
>   data:
> pools:   2 pools, 640 pgs
> objects: 7 objects, 124B
> usage:   34.3GiB used, 116TiB / 116TiB avail
> pgs: 640 active+clean
>
> However, CephFS still works in case of 8 OSDs.
>
> If there is any doubt of this phenomenon, please let me know. Thank you.
>
> PS. I attached my ceph.conf contents:
>
> [global]
> fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
> mon_initial_members = hpc1
> mon_host = 192.168.40.10
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
>
> public_network = 192.168.40.0/24
> cluster_network = 192.168.40.0/24
>
> [osd]
> osd journal size = 1024
> osd max object name len = 256
> osd max object namespace len = 64
> osd mount options f2fs = active_logs=2
>
> [osd.0]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.1]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.2]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.3]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.4]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.5]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.6]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.7]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.8]
> host = hpc9
> public_addr = 192.168.40.18
> cluster_addr = 192.168.40.18
>
> [osd.9]
> host = hpc10
> public_addr = 192.168.40.19
> cluster_addr = 192.168.40.19
>
> [osd.10]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.11]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.12]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.13]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> [osd.14]
> host = hpc9
> public_addr = 192.168.10.18
> cluster_addr = 192.168.40.18
>
> [osd.15]
> host = hpc10
> public_addr = 192.168.10.19
> cluster_addr = 192.168.40.19
>
> --
> Kisik Jeong
> Ph.D. Student
> Computer Systems Laboratory
> Sungkyunkwan University
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread solarflow99
I think the answer is, yes.  I'm pretty sure only the OSDs require very
long life enterprise grade SSDs

On Mon, Oct 15, 2018 at 4:16 AM ST Wong (ITSC)  wrote:

> Hi all,
>
>
>
> We’ve got some servers with some small size SSD but no hard disks other
> than system disks.  While they’re not suitable for OSD, will the SSD be
> useful for running MON/MGR/MDS?
>
>
>
> Thanks a lot.
>
> Regards,
>
> /st wong
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] SSD for MON/MGR/MDS

2018-10-15 Thread Wido den Hollander


On 10/15/2018 07:50 PM, solarflow99 wrote:
> I think the answer is, yes.  I'm pretty sure only the OSDs require very
> long life enterprise grade SSDs
> 

Yes and No. Please use reliable Datacenter Grade SSDs for your MON
databases.

Something like 200GB is more then enough in your MON servers.

Wido

> On Mon, Oct 15, 2018 at 4:16 AM ST Wong (ITSC)  > wrote:
> 
> Hi all,
> 
> __ __
> 
> We’ve got some servers with some small size SSD but no hard disks
> other than system disks.  While they’re not suitable for OSD, will
> the SSD be useful for running MON/MGR/MDS?
> 
> __ __
> 
> Thanks a lot.
> 
> Regards,
> 
> /st wong
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com 
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com