Re: [ceph-users] Add one more public networks for ceph

2019-10-28 Thread luckydog xf
OK, thanks.


On Fri, Oct 25, 2019 at 6:07 PM Wido den Hollander  wrote:

>
>
> On 10/25/19 5:27 AM, luckydog xf wrote:
> > Hi, list,
> >
> > Currently my ceph nodes with 3 MON and 9 OSDs, everything is fine.
> > Now I plan to add onre more public network, the initial public network
> > is 103.x/24, and the target network is 109.x/24.  And 103 cannot reach
> > 109, as I don't config route table for them.
> >
> >I add 109.x for 3 MON nodes and they could reach one another, and I
> add
> > 
>
> That will not work. Make sure there is routing between the IP-space.
> That's the easiest and probably only way to make this work.
>
> Wido
>
> > public_network = 172.16.103.0/24 ,
> > 172.16.109.0/24 
> > 
> > After I manually change monmap file and run `ceph -s`
> > -
> >   services:
> > mon: 6 daemons, quorum
> >
> cephnode001,cephnode002,cephnode003,cephnode001-109,cephnode002-109,cephnode003-109
> > 
> > I check 6789 for both 103.x and 109.x is listening properly. But I mount
> > cephfs through IP 109.x, it fails.
> >
> > I use tcpdump and find there does have traffic directed to 109.x
> > interfaces of 3 MON host.
> >
> >  Anything wrong? Is it doable to have 2 public networks(subnets), served
> > for connections sourced from different subnets? I.E, 109 network for
> > connection from 109.x, while 103 network for 103.x?
> >
> > Thanks.
> >
> >
> >
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-28 Thread Pickett, Neale T
Hi!


Yes, resetting journals is exactly what we did, quite a while ago, when the mds 
ran out of memory because a journal entry had an absurdly large number in it (I 
think it may have been an inode number). We probably also reset the inode table 
later, which I recently learned resets a data structure on disk, and probably 
started us overwriting inodes or dentries or both.


So I take it (we are learning about filesystems very quickly over here) that 
ceph is reusing inode numbers. Re-scanning dentries will somehow figure out 
which dentry is most recent, and remove the older (now wrong) one. And somehow 
it can handle hard links, possibly (we don't have many, or any, of these).


Thanks very much for your help. This has been fascinating.


Neale





From: Patrick Donnelly 
Sent: Monday, October 28, 2019 12:52
To: Pickett, Neale T
Cc: ceph-users
Subject: Re: [ceph-users] Problematic inode preventing ceph-mds from starting

On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T  wrote:
> In the last week we have made a few changes to the down filesystem in an 
> attempt to fix what we thought was an inode problem:
>
>
> cephfs-data-scan scan_extents   # about 1 day with 64 processes
>
> cephfs-data-scan scan_inodes   # about 1 day with 64 processes
>
> cephfs-data_scan scan_links   # about 1 day

Did you reset the journals or perform any other disaster recovery
commands? This process likely introduced the duplicate inodes.

> After these three, we tried to start an MDS and it stayed up. We then ran:
>
> ceph tell mds.a scrub start / recursive repair
>
>
> The repair ran about 3 days, spewing logs to `ceph -w` about duplicated 
> inodes, until it stopped. All looked well until we began bringing production 
> services back online, at which point many error messages appeared, the mds 
> went back into damaged, and the fs back to degraded. At this point I removed 
> the objects you suggested, which brought everything back briefly.
>
> The latest crash is:
>
> -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 
> 2019-1...
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  258: FAILED ceph_assert(!p)

This error indicates a duplicate inode loaded into cache. Fixing this
probably requires significant intervention and (meta)data loss for
recent changes:

- Stop/unmount all clients. (Probably already the case if the rank is damaged!)

- Reset the MDS journal [1] and optionally recover any dentries first.
(This will hopefully resolve the ESubtreeMap errors you pasted.) Note
that some metadata may be lost through this command.

- `cephfs-data_scan scan_links` again. This should repair any
duplicate inodes (by dropping the older dentries).

- Then you can try marking the rank as repaired.

Good luck!

[1] 
https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation


--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-28 Thread Brad Hubbard
Yes, try and get the pgs healthy, then you can just re-provision the down OSDs.

Run a scrub on each of these pgs and then use the commands on the
following page to find out more information for each case.

https://docs.ceph.com/docs/luminous/rados/troubleshooting/troubleshooting-pg/

Focus on the commands 'list-missing', 'list-inconsistent-obj', and
'list-inconsistent-snapset'.

Let us know if you get stuck.

P.S. There are several threads about these sorts of issues in this
mailing list that should turn up when doing a web search.

On Tue, Oct 29, 2019 at 5:06 AM Jérémy Gardais
 wrote:
>
> Hello,
>
> From several weeks, i have some OSDs flapping before ending out of the
> cluster by Ceph…
> I was hoping some Ceph's magic and just gave it sometime to auto heal
> (and be able to do all the side work…) but it was a bad idea (what a
> surprise :D). Also got some inconsistents PGs, but i was waiting a quiet
> health cluster before trying to fix them.
>
> Now that i have more time, i also have 6 OSDs down+out on my 5 nodes
> cluster and 1~2 OSDs still flapping from time to time, i asking myself
> if these PGs might be the (one ?) source of my problem.
>
> The last OSD error on osd.28 gave these logs :
> -2> 2019-10-28 12:57:47.346460 7fefbdc4d700  5 -- 129.20.177.2:6811/47803 
> >> 129.20.177.3:6808/4141402 conn(0x55de8211a000 :-1 
> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2058 cs=1 l=0). rx osd.25 
> seq 169 0x55dea57b3600 MOSDPGPush(2.1d9 191810/191810 
> [PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
> 127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
> omap_entries_size: 0, attrset_size: 1, recovery_info: 
> ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006,
>  size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
> after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
> data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
> before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
> data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) 
> v3
> -1> 2019-10-28 12:57:47.346517 7fefbdc4d700  1 -- 129.20.177.2:6811/47803 
> <== osd.25 129.20.177.3:6808/4141402 169  MOSDPGPush(2.1d9 191810/191810 
> [PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
> 127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
> omap_entries_size: 0, attrset_size: 1, recovery_info: 
> ObjectRecoveryInfo(2:9b97b818:::rbd_data.c16b76b8b4567.0001426e:5926@127481'7241006,
>  size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
> after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
> data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
> before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
> data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) 
> v3  909+0+0 (1239474936 0 0) 0x55dea57b3600 con 0x55de8211a000
>  0> 2019-10-28 12:57:47.353680 7fef99441700 -1 
> /build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void 
> PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, 
> ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fef99441700 time 
> 2019-10-28 12:57:47.347132
> /build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: 354: FAILED 
> assert(recovery_info.oi.legacy_snaps.size())
>
>  ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous 
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x102) [0x55de72039f32]
>  2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo 
> const&, std::shared_ptr, bool, 
> ObjectStore::Transaction*)+0x135b) [0x55de71be330b]
>  3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, 
> ObjectStore::Transaction*)+0x31d) [0x55de71d4fadd]
>  4: (ReplicatedBackend::_do_push(boost::intrusive_ptr)+0x18f) 
> [0x55de71d4fd7f]
>  5: 
> (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2d1) 
> [0x55de71d5ff11]
>  6: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) 
> [0x55de71c7d030]
>  7: (PrimaryLogPG::do_request(boost::intrusive_ptr&, 
> ThreadPool::TPHandle&)+0x5f1) [0x55de71be87b1]
>  8: (OSD::dequeue_op(boost::intrusive_ptr, 
> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3f7) 
> [0x55de71a63e97]
>  9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
> const&)+0x57) [0x55de71cf5077]
>  10: (OSD::ShardedOpWQ::_process(unsigned int, 
> ceph::heartbeat_handle_d*)+0x108c) [0x55de71a94e1c]
>  11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) 
> [0x55de7203fbbd]
>  12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55de72041b80]
>  13: (()+0x8064) [0x7fefc12b5064]
>  14: (clone()+0x6d) [0x7fefc03a962d]
>  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
> 

Re: [ceph-users] RGW/swift segments

2019-10-28 Thread Peter Eisch
I should have noted this is with Luminous 12.2.12 and consistent with 
swiftclient versions from 3.0.0 to 3.8.1, which may not be relevant.  With a 
proper nod I can open a ticket for this – just want to make sure it’s not a 
config issue.

[client.rgw.cephrgw-s01]
  host = cephrgw-s01
  keyring = /etc/ceph/ceph.client.rgw.cephrgw-s01
  rgw_zone = 
  rgw zonegroup = us
  rgw realm = 
  rgw dns name = rgw-s00.
  rgw dynamic resharding = false
  rgw swift account in url = true
  rgw swift url = https://rgw-s00./swift/v1
  rgw keystone make new tenants = true
  rgw keystone implicit tenants = true
  rgw enable usage log = true
  rgw keystone accepted roles = _member_,admin
  rgw keystone admin domain = Default
  rgw keystone admin password = 
  rgw keystone admin project = admin
  rgw keystone admin user = admin
  rgw keystone api version = 3
  rgw keystone url = https://keystone-s00.
  rgw relaxed s3 bucket names = true
  rgw s3 auth use keystone = true
  rgw thread pool size = 4096
  rgw keystone revocation interval = 300
  rgw keystone token cache size = 1
  rgw swift versioning enabled = true
  rgw log nonexistent bucket = true

All tips accepted…

peter



Peter Eisch
Senior Site Reliability Engineer
T1.612.659.3228
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.
v2.64
From: ceph-users  on behalf of Peter Eisch 

Date: Monday, October 28, 2019 at 9:28 AM
To: "ceph-users@lists.ceph.com" 
Subject: [ceph-users] RGW/swift segments


Hi,

When uploading to RGW via swift I can set an expiration time.  The files being 
uploaded are large.  We segment them using the swift upload ‘-S’ arg.  This 
results in a 0-byte file in the bucket and all the data frags landing in a 
*_segments bucket.

When the expiration passes the 0-byte file is delete but all the segments 
remain.  Am I misconfigured or is this a bug where it won’t expire the actual 
data?  Shouldn’t RGW set the expiration on the uploaded segments too if they’re 
managed separately?

Thanks,

peter
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Inconsistents + FAILED assert(recovery_info.oi.legacy_snaps.size())

2019-10-28 Thread Jérémy Gardais
Hello,

From several weeks, i have some OSDs flapping before ending out of the
cluster by Ceph…
I was hoping some Ceph's magic and just gave it sometime to auto heal
(and be able to do all the side work…) but it was a bad idea (what a
surprise :D). Also got some inconsistents PGs, but i was waiting a quiet
health cluster before trying to fix them.

Now that i have more time, i also have 6 OSDs down+out on my 5 nodes
cluster and 1~2 OSDs still flapping from time to time, i asking myself
if these PGs might be the (one ?) source of my problem.

The last OSD error on osd.28 gave these logs :
-2> 2019-10-28 12:57:47.346460 7fefbdc4d700  5 -- 129.20.177.2:6811/47803 
>> 129.20.177.3:6808/4141402 conn(0x55de8211a000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2058 cs=1 l=0). rx osd.25 seq 
169 0x55dea57b3600 MOSDPGPush(2.1d9 191810/191810 
[PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
omap_entries_size: 0, attrset_size: 1, recovery_info: 
ObjectRecoveryInfo(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926@127481'7241006,
 size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) v3
-1> 2019-10-28 12:57:47.346517 7fefbdc4d700  1 -- 129.20.177.2:6811/47803 
<== osd.25 129.20.177.3:6808/4141402 169  MOSDPGPush(2.1d9 191810/191810 
[PushOp(2:9b97b818:::rbd_data.0c16b76b8b4567.0001426e:5926, version: 
127481'7241006, data_included: [], data_size: 0, omap_header_size: 0, 
omap_entries_size: 0, attrset_size: 1, recovery_info: 
ObjectRecoveryInfo(2:9b97b818:::rbd_data.c16b76b8b4567.0001426e:5926@127481'7241006,
 size: 4194304, copy_subset: [], clone_subset: {}, snapset: 0=[]:[]), 
after_progress: ObjectRecoveryProgress(!first, data_recovered_to:0, 
data_complete:true, omap_recovered_to:, omap_complete:true, error:false), 
before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, 
data_complete:false, omap_recovered_to:, omap_complete:false, error:false))]) 
v3  909+0+0 (1239474936 0 0) 0x55dea57b3600 con 0x55de8211a000
 0> 2019-10-28 12:57:47.353680 7fef99441700 -1 
/build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: In function 'virtual void 
PrimaryLogPG::on_local_recover(const hobject_t&, const ObjectRecoveryInfo&, 
ObjectContextRef, bool, ObjectStore::Transaction*)' thread 7fef99441700 time 
2019-10-28 12:57:47.347132
/build/ceph-12.2.12/src/osd/PrimaryLogPG.cc: 354: FAILED 
assert(recovery_info.oi.legacy_snaps.size())

 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous 
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x102) [0x55de72039f32]
 2: (PrimaryLogPG::on_local_recover(hobject_t const&, ObjectRecoveryInfo 
const&, std::shared_ptr, bool, 
ObjectStore::Transaction*)+0x135b) [0x55de71be330b]
 3: (ReplicatedBackend::handle_push(pg_shard_t, PushOp const&, PushReplyOp*, 
ObjectStore::Transaction*)+0x31d) [0x55de71d4fadd]
 4: (ReplicatedBackend::_do_push(boost::intrusive_ptr)+0x18f) 
[0x55de71d4fd7f]
 5: (ReplicatedBackend::_handle_message(boost::intrusive_ptr)+0x2d1) 
[0x55de71d5ff11]
 6: (PGBackend::handle_message(boost::intrusive_ptr)+0x50) 
[0x55de71c7d030]
 7: (PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0x5f1) [0x55de71be87b1]
 8: (OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x3f7) [0x55de71a63e97]
 9: (PGQueueable::RunVis::operator()(boost::intrusive_ptr 
const&)+0x57) [0x55de71cf5077]
 10: (OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0x108c) [0x55de71a94e1c]
 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x88d) 
[0x55de7203fbbd]
 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55de72041b80]
 13: (()+0x8064) [0x7fefc12b5064]
 14: (clone()+0x6d) [0x7fefc03a962d]
 NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_mirror
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   1/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 1 reserver
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
   1/ 5 

Re: [ceph-users] Problematic inode preventing ceph-mds from starting

2019-10-28 Thread Patrick Donnelly
On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T  wrote:
> In the last week we have made a few changes to the down filesystem in an 
> attempt to fix what we thought was an inode problem:
>
>
> cephfs-data-scan scan_extents   # about 1 day with 64 processes
>
> cephfs-data-scan scan_inodes   # about 1 day with 64 processes
>
> cephfs-data_scan scan_links   # about 1 day

Did you reset the journals or perform any other disaster recovery
commands? This process likely introduced the duplicate inodes.

> After these three, we tried to start an MDS and it stayed up. We then ran:
>
> ceph tell mds.a scrub start / recursive repair
>
>
> The repair ran about 3 days, spewing logs to `ceph -w` about duplicated 
> inodes, until it stopped. All looked well until we began bringing production 
> services back online, at which point many error messages appeared, the mds 
> went back into damaged, and the fs back to degraded. At this point I removed 
> the objects you suggested, which brought everything back briefly.
>
> The latest crash is:
>
> -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 
> 2019-1...
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  258: FAILED ceph_assert(!p)

This error indicates a duplicate inode loaded into cache. Fixing this
probably requires significant intervention and (meta)data loss for
recent changes:

- Stop/unmount all clients. (Probably already the case if the rank is damaged!)

- Reset the MDS journal [1] and optionally recover any dentries first.
(This will hopefully resolve the ESubtreeMap errors you pasted.) Note
that some metadata may be lost through this command.

- `cephfs-data_scan scan_links` again. This should repair any
duplicate inodes (by dropping the older dentries).

- Then you can try marking the rank as repaired.

Good luck!

[1] 
https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation


--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] very high ram usage by OSDs on Nautilus

2019-10-28 Thread Mark Nelson

Hi Philippe,


Have you looked at the mempool stats yet?


ceph daemon osd.NNN dump_mempools


You may also want to look at the heap stats, and potentially enable 
debug 5 for bluestore to see what the priority cache manager is doing.  
Typically in these cases we end up seeing a ton of memory used by 
something and the priority cache manager is trying to compensate by 
shrinking the caches, but you won't really know until you start looking 
at the various statistics and logging.



Mark


On 10/28/19 2:54 AM, Philippe D'Anjou wrote:

Hi,

we are seeing quite a high memory usage by OSDs since Nautilus. 
Averaging 10GB/OSD for 10TB HDDs. But I had OOM issues on 128GB 
Systems because some single OSD processes used up to 32%.


Here an example how they look on average: https://i.imgur.com/kXCtxMe.png

Is that normal? I never seen this on luminous. Memory leaks?
Using all default values, OSDs have no special configuration. Use case 
is cephfs.


v14.2.4 on Ubuntu 18.04 LTS

Seems a bit high?

Thanks for help

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] RGW/swift segments

2019-10-28 Thread Peter Eisch
Hi,

When uploading to RGW via swift I can set an expiration time.  The files being 
uploaded are large.  We segment them using the swift upload ‘-S’ arg.  This 
results in a 0-byte file in the bucket and all the data frags landing in a 
*_segments bucket.

When the expiration passes the 0-byte file is delete but all the segments 
remain.  Am I misconfigured or is this a bug where it won’t expire the actual 
data?  Shouldn’t RGW set the expiration on the uploaded segments too if they’re 
managed separately?

Thanks,

peter

Peter Eisch
Senior Site Reliability Engineer
T1.612.659.3228
virginpulse.com
|virginpulse.com/global-challenge
Australia | Bosnia and Herzegovina | Brazil | Canada | Singapore | Switzerland 
| United Kingdom | USA
Confidentiality Notice: The information contained in this e-mail, including any 
attachment(s), is intended solely for use by the designated recipient(s). 
Unauthorized use, dissemination, distribution, or reproduction of this message 
by anyone other than the intended recipient(s), or a person designated as 
responsible for delivering such messages to the intended recipient, is strictly 
prohibited and may be unlawful. This e-mail may contain proprietary, 
confidential or privileged information. Any views or opinions expressed are 
solely those of the author and do not necessarily represent those of Virgin 
Pulse, Inc. If you have received this message in error, or are not the named 
recipient(s), please immediately notify the sender and delete this e-mail 
message.
v2.64
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs 1 large omap objects

2019-10-28 Thread Jake Grimmett
Hi Paul, Nigel,

I'm also seeing "HEALTH_WARN 6 large omap objects" warnings with cephfs
after upgrading to 14.2.4:

The affected osd's are used (only) by the metadata pool:

POOLID STORED OBJECTS USED   %USED  MAX AVAIL
mds_ssd  1 64 GiB 1.74M   65 GiB 4.47   466 GiB

See below for more log details.

While I'm glad we can silence the warning, should I be worried about the
values reported in the log causing real problems?

many thanks

Jake

[root@ceph1 ~]# zgrep "Large omap object found" /var/log/ceph/ceph.log*

/log/ceph/ceph.log-20191022.gz:2019-10-21 15:43:45.800608 osd.2 (osd.2)
262 : cluster [WRN] Large omap object found. Object:
1:e5134dd5:::10007b4b304.0240:head Key count: 524005 Size (bytes):
242090310
/var/log/ceph/ceph.log-20191022.gz:2019-10-21 15:43:48.440425 osd.2
(osd.2) 263 : cluster [WRN] Large omap object found. Object:
1:e5347802:::1000861ecf6.:head Key count: 395404 Size (bytes):
182676204
/var/log/ceph/ceph.log-20191025.gz:2019-10-24 23:53:25.348227 osd.2
(osd.2) 58 : cluster [WRN] Large omap object found. Object:
1:2f12e2d8:::10007b4b304.0180:head Key count: 1041988 Size (bytes):
481398012
/var/log/ceph/ceph.log-20191026.gz:2019-10-25 10:54:57.478636 osd.2
(osd.2) 69 : cluster [WRN] Large omap object found. Object:
1:effe741b:::1000763dfe6.:head Key count: 640788 Size (bytes):
296043612
/var/log/ceph/ceph.log-20191026.gz:2019-10-25 19:57:11.894099 osd.3
(osd.3) 326 : cluster [WRN] Large omap object found. Object:
1:4b4f7436:::10007b4b304.0200:head Key count: 522689 Size (bytes):
241482318
/var/log/ceph/ceph.log-20191027.gz:2019-10-27 02:30:10.648346 osd.3
(osd.3) 351 : cluster [WRN] Large omap object found. Object:
1:a47c6896:::1000894a736.:head Key count: 768126 Size (bytes):
354873768
On 10/8/19 10:27 AM, Paul Emmerich wrote:
> Hi,
> 
> the default for this warning changed recently (see other similar
> threads on the mailing list), it was 2 million before 14.2.3.
> 
> I don't think the new default of 200k is a good choice, so increasing
> it is a reasonable work-around.
> 
> Paul
> 


-- 
Jake Grimmett
MRC Laboratory of Molecular Biology
Francis Crick Avenue,
Cambridge CB2 0QH, UK.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph is moving data ONLY to near-full OSDs [BUG]

2019-10-28 Thread Philippe D'Anjou
 I was following the pg autoscaler recommendations and I did not get a 
recommendation to raise the PGs there.
I'll try that, I am raising it already. But still seems weird why it would move 
data onto almost full OSDs, see the data distribution, it's horrible, ranges 
from 60 to almost 90% of full. PGs are not equally distributed otherwise it'd 
be a PG size issue.

Thanks

Am Sonntag, 27. Oktober 2019, 20:33:11 OEZ hat Wido den Hollander 
 Folgendes geschrieben:  
 
 

On 10/26/19 8:01 AM, Philippe D'Anjou wrote:
> V14.2.4
> So, this is not new, this happens every time there is a rebalance, now
> because of raising PGs. PG balancer is disabled because I thought it was
> the reason but apparently it's not, but it ain't helping either.
> 
> Ceph is totally borged, it's only moving data on nearfull OSDs causing
> issues. See this after PG raise.
> 
>     health: HEALTH_WARN
>     3 nearfull osd(s)
>     2 pool(s) nearfull
> 
> 
> 08:44 am
> 
> 92   hdd  9.09470  1.0 9.1 TiB  7.5 TiB  7.5 TiB  48 KiB  19 GiB 1.6
> TiB 82.79 1.25  39 up
> 71   hdd  9.09470  1.0 9.1 TiB  7.6 TiB  7.6 TiB  88 KiB  20 GiB 1.4
> TiB 84.09 1.27  38 up
> 21   hdd  9.09470  1.0 9.1 TiB  7.6 TiB  7.6 TiB  60 KiB  20 GiB 1.5
> TiB 84.05 1.27  36 up
>  
>  
> 08:54 am
> 
> 92   hdd  9.09470  1.0 9.1 TiB  7.5 TiB  7.5 TiB  48 KiB  19 GiB 1.6
> TiB 82.81 1.25  39 up
> 71   hdd  9.09470  1.0 9.1 TiB  7.7 TiB  7.6 TiB  88 KiB  20 GiB 1.4
> TiB 84.14 1.27  38 up
> 21   hdd  9.09470  1.0 9.1 TiB  7.6 TiB  7.6 TiB  60 KiB  20 GiB 1.4
> TiB 84.10 1.27  36 up
>    
>    
>    
>  14   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  76 KiB  17 GiB
> 2.6 TiB 71.33 1.07  32 up
>  19   hdd  9.09470  1.0 9.1 TiB  6.3 TiB  6.2 TiB  52 KiB  17 GiB
> 2.8 TiB 68.81 1.04  30 up
>  22   hdd  9.09470  1.0 9.1 TiB  6.3 TiB  6.2 TiB  92 KiB  17 GiB
> 2.8 TiB 68.90 1.04  32 up
>  25   hdd  9.09470  1.0 9.1 TiB  6.2 TiB  6.2 TiB 219 KiB  17 GiB
> 2.9 TiB 68.11 1.03  31 up
>  30   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  20 KiB  17 GiB
> 2.6 TiB 71.41 1.08  33 up
>  33   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  40 KiB  17 GiB
> 2.6 TiB 71.30 1.07  32 up
>  34   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  36 KiB  17 GiB
> 2.6 TiB 71.33 1.07  30 up
>  35   hdd  9.09470  1.0 9.1 TiB  6.6 TiB  6.6 TiB 124 KiB  17 GiB
> 2.5 TiB 72.61 1.09  32 up
>  12   hdd  9.09470  1.0 9.1 TiB  6.7 TiB  6.7 TiB  24 KiB  18 GiB
> 2.4 TiB 73.84 1.11  32 up
>  16   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.4 TiB  96 KiB  17 GiB
> 2.6 TiB 71.08 1.07  29 up
>  17   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  60 KiB  17 GiB
> 2.6 TiB 71.41 1.08  31 up
>  20   hdd  9.09470  1.0 9.1 TiB  6.2 TiB  6.2 TiB  92 KiB  17 GiB
> 2.9 TiB 68.57 1.03  28 up
>  23   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  36 KiB  17 GiB
> 2.6 TiB 71.37 1.08  29 up
>  26   hdd  9.09470  1.0 9.1 TiB  6.4 TiB  6.4 TiB  84 KiB  17 GiB
> 2.7 TiB 70.02 1.06  30 up
>  28   hdd  9.09470  1.0 9.1 TiB  6.4 TiB  6.4 TiB  28 KiB  17 GiB
> 2.7 TiB 70.11 1.06  30 up
>  31   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  56 KiB  17 GiB
> 2.6 TiB 71.26 1.07  32 up
>  13   hdd  9.09470  1.0 9.1 TiB  6.7 TiB  6.7 TiB  24 KiB  18 GiB
> 2.4 TiB 73.84 1.11  31 up
>  15   hdd  9.09470  1.0 9.1 TiB  6.5 TiB  6.5 TiB  44 KiB  17 GiB
> 2.6 TiB 71.35 1.08  29 up
>  18   hdd  9.09470  1.0 9.1 TiB  5.8 TiB  5.8 TiB  76 KiB  16 GiB
> 3.3 TiB 63.70 0.96  26 up
>  21   hdd  9.09470  1.0 9.1 TiB  7.6 TiB  7.6 TiB  60 KiB  20 GiB
> 1.4 TiB 84.10 1.27  36 up
>  24   hdd  9.09470  1.0 9.1 TiB  5.8 TiB  5.8 TiB  64 KiB  15 GiB
> 3.3 TiB 63.67 0.96  28 up
>  27   hdd  9.09470  1.0 9.1 TiB  6.0 TiB  6.0 TiB  48 KiB  17 GiB
> 3.1 TiB 66.03 1.00  28 up
>  29   hdd  9.09470  1.0 9.1 TiB  6.4 TiB  6.3 TiB  28 KiB  18 GiB
> 2.7 TiB 69.93 1.05  34 up
>  32   hdd  9.09470  1.0 9.1 TiB  6.0 TiB  6.0 TiB  20 KiB  17 GiB
> 3.1 TiB 66.20 1.00  28 up
>  37   hdd  9.09470  1.0 9.1 TiB  6.4 TiB  6.4 TiB  32 KiB  18 GiB
> 2.7 TiB 70.59 1.06  31 up
>  39   hdd  9.09470  1.0 9.1 TiB  6.4 TiB  6.4 TiB  32 KiB  19 GiB
> 2.7 TiB 70.50 1.06  29 up
>  41   hdd  9.09470  1.0 9.1 TiB  6.3 TiB  6.2 TiB  52 KiB  17 GiB
> 2.8 TiB 68.79 1.04  30 up
>  43   hdd  9.09470  1.0 9.1 TiB  6.3 TiB  6.2 TiB  48 KiB  17 GiB
> 2.8 TiB 68.84 1.04  28 up
>  45   hdd  9.09470  1.0 9.1 TiB  6.7 TiB  6.7 TiB  80 KiB  18 GiB
> 2.4 TiB 74.02 1.12  33 up
>  46   hdd  9.09470  1.0 9.1 TiB  6.7 TiB  6.7 TiB  36 KiB  18 GiB
> 2.4 TiB 73.88 1.11  30 up
>  48   hdd  9.09470  1.0 9.1 TiB  6.6 TiB  6.6 TiB 101 KiB  17 GiB
> 2.5 TiB 72.57 1.09  31 up
>  50   hdd  9.09470  1.0 9.1 TiB  6.3 TiB  6.2 TiB  96 KiB  17 GiB
> 2.8 TiB 68.86 1.04  31 up
>  36   hdd  9.09470  1.0 9.1 TiB 

[ceph-users] very high ram usage by OSDs on Nautilus

2019-10-28 Thread Philippe D'Anjou
Hi,
we are seeing quite a high memory usage by OSDs since Nautilus. Averaging 
10GB/OSD for 10TB HDDs. But I had OOM issues on 128GB Systems because some 
single OSD processes used up to 32%.
Here an example how they look on average: https://i.imgur.com/kXCtxMe.png
Is that normal? I never seen this on luminous. Memory leaks?Using all default 
values, OSDs have no special configuration. Use case is cephfs.

v14.2.4 on Ubuntu 18.04 LTS
Seems a bit high?
Thanks for help
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com