Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Finally goodness happened!
I applied PR and ran repair on OSD unmodified after initial failure. It went 
through without any errors and now I'm able to fuse mount the OSD and export 
PGs off it using ceph-objectstore-tool. Just in order to not mess it up I 
haven't started ceph-osd until I have PGs backed up.
Cheers Igor, you're the best!


> On 3.10.2018, at 14:39, Igor Fedotov  wrote:
> 
> To fix this specific issue please apply the following PR: 
> https://github.com/ceph/ceph/pull/24339
> 
> This wouldn't fix original issue but just in case please try to run repair 
> again. Will need log if an error is different from ENOSPC from your latest 
> email.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/3/2018 1:58 PM, Sergey Malinin wrote:
>> Repair has gone farther but failed on something different - this time it 
>> appears to be related to store inconsistency rather than lack of free space. 
>> Emailed log to you, beware: over 2GB uncompressed.
>> 
>> 
>>> On 3.10.2018, at 13:15, Igor Fedotov  wrote:
>>> 
>>> You may want to try new updates from the PR along with disabling flush on 
>>> recovery for rocksdb (avoid_flush_during_recovery parameter).
>>> 
>>> Full cmd line might looks like:
>>> 
>>> CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" 
>>> bin/ceph-bluestore-tool --path  repair
>>> 
>>> 
>>> To be applied for "non-expanded" OSDs where repair didn't pass.
>>> 
>>> Please collect a log during repair...
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
 Repair goes through only when LVM volume has been expanded, otherwise it 
 fails with enospc as well as any other operation. However, expanding the 
 volume immediately renders bluefs unmountable with IO error.
 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very 
 end of bluefs-log-dump), I'm not sure whether corruption occurred before 
 or after volume expansion.
 
 
> On 2.10.2018, at 16:07, Igor Fedotov  wrote:
> 
> You mentioned repair had worked before, is that correct? What's the 
> difference now except the applied patch? Different OSD? Anything else?
> 
> 
> On 10/2/2018 3:52 PM, Sergey Malinin wrote:
> 
>> It didn't work, emailed logs to you.
>> 
>> 
>>> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
>>> 
>>> The major change is in get_bluefs_rebalance_txn function, it lacked 
>>> bluefs_rebalance_txn assignment..
>>> 
>>> 
>>> 
>>> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
 PR doesn't seem to have changed since yesterday. Am I missing 
 something?
 
 
> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
> 
> Please update the patch from the PR - it didn't update bluefs extents 
> list before.
> 
> Also please set debug bluestore 20 when re-running repair and collect 
> the log.
> 
> If repair doesn't help - would you send repair and startup logs 
> directly to me as I have some issues accessing ceph-post-file uploads.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
>> Yes, I did repair all OSDs and it finished with 'repair success'. I 
>> backed up OSDs so now I have more room to play.
>> I posted log files using ceph-post-file with the following IDs:
>> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
>> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
>> 
>> 
>>> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
>>> 
>>> You did repair for any of this OSDs, didn't you? For all of them?
>>> 
>>> 
>>> Would you please provide the log for both types (failed on mount 
>>> and failed with enospc) of failing OSDs. Prior to collecting please 
>>> remove existing ones prior and set debug bluestore to 20.
>>> 
>>> 
>>> 
>>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
 I was able to apply patches to mimic, but nothing changed. One osd 
 that I had space expanded on fails with bluefs mount IO error, 
 others keep failing with enospc.
 
 
> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
> 
> So you should call repair which rebalances (i.e. allocates 
> additional space) BlueFS space. Hence allowing OSD to start.
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
>> Not exactly. The rebalancing from this kv_sync_thread still 
>> might be deferred due to the nature of this thread (haven't 100% 
>> sure though).
>> 
>> Here is my PR showing the idea (still untested and perhaps 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov
To fix this specific issue please apply the following PR: 
https://github.com/ceph/ceph/pull/24339


This wouldn't fix original issue but just in case please try to run 
repair again. Will need log if an error is different from ENOSPC from 
your latest email.



Thanks,

Igor


On 10/3/2018 1:58 PM, Sergey Malinin wrote:

Repair has gone farther but failed on something different - this time it 
appears to be related to store inconsistency rather than lack of free space. 
Emailed log to you, beware: over 2GB uncompressed.



On 3.10.2018, at 13:15, Igor Fedotov  wrote:

You may want to try new updates from the PR along with disabling flush on 
recovery for rocksdb (avoid_flush_during_recovery parameter).

Full cmd line might looks like:

CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" 
bin/ceph-bluestore-tool --path  repair


To be applied for "non-expanded" OSDs where repair didn't pass.

Please collect a log during repair...


Thanks,

Igor

On 10/2/2018 4:32 PM, Sergey Malinin wrote:

Repair goes through only when LVM volume has been expanded, otherwise it fails 
with enospc as well as any other operation. However, expanding the volume 
immediately renders bluefs unmountable with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end 
of bluefs-log-dump), I'm not sure whether corruption occurred before or after 
volume expansion.



On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's the difference 
now except the applied patch? Different OSD? Anything else?


On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..



On 10/2/2018 2:40 PM, Sergey Malinin wrote:

PR doesn't seem to have changed since yesterday. Am I missing something?



On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs extents list 
before.

Also please set debug bluestore 20 when re-running repair and collect the log.

If repair doesn't help - would you send repair and startup logs directly to me 
as I have some issues accessing ceph-post-file uploads.


Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
  

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Update:
I rebuilt ceph-osd with latest PR and it started, worked for a few minutes and 
eventually failed on enospc.
After that ceph-bluestore-tool repair started to fail on enospc again. I was 
unable to collect ceph-osd log, so emailed you the most recent repair log.



> On 3.10.2018, at 13:58, Sergey Malinin  wrote:
> 
> Repair has gone farther but failed on something different - this time it 
> appears to be related to store inconsistency rather than lack of free space. 
> Emailed log to you, beware: over 2GB uncompressed.
> 
> 
>> On 3.10.2018, at 13:15, Igor Fedotov  wrote:
>> 
>> You may want to try new updates from the PR along with disabling flush on 
>> recovery for rocksdb (avoid_flush_during_recovery parameter).
>> 
>> Full cmd line might looks like:
>> 
>> CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" 
>> bin/ceph-bluestore-tool --path  repair
>> 
>> 
>> To be applied for "non-expanded" OSDs where repair didn't pass.
>> 
>> Please collect a log during repair...
>> 
>> 
>> Thanks,
>> 
>> Igor
>> 
>> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
>>> Repair goes through only when LVM volume has been expanded, otherwise it 
>>> fails with enospc as well as any other operation. However, expanding the 
>>> volume immediately renders bluefs unmountable with IO error.
>>> 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very 
>>> end of bluefs-log-dump), I'm not sure whether corruption occurred before or 
>>> after volume expansion.
>>> 
>>> 
 On 2.10.2018, at 16:07, Igor Fedotov  wrote:
 
 You mentioned repair had worked before, is that correct? What's the 
 difference now except the applied patch? Different OSD? Anything else?
 
 
 On 10/2/2018 3:52 PM, Sergey Malinin wrote:
 
> It didn't work, emailed logs to you.
> 
> 
>> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
>> 
>> The major change is in get_bluefs_rebalance_txn function, it lacked 
>> bluefs_rebalance_txn assignment..
>> 
>> 
>> 
>> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>>> PR doesn't seem to have changed since yesterday. Am I missing something?
>>> 
>>> 
 On 2.10.2018, at 14:15, Igor Fedotov  wrote:
 
 Please update the patch from the PR - it didn't update bluefs extents 
 list before.
 
 Also please set debug bluestore 20 when re-running repair and collect 
 the log.
 
 If repair doesn't help - would you send repair and startup logs 
 directly to me as I have some issues accessing ceph-post-file uploads.
 
 
 Thanks,
 
 Igor
 
 
 On 10/2/2018 11:39 AM, Sergey Malinin wrote:
> Yes, I did repair all OSDs and it finished with 'repair success'. I 
> backed up OSDs so now I have more room to play.
> I posted log files using ceph-post-file with the following IDs:
> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
> 
> 
>> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
>> 
>> You did repair for any of this OSDs, didn't you? For all of them?
>> 
>> 
>> Would you please provide the log for both types (failed on mount and 
>> failed with enospc) of failing OSDs. Prior to collecting please 
>> remove existing ones prior and set debug bluestore to 20.
>> 
>> 
>> 
>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>>> I was able to apply patches to mimic, but nothing changed. One osd 
>>> that I had space expanded on fails with bluefs mount IO error, 
>>> others keep failing with enospc.
>>> 
>>> 
 On 1.10.2018, at 19:26, Igor Fedotov  wrote:
 
 So you should call repair which rebalances (i.e. allocates 
 additional space) BlueFS space. Hence allowing OSD to start.
 
 Thanks,
 
 Igor
 
 
 On 10/1/2018 7:22 PM, Igor Fedotov wrote:
> Not exactly. The rebalancing from this kv_sync_thread still might 
> be deferred due to the nature of this thread (haven't 100% sure 
> though).
> 
> Here is my PR showing the idea (still untested and perhaps 
> unfinished!!!)
> 
> https://github.com/ceph/ceph/pull/24353
> 
> 
> Igor
> 
> 
> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
>> Can you please confirm whether I got this right:
>> 
>> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
>> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
>> @@ -9049,22 +9049,17 @@
>>

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Sergey Malinin
Repair has gone farther but failed on something different - this time it 
appears to be related to store inconsistency rather than lack of free space. 
Emailed log to you, beware: over 2GB uncompressed.


> On 3.10.2018, at 13:15, Igor Fedotov  wrote:
> 
> You may want to try new updates from the PR along with disabling flush on 
> recovery for rocksdb (avoid_flush_during_recovery parameter).
> 
> Full cmd line might looks like:
> 
> CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" 
> bin/ceph-bluestore-tool --path  repair
> 
> 
> To be applied for "non-expanded" OSDs where repair didn't pass.
> 
> Please collect a log during repair...
> 
> 
> Thanks,
> 
> Igor
> 
> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
>> Repair goes through only when LVM volume has been expanded, otherwise it 
>> fails with enospc as well as any other operation. However, expanding the 
>> volume immediately renders bluefs unmountable with IO error.
>> 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very 
>> end of bluefs-log-dump), I'm not sure whether corruption occurred before or 
>> after volume expansion.
>> 
>> 
>>> On 2.10.2018, at 16:07, Igor Fedotov  wrote:
>>> 
>>> You mentioned repair had worked before, is that correct? What's the 
>>> difference now except the applied patch? Different OSD? Anything else?
>>> 
>>> 
>>> On 10/2/2018 3:52 PM, Sergey Malinin wrote:
>>> 
 It didn't work, emailed logs to you.
 
 
> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
> 
> The major change is in get_bluefs_rebalance_txn function, it lacked 
> bluefs_rebalance_txn assignment..
> 
> 
> 
> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>> PR doesn't seem to have changed since yesterday. Am I missing something?
>> 
>> 
>>> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
>>> 
>>> Please update the patch from the PR - it didn't update bluefs extents 
>>> list before.
>>> 
>>> Also please set debug bluestore 20 when re-running repair and collect 
>>> the log.
>>> 
>>> If repair doesn't help - would you send repair and startup logs 
>>> directly to me as I have some issues accessing ceph-post-file uploads.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
 Yes, I did repair all OSDs and it finished with 'repair success'. I 
 backed up OSDs so now I have more room to play.
 I posted log files using ceph-post-file with the following IDs:
 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
 20df7df5-f0c9-4186-aa21-4e5c0172cd93
 
 
> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
> 
> You did repair for any of this OSDs, didn't you? For all of them?
> 
> 
> Would you please provide the log for both types (failed on mount and 
> failed with enospc) of failing OSDs. Prior to collecting please 
> remove existing ones prior and set debug bluestore to 20.
> 
> 
> 
> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>> I was able to apply patches to mimic, but nothing changed. One osd 
>> that I had space expanded on fails with bluefs mount IO error, 
>> others keep failing with enospc.
>> 
>> 
>>> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
>>> 
>>> So you should call repair which rebalances (i.e. allocates 
>>> additional space) BlueFS space. Hence allowing OSD to start.
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
 Not exactly. The rebalancing from this kv_sync_thread still might 
 be deferred due to the nature of this thread (haven't 100% sure 
 though).
 
 Here is my PR showing the idea (still untested and perhaps 
 unfinished!!!)
 
 https://github.com/ceph/ceph/pull/24353
 
 
 Igor
 
 
 On 10/1/2018 7:07 PM, Sergey Malinin wrote:
> Can you please confirm whether I got this right:
> 
> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
> @@ -9049,22 +9049,17 @@
> throttle_bytes.put(costs);
>   PExtentVector bluefs_gift_extents;
> -  if (bluefs &&
> -  after_flush - bluefs_last_balance >
> -  cct->_conf->bluestore_bluefs_balance_interval) {
> -bluefs_last_balance = after_flush;
> -int r = _balance_bluefs_freespace(_gift_extents);
> -assert(r >= 0);
> -if (r > 0) {
> -  for (auto& p : 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov

Alex,

upstream recommendations for DB sizing are probably good enough but as 
most of fixed allocations they aren't super optimal for all the use 
cases. Usually one either wastes space or lacks it pme day in such configs.


So I think we should have means to have more freedom in volumes 
management (change sizes, migrate, coalesce and split).


LVM usage is a big step toward that but it is still insufficient and 
lacks additional helpers sometimes.



To avoid the issue Sergey is experiencing IMO it's better to have 
standalone DB volume with some extra spare space.  Even if the physical 
media is the same it helps to avoid this lazy rebalancing procedure 
which is the issue's root cause. But this wouldn't eliminate it totally 
- if spillover to main device takes place one might face it again.
The same improvement can be probably done with single device 
configuration by proper rebalance tuning though (bluestore_bluefs_min 
and other params) but that's more complicated to debug and setup 
properly IMO.

Anyway I think the issue is met very rarely.

Sorry given all that I wouldn't comment if 30 GB fits your scenario or 
not. I don't know :)


Thanks,
Igor

On 10/2/2018 5:23 PM, Alex Litvak wrote:

Igor,

Thank you for your reply.  So what you are saying there are really no 
sensible space requirements for a collocated device? Even if I setup 
30 GB for DB (which I really wouldn't like to do due to a space waste 
considerations ) there is a chance that if this space feels up I will 
be in the same trouble under some heavy load scenario?


On 10/2/2018 9:15 AM, Igor Fedotov wrote:
Even with a single device bluestore has a sort of implicit "BlueFS 
partition" where DB is stored.  And it dynamically adjusts 
(rebalances) the space for that partition in background. 
Unfortunately it might perform that "too lazy" and hence under some 
heavy load it might end-up with the lack of space for that partition. 
While main device still has plenty of free space.


I'm planning to refactor this re-balancing procedure in the future to 
eliminate the root cause.



Thanks,

Igor


On 10/2/2018 5:04 PM, Alex Litvak wrote:
I am sorry for interrupting the thread, but my understanding always 
was that blue store on the single device should not care of the DB 
size, i.e. it would use the data part for all operations if DB is 
full.  And if it is not true, what would be sensible defaults on 800 
GB SSD?  I used ceph-ansible to build my cluster with system 
defaults and from I reading in this thread doesn't give me a good 
feeling at all. Document ion on the topic is very sketchy and online 
posts contradict each other some times.


Thank you in advance,

On 10/2/2018 8:52 AM, Igor Fedotov wrote:

May I have a repair log for that "already expanded" OSD?


On 10/2/2018 4:32 PM, Sergey Malinin wrote:
Repair goes through only when LVM volume has been expanded, 
otherwise it fails with enospc as well as any other operation. 
However, expanding the volume immediately renders bluefs 
unmountable with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at 
the very end of bluefs-log-dump), I'm not sure whether corruption 
occurred before or after volume expansion.




On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's 
the difference now except the applied patch? Different OSD? 
Anything else?



On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it 
lacked bluefs_rebalance_txn assignment..




On 10/2/2018 2:40 PM, Sergey Malinin wrote:
PR doesn't seem to have changed since yesterday. Am I missing 
something?




On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs 
extents list before.


Also please set debug bluestore 20 when re-running repair and 
collect the log.


If repair doesn't help - would you send repair and startup 
logs directly to me as I have some issues accessing 
ceph-post-file uploads.



Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:
Yes, I did repair all OSDs and it finished with 'repair 
success'. I backed up OSDs so now I have more room to play.

I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of 
them?



Would you please provide the log for both types (failed on 
mount and failed with enospc) of failing OSDs. Prior to 
collecting please remove existing ones prior and set debug 
bluestore to 20.




On 10/2/2018 2:16 AM, Sergey Malinin wrote:
I was able to apply patches to mimic, but nothing changed. 
One osd that I had space expanded on fails with bluefs 
mount IO error, others keep failing with enospc.



On 1.10.2018, at 19:26, Igor 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-03 Thread Igor Fedotov
You may want to try new updates from the PR along with disabling flush 
on recovery for rocksdb (avoid_flush_during_recovery parameter).


Full cmd line might looks like:

CEPH_ARGS="--bluestore_rocksdb_options avoid_flush_during_recovery=1" 
bin/ceph-bluestore-tool --path  repair



To be applied for "non-expanded" OSDs where repair didn't pass.

Please collect a log during repair...


Thanks,

Igor

On 10/2/2018 4:32 PM, Sergey Malinin wrote:

Repair goes through only when LVM volume has been expanded, otherwise it fails 
with enospc as well as any other operation. However, expanding the volume 
immediately renders bluefs unmountable with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end 
of bluefs-log-dump), I'm not sure whether corruption occurred before or after 
volume expansion.



On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's the difference 
now except the applied patch? Different OSD? Anything else?


On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..



On 10/2/2018 2:40 PM, Sergey Malinin wrote:

PR doesn't seem to have changed since yesterday. Am I missing something?



On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs extents list 
before.

Also please set debug bluestore 20 when re-running repair and collect the log.

If repair doesn't help - would you send repair and startup logs directly to me 
as I have some issues accessing ceph-post-file uploads.


Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Sent download link by email. verbosity=10, over 900M uncompressed.


> On 2.10.2018, at 16:52, Igor Fedotov  wrote:
> 
> May I have a repair log for that "already expanded" OSD?
> 
> 
> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
>> Repair goes through only when LVM volume has been expanded, otherwise it 
>> fails with enospc as well as any other operation. However, expanding the 
>> volume immediately renders bluefs unmountable with IO error.
>> 2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very 
>> end of bluefs-log-dump), I'm not sure whether corruption occurred before or 
>> after volume expansion.
>> 
>> 
>>> On 2.10.2018, at 16:07, Igor Fedotov  wrote:
>>> 
>>> You mentioned repair had worked before, is that correct? What's the 
>>> difference now except the applied patch? Different OSD? Anything else?
>>> 
>>> 
>>> On 10/2/2018 3:52 PM, Sergey Malinin wrote:
>>> 
 It didn't work, emailed logs to you.
 
 
> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
> 
> The major change is in get_bluefs_rebalance_txn function, it lacked 
> bluefs_rebalance_txn assignment..
> 
> 
> 
> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>> PR doesn't seem to have changed since yesterday. Am I missing something?
>> 
>> 
>>> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
>>> 
>>> Please update the patch from the PR - it didn't update bluefs extents 
>>> list before.
>>> 
>>> Also please set debug bluestore 20 when re-running repair and collect 
>>> the log.
>>> 
>>> If repair doesn't help - would you send repair and startup logs 
>>> directly to me as I have some issues accessing ceph-post-file uploads.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
 Yes, I did repair all OSDs and it finished with 'repair success'. I 
 backed up OSDs so now I have more room to play.
 I posted log files using ceph-post-file with the following IDs:
 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
 20df7df5-f0c9-4186-aa21-4e5c0172cd93
 
 
> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
> 
> You did repair for any of this OSDs, didn't you? For all of them?
> 
> 
> Would you please provide the log for both types (failed on mount and 
> failed with enospc) of failing OSDs. Prior to collecting please 
> remove existing ones prior and set debug bluestore to 20.
> 
> 
> 
> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>> I was able to apply patches to mimic, but nothing changed. One osd 
>> that I had space expanded on fails with bluefs mount IO error, 
>> others keep failing with enospc.
>> 
>> 
>>> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
>>> 
>>> So you should call repair which rebalances (i.e. allocates 
>>> additional space) BlueFS space. Hence allowing OSD to start.
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
 Not exactly. The rebalancing from this kv_sync_thread still might 
 be deferred due to the nature of this thread (haven't 100% sure 
 though).
 
 Here is my PR showing the idea (still untested and perhaps 
 unfinished!!!)
 
 https://github.com/ceph/ceph/pull/24353
 
 
 Igor
 
 
 On 10/1/2018 7:07 PM, Sergey Malinin wrote:
> Can you please confirm whether I got this right:
> 
> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
> @@ -9049,22 +9049,17 @@
> throttle_bytes.put(costs);
>   PExtentVector bluefs_gift_extents;
> -  if (bluefs &&
> -  after_flush - bluefs_last_balance >
> -  cct->_conf->bluestore_bluefs_balance_interval) {
> -bluefs_last_balance = after_flush;
> -int r = _balance_bluefs_freespace(_gift_extents);
> -assert(r >= 0);
> -if (r > 0) {
> -  for (auto& p : bluefs_gift_extents) {
> -bluefs_extents.insert(p.offset, p.length);
> -  }
> -  bufferlist bl;
> -  encode(bluefs_extents, bl);
> -  dout(10) << __func__ << " bluefs_extents now 0x" << 
> std::hex
> -   << bluefs_extents << std::dec << dendl;
> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> +  int r = _balance_bluefs_freespace(_gift_extents);
> + 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alfredo Deza
On Tue, Oct 2, 2018 at 10:23 AM Alex Litvak
 wrote:
>
> Igor,
>
> Thank you for your reply.  So what you are saying there are really no
> sensible space requirements for a collocated device? Even if I setup 30
> GB for DB (which I really wouldn't like to do due to a space waste
> considerations ) there is a chance that if this space feels up I will be
> in the same trouble under some heavy load scenario?

We do have good sizing recommendations for a separate block.db
partition. Roughly it shouldn't be less than 4% the size of the data
device.

http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#sizing

>
> On 10/2/2018 9:15 AM, Igor Fedotov wrote:
> > Even with a single device bluestore has a sort of implicit "BlueFS
> > partition" where DB is stored.  And it dynamically adjusts (rebalances)
> > the space for that partition in background. Unfortunately it might
> > perform that "too lazy" and hence under some heavy load it might end-up
> > with the lack of space for that partition. While main device still has
> > plenty of free space.
> >
> > I'm planning to refactor this re-balancing procedure in the future to
> > eliminate the root cause.
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> > On 10/2/2018 5:04 PM, Alex Litvak wrote:
> >> I am sorry for interrupting the thread, but my understanding always
> >> was that blue store on the single device should not care of the DB
> >> size, i.e. it would use the data part for all operations if DB is
> >> full.  And if it is not true, what would be sensible defaults on 800
> >> GB SSD?  I used ceph-ansible to build my cluster with system defaults
> >> and from I reading in this thread doesn't give me a good feeling at
> >> all. Document ion on the topic is very sketchy and online posts
> >> contradict each other some times.
> >>
> >> Thank you in advance,
> >>
> >> On 10/2/2018 8:52 AM, Igor Fedotov wrote:
> >>> May I have a repair log for that "already expanded" OSD?
> >>>
> >>>
> >>> On 10/2/2018 4:32 PM, Sergey Malinin wrote:
>  Repair goes through only when LVM volume has been expanded,
>  otherwise it fails with enospc as well as any other operation.
>  However, expanding the volume immediately renders bluefs unmountable
>  with IO error.
>  2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at
>  the very end of bluefs-log-dump), I'm not sure whether corruption
>  occurred before or after volume expansion.
> 
> 
> > On 2.10.2018, at 16:07, Igor Fedotov  wrote:
> >
> > You mentioned repair had worked before, is that correct? What's the
> > difference now except the applied patch? Different OSD? Anything else?
> >
> >
> > On 10/2/2018 3:52 PM, Sergey Malinin wrote:
> >
> >> It didn't work, emailed logs to you.
> >>
> >>
> >>> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
> >>>
> >>> The major change is in get_bluefs_rebalance_txn function, it
> >>> lacked bluefs_rebalance_txn assignment..
> >>>
> >>>
> >>>
> >>> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>  PR doesn't seem to have changed since yesterday. Am I missing
>  something?
> 
> 
> > On 2.10.2018, at 14:15, Igor Fedotov  wrote:
> >
> > Please update the patch from the PR - it didn't update bluefs
> > extents list before.
> >
> > Also please set debug bluestore 20 when re-running repair and
> > collect the log.
> >
> > If repair doesn't help - would you send repair and startup logs
> > directly to me as I have some issues accessing ceph-post-file
> > uploads.
> >
> >
> > Thanks,
> >
> > Igor
> >
> >
> > On 10/2/2018 11:39 AM, Sergey Malinin wrote:
> >> Yes, I did repair all OSDs and it finished with 'repair
> >> success'. I backed up OSDs so now I have more room to play.
> >> I posted log files using ceph-post-file with the following IDs:
> >> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
> >> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
> >>
> >>
> >>> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
> >>>
> >>> You did repair for any of this OSDs, didn't you? For all of
> >>> them?
> >>>
> >>>
> >>> Would you please provide the log for both types (failed on
> >>> mount and failed with enospc) of failing OSDs. Prior to
> >>> collecting please remove existing ones prior and set debug
> >>> bluestore to 20.
> >>>
> >>>
> >>>
> >>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>  I was able to apply patches to mimic, but nothing changed.
>  One osd that I had space expanded on fails with bluefs mount
>  IO error, others keep failing with enospc.
> 
> 
> > On 1.10.2018, at 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alex Litvak

Igor,

Thank you for your reply.  So what you are saying there are really no 
sensible space requirements for a collocated device? Even if I setup 30 
GB for DB (which I really wouldn't like to do due to a space waste 
considerations ) there is a chance that if this space feels up I will be 
in the same trouble under some heavy load scenario?


On 10/2/2018 9:15 AM, Igor Fedotov wrote:
Even with a single device bluestore has a sort of implicit "BlueFS 
partition" where DB is stored.  And it dynamically adjusts (rebalances) 
the space for that partition in background. Unfortunately it might 
perform that "too lazy" and hence under some heavy load it might end-up 
with the lack of space for that partition. While main device still has 
plenty of free space.


I'm planning to refactor this re-balancing procedure in the future to 
eliminate the root cause.



Thanks,

Igor


On 10/2/2018 5:04 PM, Alex Litvak wrote:
I am sorry for interrupting the thread, but my understanding always 
was that blue store on the single device should not care of the DB 
size, i.e. it would use the data part for all operations if DB is 
full.  And if it is not true, what would be sensible defaults on 800 
GB SSD?  I used ceph-ansible to build my cluster with system defaults 
and from I reading in this thread doesn't give me a good feeling at 
all. Document ion on the topic is very sketchy and online posts 
contradict each other some times.


Thank you in advance,

On 10/2/2018 8:52 AM, Igor Fedotov wrote:

May I have a repair log for that "already expanded" OSD?


On 10/2/2018 4:32 PM, Sergey Malinin wrote:
Repair goes through only when LVM volume has been expanded, 
otherwise it fails with enospc as well as any other operation. 
However, expanding the volume immediately renders bluefs unmountable 
with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at 
the very end of bluefs-log-dump), I'm not sure whether corruption 
occurred before or after volume expansion.




On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's the 
difference now except the applied patch? Different OSD? Anything else?



On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it 
lacked bluefs_rebalance_txn assignment..




On 10/2/2018 2:40 PM, Sergey Malinin wrote:
PR doesn't seem to have changed since yesterday. Am I missing 
something?




On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs 
extents list before.


Also please set debug bluestore 20 when re-running repair and 
collect the log.


If repair doesn't help - would you send repair and startup logs 
directly to me as I have some issues accessing ceph-post-file 
uploads.



Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:
Yes, I did repair all OSDs and it finished with 'repair 
success'. I backed up OSDs so now I have more room to play.

I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of 
them?



Would you please provide the log for both types (failed on 
mount and failed with enospc) of failing OSDs. Prior to 
collecting please remove existing ones prior and set debug 
bluestore to 20.




On 10/2/2018 2:16 AM, Sergey Malinin wrote:
I was able to apply patches to mimic, but nothing changed. 
One osd that I had space expanded on fails with bluefs mount 
IO error, others keep failing with enospc.




On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates 
additional space) BlueFS space. Hence allowing OSD to start.


Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:
Not exactly. The rebalancing from this kv_sync_thread 
still might be deferred due to the nature of this thread 
(haven't 100% sure though).


Here is my PR showing the idea (still untested and perhaps 
unfinished!!!)


https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
- cct->_conf->bluestore_bluefs_balance_interval) {
-    bluefs_last_balance = after_flush;
-    int r = 
_balance_bluefs_freespace(_gift_extents);

-    assert(r >= 0);
-    if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-    bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
Even with a single device bluestore has a sort of implicit "BlueFS 
partition" where DB is stored.  And it dynamically adjusts (rebalances) 
the space for that partition in background. Unfortunately it might 
perform that "too lazy" and hence under some heavy load it might end-up 
with the lack of space for that partition. While main device still has 
plenty of free space.


I'm planning to refactor this re-balancing procedure in the future to 
eliminate the root cause.



Thanks,

Igor


On 10/2/2018 5:04 PM, Alex Litvak wrote:
I am sorry for interrupting the thread, but my understanding always 
was that blue store on the single device should not care of the DB 
size, i.e. it would use the data part for all operations if DB is 
full.  And if it is not true, what would be sensible defaults on 800 
GB SSD?  I used ceph-ansible to build my cluster with system defaults 
and from I reading in this thread doesn't give me a good feeling at 
all. Document ion on the topic is very sketchy and online posts 
contradict each other some times.


Thank you in advance,

On 10/2/2018 8:52 AM, Igor Fedotov wrote:

May I have a repair log for that "already expanded" OSD?


On 10/2/2018 4:32 PM, Sergey Malinin wrote:
Repair goes through only when LVM volume has been expanded, 
otherwise it fails with enospc as well as any other operation. 
However, expanding the volume immediately renders bluefs unmountable 
with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at 
the very end of bluefs-log-dump), I'm not sure whether corruption 
occurred before or after volume expansion.




On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's the 
difference now except the applied patch? Different OSD? Anything else?



On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it 
lacked bluefs_rebalance_txn assignment..




On 10/2/2018 2:40 PM, Sergey Malinin wrote:
PR doesn't seem to have changed since yesterday. Am I missing 
something?




On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs 
extents list before.


Also please set debug bluestore 20 when re-running repair and 
collect the log.


If repair doesn't help - would you send repair and startup logs 
directly to me as I have some issues accessing ceph-post-file 
uploads.



Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:
Yes, I did repair all OSDs and it finished with 'repair 
success'. I backed up OSDs so now I have more room to play.

I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of 
them?



Would you please provide the log for both types (failed on 
mount and failed with enospc) of failing OSDs. Prior to 
collecting please remove existing ones prior and set debug 
bluestore to 20.




On 10/2/2018 2:16 AM, Sergey Malinin wrote:
I was able to apply patches to mimic, but nothing changed. 
One osd that I had space expanded on fails with bluefs mount 
IO error, others keep failing with enospc.




On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates 
additional space) BlueFS space. Hence allowing OSD to start.


Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:
Not exactly. The rebalancing from this kv_sync_thread 
still might be deferred due to the nature of this thread 
(haven't 100% sure though).


Here is my PR showing the idea (still untested and perhaps 
unfinished!!!)


https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
- cct->_conf->bluestore_bluefs_balance_interval) {
-    bluefs_last_balance = after_flush;
-    int r = 
_balance_bluefs_freespace(_gift_extents);

-    assert(r >= 0);
-    if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-    bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" 
<< std::hex

-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = 
_balance_bluefs_freespace(_gift_extents);

+  ceph_assert(r >= 0);
+  if (r > 0) {
+    for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+    bufferlist bl;
+    

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Alex Litvak
I am sorry for interrupting the thread, but my understanding always was 
that blue store on the single device should not care of the DB size, 
i.e. it would use the data part for all operations if DB is full.  And 
if it is not true, what would be sensible defaults on 800 GB SSD?  I 
used ceph-ansible to build my cluster with system defaults and from I 
reading in this thread doesn't give me a good feeling at all. Document 
ion on the topic is very sketchy and online posts contradict each other 
some times.


Thank you in advance,

On 10/2/2018 8:52 AM, Igor Fedotov wrote:

May I have a repair log for that "already expanded" OSD?


On 10/2/2018 4:32 PM, Sergey Malinin wrote:
Repair goes through only when LVM volume has been expanded, otherwise 
it fails with enospc as well as any other operation. However, 
expanding the volume immediately renders bluefs unmountable with IO 
error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the 
very end of bluefs-log-dump), I'm not sure whether corruption occurred 
before or after volume expansion.



On 2.10.2018, at 16:07, Igor Fedotov 
 wrote:


You mentioned repair had worked before, is that correct? What's the 
difference now except the applied patch? Different OSD? Anything else?



On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.


On 2.10.2018, at 14:43, Igor Fedotov 
 wrote:


The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..




On 10/2/2018 2:40 PM, Sergey Malinin wrote:
PR doesn't seem to have changed since yesterday. Am I missing 
something?



On 2.10.2018, at 14:15, Igor Fedotov 
 wrote:


Please update the patch from the PR - it didn't update bluefs 
extents list before.


Also please set debug bluestore 20 when re-running repair and 
collect the log.


If repair doesn't help - would you send repair and startup logs 
directly to me as I have some issues accessing ceph-post-file 
uploads.



Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:
Yes, I did repair all OSDs and it finished with 'repair 
success'. I backed up OSDs so now I have more room to play.

I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93


On 2.10.2018, at 11:26, Igor Fedotov 
 wrote:


You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on 
mount and failed with enospc) of failing OSDs. Prior to 
collecting please remove existing ones prior and set debug 
bluestore to 20.




On 10/2/2018 2:16 AM, Sergey Malinin wrote:
I was able to apply patches to mimic, but nothing changed. One 
osd that I had space expanded on fails with bluefs mount IO 
error, others keep failing with enospc.



On 1.10.2018, at 19:26, Igor Fedotov 
 wrote:


So you should call repair which rebalances (i.e. allocates 
additional space) BlueFS space. Hence allowing OSD to start.


Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:
Not exactly. The rebalancing from this kv_sync_thread still 
might be deferred due to the nature of this thread (haven't 
100% sure though).


Here is my PR showing the idea (still untested and perhaps 
unfinished!!!)


https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-    bluefs_last_balance = after_flush;
-    int r = _balance_bluefs_freespace(_gift_extents);
-    assert(r >= 0);
-    if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-    bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << 
std::hex

-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = 
_balance_bluefs_freespace(_gift_extents);

+  ceph_assert(r >= 0);
+  if (r > 0) {
+    for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+    bufferlist bl;
+    encode(bluefs_extents, bl);
+    dout(10) << __func__ << " bluefs_extents now 0x" << 
std::hex

+ << bluefs_extents << std::dec << dendl;
+    synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys

On 1.10.2018, at 18:39, Igor Fedotov 
 wrote:


So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand 
BlueFS partition at main device, standalone devices are 
supported only.


Given that you're able to rebuild the code I 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov

May I have a repair log for that "already expanded" OSD?


On 10/2/2018 4:32 PM, Sergey Malinin wrote:

Repair goes through only when LVM volume has been expanded, otherwise it fails 
with enospc as well as any other operation. However, expanding the volume 
immediately renders bluefs unmountable with IO error.
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end 
of bluefs-log-dump), I'm not sure whether corruption occurred before or after 
volume expansion.



On 2.10.2018, at 16:07, Igor Fedotov  wrote:

You mentioned repair had worked before, is that correct? What's the difference 
now except the applied patch? Different OSD? Anything else?


On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..



On 10/2/2018 2:40 PM, Sergey Malinin wrote:

PR doesn't seem to have changed since yesterday. Am I missing something?



On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs extents list 
before.

Also please set debug bluestore 20 when re-running repair and collect the log.

If repair doesn't help - would you send repair and startup logs directly to me 
as I have some issues accessing ceph-post-file uploads.


Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }

If it waits 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Repair goes through only when LVM volume has been expanded, otherwise it fails 
with enospc as well as any other operation. However, expanding the volume 
immediately renders bluefs unmountable with IO error. 
2 of 3 OSDs got bluefs log currupted (bluestore tool segfaults at the very end 
of bluefs-log-dump), I'm not sure whether corruption occurred before or after 
volume expansion.


> On 2.10.2018, at 16:07, Igor Fedotov  wrote:
> 
> You mentioned repair had worked before, is that correct? What's the 
> difference now except the applied patch? Different OSD? Anything else?
> 
> 
> On 10/2/2018 3:52 PM, Sergey Malinin wrote:
> 
>> It didn't work, emailed logs to you.
>> 
>> 
>>> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
>>> 
>>> The major change is in get_bluefs_rebalance_txn function, it lacked 
>>> bluefs_rebalance_txn assignment..
>>> 
>>> 
>>> 
>>> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
 PR doesn't seem to have changed since yesterday. Am I missing something?
 
 
> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
> 
> Please update the patch from the PR - it didn't update bluefs extents 
> list before.
> 
> Also please set debug bluestore 20 when re-running repair and collect the 
> log.
> 
> If repair doesn't help - would you send repair and startup logs directly 
> to me as I have some issues accessing ceph-post-file uploads.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
>> Yes, I did repair all OSDs and it finished with 'repair success'. I 
>> backed up OSDs so now I have more room to play.
>> I posted log files using ceph-post-file with the following IDs:
>> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
>> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
>> 
>> 
>>> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
>>> 
>>> You did repair for any of this OSDs, didn't you? For all of them?
>>> 
>>> 
>>> Would you please provide the log for both types (failed on mount and 
>>> failed with enospc) of failing OSDs. Prior to collecting please remove 
>>> existing ones prior and set debug bluestore to 20.
>>> 
>>> 
>>> 
>>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
 I was able to apply patches to mimic, but nothing changed. One osd 
 that I had space expanded on fails with bluefs mount IO error, others 
 keep failing with enospc.
 
 
> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
> 
> So you should call repair which rebalances (i.e. allocates additional 
> space) BlueFS space. Hence allowing OSD to start.
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
>> Not exactly. The rebalancing from this kv_sync_thread still might be 
>> deferred due to the nature of this thread (haven't 100% sure though).
>> 
>> Here is my PR showing the idea (still untested and perhaps 
>> unfinished!!!)
>> 
>> https://github.com/ceph/ceph/pull/24353
>> 
>> 
>> Igor
>> 
>> 
>> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
>>> Can you please confirm whether I got this right:
>>> 
>>> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
>>> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
>>> @@ -9049,22 +9049,17 @@
>>> throttle_bytes.put(costs);
>>>   PExtentVector bluefs_gift_extents;
>>> -  if (bluefs &&
>>> -  after_flush - bluefs_last_balance >
>>> -  cct->_conf->bluestore_bluefs_balance_interval) {
>>> -bluefs_last_balance = after_flush;
>>> -int r = _balance_bluefs_freespace(_gift_extents);
>>> -assert(r >= 0);
>>> -if (r > 0) {
>>> -  for (auto& p : bluefs_gift_extents) {
>>> -bluefs_extents.insert(p.offset, p.length);
>>> -  }
>>> -  bufferlist bl;
>>> -  encode(bluefs_extents, bl);
>>> -  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> -   << bluefs_extents << std::dec << dendl;
>>> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> +  int r = _balance_bluefs_freespace(_gift_extents);
>>> +  ceph_assert(r >= 0);
>>> +  if (r > 0) {
>>> +for (auto& p : bluefs_gift_extents) {
>>> +  bluefs_extents.insert(p.offset, p.length);
>>>   }
>>> +bufferlist bl;
>>> +encode(bluefs_extents, bl);
>>> +dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> + << bluefs_extents << std::dec << dendl;
>>> +synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> }

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
You mentioned repair had worked before, is that correct? What's the 
difference now except the applied patch? Different OSD? Anything else?



On 10/2/2018 3:52 PM, Sergey Malinin wrote:


It didn't work, emailed logs to you.



On 2.10.2018, at 14:43, Igor Fedotov  wrote:

The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..



On 10/2/2018 2:40 PM, Sergey Malinin wrote:

PR doesn't seem to have changed since yesterday. Am I missing something?



On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs extents list 
before.

Also please set debug bluestore 20 when re-running repair and collect the log.

If repair doesn't help - would you send repair and startup logs directly to me 
as I have some issues accessing ceph-post-file uploads.


Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
It didn't work, emailed logs to you.


> On 2.10.2018, at 14:43, Igor Fedotov  wrote:
> 
> The major change is in get_bluefs_rebalance_txn function, it lacked 
> bluefs_rebalance_txn assignment..
> 
> 
> 
> On 10/2/2018 2:40 PM, Sergey Malinin wrote:
>> PR doesn't seem to have changed since yesterday. Am I missing something?
>> 
>> 
>>> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
>>> 
>>> Please update the patch from the PR - it didn't update bluefs extents list 
>>> before.
>>> 
>>> Also please set debug bluestore 20 when re-running repair and collect the 
>>> log.
>>> 
>>> If repair doesn't help - would you send repair and startup logs directly to 
>>> me as I have some issues accessing ceph-post-file uploads.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
 Yes, I did repair all OSDs and it finished with 'repair success'. I backed 
 up OSDs so now I have more room to play.
 I posted log files using ceph-post-file with the following IDs:
 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
 20df7df5-f0c9-4186-aa21-4e5c0172cd93
 
 
> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
> 
> You did repair for any of this OSDs, didn't you? For all of them?
> 
> 
> Would you please provide the log for both types (failed on mount and 
> failed with enospc) of failing OSDs. Prior to collecting please remove 
> existing ones prior and set debug bluestore to 20.
> 
> 
> 
> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>> I was able to apply patches to mimic, but nothing changed. One osd that 
>> I had space expanded on fails with bluefs mount IO error, others keep 
>> failing with enospc.
>> 
>> 
>>> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
>>> 
>>> So you should call repair which rebalances (i.e. allocates additional 
>>> space) BlueFS space. Hence allowing OSD to start.
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
 Not exactly. The rebalancing from this kv_sync_thread still might be 
 deferred due to the nature of this thread (haven't 100% sure though).
 
 Here is my PR showing the idea (still untested and perhaps 
 unfinished!!!)
 
 https://github.com/ceph/ceph/pull/24353
 
 
 Igor
 
 
 On 10/1/2018 7:07 PM, Sergey Malinin wrote:
> Can you please confirm whether I got this right:
> 
> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
> @@ -9049,22 +9049,17 @@
> throttle_bytes.put(costs);
>   PExtentVector bluefs_gift_extents;
> -  if (bluefs &&
> -  after_flush - bluefs_last_balance >
> -  cct->_conf->bluestore_bluefs_balance_interval) {
> -bluefs_last_balance = after_flush;
> -int r = _balance_bluefs_freespace(_gift_extents);
> -assert(r >= 0);
> -if (r > 0) {
> -  for (auto& p : bluefs_gift_extents) {
> -bluefs_extents.insert(p.offset, p.length);
> -  }
> -  bufferlist bl;
> -  encode(bluefs_extents, bl);
> -  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
> -   << bluefs_extents << std::dec << dendl;
> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> +  int r = _balance_bluefs_freespace(_gift_extents);
> +  ceph_assert(r >= 0);
> +  if (r > 0) {
> +for (auto& p : bluefs_gift_extents) {
> +  bluefs_extents.insert(p.offset, p.length);
>   }
> +bufferlist bl;
> +encode(bluefs_extents, bl);
> +dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
> + << bluefs_extents << std::dec << dendl;
> +synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> }
>   // cleanup sync deferred keys
> 
>> On 1.10.2018, at 18:39, Igor Fedotov  wrote:
>> 
>> So you have just a single main device per OSD
>> 
>> Then bluestore-tool wouldn't help, it's unable to expand BlueFS 
>> partition at main device, standalone devices are supported only.
>> 
>> Given that you're able to rebuild the code I can suggest to make a 
>> patch that triggers BlueFS rebalance (see code snippet below) on 
>> repairing.
>>  PExtentVector bluefs_gift_extents;
>>  int r = _balance_bluefs_freespace(_gift_extents);
>>  ceph_assert(r >= 0);
>>  if (r > 0) {
>>for (auto& p : bluefs_gift_extents) {
>>  bluefs_extents.insert(p.offset, p.length);
>>}
>>

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
The major change is in get_bluefs_rebalance_txn function, it lacked 
bluefs_rebalance_txn assignment..




On 10/2/2018 2:40 PM, Sergey Malinin wrote:

PR doesn't seem to have changed since yesterday. Am I missing something?



On 2.10.2018, at 14:15, Igor Fedotov  wrote:

Please update the patch from the PR - it didn't update bluefs extents list 
before.

Also please set debug bluestore 20 when re-running repair and collect the log.

If repair doesn't help - would you send repair and startup logs directly to me 
as I have some issues accessing ceph-post-file uploads.


Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
PR doesn't seem to have changed since yesterday. Am I missing something?


> On 2.10.2018, at 14:15, Igor Fedotov  wrote:
> 
> Please update the patch from the PR - it didn't update bluefs extents list 
> before.
> 
> Also please set debug bluestore 20 when re-running repair and collect the log.
> 
> If repair doesn't help - would you send repair and startup logs directly to 
> me as I have some issues accessing ceph-post-file uploads.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/2/2018 11:39 AM, Sergey Malinin wrote:
>> Yes, I did repair all OSDs and it finished with 'repair success'. I backed 
>> up OSDs so now I have more room to play.
>> I posted log files using ceph-post-file with the following IDs:
>> 4af9cc4d-9c73-41c9-9c38-eb6c551047a0
>> 20df7df5-f0c9-4186-aa21-4e5c0172cd93
>> 
>> 
>>> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
>>> 
>>> You did repair for any of this OSDs, didn't you? For all of them?
>>> 
>>> 
>>> Would you please provide the log for both types (failed on mount and failed 
>>> with enospc) of failing OSDs. Prior to collecting please remove existing 
>>> ones prior and set debug bluestore to 20.
>>> 
>>> 
>>> 
>>> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
 I was able to apply patches to mimic, but nothing changed. One osd that I 
 had space expanded on fails with bluefs mount IO error, others keep 
 failing with enospc.
 
 
> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
> 
> So you should call repair which rebalances (i.e. allocates additional 
> space) BlueFS space. Hence allowing OSD to start.
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
>> Not exactly. The rebalancing from this kv_sync_thread still might be 
>> deferred due to the nature of this thread (haven't 100% sure though).
>> 
>> Here is my PR showing the idea (still untested and perhaps unfinished!!!)
>> 
>> https://github.com/ceph/ceph/pull/24353
>> 
>> 
>> Igor
>> 
>> 
>> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
>>> Can you please confirm whether I got this right:
>>> 
>>> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
>>> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
>>> @@ -9049,22 +9049,17 @@
>>> throttle_bytes.put(costs);
>>>   PExtentVector bluefs_gift_extents;
>>> -  if (bluefs &&
>>> -  after_flush - bluefs_last_balance >
>>> -  cct->_conf->bluestore_bluefs_balance_interval) {
>>> -bluefs_last_balance = after_flush;
>>> -int r = _balance_bluefs_freespace(_gift_extents);
>>> -assert(r >= 0);
>>> -if (r > 0) {
>>> -  for (auto& p : bluefs_gift_extents) {
>>> -bluefs_extents.insert(p.offset, p.length);
>>> -  }
>>> -  bufferlist bl;
>>> -  encode(bluefs_extents, bl);
>>> -  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> -   << bluefs_extents << std::dec << dendl;
>>> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> +  int r = _balance_bluefs_freespace(_gift_extents);
>>> +  ceph_assert(r >= 0);
>>> +  if (r > 0) {
>>> +for (auto& p : bluefs_gift_extents) {
>>> +  bluefs_extents.insert(p.offset, p.length);
>>>   }
>>> +bufferlist bl;
>>> +encode(bluefs_extents, bl);
>>> +dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> + << bluefs_extents << std::dec << dendl;
>>> +synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> }
>>>   // cleanup sync deferred keys
>>> 
 On 1.10.2018, at 18:39, Igor Fedotov  wrote:
 
 So you have just a single main device per OSD
 
 Then bluestore-tool wouldn't help, it's unable to expand BlueFS 
 partition at main device, standalone devices are supported only.
 
 Given that you're able to rebuild the code I can suggest to make a 
 patch that triggers BlueFS rebalance (see code snippet below) on 
 repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }
 
 If it waits I can probably make a corresponding PR tomorrow.
 
 Thanks,
 Igor
 On 10/1/2018 6:16 PM, Sergey Malinin wrote:
> I have rebuilt the tool, but none of my OSDs 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov
Please update the patch from the PR - it didn't update bluefs extents 
list before.


Also please set debug bluestore 20 when re-running repair and collect 
the log.


If repair doesn't help - would you send repair and startup logs directly 
to me as I have some issues accessing ceph-post-file uploads.



Thanks,

Igor


On 10/2/2018 11:39 AM, Sergey Malinin wrote:

Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93



On 2.10.2018, at 11:26, Igor Fedotov  wrote:

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and failed 
with enospc) of failing OSDs. Prior to collecting please remove existing ones 
prior and set debug bluestore to 20.



On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
_reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Sergey Malinin
Yes, I did repair all OSDs and it finished with 'repair success'. I backed up 
OSDs so now I have more room to play.
I posted log files using ceph-post-file with the following IDs:
4af9cc4d-9c73-41c9-9c38-eb6c551047a0
20df7df5-f0c9-4186-aa21-4e5c0172cd93


> On 2.10.2018, at 11:26, Igor Fedotov  wrote:
> 
> You did repair for any of this OSDs, didn't you? For all of them?
> 
> 
> Would you please provide the log for both types (failed on mount and failed 
> with enospc) of failing OSDs. Prior to collecting please remove existing ones 
> prior and set debug bluestore to 20.
> 
> 
> 
> On 10/2/2018 2:16 AM, Sergey Malinin wrote:
>> I was able to apply patches to mimic, but nothing changed. One osd that I 
>> had space expanded on fails with bluefs mount IO error, others keep failing 
>> with enospc.
>> 
>> 
>>> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
>>> 
>>> So you should call repair which rebalances (i.e. allocates additional 
>>> space) BlueFS space. Hence allowing OSD to start.
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
 Not exactly. The rebalancing from this kv_sync_thread still might be 
 deferred due to the nature of this thread (haven't 100% sure though).
 
 Here is my PR showing the idea (still untested and perhaps unfinished!!!)
 
 https://github.com/ceph/ceph/pull/24353
 
 
 Igor
 
 
 On 10/1/2018 7:07 PM, Sergey Malinin wrote:
> Can you please confirm whether I got this right:
> 
> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
> @@ -9049,22 +9049,17 @@
> throttle_bytes.put(costs);
>   PExtentVector bluefs_gift_extents;
> -  if (bluefs &&
> -  after_flush - bluefs_last_balance >
> -  cct->_conf->bluestore_bluefs_balance_interval) {
> -bluefs_last_balance = after_flush;
> -int r = _balance_bluefs_freespace(_gift_extents);
> -assert(r >= 0);
> -if (r > 0) {
> -  for (auto& p : bluefs_gift_extents) {
> -bluefs_extents.insert(p.offset, p.length);
> -  }
> -  bufferlist bl;
> -  encode(bluefs_extents, bl);
> -  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
> -   << bluefs_extents << std::dec << dendl;
> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> +  int r = _balance_bluefs_freespace(_gift_extents);
> +  ceph_assert(r >= 0);
> +  if (r > 0) {
> +for (auto& p : bluefs_gift_extents) {
> +  bluefs_extents.insert(p.offset, p.length);
>   }
> +bufferlist bl;
> +encode(bluefs_extents, bl);
> +dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
> + << bluefs_extents << std::dec << dendl;
> +synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> }
>   // cleanup sync deferred keys
> 
>> On 1.10.2018, at 18:39, Igor Fedotov  wrote:
>> 
>> So you have just a single main device per OSD
>> 
>> Then bluestore-tool wouldn't help, it's unable to expand BlueFS 
>> partition at main device, standalone devices are supported only.
>> 
>> Given that you're able to rebuild the code I can suggest to make a patch 
>> that triggers BlueFS rebalance (see code snippet below) on repairing.
>>  PExtentVector bluefs_gift_extents;
>>  int r = _balance_bluefs_freespace(_gift_extents);
>>  ceph_assert(r >= 0);
>>  if (r > 0) {
>>for (auto& p : bluefs_gift_extents) {
>>  bluefs_extents.insert(p.offset, p.length);
>>}
>>bufferlist bl;
>>encode(bluefs_extents, bl);
>>dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>> << bluefs_extents << std::dec << dendl;
>>synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>  }
>> 
>> If it waits I can probably make a corresponding PR tomorrow.
>> 
>> Thanks,
>> Igor
>> On 10/1/2018 6:16 PM, Sergey Malinin wrote:
>>> I have rebuilt the tool, but none of my OSDs no matter dead or alive 
>>> have any symlinks other than 'block' pointing to LVM.
>>> I adjusted main device size but it looks like it needs even more space 
>>> for db compaction. After executing bluefs-bdev-expand OSD fails to 
>>> start, however 'fsck' and 'repair' commands finished successfully.
>>> 
>>> 2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
>>> 2018-10-01 18:02:39.763 7fc9226c6240  1 
>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening allocation 
>>> metadata
>>> 2018-10-01 18:02:40.907 7fc9226c6240  1 
>>> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 285 GiB in 
>>> 2249899 extents
>>> 2018-10-01 18:02:40.951 7fc9226c6240 -1 
>>> 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-02 Thread Igor Fedotov

You did repair for any of this OSDs, didn't you? For all of them?


Would you please provide the log for both types (failed on mount and 
failed with enospc) of failing OSDs. Prior to collecting please remove 
existing ones prior and set debug bluestore to 20.




On 10/2/2018 2:16 AM, Sergey Malinin wrote:

I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.



On 1.10.2018, at 19:26, Igor Fedotov  wrote:

So you should call repair which rebalances (i.e. allocates additional space) 
BlueFS space. Hence allowing OSD to start.

Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:

Not exactly. The rebalancing from this kv_sync_thread still might be deferred 
due to the nature of this thread (haven't 100% sure though).

Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
 throttle_bytes.put(costs);
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-bluefs_last_balance = after_flush;
-int r = _balance_bluefs_freespace(_gift_extents);
-assert(r >= 0);
-if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
   }
+bufferlist bl;
+encode(bluefs_extents, bl);
+dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }
   // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
_reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all 
background work
2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 shutdown
2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
object store
2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Sergey Malinin
I was able to apply patches to mimic, but nothing changed. One osd that I had 
space expanded on fails with bluefs mount IO error, others keep failing with 
enospc.


> On 1.10.2018, at 19:26, Igor Fedotov  wrote:
> 
> So you should call repair which rebalances (i.e. allocates additional space) 
> BlueFS space. Hence allowing OSD to start.
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 7:22 PM, Igor Fedotov wrote:
>> Not exactly. The rebalancing from this kv_sync_thread still might be 
>> deferred due to the nature of this thread (haven't 100% sure though).
>> 
>> Here is my PR showing the idea (still untested and perhaps unfinished!!!)
>> 
>> https://github.com/ceph/ceph/pull/24353
>> 
>> 
>> Igor
>> 
>> 
>> On 10/1/2018 7:07 PM, Sergey Malinin wrote:
>>> Can you please confirm whether I got this right:
>>> 
>>> --- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
>>> +++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
>>> @@ -9049,22 +9049,17 @@
>>> throttle_bytes.put(costs);
>>>   PExtentVector bluefs_gift_extents;
>>> -  if (bluefs &&
>>> -  after_flush - bluefs_last_balance >
>>> -  cct->_conf->bluestore_bluefs_balance_interval) {
>>> -bluefs_last_balance = after_flush;
>>> -int r = _balance_bluefs_freespace(_gift_extents);
>>> -assert(r >= 0);
>>> -if (r > 0) {
>>> -  for (auto& p : bluefs_gift_extents) {
>>> -bluefs_extents.insert(p.offset, p.length);
>>> -  }
>>> -  bufferlist bl;
>>> -  encode(bluefs_extents, bl);
>>> -  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> -   << bluefs_extents << std::dec << dendl;
>>> -  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> +  int r = _balance_bluefs_freespace(_gift_extents);
>>> +  ceph_assert(r >= 0);
>>> +  if (r > 0) {
>>> +for (auto& p : bluefs_gift_extents) {
>>> +  bluefs_extents.insert(p.offset, p.length);
>>>   }
>>> +bufferlist bl;
>>> +encode(bluefs_extents, bl);
>>> +dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
>>> + << bluefs_extents << std::dec << dendl;
>>> +synct->set(PREFIX_SUPER, "bluefs_extents", bl);
>>> }
>>>   // cleanup sync deferred keys
>>> 
 On 1.10.2018, at 18:39, Igor Fedotov  wrote:
 
 So you have just a single main device per OSD
 
 Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition 
 at main device, standalone devices are supported only.
 
 Given that you're able to rebuild the code I can suggest to make a patch 
 that triggers BlueFS rebalance (see code snippet below) on repairing.
  PExtentVector bluefs_gift_extents;
  int r = _balance_bluefs_freespace(_gift_extents);
  ceph_assert(r >= 0);
  if (r > 0) {
for (auto& p : bluefs_gift_extents) {
  bluefs_extents.insert(p.offset, p.length);
}
bufferlist bl;
encode(bluefs_extents, bl);
dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
 << bluefs_extents << std::dec << dendl;
synct->set(PREFIX_SUPER, "bluefs_extents", bl);
  }
 
 If it waits I can probably make a corresponding PR tomorrow.
 
 Thanks,
 Igor
 On 10/1/2018 6:16 PM, Sergey Malinin wrote:
> I have rebuilt the tool, but none of my OSDs no matter dead or alive have 
> any symlinks other than 'block' pointing to LVM.
> I adjusted main device size but it looks like it needs even more space 
> for db compaction. After executing bluefs-bdev-expand OSD fails to start, 
> however 'fsck' and 'repair' commands finished successfully.
> 
> 2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
> 2018-10-01 18:02:39.763 7fc9226c6240  1 
> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening allocation 
> metadata
> 2018-10-01 18:02:40.907 7fc9226c6240  1 
> bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 285 GiB in 2249899 
> extents
> 2018-10-01 18:02:40.951 7fc9226c6240 -1 
> bluestore(/var/lib/ceph/osd/ceph-1) _reconcile_bluefs_freespace bluefs 
> extra 0x[6d6f00~50c80]
> 2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 
> shutdown
> 2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
> 2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling 
> all background work
> 2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
> 2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
> 2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 
> shutdown
> 2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
> /var/lib/ceph/osd/ceph-1/block) close
> 2018-10-01 18:02:41.267 7fc9226c6240  1 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov
So you should call repair which rebalances (i.e. allocates additional 
space) BlueFS space. Hence allowing OSD to start.


Thanks,

Igor


On 10/1/2018 7:22 PM, Igor Fedotov wrote:
Not exactly. The rebalancing from this kv_sync_thread still might be 
deferred due to the nature of this thread (haven't 100% sure though).


Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak    2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc    2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
    throttle_bytes.put(costs);
      PExtentVector bluefs_gift_extents;
-  if (bluefs &&
-  after_flush - bluefs_last_balance >
-  cct->_conf->bluestore_bluefs_balance_interval) {
-    bluefs_last_balance = after_flush;
-    int r = _balance_bluefs_freespace(_gift_extents);
-    assert(r >= 0);
-    if (r > 0) {
-  for (auto& p : bluefs_gift_extents) {
-    bluefs_extents.insert(p.offset, p.length);
-  }
-  bufferlist bl;
-  encode(bluefs_extents, bl);
-  dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-   << bluefs_extents << std::dec << dendl;
-  synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+    for (auto& p : bluefs_gift_extents) {
+  bluefs_extents.insert(p.offset, p.length);
  }
+    bufferlist bl;
+    encode(bluefs_extents, bl);
+    dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+ << bluefs_extents << std::dec << dendl;
+    synct->set(PREFIX_SUPER, "bluefs_extents", bl);
    }
      // cleanup sync deferred keys


On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS 
partition at main device, standalone devices are supported only.


Given that you're able to rebuild the code I can suggest to make a 
patch that triggers BlueFS rebalance (see code snippet below) on 
repairing.

 PExtentVector bluefs_gift_extents;
 int r = _balance_bluefs_freespace(_gift_extents);
 ceph_assert(r >= 0);
 if (r > 0) {
   for (auto& p : bluefs_gift_extents) {
 bluefs_extents.insert(p.offset, p.length);
   }
   bufferlist bl;
   encode(bluefs_extents, bl);
   dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
    << bluefs_extents << std::dec << dendl;
   synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:
I have rebuilt the tool, but none of my OSDs no matter dead or 
alive have any symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more 
space for db compaction. After executing bluefs-bdev-expand OSD 
fails to start, however 'fsck' and 'repair' commands finished 
successfully.


2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 
bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc opening allocation 
metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 
bluestore(/var/lib/ceph/osd/ceph-1) _open_alloc loaded 285 GiB in 
2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 
bluestore(/var/lib/ceph/osd/ceph-1) _reconcile_bluefs_freespace 
bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 
0x0x55d053fb9180 shutdown

2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: 
canceling all background work
2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete

2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 
0x0x55d053883800 shutdown
2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to 
mount object store
2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: 
(5) Input/output error




On 1.10.2018, at 18:09, Igor Fedotov  wrote:

Well, actually you can avoid bluestore-tool rebuild.

You'll need to edit the first chunk of blocks.db where labels are 
stored. (Please make a backup first!!!)


Size label is stored at offset 0x52 and is 8 bytes long - 
little-endian 64bit integer encoding. (Please verify that old 
value at this offset exactly corresponds to you original volume 
size and/or 'size' label reported by ceph-bluestore-tool).


So you have to put new DB volume size 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov
Not exactly. The rebalancing from this kv_sync_thread still might be 
deferred due to the nature of this thread (haven't 100% sure though).


Here is my PR showing the idea (still untested and perhaps unfinished!!!)

https://github.com/ceph/ceph/pull/24353


Igor


On 10/1/2018 7:07 PM, Sergey Malinin wrote:

Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
throttle_bytes.put(costs);
  
PExtentVector bluefs_gift_extents;

-  if (bluefs &&
- after_flush - bluefs_last_balance >
- cct->_conf->bluestore_bluefs_balance_interval) {
-   bluefs_last_balance = after_flush;
-   int r = _balance_bluefs_freespace(_gift_extents);
-   assert(r >= 0);
-   if (r > 0) {
- for (auto& p : bluefs_gift_extents) {
-   bluefs_extents.insert(p.offset, p.length);
- }
- bufferlist bl;
- encode(bluefs_extents, bl);
- dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-  << bluefs_extents << std::dec << dendl;
- synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+   for (auto& p : bluefs_gift_extents) {
+ bluefs_extents.insert(p.offset, p.length);
}
+   bufferlist bl;
+   encode(bluefs_extents, bl);
+   dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+<< bluefs_extents << std::dec << dendl;
+   synct->set(PREFIX_SUPER, "bluefs_extents", bl);
}
  
// cleanup sync deferred keys



On 1.10.2018, at 18:39, Igor Fedotov  wrote:

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
main device, standalone devices are supported only.

Given that you're able to rebuild the code I can suggest to make a patch that 
triggers BlueFS rebalance (see code snippet below) on repairing.
 PExtentVector bluefs_gift_extents;
 int r = _balance_bluefs_freespace(_gift_extents);
 ceph_assert(r >= 0);
 if (r > 0) {
   for (auto& p : bluefs_gift_extents) {
 bluefs_extents.insert(p.offset, p.length);
   }
   bufferlist bl;
   encode(bluefs_extents, bl);
   dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
<< bluefs_extents << std::dec << dendl;
   synct->set(PREFIX_SUPER, "bluefs_extents", bl);
 }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
_reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all 
background work
2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 shutdown
2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
object store
2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: (5) 
Input/output error



On 1.10.2018, at 18:09, Igor Fedotov  wrote:

Well, actually you can avoid bluestore-tool rebuild.

You'll need to edit the first chunk of blocks.db where labels are stored. 
(Please make a backup first!!!)

Size label is stored at offset 0x52 and is 8 bytes long - little-endian 64bit 
integer encoding. (Please verify that old value at this offset exactly 
corresponds to you original volume size and/or 'size' label reported by 
ceph-bluestore-tool).

So you have to put new DB volume size there. Or you can send the first 4K chunk 
(e.g. extracted with dd) along with new DB volume size (in bytes) to me and 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Sergey Malinin
Can you please confirm whether I got this right:

--- BlueStore.cc.bak2018-10-01 18:54:45.096836419 +0300
+++ BlueStore.cc2018-10-01 19:01:35.937623861 +0300
@@ -9049,22 +9049,17 @@
   throttle_bytes.put(costs);
 
   PExtentVector bluefs_gift_extents;
-  if (bluefs &&
- after_flush - bluefs_last_balance >
- cct->_conf->bluestore_bluefs_balance_interval) {
-   bluefs_last_balance = after_flush;
-   int r = _balance_bluefs_freespace(_gift_extents);
-   assert(r >= 0);
-   if (r > 0) {
- for (auto& p : bluefs_gift_extents) {
-   bluefs_extents.insert(p.offset, p.length);
- }
- bufferlist bl;
- encode(bluefs_extents, bl);
- dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
-  << bluefs_extents << std::dec << dendl;
- synct->set(PREFIX_SUPER, "bluefs_extents", bl);
+  int r = _balance_bluefs_freespace(_gift_extents);
+  ceph_assert(r >= 0);
+  if (r > 0) {
+   for (auto& p : bluefs_gift_extents) {
+ bluefs_extents.insert(p.offset, p.length);
}
+   bufferlist bl;
+   encode(bluefs_extents, bl);
+   dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
+<< bluefs_extents << std::dec << dendl;
+   synct->set(PREFIX_SUPER, "bluefs_extents", bl);
   }
 
   // cleanup sync deferred keys

> On 1.10.2018, at 18:39, Igor Fedotov  wrote:
> 
> So you have just a single main device per OSD
> 
> Then bluestore-tool wouldn't help, it's unable to expand BlueFS partition at 
> main device, standalone devices are supported only.
> 
> Given that you're able to rebuild the code I can suggest to make a patch that 
> triggers BlueFS rebalance (see code snippet below) on repairing.
> PExtentVector bluefs_gift_extents;
> int r = _balance_bluefs_freespace(_gift_extents);
> ceph_assert(r >= 0);
> if (r > 0) {
>   for (auto& p : bluefs_gift_extents) {
> bluefs_extents.insert(p.offset, p.length);
>   }
>   bufferlist bl;
>   encode(bluefs_extents, bl);
>   dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
><< bluefs_extents << std::dec << dendl;
>   synct->set(PREFIX_SUPER, "bluefs_extents", bl);
> }
> 
> If it waits I can probably make a corresponding PR tomorrow.
> 
> Thanks,
> Igor
> On 10/1/2018 6:16 PM, Sergey Malinin wrote:
>> I have rebuilt the tool, but none of my OSDs no matter dead or alive have 
>> any symlinks other than 'block' pointing to LVM.
>> I adjusted main device size but it looks like it needs even more space for 
>> db compaction. After executing bluefs-bdev-expand OSD fails to start, 
>> however 'fsck' and 'repair' commands finished successfully.
>> 
>> 2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
>> 2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
>> _open_alloc opening allocation metadata
>> 2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
>> _open_alloc loaded 285 GiB in 2249899 extents
>> 2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
>> _reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
>> 2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
>> 2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
>> 2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all 
>> background work
>> 2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
>> [/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
>> 2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
>> 2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 shutdown
>> 2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
>> /var/lib/ceph/osd/ceph-1/block) close
>> 2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
>> /var/lib/ceph/osd/ceph-1/block) close
>> 2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
>> object store
>> 2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: (5) 
>> Input/output error
>> 
>> 
>>> On 1.10.2018, at 18:09, Igor Fedotov  wrote:
>>> 
>>> Well, actually you can avoid bluestore-tool rebuild.
>>> 
>>> You'll need to edit the first chunk of blocks.db where labels are stored. 
>>> (Please make a backup first!!!)
>>> 
>>> Size label is stored at offset 0x52 and is 8 bytes long - little-endian 
>>> 64bit integer encoding. (Please verify that old value at this offset 
>>> exactly corresponds to you original volume size and/or 'size' label 
>>> reported by ceph-bluestore-tool).
>>> 
>>> So you have to put new DB volume size there. Or you can send the first 4K 
>>> chunk (e.g. extracted with dd) along with new DB volume size (in bytes) to 
>>> me and I'll do that for you.
>>> 
>>> 
>>> Thanks,
>>> 
>>> Igor
>>> 
>>> 
>>> On 10/1/2018 5:32 PM, Igor Fedotov wrote:
 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov

So you have just a single main device per OSD

Then bluestore-tool wouldn't help, it's unable to expand BlueFS 
partition at main device, standalone devices are supported only.


Given that you're able to rebuild the code I can suggest to make a patch 
that triggers BlueFS rebalance (see code snippet below) on repairing.

    PExtentVector bluefs_gift_extents;
    int r = _balance_bluefs_freespace(_gift_extents);
    ceph_assert(r >= 0);
    if (r > 0) {
      for (auto& p : bluefs_gift_extents) {
        bluefs_extents.insert(p.offset, p.length);
      }
      bufferlist bl;
      encode(bluefs_extents, bl);
      dout(10) << __func__ << " bluefs_extents now 0x" << std::hex
           << bluefs_extents << std::dec << dendl;
      synct->set(PREFIX_SUPER, "bluefs_extents", bl);
    }

If it waits I can probably make a corresponding PR tomorrow.

Thanks,
Igor
On 10/1/2018 6:16 PM, Sergey Malinin wrote:

I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
_reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all 
background work
2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 shutdown
2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
object store
2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: (5) 
Input/output error



On 1.10.2018, at 18:09, Igor Fedotov  wrote:

Well, actually you can avoid bluestore-tool rebuild.

You'll need to edit the first chunk of blocks.db where labels are stored. 
(Please make a backup first!!!)

Size label is stored at offset 0x52 and is 8 bytes long - little-endian 64bit 
integer encoding. (Please verify that old value at this offset exactly 
corresponds to you original volume size and/or 'size' label reported by 
ceph-bluestore-tool).

So you have to put new DB volume size there. Or you can send the first 4K chunk 
(e.g. extracted with dd) along with new DB volume size (in bytes) to me and 
I'll do that for you.


Thanks,

Igor


On 10/1/2018 5:32 PM, Igor Fedotov wrote:


On 10/1/2018 5:03 PM, Sergey Malinin wrote:

Before I received your response, I had already added 20GB to the OSD (by epanding LV followed 
by bluefs-bdev-expand) and ran "ceph-kvstore-tool bluestore-kv  compact", 
however it still needs more space.
Is that because I didn't update DB size with set-label-key?

In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" command 
to commit bluefs volume expansion.
Unfortunately the last command doesn't handle "size" label properly. That's why 
you might need to backport and rebuild with the mentioned commits.


What exactly is the label-key that needs to be updated, as I couldn't find 
which one is related to DB:

# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
{
  "/var/lib/ceph/osd/ceph-1/block": {
  "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
  "size": 471305551872,
  "btime": "2018-07-31 03:06:43.751243",
  "description": "main",
  "bluefs": "1",
  "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
  "kv_backend": "rocksdb",
  "magic": "ceph osd volume v026",
  "mkfs_done": "yes",
  "osd_key": "XXX",
  "ready": "ready",
  "whoami": "1"
  }
}

'size' label but your output is for block(aka slow) device.

It should return labels for db/wal devices as well (block.db and block.wal 
symlinks respectively). It works for me in master, can't verify with mimic at 
the moment though..
Here is output for master:

# bin/ceph-bluestore-tool show-label --path dev/osd0
inferring bluefs devices from bluestore path
{
 "dev/osd0/block": {
 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Sergey Malinin
I have rebuilt the tool, but none of my OSDs no matter dead or alive have any 
symlinks other than 'block' pointing to LVM.
I adjusted main device size but it looks like it needs even more space for db 
compaction. After executing bluefs-bdev-expand OSD fails to start, however 
'fsck' and 'repair' commands finished successfully.

2018-10-01 18:02:39.755 7fc9226c6240  1 freelist init
2018-10-01 18:02:39.763 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc opening allocation metadata
2018-10-01 18:02:40.907 7fc9226c6240  1 bluestore(/var/lib/ceph/osd/ceph-1) 
_open_alloc loaded 285 GiB in 2249899 extents
2018-10-01 18:02:40.951 7fc9226c6240 -1 bluestore(/var/lib/ceph/osd/ceph-1) 
_reconcile_bluefs_freespace bluefs extra 0x[6d6f00~50c80]
2018-10-01 18:02:40.951 7fc9226c6240  1 stupidalloc 0x0x55d053fb9180 shutdown
2018-10-01 18:02:40.963 7fc9226c6240  1 freelist shutdown
2018-10-01 18:02:40.963 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:252] Shutdown: canceling all 
background work
2018-10-01 18:02:40.967 7fc9226c6240  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/db_impl.cc:397] Shutdown complete
2018-10-01 18:02:40.971 7fc9226c6240  1 bluefs umount
2018-10-01 18:02:40.975 7fc9226c6240  1 stupidalloc 0x0x55d053883800 shutdown
2018-10-01 18:02:40.975 7fc9226c6240  1 bdev(0x55d053c32e00 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.267 7fc9226c6240  1 bdev(0x55d053c32a80 
/var/lib/ceph/osd/ceph-1/block) close
2018-10-01 18:02:41.443 7fc9226c6240 -1 osd.1 0 OSD:init: unable to mount 
object store
2018-10-01 18:02:41.443 7fc9226c6240 -1  ** ERROR: osd init failed: (5) 
Input/output error


> On 1.10.2018, at 18:09, Igor Fedotov  wrote:
> 
> Well, actually you can avoid bluestore-tool rebuild.
> 
> You'll need to edit the first chunk of blocks.db where labels are stored. 
> (Please make a backup first!!!)
> 
> Size label is stored at offset 0x52 and is 8 bytes long - little-endian 64bit 
> integer encoding. (Please verify that old value at this offset exactly 
> corresponds to you original volume size and/or 'size' label reported by 
> ceph-bluestore-tool).
> 
> So you have to put new DB volume size there. Or you can send the first 4K 
> chunk (e.g. extracted with dd) along with new DB volume size (in bytes) to me 
> and I'll do that for you.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 5:32 PM, Igor Fedotov wrote:
>> 
>> 
>> On 10/1/2018 5:03 PM, Sergey Malinin wrote:
>>> Before I received your response, I had already added 20GB to the OSD (by 
>>> epanding LV followed by bluefs-bdev-expand) and ran "ceph-kvstore-tool 
>>> bluestore-kv  compact", however it still needs more space.
>>> Is that because I didn't update DB size with set-label-key?
>> In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" 
>> command to commit bluefs volume expansion.
>> Unfortunately the last command doesn't handle "size" label properly. That's 
>> why you might need to backport and rebuild with the mentioned commits.
>> 
>>> What exactly is the label-key that needs to be updated, as I couldn't find 
>>> which one is related to DB:
>>> 
>>> # ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
>>> inferring bluefs devices from bluestore path
>>> {
>>>  "/var/lib/ceph/osd/ceph-1/block": {
>>>  "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
>>>  "size": 471305551872,
>>>  "btime": "2018-07-31 03:06:43.751243",
>>>  "description": "main",
>>>  "bluefs": "1",
>>>  "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
>>>  "kv_backend": "rocksdb",
>>>  "magic": "ceph osd volume v026",
>>>  "mkfs_done": "yes",
>>>  "osd_key": "XXX",
>>>  "ready": "ready",
>>>  "whoami": "1"
>>>  }
>>> }
>> 'size' label but your output is for block(aka slow) device.
>> 
>> It should return labels for db/wal devices as well (block.db and block.wal 
>> symlinks respectively). It works for me in master, can't verify with mimic 
>> at the moment though..
>> Here is output for master:
>> 
>> # bin/ceph-bluestore-tool show-label --path dev/osd0
>> inferring bluefs devices from bluestore path
>> {
>> "dev/osd0/block": {
>> "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
>> "size": 21474836480,
>> "btime": "2018-09-10 15:55:09.044039",
>> "description": "main",
>> "bluefs": "1",
>> "ceph_fsid": "56eddc15-11b9-4e0b-9192-e391fbae551c",
>> "kv_backend": "rocksdb",
>> "magic": "ceph osd volume v026",
>> "mkfs_done": "yes",
>> "osd_key": "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==",
>> "ready": "ready",
>> "whoami": "0"
>> },
>> "dev/osd0/block.wal": {
>> "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
>> "size": 1048576000,
>> "btime": "2018-09-10 15:55:09.044985",
>> "description": "bluefs wal"
>> },
>> 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov

Well, actually you can avoid bluestore-tool rebuild.

You'll need to edit the first chunk of blocks.db where labels are 
stored. (Please make a backup first!!!)


Size label is stored at offset 0x52 and is 8 bytes long - little-endian 
64bit integer encoding. (Please verify that old value at this offset 
exactly corresponds to you original volume size and/or 'size' label 
reported by ceph-bluestore-tool).


So you have to put new DB volume size there. Or you can send the first 
4K chunk (e.g. extracted with dd) along with new DB volume size (in 
bytes) to me and I'll do that for you.



Thanks,

Igor


On 10/1/2018 5:32 PM, Igor Fedotov wrote:



On 10/1/2018 5:03 PM, Sergey Malinin wrote:
Before I received your response, I had already added 20GB to the OSD 
(by epanding LV followed by bluefs-bdev-expand) and ran 
"ceph-kvstore-tool bluestore-kv  compact", however it still 
needs more space.

Is that because I didn't update DB size with set-label-key?
In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" 
command to commit bluefs volume expansion.
Unfortunately the last command doesn't handle "size" label properly. 
That's why you might need to backport and rebuild with the mentioned 
commits.


What exactly is the label-key that needs to be updated, as I couldn't 
find which one is related to DB:


# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
{
 "/var/lib/ceph/osd/ceph-1/block": {
 "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
 "size": 471305551872,
 "btime": "2018-07-31 03:06:43.751243",
 "description": "main",
 "bluefs": "1",
 "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
 "kv_backend": "rocksdb",
 "magic": "ceph osd volume v026",
 "mkfs_done": "yes",
 "osd_key": "XXX",
 "ready": "ready",
 "whoami": "1"
 }
}

'size' label but your output is for block(aka slow) device.

It should return labels for db/wal devices as well (block.db and 
block.wal symlinks respectively). It works for me in master, can't 
verify with mimic at the moment though..

Here is output for master:

# bin/ceph-bluestore-tool show-label --path dev/osd0
inferring bluefs devices from bluestore path
{
    "dev/osd0/block": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 21474836480,
    "btime": "2018-09-10 15:55:09.044039",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "56eddc15-11b9-4e0b-9192-e391fbae551c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "osd_key": "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==",
    "ready": "ready",
    "whoami": "0"
    },
    "dev/osd0/block.wal": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 1048576000,
    "btime": "2018-09-10 15:55:09.044985",
    "description": "bluefs wal"
    },
    "dev/osd0/block.db": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 1048576000,
    "btime": "2018-09-10 15:55:09.044469",
    "description": "bluefs db"
    }
}


You can try --dev option instead of --path, e.g.
ceph-bluestore-tool show-label --dev 





On 1.10.2018, at 16:48, Igor Fedotov  wrote:

This looks like a sort of deadlock when BlueFS needs some additional 
space to replay the log left after the crash. Which happens during 
BlueFS open.


But such a space (at slow device as DB is full) is gifted in 
background during bluefs rebalance procedure which will occur after 
the open.


Hence OSDs stuck in permanent crashing..

The only way to recover I can suggest for now is to expand DB 
volumes. You can do that with lvm tools if you have any spare space 
for that.


Once resized you'll need ceph-bluestore-tool to indicate volume 
expansion to BlueFS (bluefs-bdev-expand command ) and finally update 
DB volume size label with  set-label-key command.


The latter is a bit tricky for mimic - you might need to backport 
https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b


and rebuild ceph-bluestore-tool. Alternatively you can backport 
https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b


then bluefs expansion and label updates will occur in a single step.

I'll do these backports in upstream but this will take some time to 
pass all the procedures and get into official mimic release.


Will fire a ticket to fix the original issue as well.


Thanks,

Igor


On 10/1/2018 3:28 PM, Sergey Malinin wrote:
These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm 
prepare --bluestore /dev/nvme0n1p3" i.e. without specifying wal/db 
devices.
OSDs were created with bluestore_min_alloc_size_ssd=4096, another 
modified setting is bluestore_cache_kv_max=1073741824


DB/block usage collected by prometheus module for 3 failed and 1 
survived 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov



On 10/1/2018 5:03 PM, Sergey Malinin wrote:

Before I received your response, I had already added 20GB to the OSD (by epanding LV followed 
by bluefs-bdev-expand) and ran "ceph-kvstore-tool bluestore-kv  compact", 
however it still needs more space.
Is that because I didn't update DB size with set-label-key?
In mimic you need to run both "bluefs-bdev-expand" and "set-label-key" 
command to commit bluefs volume expansion.
Unfortunately the last command doesn't handle "size" label properly. 
That's why you might need to backport and rebuild with the mentioned 
commits.



What exactly is the label-key that needs to be updated, as I couldn't find 
which one is related to DB:

# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
{
 "/var/lib/ceph/osd/ceph-1/block": {
 "osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
 "size": 471305551872,
 "btime": "2018-07-31 03:06:43.751243",
 "description": "main",
 "bluefs": "1",
 "ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
 "kv_backend": "rocksdb",
 "magic": "ceph osd volume v026",
 "mkfs_done": "yes",
 "osd_key": "XXX",
 "ready": "ready",
 "whoami": "1"
 }
}

'size' label but your output is for block(aka slow) device.

It should return labels for db/wal devices as well (block.db and 
block.wal symlinks respectively). It works for me in master, can't 
verify with mimic at the moment though..

Here is output for master:

# bin/ceph-bluestore-tool show-label --path dev/osd0
inferring bluefs devices from bluestore path
{
    "dev/osd0/block": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 21474836480,
    "btime": "2018-09-10 15:55:09.044039",
    "description": "main",
    "bluefs": "1",
    "ceph_fsid": "56eddc15-11b9-4e0b-9192-e391fbae551c",
    "kv_backend": "rocksdb",
    "magic": "ceph osd volume v026",
    "mkfs_done": "yes",
    "osd_key": "AQCsaZZbYTxXJBAAe3jJI4p6WbMjvA8CBBUJbA==",
    "ready": "ready",
    "whoami": "0"
    },
    "dev/osd0/block.wal": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 1048576000,
    "btime": "2018-09-10 15:55:09.044985",
    "description": "bluefs wal"
    },
    "dev/osd0/block.db": {
    "osd_uuid": "404dcbe9-3f8d-4ef5-ac59-2582454a9a75",
    "size": 1048576000,
    "btime": "2018-09-10 15:55:09.044469",
    "description": "bluefs db"
    }
}


You can try --dev option instead of --path, e.g.
ceph-bluestore-tool show-label --dev 





On 1.10.2018, at 16:48, Igor Fedotov  wrote:

This looks like a sort of deadlock when BlueFS needs some additional space to 
replay the log left after the crash. Which happens during BlueFS open.

But such a space (at slow device as DB is full) is gifted in background during 
bluefs rebalance procedure which will occur after the open.

Hence OSDs stuck in permanent crashing..

The only way to recover I can suggest for now is to expand DB volumes. You can 
do that with lvm tools if you have any spare space for that.

Once resized you'll need ceph-bluestore-tool to indicate volume expansion to 
BlueFS (bluefs-bdev-expand command ) and finally update DB volume size label 
with  set-label-key command.

The latter is a bit tricky for mimic - you might need to backport 
https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b

and rebuild ceph-bluestore-tool. Alternatively you can backport 
https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b

then bluefs expansion and label updates will occur in a single step.

I'll do these backports in upstream but this will take some time to pass all 
the procedures and get into official mimic  release.

Will fire a ticket to fix the original issue as well.


Thanks,

Igor


On 10/1/2018 3:28 PM, Sergey Malinin wrote:

These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm prepare 
--bluestore /dev/nvme0n1p3" i.e. without specifying wal/db devices.
OSDs were created with bluestore_min_alloc_size_ssd=4096, another modified 
setting is bluestore_cache_kv_max=1073741824

DB/block usage collected by prometheus module for 3 failed and 1 survived OSDs:

ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> this one has 
survived
ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0

ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0

ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Sergey Malinin
Before I received your response, I had already added 20GB to the OSD (by 
epanding LV followed by bluefs-bdev-expand) and ran "ceph-kvstore-tool 
bluestore-kv  compact", however it still needs more space.
Is that because I didn't update DB size with set-label-key?

What exactly is the label-key that needs to be updated, as I couldn't find 
which one is related to DB:

# ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-1
inferring bluefs devices from bluestore path
{
"/var/lib/ceph/osd/ceph-1/block": {
"osd_uuid": "f8f122ee-70a6-4c54-8eb0-9b42205b1ecc",
"size": 471305551872,
"btime": "2018-07-31 03:06:43.751243",
"description": "main",
"bluefs": "1",
"ceph_fsid": "7d320499-5b3f-453e-831f-60d4db9a4533",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"osd_key": "XXX",
"ready": "ready",
"whoami": "1"
}
}


> On 1.10.2018, at 16:48, Igor Fedotov  wrote:
> 
> This looks like a sort of deadlock when BlueFS needs some additional space to 
> replay the log left after the crash. Which happens during BlueFS open.
> 
> But such a space (at slow device as DB is full) is gifted in background 
> during bluefs rebalance procedure which will occur after the open.
> 
> Hence OSDs stuck in permanent crashing..
> 
> The only way to recover I can suggest for now is to expand DB volumes. You 
> can do that with lvm tools if you have any spare space for that.
> 
> Once resized you'll need ceph-bluestore-tool to indicate volume expansion to 
> BlueFS (bluefs-bdev-expand command ) and finally update DB volume size label 
> with  set-label-key command.
> 
> The latter is a bit tricky for mimic - you might need to backport 
> https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b
> 
> and rebuild ceph-bluestore-tool. Alternatively you can backport 
> https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b
> 
> then bluefs expansion and label updates will occur in a single step.
> 
> I'll do these backports in upstream but this will take some time to pass all 
> the procedures and get into official mimic  release.
> 
> Will fire a ticket to fix the original issue as well.
> 
> 
> Thanks,
> 
> Igor
> 
> 
> On 10/1/2018 3:28 PM, Sergey Malinin wrote:
>> These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm prepare 
>> --bluestore /dev/nvme0n1p3" i.e. without specifying wal/db devices.
>> OSDs were created with bluestore_min_alloc_size_ssd=4096, another modified 
>> setting is bluestore_cache_kv_max=1073741824
>> 
>> DB/block usage collected by prometheus module for 3 failed and 1 survived 
>> OSDs:
>> 
>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> this one 
>> has survived
>> ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0
>> 
>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
>> ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0
>> 
>> ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
>> ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0
>> ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0
>> ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0
>> 
>> ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0
>> ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0
>> ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0
>> ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0
>> 
>> 
>> First crashed OSD was doing DB compaction, others crashed shortly after 
>> during backfilling. Workload was "ceph-data-scan scan_inodes" filling 
>> metadata pool located on these OSDs at the rate close to 10k objects/second.
>> Here is the log excerpt of the first crash occurrence:
>> 
>> 2018-10-01 03:27:12.762 7fbf16dd6700  0 bluestore(/var/lib/ceph/osd/ceph-1) 
>> _balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x1000
>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: 
>> [/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 
>> 24] Generated table #89741: 106356 keys, 68110589 bytes
>> 2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 
>> {"time_micros": 1538353632892744, "cf_name": "default", "job": 24, "event": 
>> "table_file_creation", "file_number": 89741, "file_size": 68110589, 
>> "table_properties": {"data_size": 67112903, "index_size": 579319, 
>> "filter_size": 417316, "raw_key_size": 6733561, "raw_average_key_size": 63, 
>> "raw_value_size": 60994583, "raw_average_value_size": 573, 
>> "num_data_blocks": 16336, "num_entries": 106356, "filter_policy_name": 
>> "rocksdb.BuiltinBloomFilter", "kDeletedKeys": 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov
This looks like a sort of deadlock when BlueFS needs some additional 
space to replay the log left after the crash. Which happens during 
BlueFS open.


But such a space (at slow device as DB is full) is gifted in background 
during bluefs rebalance procedure which will occur after the open.


Hence OSDs stuck in permanent crashing..

The only way to recover I can suggest for now is to expand DB volumes. 
You can do that with lvm tools if you have any spare space for that.


Once resized you'll need ceph-bluestore-tool to indicate volume 
expansion to BlueFS (bluefs-bdev-expand command ) and finally update DB 
volume size label with  set-label-key command.


The latter is a bit tricky for mimic - you might need to backport 
https://github.com/ceph/ceph/pull/22085/commits/ffac450da5d6e09cf14b8363b35f21819b48f38b


and rebuild ceph-bluestore-tool. Alternatively you can backport 
https://github.com/ceph/ceph/pull/22085/commits/71c3b58da4e7ced3422bce2b1da0e3fa9331530b


then bluefs expansion and label updates will occur in a single step.

I'll do these backports in upstream but this will take some time to pass 
all the procedures and get into official mimic  release.


Will fire a ticket to fix the original issue as well.


Thanks,

Igor


On 10/1/2018 3:28 PM, Sergey Malinin wrote:

These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm prepare 
--bluestore /dev/nvme0n1p3" i.e. without specifying wal/db devices.
OSDs were created with bluestore_min_alloc_size_ssd=4096, another modified 
setting is bluestore_cache_kv_max=1073741824

DB/block usage collected by prometheus module for 3 failed and 1 survived OSDs:

ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> this one has 
survived
ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0

ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0

ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0

ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0


First crashed OSD was doing DB compaction, others crashed shortly after during 
backfilling. Workload was "ceph-data-scan scan_inodes" filling metadata pool 
located on these OSDs at the rate close to 10k objects/second.
Here is the log excerpt of the first crash occurrence:

2018-10-01 03:27:12.762 7fbf16dd6700  0 bluestore(/var/lib/ceph/osd/ceph-1) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x1000
2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24] 
Generated table #89741: 106356 keys, 68110589 bytes
2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1538353632892744, "cf_name": "default", "job": 24, "event": "table_file_creation", "file_number": 89741, "file_size": 68110589, "table_properties": 
{"data_size": 67112903, "index_size": 579319, "filter_size": 417316, "raw_key_size": 6733561, "raw_average_key_size": 63, "raw_value_size": 60994583, "raw_average_value_size": 573, "num_data_blocks": 16336, "num_entries": 106356, 
"filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "1", "kMergeOperands": "0"}}
2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24] 
Generated table #89742: 23214 keys, 16352315 bytes
2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1538353632938670, "cf_name": "default", "job": 24, "event": "table_file_creation", "file_number": 89742, "file_size": 16352315, "table_properties": 
{"data_size": 16116986, "index_size": 139894, "filter_size": 94386, "raw_key_size": 1470883, "raw_average_key_size": 63, "raw_value_size": 14775006, "raw_average_value_size": 636, "num_data_blocks": 3928, "num_entries": 23214, 
"filter_policy_name": "rocksdb.BuiltinBloomFilter", "kDeletedKeys": "90", "kMergeOperands": "0"}}
2018-10-01 03:27:13.042 7fbf1e5e5700  1 bluefs _allocate failed to allocate 
0x410 on bdev 1, free 0x1a0; fallback to bdev 2
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _allocate failed to allocate 
0x410 on bdev 2, dne
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _flush_range allocated: 0x0 
offset: 0x0 length: 0x40ea9f1
2018-10-01 03:27:13.046 7fbf1e5e5700 -1 
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Sergey Malinin
These are LVM bluestore NVMe SSDs created with "ceph-volume --lvm prepare 
--bluestore /dev/nvme0n1p3" i.e. without specifying wal/db devices.
OSDs were created with bluestore_min_alloc_size_ssd=4096, another modified 
setting is bluestore_cache_kv_max=1073741824

DB/block usage collected by prometheus module for 3 failed and 1 survived OSDs:

ceph_bluefs_db_total_bytes{ceph_daemon="osd.0"} 65493008384.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.1"} 49013587968.0
ceph_bluefs_db_total_bytes{ceph_daemon="osd.2"} 76834406400.0 --> this one has 
survived
ceph_bluefs_db_total_bytes{ceph_daemon="osd.3"} 63726157824.0

ceph_bluefs_db_used_bytes{ceph_daemon="osd.0"} 65217232896.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.1"} 48944381952.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.2"} 68093476864.0
ceph_bluefs_db_used_bytes{ceph_daemon="osd.3"} 63632834560.0

ceph_osd_stat_bytes{ceph_daemon="osd.0"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.1"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.2"} 471305551872.0
ceph_osd_stat_bytes{ceph_daemon="osd.3"} 471305551872.0

ceph_osd_stat_bytes_used{ceph_daemon="osd.0"} 222328213504.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.1"} 214472544256.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.2"} 163603996672.0
ceph_osd_stat_bytes_used{ceph_daemon="osd.3"} 212806815744.0


First crashed OSD was doing DB compaction, others crashed shortly after during 
backfilling. Workload was "ceph-data-scan scan_inodes" filling metadata pool 
located on these OSDs at the rate close to 10k objects/second.
Here is the log excerpt of the first crash occurrence:

2018-10-01 03:27:12.762 7fbf16dd6700  0 bluestore(/var/lib/ceph/osd/ceph-1) 
_balance_bluefs_freespace no allocate on 0x8000 min_alloc_size 0x1000
2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24] 
Generated table #89741: 106356 keys, 68110589 bytes
2018-10-01 03:27:12.886 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
1538353632892744, "cf_name": "default", "job": 24, "event": 
"table_file_creation", "file_number": 89741, "file_size": 68110589, 
"table_properties": {"data_size": 67112903, "index_size": 579319, 
"filter_size": 417316, "raw_key_size": 6733561, "raw_average_key_size": 63, 
"raw_value_size": 60994583, "raw_average_value_size": 573, "num_data_blocks": 
16336, "num_entries": 106356, "filter_policy_name": 
"rocksdb.BuiltinBloomFilter", "kDeletedKeys": "1", "kMergeOperands": "0"}}
2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: 
[/build/ceph-13.2.2/src/rocksdb/db/compaction_job.cc:1166] [default] [JOB 24] 
Generated table #89742: 23214 keys, 16352315 bytes
2018-10-01 03:27:12.934 7fbf1e5e5700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 
1538353632938670, "cf_name": "default", "job": 24, "event": 
"table_file_creation", "file_number": 89742, "file_size": 16352315, 
"table_properties": {"data_size": 16116986, "index_size": 139894, 
"filter_size": 94386, "raw_key_size": 1470883, "raw_average_key_size": 63, 
"raw_value_size": 14775006, "raw_average_value_size": 636, "num_data_blocks": 
3928, "num_entries": 23214, "filter_policy_name": "rocksdb.BuiltinBloomFilter", 
"kDeletedKeys": "90", "kMergeOperands": "0"}}
2018-10-01 03:27:13.042 7fbf1e5e5700  1 bluefs _allocate failed to allocate 
0x410 on bdev 1, free 0x1a0; fallback to bdev 2
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _allocate failed to allocate 
0x410 on bdev 2, dne
2018-10-01 03:27:13.042 7fbf1e5e5700 -1 bluefs _flush_range allocated: 0x0 
offset: 0x0 length: 0x40ea9f1
2018-10-01 03:27:13.046 7fbf1e5e5700 -1 
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int 
BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 
7fbf1e5e5700 time 2018-10-01 03:27:13.048298
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 == "bluefs 
enospc")

 ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x102) [0x7fbf2d4fe5c2]
 2: (()+0x26c787) [0x7fbf2d4fe787]
 3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned 
long)+0x1ab4) [0x5619325114b4]
 4: (BlueRocksWritableFile::Flush()+0x3d) [0x561932527c1d]
 5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x56193271c399]
 6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x56193271d42b]
 7: (rocksdb::CompactionJob::FinishCompactionOutputFile(rocksdb::Status const&, 
rocksdb::CompactionJob::SubcompactionState*, rocksdb::RangeDelAggregator*, 
CompactionIterationStats*, rocksdb::Slice const*)+0x3db) [0x56193276098b]
 8: 
(rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)+0x7d9)
 [0x561932763da9]
 9: (rocksdb::CompactionJob::Run()+0x314) [0x561932765504]
 10: (rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, 
rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*)+0xc54) 
[0x5619325b5c44]
 11: 

Re: [ceph-users] mimic: 3/4 OSDs crashed on "bluefs enospc"

2018-10-01 Thread Igor Fedotov

Hi Sergey,

could you please provide more details on your OSDs ?

What are sizes for DB/block devices?

Do you have any modifications in BlueStore config settings?

Can you share stats you're referring to?


Thanks,

Igor


On 10/1/2018 12:29 PM, Sergey Malinin wrote:

Hello,
3 of 4 NVME OSDs crashed at the same time on assert(0 == "bluefs enospc") and 
no longer start.
Stats collected just before crash show that ceph_bluefs_db_used_bytes is 100% 
used. Although OSDs have over 50% of free space, it is not reallocated for DB 
usage.

2018-10-01 12:18:06.744 7f1d6a04d240  1 bluefs _allocate failed to allocate 
0x10 on bdev 1, free 0x0; fallback to bdev 2
2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _allocate failed to allocate 
0x10 on bdev 2, dne
2018-10-01 12:18:06.744 7f1d6a04d240 -1 bluefs _flush_range allocated: 0x0 
offset: 0x0 length: 0xa8700
2018-10-01 12:18:06.748 7f1d6a04d240 -1 
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: In function 'int 
BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 
7f1d6a04d240 time 2018-10-01 12:18:06.746800
/build/ceph-13.2.2/src/os/bluestore/BlueFS.cc: 1663: FAILED assert(0 == "bluefs 
enospc")

  ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable)
  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x102) [0x7f1d6146f5c2]
  2: (()+0x26c787) [0x7f1d6146f787]
  3: (BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned 
long)+0x1ab4) [0x5586b22684b4]
  4: (BlueRocksWritableFile::Flush()+0x3d) [0x5586b227ec1d]
  5: (rocksdb::WritableFileWriter::Flush()+0x1b9) [0x5586b2473399]
  6: (rocksdb::WritableFileWriter::Sync(bool)+0x3b) [0x5586b247442b]
  7: (rocksdb::BuildTable(std::__cxx11::basic_string, 
std::allocator > const&, rocksdb::Env*, rocksdb::ImmutableCFOptions const&, 
rocksdb::MutableCFOptions const&, rocksdb::EnvOptions const&, rock
sdb::TableCache*, rocksdb::InternalIterator*, std::unique_ptr >, rocksdb::FileMetaData*, 
rocksdb::InternalKeyComparator const&, std::vector >, 
std::allocator > > > co
nst*, unsigned int, std::__cxx11::basic_string, std::allocator 
> const&, std::vector >, unsigned long, 
rocksdb::SnapshotChecker*, rocksdb::Compression
Type, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, 
rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, 
rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned 
long, rocksdb
::Env::WriteLifeTimeHint)+0x1e24) [0x5586b249ef94]
  8: (rocksdb::DBImpl::WriteLevel0TableForRecovery(int, 
rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0xcb7) 
[0x5586b2321457]
  9: (rocksdb::DBImpl::RecoverLogFiles(std::vector > const&, unsigned long*, bool)+0x19de) [0x5586b232373e]
  10: (rocksdb::DBImpl::Recover(std::vector > const&, bool, bool, bool)+0x5d4) 
[0x5586b23242f4]
  11: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string, std::allocator > const&, 
std::vector > const&, std::vector >*, rocksdb::DB**, bool)+0x68b) [0x5586b232559b]
  12: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string, std::allocator > const&, 
std::vector
const&, std::vector >*, rocksdb::DB**)+0x22) [0x5586b2326e72]

  13: (RocksDBStore::do_open(std::ostream&, bool, std::vector > const*)+0x170c) [0x5586b220219c]
  14: (BlueStore::_open_db(bool, bool)+0xd8e) [0x5586b218ee1e]
  15: (BlueStore::_mount(bool, bool)+0x4b7) [0x5586b21bf807]
  16: (OSD::init()+0x295) [0x5586b1d673c5]
  17: (main()+0x268d) [0x5586b1c554ed]
  18: (__libc_start_main()+0xe7) [0x7f1d5ea2db97]
  19: (_start()+0x2a) [0x5586b1d1d7fa]
  NOTE: a copy of the executable, or `objdump -rdS ` is needed to 
interpret this.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com