On 28. sep. 2017 18:53, hjcho616 wrote:
Yay! Finally after about exactly one month I finally am able to mount
the drive! Now is time to see how my data is doing. =P Doesn't look
too bad though.
Got to love the open source. =) I downloaded ceph source code. Built them. Then tried to run
Yay! Finally after about exactly one month I finally am able to mount the
drive! Now is time to see how my data is doing. =P Doesn't look too bad
though.
Got to love the open source. =) I downloaded ceph source code. Built them.
Then tried to run ceph-objectstore-export on that osd.4.
Ronny,
Could you help me with this log? I got this with debug osd=20 filestore=20
ms=20. This one is running "ceph pg repair 2.7" This is one of the smaller
page, thus log was smaller. Others have similar errors. I can see the lines
with ERR, but other than that is there something I should
# rados list-inconsistent-pg
data["0.0","0.5","0.a","0.e","0.1c","0.29","0.2c"]# rados list-inconsistent-pg
metadata["1.d","1.3d"]# rados list-inconsistent-pg rbd["2.7"]# rados
list-inconsistent-obj 0.0 --format=json-pretty
{ "epoch": 23112, "inconsistents": []}# rados
Thanks Ronny. I'll try that inconsistent issue soon.
I think the OSD drive that PG 1.28 is sitting on is still ok... just file
corruption happened when power outage happened.. =P As you suggested, cd
/var/lib/ceph/osd/ceph-4/current/
tar --xattrs --preserve-permissions -zcvf osd.4.tar.gz
i would only tar the pg you have missing objects from, trying to inject
older objects when the pg is correct can not be good.
scrub errors is kind of the issue with only 2 replicas. when you have 2
different objects. how to know witch one is correct and witch one is bad..
and as you have read
Thanks Ronny.
I decided to try to tar everything under current directory. Is this correct
command for it? Is there any directory we do not want in the new drive?
commit_op_seq, meta, nosnap, omap?
tar --xattrs --preserve-permissions -zcvf osd.4.tar.gz .
As far as inconsistent PGs... I am
On 20.09.2017 16:49, hjcho616 wrote:
Anyone? Can this page be saved? If not what are my options?
Regards,
Hong
On Saturday, September 16, 2017 1:55 AM, hjcho616
wrote:
Looking better... working on scrubbing..
HEALTH_ERR 1 pgs are stuck inactive for more than 300
Anyone? Can this page be saved? If not what are my options?
Regards,Hong
On Saturday, September 16, 2017 1:55 AM, hjcho616
wrote:
Looking better... working on scrubbing..HEALTH_ERR 1 pgs are stuck inactive
for more than 300 seconds; 1 pgs incomplete; 12 pgs
Looking better... working on scrubbing..HEALTH_ERR 1 pgs are stuck inactive for
more than 300 seconds; 1 pgs incomplete; 12 pgs inconsistent; 2 pgs repair; 1
pgs stuck inactive; 1 pgs stuck unclean; 109 scrub errors; too few PGs per OSD
(29 < min 30); mds rank 0 has failed; mds cluster is
After running ceph osd lost osd.0, it started backfilling... I figured that was
supposed to happen earlier when I added those missing PGs. Running in to "too
few PGs per OSD" I removed osds after cluster stopped working after adding
osds. But I guess I still needed them. Currently I see
you write you had all pg's exported except one. so i assume you have
injected those pg's into the cluster again using the method linked a few
times in this thread. How did that go, were you successfull in
recovering those pg's ?
kind regards.
Ronny Aasen
On 15. sep. 2017 07:52, hjcho616
I just did this and backfilling started. Let's see where this takes me. ceph
osd lost 0 --yes-i-really-mean-it
Regards,Hong
On Friday, September 15, 2017 12:44 AM, hjcho616 wrote:
Ronny,
Working with all of the pgs shown in the "ceph health detail", I ran below for
Ronny,
Working with all of the pgs shown in the "ceph health detail", I ran below for
each PG to export.ceph-objectstore-tool --op export --pgid 0.1c --data-path
/var/lib/ceph/osd/ceph-0 --journal-path /var/lib/ceph/osd/ceph-0/journal
--skip-journal-replay --file 0.1c.export
I have all PGs
Rooney,
Just tried hooking up osd.0 back. osd.0 seems to be better as I was able to
run ceph-objectstore-tool export so decided to try hooking it up. Looks like
journal is not happy. Is there any way to get this running? Or do I need to
start getting data using ceph-objectstore-tool?
On 13. sep. 2017 07:04, hjcho616 wrote:
Ronny,
Did bunch of ceph pg repair pg# and got the scrub errors down to 10...
well was 9, trying to fix one became 10.. waiting for it to fix (I did
that noout trick as I only have two copies). 8 of those scrub errors
looks like it would need data
Ronny,
Did bunch of ceph pg repair pg# and got the scrub errors down to 10... well was
9, trying to fix one became 10.. waiting for it to fix (I did that noout trick
as I only have two copies). 8 of those scrub errors looks like it would need
data from osd.0.
HEALTH_ERR 22 pgs are stuck
Thank you for those references! I'll have to go study some more. Good portion
of that inconsistent seems to be from missing data from osd.0. =P There
appears to be some from okay drives. =P Kicked off "ceph pg repair pg#" few
times, but doesn't seem to change much yet. =P As far as smart
you can start by posting more details. atleast
"ceph osd tree" "cat ceph.conf" and "ceph osd df" so we can see what
settings you are running, and how your cluster is balanced at the moment.
generally:
inconsistent pg's are pg's that have scrub errors. use rados
list-inconsistent-pg [pool]
It took a while. It appears to have cleaned up quite a bit... but still has
issues. I've been seeing below message for more than a day and cpu utilization
and io utilization is low... looks like something is stuck... I rebooted OSDs
several times when it looked like it was stuck earlier and
Hmm.. I hope I don't really need any thing from osd.0. =P
# ceph-objectstore-tool --op export --pgid 2.35 --data-path
/var/lib/ceph/osd/ceph-0 --journal-path /var/lib/ceph/osd/ceph-0/journal --file
2.35.exportFailure to read OSD superblock: (2) No such file or directory#
ceph-objectstore-tool
Ronny,
While letting cluster replicate, looks like this might be a while, I decided to
look in to where those pgs are missing.. From the "ceph health detail" I found
pgs that are unfound. Then found the directories that had that pgs, pasted on
the right of that detail message below..pg 2.35 is
Thank you Ronny. I've added two OSDs to OSD2, 2TB each. I hope that would be
enough. =) I've changed min_size and size to 2. OSDs are busy balancing
again. I'll try those you recommended and will get back to you with more
questions! =)
# ceph osd treeID WEIGHT TYPE NAME UP/DOWN
I would not even attempt to connect a recovered drive to ceph,
especially not one that have had xfs errors and corruption.
your pg's that are undersized lead me to belive you still need to either
expand, with more disks, or nodes. or that you need to set
|osd crush chooseleaf type = 0 |
to
I checked with ceph-2, 3, 4, 5 so I figured it was safe to assume that
superblock file is the same. I copied it over and started OSD. It still fails
with the same error message. Looks like when I updated to 10.2.9, some osd
needs to be updated and that process is not finding the data it
Just realized there is a file called superblock in the ceph directory. ceph-1
and ceph-2's superblock file is identical, ceph-6 and ceph-7 are identical, but
not between the two groups. When I originally created the OSDs, I created
ceph-0 through 5. Can superblock file be copied over from
Tried connecting recovered osd. Looks like some of the files in the lost+found
are super blocks. Below is the log. What can I do about this?
2017-09-01 22:27:27.634228 7f68837e5800 0 set uid:gid to 1001:1001
(ceph:ceph)2017-09-01 22:27:27.634245 7f68837e5800 0 ceph version 10.2.9
Found the partition, wasn't able to mount the partition right away... Did a
xfs_repair on that drive.
Got bunch of messages like this.. =(entry
"10a89fd.__head_AE319A25__0" in shortform directory 845908970
references non-existent inode 605294241 junking entry
Looks like it has been rescued... Only 1 error as we saw before in the smart
log!# ddrescue -f /dev/sda /dev/sdc ./rescue.logGNU ddrescue 1.21Press Ctrl-C
to interrupt ipos: 1508 GB, non-trimmed: 0 B, current rate:
0 B/s opos: 1508 GB, non-scraped: 0 B,
On 30.08.2017 15:32, Steve Taylor wrote:
I'm not familiar with dd_rescue, but I've just been reading about it.
I'm not seeing any features that would be beneficial in this scenario
that aren't also available in dd. What specific features give it
"really a far better chance of restoring a copy
I'm not familiar with dd_rescue, but I've just been reading about it. I'm not
seeing any features that would be beneficial in this scenario that aren't also
available in dd. What specific features give it "really a far better chance of
restoring a copy of your disk" than dd? I'm always
Yes, if I had created the RBD in the same cluster I was trying to repair then I
would have used rbd-fuse to "map" the RBD in order to avoid potential deadlock
issues with the kernel client. I had another cluster available, so I copied its
config file to the osd node, created the RBD in the
[snip]
I'm not sure if I am liking what I see on fdisk... it doesn't show sdb1.
I hope it shows up when I run dd_rescue to other drive... =P
# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.25.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using
This is what it looks like today. Seems like ceph-osds are sitting at 0% cpu
so... all the migrations appear to be done, Does this look ok to shutdown and
continue when I get the HDD on Thursday?
# ceph healthHEALTH_ERR 22 pgs are stuck inactive for more than 300 seconds; 20
pgs
Maged, on second host he has 4 out of 5 OSD failed on him … I think he’s past
the trying to increase the backfill threshold :) ofcourse he could try to
degrade cluster by letting mirror within same host :)
> On 29 Aug 2017, at 21:26, Maged Mokhtar wrote:
>
> One of the
One of the things to watch out in small clusters is OSDs can get full
rather unexpectedly in recovery/backfill cases:
In your case you have 2 OSD nodes with 5 disks each. Since you have a
replica of 2, each PG will have 1 copy on each host, so if an OSD fails,
all its PGs will have to be
Just FYI, setting size and min_size to 1 is a last resort in my mind - to get
you out of dodge !!
Before setting that you should have made your self 105% certain that all OSD
you leave ON, have NO bad sectors or no sectors pending or no any errors of any
kind.
once you can mount the
On 29-8-2017 19:12, Steve Taylor wrote:
> Hong,
>
> Probably your best chance at recovering any data without special,
> expensive, forensic procedures is to perform a dd from /dev/sdb to
> somewhere else large enough to hold a full disk image and attempt to
> repair that. You'll want to use
Nice! Thank you for the explanation! I feel like I can revive that OSD. =)
That does sound great. I don't quite have another cluster so waiting for a
drive to arrive! =)
After setting min and max_min to 1, looks like toofull flag is gone... Maybe
when I was making that video copy OSDs
But it was absolutely awesome to run an osd off of an rbd after the disk
failed.
On Tue, Aug 29, 2017, 1:42 PM David Turner wrote:
> To addend Steve's success, the rbd was created in a second cluster in the
> same datacenter so it didn't run the risk of deadlocking that
To addend Steve's success, the rbd was created in a second cluster in the
same datacenter so it didn't run the risk of deadlocking that mapping rbds
on machines running osds has. It is still theoretical to work on the same
cluster, but more inherently dangerous for a few reasons.
On Tue, Aug 29,
Hong,
Probably your best chance at recovering any data without special,
expensive, forensic procedures is to perform a dd from /dev/sdb to
somewhere else large enough to hold a full disk image and attempt to
repair that. You'll want to use 'conv=noerror' with your dd command
since your disk is
Rule of thumb with batteries is:
- more “proper temperature” you run them at the more life you get out of them
- more battery is overpowered for your application the longer it will survive.
Get your self a LSI 94** controller and use it as HBA and you will be fine. but
get MORE DRIVES ! …
Thank you Tomasz and Ronny. I'll have to order some hdd soon and try these
out. Car battery idea is nice! I may try that.. =) Do they last longer?
Ones that fit the UPS original battery spec didn't last very long... part of
the reason why I gave up on them.. =P My wife probably won't like
Sorry for being brutal … anyway
1. get the battery for UPS ( a car battery will do as well, I’ve moded on ups
in the past with truck battery and it was working like a charm :D )
2. get spare drives and put those in because your cluster CAN NOT get out of
error due to lack of space
3. Follow
Tomasz,
Those machines are behind a surge protector. Doesn't appear to be a good one!
I do have a UPS... but it is my fault... no battery. Power was pretty reliable
for a while... and UPS was just beeping every chance it had, disrupting some
sleep.. =P So running on surge protector only. I
> [SNIP - bad drives]
Generally when a disk is displaying bad blocks to the OS, the drive have
been remapping blocks for ages in the background. and the disk is really
on it's last legs. a bit unlikely that you get so many disks dying at
the same time tho. but the problem can have been
So to decode few things about your disk:
1 Raw_Read_Error_Rate 0x002f 100 100 051Pre-fail Always
- 37
37 read erros and only one sector marked as pending - fun disk :/
181 Program_Fail_Cnt_Total 0x0022 099 099 000Old_age Always
- 35325174
I think you are looking at something more like this :
I think you are looking at something more like this :
So.. would doing something like this could potentially bring it back to life? =)
Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki
|
|
|
| ||
|
|
|
| |
Analyzing a Faulty Hard Disk using Smartctl - Thomas-Krenn-Wiki
| |
|
|
On
I think you’ve got your anwser:
197 Current_Pending_Sector 0x0032 100 100 000Old_age Always
- 1
> On 28 Aug 2017, at 21:22, hjcho616 wrote:
>
> Steve,
>
> I thought that was odd too..
>
> Below is from the log, This captures transition from good
Steve,
I thought that was odd too..
Below is from the log, This captures transition from good to bad. Looks like
there is "Device: /dev/sdb [SAT], 1 Currently unreadable (pending) sectors".
And looks like I did a repair with /dev/sdb1... =P
# grep sdb syslog.1Aug 27 06:27:22 OSD1 smartd[1031]:
I'm jumping in a little late here, but running xfs_repair on your partition
can't frag your partition table. The partition table lives outside the
partition block device and xfs_repair doesn't have access to it when run
against /dev/sdb1. I haven't actually tested it, but it seems unlikely that
Tomasz,
Looks like when I did xfs_repair -L /dev/sdb1 it did something to partition
table and I don't see /dev/sdb1 anymore... or maybe I missed 1 in the
/dev/sdb1? =(. Yes.. that extra power outage did a pretty good damage... =P I
am hoping 0.007% is very small...=P Any recommendations on
comments inline
On 28.08.2017 18:31, hjcho616 wrote:
I'll see what I can do on that... Looks like I may have to add another
OSD host as I utilized all of the SATA ports on those boards. =P
Ronny,
I am running with size=2 min_size=1. I created everything with
ceph-deploy and didn't touch
Sorry mate I’ve just noticed the
"unfound (0.007%)”
I think that your main culprit here is osd.0. You need to have all osd’s on one
host to get all the data back.
Also for time being I would just change size and min size down to 1 and try to
figure out which osd you actually need to get all
Thank you all for suggestions!
Maged,
I'll see what I can do on that... Looks like I may have to add another OSD host
as I utilized all of the SATA ports on those boards. =P
Ronny,
I am running with size=2 min_size=1. I created everything with ceph-deploy and
didn't touch much of that pool
On 28. aug. 2017 08:01, hjcho616 wrote:
Hello!
I've been using ceph for long time mostly for network CephFS storage,
even before Argonaut release! It's been working very well for me. Yes,
I had some power outtages before and asked few questions on this list
before and got resolved happily!
I would suggest either adding 1 new disk on each of the 2 machines
increasing the osd_backfill_full_ratio to something like 90 or 92 from
default 85.
/Maged
On 2017-08-28 08:01, hjcho616 wrote:
> Hello!
>
> I've been using ceph for long time mostly for network CephFS storage, even
>
60 matches
Mail list logo