[ceph-users] Multiple corrupt bluestore osds, Host Machine attacks VM OSDs

2019-07-25 Thread Daniel Williams
Hey,

I have a machine with 5 drives in a VM and 5 drives that were on the same
host machine. I've made this mistake once before ceph-volume activate -all
the host machines drives and it takes over the 5 drives in the VM as well
and corrupts them.

I've actually lost data this time. Erasure encoded 6.3 but losing 5 drives
I lost a small number of PGs (6). Repair gives this message

$ ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-0

/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25
23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r !=
q->second->file_map.end())
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14e) [0x7f21c56c4b5e]
 2: (()+0x2c4cb7) [0x7f21c56c4cb7]
 3: (BlueFS::_replay(bool, bool)+0x4082) [0x56432ef954a2]
 4: (BlueFS::mount()+0xff) [0x56432ef958ef]
 5: (BlueStore::_open_db(bool, bool)+0x81c) [0x56432eff1a1c]
 6: (BlueStore::_fsck(bool, bool)+0x337) [0x56432f00e0a7]
 7: (main()+0xf0a) [0x56432eea7dca]
 8: (__libc_start_main()+0xeb) [0x7f21c4b7109b]
 9: (_start()+0x2a) [0x56432ef700fa]
*** Caught signal (Aborted) **
 in thread 7f21c3c6d980 thread_name:ceph-bluestore-
2019-07-25 23:19:44.817 7f21c3c6d980 -1
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: In function 'int
BlueFS::_replay(bool, bool)' thread 7f21c3c6d980 time 2019-07-25
23:19:44.820537
/build/ceph-13.2.6/src/os/bluestore/BlueFS.cc: 848: FAILED assert(r !=
q->second->file_map.end())

I have two osds that don't start but at least make it further into the
repair
ceph-bluestore-tool repair --deep true --path /var/lib/ceph/osd/ceph-8
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x1~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x1~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x1~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
_verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0xf139a661,
expected 0x9344f85e, device location [0x1~1000], logical extent
0x0~1000, object #-1:7b3f43c4:::osd_superblock:0#
2019-07-25 21:59:17.314 7f1a03dfb980 -1 bluestore(/var/lib/ceph/osd/ceph-8)
fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read:  0~21a (5)
Input/output error
... still running 

I've read through the archives and unlike others who have come across this
I'm not able to recover the content without the lost OSDs.

These PGs are backing a cephfs instance, so ideally
1. I'd be able to recover the 6 missing PGs for 3 OSDs of the 5 in broken
state...
or less desirable
2. Figure out how to map PGs to cephfs files that I lost on the cephfs so
that I can figure out whats lost and what remains.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs on an EC Pool - What determines object size

2019-05-02 Thread Daniel Williams
Thanks so much for your help!

On Mon, Apr 29, 2019 at 6:49 PM Gregory Farnum  wrote:

> Yes, check out the file layout options:
> http://docs.ceph.com/docs/master/cephfs/file-layouts/
>
> On Mon, Apr 29, 2019 at 3:32 PM Daniel Williams 
> wrote:
> >
> > Is the 4MB configurable?
> >
> > On Mon, Apr 29, 2019 at 4:36 PM Gregory Farnum 
> wrote:
> >>
> >> CephFS automatically chunks objects into 4MB objects by default. For
> >> an EC pool, RADOS internally will further subdivide them based on the
> >> erasure code and striping strategy, with a layout that can vary. But
> >> by default if you have eg an 8+3 EC code, you'll end up with a bunch
> >> of (4MB/8=)512KB objects within the OSD.
> >> -Greg
> >>
> >> On Sun, Apr 28, 2019 at 12:42 PM Daniel Williams 
> wrote:
> >> >
> >> > Hey,
> >> >
> >> > What controls / determines object size of a purely cephfs ec (6.3)
> pool? I have large file but seemingly small objects.
> >> >
> >> > Daniel
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Cephfs on an EC Pool - What determines object size

2019-04-29 Thread Daniel Williams
Is the 4MB configurable?

On Mon, Apr 29, 2019 at 4:36 PM Gregory Farnum  wrote:

> CephFS automatically chunks objects into 4MB objects by default. For
> an EC pool, RADOS internally will further subdivide them based on the
> erasure code and striping strategy, with a layout that can vary. But
> by default if you have eg an 8+3 EC code, you'll end up with a bunch
> of (4MB/8=)512KB objects within the OSD.
> -Greg
>
> On Sun, Apr 28, 2019 at 12:42 PM Daniel Williams 
> wrote:
> >
> > Hey,
> >
> > What controls / determines object size of a purely cephfs ec (6.3) pool?
> I have large file but seemingly small objects.
> >
> > Daniel
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Cephfs on an EC Pool - What determines object size

2019-04-28 Thread Daniel Williams
Hey,

What controls / determines object size of a purely cephfs ec (6.3) pool? I
have large file but seemingly small objects.

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Optimizing for cephfs throughput on a hdd pool

2019-04-18 Thread Daniel Williams
Hey,

Im running a new ceph 13 cluster, using just one cephfs, 6.3 erasure
encoded stripe pool, each osd is 10T hdd, 20 total, all on there own host.
Storing mostly large files ~20G. I'm running mostly stock except that I've
optimized for the low (2G) memory hosts based an old threads
recommendations.

I'm trying to fill it and test various failure scenarios and by far my
biggest bottleneck is iops for both writing and recovery. I'm guessing
because of the journal write + block write (seeing roughly 30MiB/s for
100iops). SSD for the journal is not possible.

Am I correct in saying that I'm really only able to reduce/influence
iops/MiB for the block write? Is the correct way to increase that is to
increase the stripe_unit by say 3x to achieve 100MiB/s per osd?

Daniel
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com