[ceph-users] Small modifications to Bluestore migration documentation

2018-02-27 Thread Alexander Kushnirenko
Hello,

Luminous 12.2.2

There were several discussions on this list concerning Bluestore migration,
as official documentation does not work quite well yet. In particular this
one
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-January/024190.html

Is it possible to modify official documentation
http://docs.ceph.com/docs/master/rados/operations/bluestore-migration/

Item 8
ceph osd destroy $ID --yes-i-really-mean-it
ADD COMMAND
ceph osd purge $ID  --yes-i-really-mean-it

Item 9
REPLACE (There is actually a typo - lvm is missing)
ceph-volume create --bluestore --data $DEVICE --osd-id $ID
WITH
ceph-volume lvm create --bluestore --data $DEVICE

ceph-volume will automatically pick previous osd-id

PLUS a note to ignore errors ( _read_bdev_label unable to decode label at
offset 102) https://tracker.ceph.com/issues/22285

Alexander.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-16 Thread Alexander Kushnirenko
Hi, Gregory, Ian!

There is very little information on striper mode in Ceph documentation.
Could this explanation help?

The logic of striper mode is very much the same as in RAID-0.  There are 3
parameters that drives it:

stripe_unit - the stripe size  (default=4M)
stripe_count - how many objects to write in parallel (default=1)
object_size  - when to stop increasing object size and create new objects.
 (default =4M)

For example if you write 132M of data (132 consecutive pieces of data 1M
each) in striped mode with the following parameters:
stripe_unit = 8M
stripe_count = 4
object_size = 24M
Then 8 objects will be created - 4 objects with 24M size and 4 objects with
8M size.

Obj1=24MObj2=24MObj3=24MObj4=24M
00 .. 07 08 .. 0f 10 .. 17 18 .. 1f  <--
consecutive 1M pieces of data
20 .. 27 21 .. 2f 30 .. 37 38 .. 3f
40 .. 47 48 .. 4f 50 .. 57 58 .. 5f

Obj5= 8MObj6= 8MObj7= 8MObj8= 8M
60 .. 6768 .. 6f70 .. 7778 .. 7f

Alexander.




On Wed, Oct 11, 2017 at 3:19 PM, Alexander Kushnirenko <
kushnire...@gmail.com> wrote:

> Oh!  I put a wrong link, sorry  The picture which explains stripe_unit and
> stripe count is here:
>
> https://indico.cern.ch/event/330212/contributions/1718786/at
> tachments/642384/883834/CephPluginForXroot.pdf
>
> I tried to attach it in the mail, but it was blocked.
>
>
> On Wed, Oct 11, 2017 at 3:16 PM, Alexander Kushnirenko <
> kushnire...@gmail.com> wrote:
>
>> Hi, Ian!
>>
>> Thank you for your reference!
>>
>> Could you comment on the following rule:
>> object_size = stripe_unit * stripe_count
>> Or it is not necessarily so?
>>
>> I refer to page 8 in this report:
>>
>> https://indico.cern.ch/event/531810/contributions/2298934/at
>> tachments/1358128/2053937/Ceph-Experience-at-RAL-final.pdf
>>
>>
>> Alexander.
>>
>> On Wed, Oct 11, 2017 at 1:11 PM, <ian.john...@stfc.ac.uk> wrote:
>>
>>> Hi Gregory
>>>
>>> You’re right, when setting the object layout in libradosstriper, one
>>> should set all three parameters (the number of stripes, the size of the
>>> stripe unit, and the size of the striped object). The Ceph plugin for
>>> GridFTP has an example of this at https://github.com/stfc/gridFT
>>> PCephPlugin/blob/master/ceph_posix.cpp#L371
>>>
>>>
>>>
>>> At RAL, we use the following values:
>>>
>>>
>>>
>>> $STRIPER_NUM_STRIPES 1
>>>
>>> $STRIPER_STRIPE_UNIT 8388608
>>>
>>> $STRIPER_OBJECT_SIZE 67108864
>>>
>>>
>>>
>>> Regards,
>>>
>>>
>>>
>>> Ian Johnson MBCS
>>>
>>> Data Services Group
>>>
>>> Scientific Computing Department
>>>
>>> Rutherford Appleton Laboratory
>>>
>>>
>>>
>>> ___
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Oh!  I put a wrong link, sorry  The picture which explains stripe_unit and
stripe count is here:

https://indico.cern.ch/event/330212/contributions/1718786/attachments/642384/883834/CephPluginForXroot.pdf

I tried to attach it in the mail, but it was blocked.


On Wed, Oct 11, 2017 at 3:16 PM, Alexander Kushnirenko <
kushnire...@gmail.com> wrote:

> Hi, Ian!
>
> Thank you for your reference!
>
> Could you comment on the following rule:
> object_size = stripe_unit * stripe_count
> Or it is not necessarily so?
>
> I refer to page 8 in this report:
>
> https://indico.cern.ch/event/531810/contributions/2298934/at
> tachments/1358128/2053937/Ceph-Experience-at-RAL-final.pdf
>
>
> Alexander.
>
> On Wed, Oct 11, 2017 at 1:11 PM, <ian.john...@stfc.ac.uk> wrote:
>
>> Hi Gregory
>>
>> You’re right, when setting the object layout in libradosstriper, one
>> should set all three parameters (the number of stripes, the size of the
>> stripe unit, and the size of the striped object). The Ceph plugin for
>> GridFTP has an example of this at https://github.com/stfc/gridFT
>> PCephPlugin/blob/master/ceph_posix.cpp#L371
>>
>>
>>
>> At RAL, we use the following values:
>>
>>
>>
>> $STRIPER_NUM_STRIPES 1
>>
>> $STRIPER_STRIPE_UNIT 8388608
>>
>> $STRIPER_OBJECT_SIZE 67108864
>>
>>
>>
>> Regards,
>>
>>
>>
>> Ian Johnson MBCS
>>
>> Data Services Group
>>
>> Scientific Computing Department
>>
>> Rutherford Appleton Laboratory
>>
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Hi, Ian!

Thank you for your reference!

Could you comment on the following rule:
object_size = stripe_unit * stripe_count
Or it is not necessarily so?

I refer to page 8 in this report:

https://indico.cern.ch/event/531810/contributions/2298934/at
tachments/1358128/2053937/Ceph-Experience-at-RAL-final.pdf


Alexander.

On Wed, Oct 11, 2017 at 1:11 PM,  wrote:

> Hi Gregory
>
> You’re right, when setting the object layout in libradosstriper, one
> should set all three parameters (the number of stripes, the size of the
> stripe unit, and the size of the striped object). The Ceph plugin for
> GridFTP has an example of this at https://github.com/stfc/gridFT
> PCephPlugin/blob/master/ceph_posix.cpp#L371
>
>
>
> At RAL, we use the following values:
>
>
>
> $STRIPER_NUM_STRIPES 1
>
> $STRIPER_STRIPE_UNIT 8388608
>
> $STRIPER_OBJECT_SIZE 67108864
>
>
>
> Regards,
>
>
>
> Ian Johnson MBCS
>
> Data Services Group
>
> Scientific Computing Department
>
> Rutherford Appleton Laboratory
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-11 Thread Alexander Kushnirenko
Hi, Gregory!

You are absolutely right! Thanks!

The following sequence solves the problem:
rados_striper_set_object_layout_stripe_unit(m_striper, stripe_unit);
rados_striper_set_object_layout_stripe_count(m_striper, stripe_count);
int stripe_size = stripe_unit * stripe_count;
rados_striper_set_object_layout_object_size(m_striper, stripe_size);

Now there is very little in the documentation about meaning of above
parameters.  The only document I found is CERN IT group presentation (page
8).  Perhaps it is obvious.  Also it seems that optimizing these parameters
is meaningful in large scale Ceph installations only.

https://indico.cern.ch/event/531810/contributions/2298934/
attachments/1358128/2053937/Ceph-Experience-at-RAL-final.pdf

Now if I have a 6TB disks, then in default installation there would be
6TB/4MB = 1.5M objects per OSD.  Does it create any performance hit?

Thank you,
Alexander

On Tue, Oct 10, 2017 at 12:38 AM, Gregory Farnum <gfar...@redhat.com> wrote:

> Well, just from a quick skim, libradosstriper.h has a function
> rados_striper_set_object_layout_object_size(rados_striper_t striper,
> unsigned int object_size)
> and libradosstriper.hpp has one in RadosStriper
> set_object_layout_object_size(unsigned int object_size);
>
> So I imagine you specify it with those the same way you've set the stripe
> unit and counts.
>
> On Sat, Oct 7, 2017 at 12:38 PM Alexander Kushnirenko <
> kushnire...@gmail.com> wrote:
>
>> Hi, Gregory!
>>
>> It turns out that this error is internal CEPH feature. I wrote standalone
>> program to create 132M object in striper mode. It works only for 4M
>> stripe.  If you set stripe_unit = 2M it still creates 4M stripe_unit.
>> Anything bigger than 4M causes crash here
>> <https://github.com/ceph/ceph/blob/master/src/osdc/Striper.cc#L64%5C>:
>>
>>
>> __u32 object_size = layout->object_size;
>>   __u32 su = layout->stripe_unit;
>>   __u32 stripe_count = layout->stripe_count;
>>   assert(object_size >= su);   <
>>
>> I'm curious where it gets layout->object_size for object that is just
>> been created.
>>
>> As I understod striper mode was created by CERN guys.  In there document
>> <https://indico.cern.ch/event/542464/contributions/2202295/attachments/1289543/1919853/HEPUsageOfCeph.pdf>
>> they recommend 8M stripe_unit.  But it does not work in luminous.
>>
>> Created I/O context.
>> Connected to pool backup with rados_striper_create
>> Stripe unit OK 8388608
>> Stripe count OK 1
>> /build/ceph-12.2.0/src/osdc/Striper.cc: In function 'static void
>> Striper::file_to_extents(CephContext*, const char*, const
>> file_layout_t*, uint64_t, uint64_t, uint64_t, std::map<object_t,
>> std::vector >&, uint64_t)' thread 7f13bd5c1e00 time
>> 2017-10-07 21:44:58.654778
>> /build/ceph-12.2.0/src/osdc/Striper.cc: 64: FAILED assert(object_size >=
>> su)
>>  ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
>> (rc)
>>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x102) [0x7f13b3f3b332]
>>  2: (Striper::file_to_extents(CephContext*, char const*, file_layout_t
>> const*, unsigned long, unsigned long, unsigned long, std::map<object_t,
>> std::vector<ObjectExtent, std::allocator >,
>> std::less, std::allocator<std::pair> std::vector<ObjectExtent, std::allocator > > > >&, unsigned
>> long)+0x1e1e) [0x7f13bce235ee]
>>  3: (Striper::file_to_extents(CephContext*, char const*, file_layout_t
>> const*, unsigned long, unsigned long, unsigned long,
>> std::vector<ObjectExtent, std::allocator >&, unsigned
>> long)+0x51) [0x7f13bce23691]
>>  4: (libradosstriper::RadosStriperImpl::internal_
>> aio_write(std::__cxx11::basic_string<char, std::char_traits,
>> std::allocator > const&, 
>> boost::intrusive_ptr,
>> ceph::buffer::list const&, unsigned long, unsigned long, ceph_file_layout
>> const&)+0x224) [0x7f13bcda4184]
>>  5: (libradosstriper::RadosStriperImpl::write_in_
>> open_object(std::__cxx11::basic_string<char, std::char_traits,
>> std::allocator > const&, ceph_file_layout const&,
>> std::__cxx11::basic_string<char, std::char_traits,
>> std::allocator > const&, ceph::buffer::list const&, unsigned long,
>> unsigned long)+0x13c) [0x7f13bcda476c]
>>  6: 
>> (libradosstriper::RadosStriperImpl::write(std::__cxx11::basic_string<char,
>> std::char_traits, std::allocator > const&, ceph::buffer::list
>> const&, unsigned long, unsigned long)+0xd5) [0x7f13bcda4bd5]
>>  7: (rados_

[ceph-users] advice on number of objects per OSD

2017-10-10 Thread Alexander Kushnirenko
Hi,

Are there any recommendations on what is the limit when osd performance
start to decline because of large number of objects? Or perhaps a procedure
on how to find this number (luminous)?  My understanding is that the
recommended object size is 10-100 MB, but is there any performance hit due
to large number of objects?  I ran across a number of about 1M objects, is
that so?  We do not have special SSD for journal and use librados for I/O.

Alexander.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-10-07 Thread Alexander Kushnirenko
Hi, Gregory!

It turns out that this error is internal CEPH feature. I wrote standalone
program to create 132M object in striper mode. It works only for 4M
stripe.  If you set stripe_unit = 2M it still creates 4M stripe_unit.
Anything bigger than 4M causes crash here
<https://github.com/ceph/ceph/blob/master/src/osdc/Striper.cc#L64\>:


__u32 object_size = layout->object_size;
  __u32 su = layout->stripe_unit;
  __u32 stripe_count = layout->stripe_count;
  assert(object_size >= su);   <

I'm curious where it gets layout->object_size for object that is just been
created.

As I understod striper mode was created by CERN guys.  In there document
<https://indico.cern.ch/event/542464/contributions/2202295/attachments/1289543/1919853/HEPUsageOfCeph.pdf>
they recommend 8M stripe_unit.  But it does not work in luminous.

Created I/O context.
Connected to pool backup with rados_striper_create
Stripe unit OK 8388608
Stripe count OK 1
/build/ceph-12.2.0/src/osdc/Striper.cc: In function 'static void
Striper::file_to_extents(CephContext*, const char*, const file_layout_t*,
uint64_t, uint64_t, uint64_t, std::map<object_t, std::vector
>&, uint64_t)' thread 7f13bd5c1e00 time 2017-10-07 21:44:58.654778
/build/ceph-12.2.0/src/osdc/Striper.cc: 64: FAILED assert(object_size >= su)
 ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous
(rc)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x7f13b3f3b332]
 2: (Striper::file_to_extents(CephContext*, char const*, file_layout_t
const*, unsigned long, unsigned long, unsigned long, std::map<object_t,
std::vector<ObjectExtent, std::allocator >,
std::less, std::allocator<std::pair > > > >&, unsigned
long)+0x1e1e) [0x7f13bce235ee]
 3: (Striper::file_to_extents(CephContext*, char const*, file_layout_t
const*, unsigned long, unsigned long, unsigned long,
std::vector<ObjectExtent, std::allocator >&, unsigned
long)+0x51) [0x7f13bce23691]
 4:
(libradosstriper::RadosStriperImpl::internal_aio_write(std::__cxx11::basic_string<char,
std::char_traits, std::allocator > const&,
boost::intrusive_ptr,
ceph::buffer::list const&, unsigned long, unsigned long, ceph_file_layout
const&)+0x224) [0x7f13bcda4184]
 5:
(libradosstriper::RadosStriperImpl::write_in_open_object(std::__cxx11::basic_string<char,
std::char_traits, std::allocator > const&, ceph_file_layout
const&, std::__cxx11::basic_string<char, std::char_traits,
std::allocator > const&, ceph::buffer::list const&, unsigned long,
unsigned long)+0x13c) [0x7f13bcda476c]
 6:
(libradosstriper::RadosStriperImpl::write(std::__cxx11::basic_string<char,
std::char_traits, std::allocator > const&, ceph::buffer::list
const&, unsigned long, unsigned long)+0xd5) [0x7f13bcda4bd5]
 7: (rados_striper_write()+0xdb) [0x7f13bcd9ba0b]
 8: (()+0x10fb) [0x55dd87b410fb]
 9: (__libc_start_main()+0xf1) [0x7f13bc9d72b1]
 10: (()+0xbca) [0x55dd87b40bca]


On Fri, Sep 29, 2017 at 11:46 PM, Gregory Farnum <gfar...@redhat.com> wrote:

> I haven't used the striper, but it appears to make you specify sizes,
> stripe units, and stripe counts. I would expect you need to make sure that
> the size is an integer multiple of the stripe unit. And it probably
> defaults to a 4MB object if you don't specify one?
>
> On Fri, Sep 29, 2017 at 2:09 AM Alexander Kushnirenko <
> kushnire...@gmail.com> wrote:
>
>> Hi,
>>
>> I'm trying to use CEPH-12.2.0 as storage for with Bareos-16.2.4 backup
>> with libradosstriper1 support.
>>
>> Libradosstriber was suggested on this list to solve the problem, that
>> current CEPH-12 discourages users from using object with very big size
>> (>128MB).  Bareos treat Rados Object as Volume and in CEPH-10 it created
>> objects with very big size (10G and more).  CEPH-10 allowed such behaviour,
>> put recovery indeed take very long time. So stripping objects seems to be
>> the right thing to do.
>>
>> Bareos supports libradosstriper and the code seems to work. But for some
>> reason it run only with stripe_unit=4194304, which seems to be typical
>> value for RadosGW for example.  I tried several other values for
>> stripe_unit, but the code exit with error.
>>
>> Is there a particular reason why only 4M size works?  Can one use some
>> CLI to test different stripe sizes?
>>
>> Basic flow of creating object in Bareos is the following:
>> rados_ioctx_create(m_cluster, m_rados_poolname, _ctx);
>> rados_striper_create(m_ctx, _striper);
>> rados_striper_set_object_layout_stripe_unit(m_striper, m_stripe_unit);
>> rados_striper_set_object_layout_stripe_count(m_striper, m_stripe_count);
>> .
>> status = rados_striper_write(m_striper, m_virtual_filename, buffer,
>>

[ceph-users] How to use rados_aio_write correctly?

2017-10-03 Thread Alexander Kushnirenko
Hello,

I'm working on third party code (Bareos Storage daemon) which gives very
low write speeds for CEPH.  The code was written to demonstrate that it is
possible, but the speed is about 3-9 MB/s which is too slow.   I modified
the routine to use rados_aio_write instead of rados_write, and was able to
backup/restore data successfully with the speed about 30MB/s, which what I
would expect on 1GB/s network and rados bench results.  I studied examples
in the documents and github, but still I'm afraid that by code is working
merely by accident.  Could some one comment on the following questions:

Q1. Storage daemon sends write requests of 64K size, so current code works
like this:

rados_write(., buffer, len=64K, offset=0)
rados_write(., buffer, len=64K, offset=64K)
rados_write(., buffer, len=64K, offset=128K)
... and so on ...

What is the correct way to use AIO (to use one completion or several?)
Version 1:

rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_write(., comp, buffer, len=64K, offset=0)
rados_aio_write(., comp, buffer, len=64K, offset=64K)
rados_aio_write(., comp, buffer, len=64K, offset=128K)
rados_aio_wait_for_complete(comp);// wait for Async IO in memory
rados_aio_wait_for_safe(comp);// and on disk
rados_aio_release(comp);

Version 2:
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_create_completion(NULL, NULL, NULL, );
rados_aio_write(., comp1, buffer, len=64K, offset=0)
rados_aio_write(., comp2, buffer, len=64K, offset=64K)
rados_aio_write(., comp3, buffer, len=64K, offset=128K)
rados_aio_wait_for_complete(comp1);
rados_aio_wait_for_complete(comp2);
rados_aio_wait_for_complete(comp3);
rados_aio_write(., comp1, buffer, len=64K, offset=192K)
rados_aio_write(., comp2, buffer, len=64K, offset=256K)
rados_aio_write(., comp3, buffer, len=64K, offset=320K)
.

Q2.  Problem of maximum object size.  When I use rados_write I get an error
when I exceed maximum object size (132MB in luminous).  But when I use
rados_aio_write it happily goes beyond the limit of object, but actually
writes nothing, but does not make any error.  Is there a way to catch such
situation?

Alexander
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] rados_read versus rados_aio_read performance

2017-10-01 Thread Alexander Kushnirenko
Hi, Gregory!

Thanks for the comment.  I compiled simple program to play with write speed
measurements (from librados examples). Underline "write" functions are:
rados_write(io, "hw", read_res, 1048576, i*1048576);
rados_aio_write(io, "foo", comp, read_res, 1048576, i*1048576);

So I consecutively put 1MB blocks on CEPH.   What I measured is that
rados_aio_write gives me about 5 times the speed of rados_write.  I make
128 consecutive writes in for loop to create object of maximum allowed size
of 132MB.

Now if I do consecutive write from some client into CEPH storage, then what
is the recommended buffer size? (I'm trying to debug very poor Bareos write
speed of just 3MB/s to CEPH)

Thank you,
Alexander

On Fri, Sep 29, 2017 at 5:18 PM, Gregory Farnum <gfar...@redhat.com> wrote:

> It sounds like you are doing synchronous reads of small objects here. In
> that case you are dominated by the per-op already rather than the
> throughout of your cluster. Using aio or multiple threads will let you
> parallelism requests.
> -Greg
> On Fri, Sep 29, 2017 at 3:33 AM Alexander Kushnirenko <
> kushnire...@gmail.com> wrote:
>
>> Hello,
>>
>> We see very poor performance when reading/writing rados objects.  The
>> speed is only 3-4MB/sec, compared to 95MB rados benchmarking.
>>
>> When you look on underline code it uses librados and linradosstripper
>> libraries (both have poor performance) and the code uses rados_read and
>> rados_write functions.  If you look on examples they recommend
>> rados_aio_read/write.
>>
>> Could this be the reason for poor performance?
>>
>> Thank you,
>> Alexander.
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rados_read versus rados_aio_read performance

2017-09-29 Thread Alexander Kushnirenko
Hello,

We see very poor performance when reading/writing rados objects.  The speed
is only 3-4MB/sec, compared to 95MB rados benchmarking.

When you look on underline code it uses librados and linradosstripper
libraries (both have poor performance) and the code uses rados_read and
rados_write functions.  If you look on examples they recommend
rados_aio_read/write.

Could this be the reason for poor performance?

Thank you,
Alexander.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Bareos and libradosstriper works only for 4M sripe_unit size

2017-09-29 Thread Alexander Kushnirenko
Hi,

I'm trying to use CEPH-12.2.0 as storage for with Bareos-16.2.4 backup with
libradosstriper1 support.

Libradosstriber was suggested on this list to solve the problem, that
current CEPH-12 discourages users from using object with very big size
(>128MB).  Bareos treat Rados Object as Volume and in CEPH-10 it created
objects with very big size (10G and more).  CEPH-10 allowed such behaviour,
put recovery indeed take very long time. So stripping objects seems to be
the right thing to do.

Bareos supports libradosstriper and the code seems to work. But for some
reason it run only with stripe_unit=4194304, which seems to be typical
value for RadosGW for example.  I tried several other values for
stripe_unit, but the code exit with error.

Is there a particular reason why only 4M size works?  Can one use some CLI
to test different stripe sizes?

Basic flow of creating object in Bareos is the following:
rados_ioctx_create(m_cluster, m_rados_poolname, _ctx);
rados_striper_create(m_ctx, _striper);
rados_striper_set_object_layout_stripe_unit(m_striper, m_stripe_unit);
rados_striper_set_object_layout_stripe_count(m_striper, m_stripe_count);
.
status = rados_striper_write(m_striper, m_virtual_filename, buffer, count,
offset);

Alexander
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] osd crashes with large object size (>10GB) in luminos Rados

2017-09-26 Thread Alexander Kushnirenko
Nick,

Thanks, I will look into the latest bareos version.  They did mention
libradosstriper on github.

There is another question.  On jewel I have 25GB size objects.  Once I
upgrade to luminous those objects will be "out of bounds".
1. Will OSD start and Will I be able to read them?
2. Will they chop themselves into little pieces automatically or do I need
to get -- put_back them?

Thank you,
Alexander



On Tue, Sep 26, 2017 at 4:29 PM, Nick Fisk <n...@fisk.me.uk> wrote:

> Bareos needs to be re-written to use libradosstriper or it should
> internally shard the data across multiple objects. Objects shouldn’t be
> stored as large as that and performance will also suffer.
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Alexander Kushnirenko
> *Sent:* 26 September 2017 13:50
> *To:* ceph-users@lists.ceph.com
> *Subject:* [ceph-users] osd crashes with large object size (>10GB) in
> luminos Rados
>
>
>
> Hello,
>
>
>
> We successfully use rados to store backup volumes in jewel version of
> CEPH. Typical volume size is 25-50GB.  Backup software (bareos) use Rados
> objects as backup volumes and it works fine.  Recently we tried luminous
> for the same purpose.
>
>
>
> In luminous developers reduced osd_max_object_size from 100G to 128M.  As
> I understood for the performance reasons.  But it broke down interaction
> with bareos backup software.  You can reverse osd_max_object_size to 100G,
> but then the OSD start to crash once you start to put objects of about 4GB
> in size (4,294,951,051).
>
>
>
> Any suggestion how to approach this problem?
>
>
>
> Alexander.
>
>
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]: 
> /build/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)' thread 7f04ac2f9700 time 2017-09-26
> 15:12:58.230268
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]: 
> /build/ceph-12.2.0/src/os/bluestore/BlueStore.cc:
> 9282: FAILED assert(0 == "unexpected error")
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]: 2017-09-26 15:12:58.229837
> 7f04ac2f9700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _txc_add_transaction
> error (7) Argument list too long not handled on operation 10 (op 1,
> counting from 0)
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]: 2017-09-26 15:12:58.229869
> 7f04ac2f9700 -1 bluestore(/var/lib/ceph/osd/ceph-0) unexpected error code
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  ceph version 12.2.0
> (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  1: (ceph::__ceph_assert_fail(char
> const*, char const*, int, char const*)+0x102) [0x563c7b5f83a2]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  2: 
> (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)+0x15fa) [0x563c7b4ac2ba]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  3: 
> (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction, std::allocator
> >&, boost::intrusive_ptr, ThreadPool::TPHandle*)+0x536)
> [0x563c7b4ad916]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  4: (PrimaryLogPG::queue_transacti
> ons(std::vector<ObjectStore::Transaction, 
> std::allocator
> >&, boost::intrusive_ptr)+0x66) [0x563c7b1d17f6]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  5: 
> (ReplicatedBackend::submit_transaction(hobject_t
> const&, object_stat_sum_t const&, eversion_t const&,
> std::unique_ptr<PGTransaction, std::default_delete >&&,
> eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t,
> std::allocator > const&, 
> boost::optional&,
> Context*, Context*, Context*, unsigned long, osd_reqid_t,
> boost::intrusive_ptr)+0xcbf) [0x563c7b30436f]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  6: 
> (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
> PrimaryLogPG::OpContext*)+0x9fa) [0x563c7b16d68a]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  7: 
> (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x131d)
> [0x563c7b1b7a5d]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  8: (PrimaryLogPG::do_op(boost::in
> trusive_ptr&)+0x2ece) [0x563c7b1bb26e]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  9: 
> (PrimaryLogPG::do_request(boost::intrusive_ptr&,
> ThreadPool::TPHandle&)+0xea6) [0x563c7b175446]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  10: 
> (OSD::dequeue_op(boost::intrusive_ptr,
> boost::intrusive_ptr, ThreadPool::TPHandle&)+0x3ab)
> [0x563c7aff919b]
>
> Sep 26 15:12:58 ceph02 ceph-osd[1417]:  11: (PGQueueable::RunVis::operator
> ()(boost::intrusive_ptr const&)+0x5a) [0

[ceph-users] osd crashes with large object size (>10GB) in luminos Rados

2017-09-26 Thread Alexander Kushnirenko
Hello,

We successfully use rados to store backup volumes in jewel version of CEPH.
Typical volume size is 25-50GB.  Backup software (bareos) use Rados objects
as backup volumes and it works fine.  Recently we tried luminous for the
same purpose.

In luminous developers reduced osd_max_object_size from 100G to 128M.  As I
understood for the performance reasons.  But it broke down interaction with
bareos backup software.  You can reverse osd_max_object_size to 100G, but
then the OSD start to crash once you start to put objects of about 4GB in
size (4,294,951,051).

Any suggestion how to approach this problem?

Alexander.

Sep 26 15:12:58 ceph02 ceph-osd[1417]:
/build/ceph-12.2.0/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7f04ac2f9700 time 2017-09-26
15:12:58.230268
Sep 26 15:12:58 ceph02 ceph-osd[1417]:
/build/ceph-12.2.0/src/os/bluestore/BlueStore.cc: 9282: FAILED assert(0 ==
"unexpected error")
Sep 26 15:12:58 ceph02 ceph-osd[1417]: 2017-09-26 15:12:58.229837
7f04ac2f9700 -1 bluestore(/var/lib/ceph/osd/ceph-0) _txc_add_transaction
error (7) Argument list too long not handled on operation 10 (op 1,
counting from 0)
Sep 26 15:12:58 ceph02 ceph-osd[1417]: 2017-09-26 15:12:58.229869
7f04ac2f9700 -1 bluestore(/var/lib/ceph/osd/ceph-0) unexpected error code
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  ceph version 12.2.0
(32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  1: (ceph::__ceph_assert_fail(char
const*, char const*, int, char const*)+0x102) [0x563c7b5f83a2]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  2:
(BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x15fa) [0x563c7b4ac2ba]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  3:
(BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector&,
boost::intrusive_ptr, ThreadPool::TPHandle*)+0x536)
[0x563c7b4ad916]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  4:
(PrimaryLogPG::queue_transactions(std::vector&,
boost::intrusive_ptr)+0x66) [0x563c7b1d17f6]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  5:
(ReplicatedBackend::submit_transaction(hobject_t const&, object_stat_sum_t
const&, eversion_t const&, std::unique_ptr&&, eversion_t const&, eversion_t
const&, std::vector
const&, boost::optional&, Context*, Context*,
Context*, unsigned long, osd_reqid_t,
boost::intrusive_ptr)+0xcbf) [0x563c7b30436f]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  6:
(PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
PrimaryLogPG::OpContext*)+0x9fa) [0x563c7b16d68a]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  7:
(PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0x131d)
[0x563c7b1b7a5d]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  8:
(PrimaryLogPG::do_op(boost::intrusive_ptr&)+0x2ece)
[0x563c7b1bb26e]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  9:
(PrimaryLogPG::do_request(boost::intrusive_ptr&,
ThreadPool::TPHandle&)+0xea6) [0x563c7b175446]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  10:
(OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr,
ThreadPool::TPHandle&)+0x3ab) [0x563c7aff919b]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  11:
(PGQueueable::RunVis::operator()(boost::intrusive_ptr
const&)+0x5a) [0x563c7b29154a]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  12:
(OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x103d) [0x563c7b01fd9d]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  13:
(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x8ef)
[0x563c7b5fd20f]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  14:
(ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x563c7b600510]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  15: (()+0x7494) [0x7f04c56e2494]
Sep 26 15:12:58 ceph02 ceph-osd[1417]:  16: (clone()+0x3f) [0x7f04c4769aff]
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com