[ceph-users] Ceph mon quorum problems under load

2018-07-06 Thread Marcus Haarmann
621 mon.ariel1 mon.0 192.168.16.31:6789/0 33890 : 
cluster [INF] mon.ariel1 is new leader, mons ariel1,ariel2,ariel4 in quorum 
(ranks 0,1,2) 
2018-07-06 02:50:03.642986 mon.ariel1 mon.0 192.168.16.31:6789/0 33895 : 
cluster [INF] overall HEALTH_OK 
2018-07-06 02:50:46.757619 mon.ariel1 mon.0 192.168.16.31:6789/0 33899 : 
cluster [INF] mon.ariel1 calling monitor election 
2018-07-06 02:50:46.920468 mon.ariel1 mon.0 192.168.16.31:6789/0 33900 : 
cluster [INF] mon.ariel1 is new leader, mons ariel1,ariel2,ariel4 in quorum 
(ranks 0,1,2) 
2018-07-06 02:50:47.104222 mon.ariel1 mon.0 192.168.16.31:6789/0 33905 : 
cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum 
ariel2,ariel4) 
2018-07-06 02:50:47.104240 mon.ariel1 mon.0 192.168.16.31:6789/0 33906 : 
cluster [INF] Cluster is now healthy 
2018-07-06 02:50:47.256301 mon.ariel1 mon.0 192.168.16.31:6789/0 33907 : 
cluster [INF] overall HEALTH_OK 


There seems to be some disturbance of mon traffic. 
Since the mons are communicating via a 10GBit interface, I would not assume a 
problem here. 
There are no errors logged either on the network interfaces or on the switches. 

Maybe the disks are too slow (osds are on SATA), so we are thinking about 
putting the bluestore journal on an SSD. 
But would that action help to stabilize the mons ? 
Or would a setup with 5 machines (5 mons running) be the better choice ? 

So we are a little stuck where to search for a solution. 
What debug output would help to see whether we have a disk or network problem 
here ? 

Thankx for your input ! 

Marcus Haarmann 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Help change civetweb front port error: Permission denied

2017-09-18 Thread Marcus Haarmann
Ceph is running as non-root user, so normally it is not permitted 
to listen to a port < 1024 for non-root users. 
This is not specific to ceph. 

You could trick a listener on port 80 with a redirect via iptables or you might 
proxy the connection 
through an apache/nginx instance. 

Marcus Haarmann 


Von: "谭林江" <tanlinji...@icloud.com> 
An: "ceph-users" <ceph-users@lists.ceph.com> 
Gesendet: Montag, 18. September 2017 10:32:53 
Betreff: [ceph-users] Help change civetweb front port error: Permission denied 

Hi 


I create a gateway node and change it port is rgw_frontends = "civetweb 
port=80”, when run it response error: 

2017-09-18 04:25:16.967378 7f2dd72e08c0 0 ceph version 9.2.1 
(752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd), process radosgw, pid 3151 
2017-09-18 04:25:17.025703 7f2dd72e08c0 0 framework: civetweb 
2017-09-18 04:25:17.025712 7f2dd72e08c0 0 framework conf key: port, val: 80 
2017-09-18 04:25:17.025716 7f2dd72e08c0 0 starting handler: civetweb 
2017-09-18 04:25:17.025943 7f2dd72e08c0 0 civetweb: 0x55ac3b9bab20: 
set_ports_option: cannot bind to 80: 13 (Permission denied) 
2017-09-18 04:25:17.032177 7f2db4ff9700 -1 failed to list objects pool_iterate 
returned r=-2 
2017-09-18 04:25:17.032183 7f2db4ff9700 0 ERROR: lists_keys_next(): ret=-2 
2017-09-18 04:25:17.032186 7f2db4ff9700 0 ERROR: sync_all_users() returned 
ret=-2 



___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one dir more than 100000 files, mds_bal_fragment_size_max = 5000000

2017-09-07 Thread Marcus Haarmann
Its a feature ... 

http://docs.ceph.com/docs/master/cephfs/dirfrags/ 
https://www.spinics.net/lists/ceph-users/msg31473.html 

Marcus Haarmann 


Von: donglifec...@gmail.com 
An: "zyan" <z...@redhat.com> 
CC: "ceph-users" <ceph-users@lists.ceph.com> 
Gesendet: Freitag, 8. September 2017 07:30:53 
Betreff: [ceph-users] cephfs(Kraken 11.2.1), Unable to write more file when one 
dir more than 10 files, mds_bal_fragment_size_max = 500 

ZhengYan, 

I test cephfs( Kraken 11.2.1), I can't write more files when one dir more than 
10 files, I have already set up " mds_bal_fragment_size_max = 500". 

why is this case? Is it a bug? 

Thanks a lot. 


donglifec...@gmail.com 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] luminous/bluetsore osd memory requirements

2017-08-10 Thread Marcus Haarmann
Hi, 

we have done some testing with bluestore and found that the memory consumption 
of the osd 
processes is depending not on the real data amount stored but on the number of 
stored 
objects. 
This means that e.g. a block device of 100 GB which spreads over 100 objects 
has a different 
memory usage than storing 1000 smaller objects (the bluestore blocksize 
should be tuned for 
that kind of setup). (100 objects of size 4k to 100k had a memory 
consumption of ~4GB on the osd 
on standard block size, while the amount of data was only ~15GB). 
So it depends on the usage, a cephfs stores each file as a single object, while 
the rbd is configured 
to allocate larger objects. 

Marcus Haarmann 


Von: "Stijn De Weirdt" <stijn.dewei...@ugent.be> 
An: "ceph-users" <ceph-users@lists.ceph.com> 
Gesendet: Donnerstag, 10. August 2017 10:34:48 
Betreff: [ceph-users] luminous/bluetsore osd memory requirements 

hi all, 

we are planning to purchse new OSD hardware, and we are wondering if for 
upcoming luminous with bluestore OSDs, anything wrt the hardware 
recommendations from 
http://docs.ceph.com/docs/master/start/hardware-recommendations/ 
will be different, esp the memory/cpu part. i understand from colleagues 
that the async messenger makes a big difference in memory usage (maybe 
also cpu load?); but we are also interested in the "1GB of RAM per TB" 
recommendation/requirement. 

many thanks, 

stijn 
___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] CEPH bluestore space consumption with small objects

2017-08-08 Thread Marcus Haarmann
Hi, 

I can check if this would change anything, but we are currently trying to find 
a different solution. 
The issue we ran into by using rados as backend with a bluestore osd was that 
every object seems 
to be cached in the osd and the memory consumption of the osd was increasing 
very much. 
This is not very useful for us, since the objects need to be accessed rarely 
and have a very long 
period of time to exist. 
So we are checking out now rdb with a database on top or a filesystem on top, 
which will handle 
the huge amount of small objects. This will have the drawback that a filesystem 
or a database 
could become inconsistent easier than a rados-only approach. 
Even cephfs was not the right approach since the space consumption would be the 
same as with 
rados directly. 

Thanks to everybody, 

Marcus Haarmann 


Von: "Pavel Shub" <pa...@citymaps.com> 
An: "Gregory Farnum" <gfar...@redhat.com> 
CC: "Wido den Hollander" <w...@42on.com>, "ceph-users" 
<ceph-users@lists.ceph.com>, "Marcus Haarmann" <marcus.haarm...@midoco.de> 
Gesendet: Dienstag, 8. August 2017 17:50:44 
Betreff: Re: [ceph-users] CEPH bluestore space consumption with small objects 

Marcus, 

You may want to look at the bluestore_min_alloc_size setting as well 
as the respective bluestore_min_alloc_size_ssd and 
bluestore_min_alloc_size_hdd. By default bluestore sets a 64k block 
size for ssds. I'm also using ceph for small objects and I've see my 
OSD usage go down from 80% to 20% after setting the min alloc size to 
4k. 

Thanks, 
Pavel 

On Thu, Aug 3, 2017 at 3:59 PM, Gregory Farnum <gfar...@redhat.com> wrote: 
> Don't forget that at those sizes the internal journals and rocksdb size 
> tunings are likely to be a significant fixed cost. 
> 
> On Thu, Aug 3, 2017 at 3:13 AM Wido den Hollander <w...@42on.com> wrote: 
>> 
>> 
>> > Op 2 augustus 2017 om 17:55 schreef Marcus Haarmann 
>> > <marcus.haarm...@midoco.de>: 
>> > 
>> > 
>> > Hi, 
>> > we are doing some tests here with a Kraken setup using bluestore backend 
>> > (on Ubuntu 64 bit). 
>> > We are trying to store > 10 mio very small objects using RADOS. 
>> > (no fs, no rdb, only osd and monitors) 
>> > 
>> > The setup was done with ceph-deploy, using the standard bluestore 
>> > option, no separate devices 
>> > for wal. The test cluster spreads over 3 virtual machines, each with 
>> > 100GB storage für osd. 
>> > 
>> > We are now in the following situation (used pool is "test"): 
>> > rados df 
>> > POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED 
>> > RD_OPS RD WR_OPS WR 
>> > rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k 
>> > test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M 
>> > 
>> > total_objects 595429 
>> > total_used 141G 
>> > total_avail 158G 
>> > total_space 299G 
>> > 
>> > ceph osd df 
>> > ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 
>> > 0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72 
>> > 1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72 
>> > 2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72 
>> > TOTAL 299G 148G 150G 49.65 
>> > MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 
>> > 
>> > As you can see, there are about 18GB data stored in ~595000 objects now. 
>> > The actual space consumption is about 150GB, which fills about half of 
>> > the storage. 
>> > 
>> 
>> Not really. Each OSD uses 50GB, but since you replicate 3 times (default) 
>> it's storing 150GB spread out over 3 OSDs. 
>> 
>> So your data is 18GB, but consumes 50GB. That's still ~2.5x which is a 
>> lot, but a lot less then 150GB. 
>> 
>> > Objects have been added with a test script using the rados command line 
>> > (put). 
>> > 
>> > Obviously, the stored objects are counted byte by byte in the rados df 
>> > command, 
>> > but the real space allocation is about factor 8. 
>> > 
>> 
>> As written above, it's ~2.5x, not 8x. 
>> 
>> > The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. 
>> > 
>> > Is there any recommended way to configure bluestore with a better 
>> > suitable 
>> > block size for those small objects ? I cannot find any configuration 
>> > option 
>> > which would allow modification of the internal block handling of 
>> > bluestore. 
>> > Is luminous an option which allows more specific configuration ? 
>> > 
>> 
>> Could

[ceph-users] CEPH bluestore space consumption with small objects

2017-08-02 Thread Marcus Haarmann
Hi, 
we are doing some tests here with a Kraken setup using bluestore backend (on 
Ubuntu 64 bit). 
We are trying to store > 10 mio very small objects using RADOS. 
(no fs, no rdb, only osd and monitors) 

The setup was done with ceph-deploy, using the standard bluestore option, no 
separate devices 
for wal. The test cluster spreads over 3 virtual machines, each with 100GB 
storage für osd. 

We are now in the following situation (used pool is "test"): 
rados df 
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRAED RD_OPS 
RD WR_OPS WR 
rbd 0 2 0 6 0 0 0 49452 39618k 855 12358k 
test 17983M 595427 0 1786281 0 0 0 29 77824 596426 17985M 

total_objects 595429 
total_used 141G 
total_avail 158G 
total_space 299G 

ceph osd df 
ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 
0 0.09760 1.0 102298M 50763M 51535M 49.62 1.00 72 
1 0.09760 1.0 102298M 50799M 51499M 49.66 1.00 72 
2 0.09760 1.0 102298M 50814M 51484M 49.67 1.00 72 
TOTAL 299G 148G 150G 49.65 
MIN/MAX VAR: 1.00/1.00 STDDEV: 0.02 

As you can see, there are about 18GB data stored in ~595000 objects now. 
The actual space consumption is about 150GB, which fills about half of the 
storage. 

Objects have been added with a test script using the rados command line (put). 

Obviously, the stored objects are counted byte by byte in the rados df command, 
but the real space allocation is about factor 8. 

The stored objects are a mixture of 2kb, 10kb, 50kb, 100kb objects. 

Is there any recommended way to configure bluestore with a better suitable 
block size for those small objects ? I cannot find any configuration option 
which would allow modification of the internal block handling of bluestore. 
Is luminous an option which allows more specific configuration ? 

Thank you all in advance for support. 

Marcus Haarmann 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com