Re: [ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-07 Thread Félix Barbeira
Hi Manuel,

Yes, I already tried that option but the result it's extremely noisy and
not usable due to lack of some fields, besides that forget to parse those
logs in order to print some stats. Also, I'm not sure if this is a good
hint to rgw performance.

I think I'm going to stick with nginx and made some tests.

Thanks anyway! :)

El mar., 6 ago. 2019 a las 18:06, EDH - Manuel Rios Fernandez (<
mrios...@easydatahost.com>) escribió:

> Hi Felix,
>
>
>
> You can increase debug option with debug rgw in your rgw nodes.
>
>
>
> We got it to 10.
>
>
>
> But at least in our case we switched again to civetweb because it don’t
> provide a clear log without a lot verbose.
>
>
>
> Regards
>
>
>
> Manuel
>
>
>
>
>
> *De:* ceph-users  *En nombre de *Félix
> Barbeira
> *Enviado el:* martes, 6 de agosto de 2019 17:43
> *Para:* Ceph Users 
> *Asunto:* [ceph-users] radosgw (beast): how to enable verbose log?
> request, user-agent, etc.
>
>
>
> Hi,
>
>
>
> I'm testing radosgw with beast backend and I did not found a way to view
> more information on logfile. This is an example:
>
>
>
> 2019-08-06 16:59:14.488 7fc808234700  1 == starting new request
> req=0x5608245646f0 =
> 2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0
> op status=0 http_status=204 latency=0.00800043s ==
>
>
>
> I would be interested on typical fields that a regular webserver has:
> origin, request, useragent, etc. I checked the official docs but I don't
> find anything related:
>
>
>
> https://docs.ceph.com/docs/nautilus/radosgw/frontends/
> <https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3>
>
>
>
> The only manner I found is to put in front a nginx server running as a
> proxy or an haproxy, but I really don't like that solution because it would
> be an overhead component used only to log requests. Anyone in the same
> situation?
>
>
>
> Thanks in advance.
>
> --
>
> Félix Barbeira.
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw (beast): how to enable verbose log? request, user-agent, etc.

2019-08-06 Thread Félix Barbeira
Hi,

I'm testing radosgw with beast backend and I did not found a way to view
more information on logfile. This is an example:

2019-08-06 16:59:14.488 7fc808234700  1 == starting new request
req=0x5608245646f0 =
2019-08-06 16:59:14.496 7fc808234700  1 == req done req=0x5608245646f0
op status=0 http_status=204 latency=0.00800043s ==

I would be interested on typical fields that a regular webserver has:
origin, request, useragent, etc. I checked the official docs but I don't
find anything related:

https://docs.ceph.com/docs/nautilus/radosgw/frontends/
<https://docs.ceph.com/docs/nautilus/radosgw/frontends/#id3>

The only manner I found is to put in front a nginx server running as a
proxy or an haproxy, but I really don't like that solution because it would
be an overhead component used only to log requests. Anyone in the same
situation?

Thanks in advance.
-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] bluestore block.db on SSD, where block.wal?

2019-06-05 Thread Félix Barbeira
http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#devices

"The BlueStore journal will always be placed on the fastest device
available, so using a DB device will provide the same benefit that the
WAL device
would while *also* allowing additional metadata to be stored there (if it
will fit)."

So I guess if you only specify block.db (on faster device), block.wal it
will go into that lvm/partition.

El dom., 2 jun. 2019 a las 18:43, M Ranga Swami Reddy ()
escribió:

> Hello - I planned to use the bluestore's block.db on SSD (and data is on
> HDD) with 4% of HDD size. Here I have not mentioned the block.wal..in this
> case where block.wal place?
> is it in HDD (ie data) or in block.db of SSD?
>
> Thanks
> Swami
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Fwd: Planning all flash cluster

2019-01-30 Thread Félix Barbeira
> Is there anything that obviously stands out as severely unbalanced? The
R720XD comes with a H710 - instead of putting them in RAID0, I'm thinking a
different HBA might be a better idea, any recommendations please?
> Don't know that HBA. Does it support pass through mode or HBA mode?

H710 card does not support pass-through. With a R720 I would recommend a
JBOD card for example LSI 9207-8i.
With Dell next generation servers (H730XD) they carry H730 wich already
have pass-through.

El mié., 20 jun. 2018 a las 15:00, Luis Periquito ()
escribió:

> adding back in the list :)
>
> -- Forwarded message -
> From: Luis Periquito 
> Date: Wed, Jun 20, 2018 at 1:54 PM
> Subject: Re: [ceph-users] Planning all flash cluster
> To: 
>
>
> On Wed, Jun 20, 2018 at 1:35 PM Nick A  wrote:
> >
> > Thank you, I was under the impression that 4GB RAM per 1TB was quite
> generous, or is that not the case with all flash clusters? What's the
> recommended RAM per OSD currently? Happy to throw more at it for a
> performance boost. The important thing is that I'd like all nodes to be
> absolutely identical.
> I'm doing 8G per OSD, though I use 1.9T SSDs.
>
> >
> > Based on replies so far, it looks like 5 nodes might be a better idea,
> maybe each with 14 OSD's (960GB SSD's)? Plenty of 16 slot 2U chassis around
> to make it a no brainer if that's what you'd recommend!
> I tend to add more nodes: 1U with 4-8 SSDs per chassis to start with,
> and using a single CPU with high frequency. For IOPS/latency cpu
> frequency is really important.
> I have started a cluster that only has 2 SSDs (which I share with the
> OS) for data, but has 8 nodes. Those servers can take up to 10 drives.
>
> I'm using the Fujitsu RX1330, believe Dell would be the R330, with a
> Intel E3-1230v6 cpu and 64G of ram, dual 10G and PSAS (passthrough
> controller).
>
> >
> > The H710 doesn't do JBOD or passthrough, hence looking for an
> alternative HBA. It would be nice to do the boot drives as hardware RAID 1
> though, so a card that can do both at the same time (like the H730 found
> R630's etc) would be ideal.
> >
> > Regards,
> > Nick
> >
> > On 20 June 2018 at 13:18, Luis Periquito  wrote:
> >>
> >> Adding more nodes from the beginning would probably be a good idea.
> >>
> >> On Wed, Jun 20, 2018 at 12:58 PM Nick A  wrote:
> >> >
> >> > Hello Everyone,
> >> >
> >> > We're planning a small cluster on a budget, and I'd like to request
> any feedback or tips.
> >> >
> >> > 3x Dell R720XD with:
> >> > 2x Xeon E5-2680v2 or very similar
> >> The CPUs look good and sufficiently fast for IOPS.
> >>
> >> > 96GB RAM
> >> 4GB per OSD looks a bit on the short side. Probably 192G would help.
> >>
> >> > 2x Samsung SM863 240GB boot/OS drives
> >> > 4x Samsung SM863 960GB OSD drives
> >> > Dual 40/56Gbit Infiniband using IPoIB.
> >> >
> >> > 3 replica, MON on OSD nodes, RBD only (no object or CephFS).
> >> >
> >> > We'll probably add another 2 OSD drives per month per node until full
> (24 SSD's per node), at which point, more nodes. We've got a few SM863's in
> production on other system and are seriously impressed with them, so would
> like to use them for Ceph too.
> >> >
> >> > We're hoping this is going to provide a decent amount of IOPS, 20k
> would be ideal. I'd like to avoid NVMe Journals unless it's going to make a
> truly massive difference. Same with carving up the SSD's, would rather not,
> and just keep it as simple as possible.
> >> I agree: those SSDs shouldn't really require a journal device. Not
> >> sure about the 20k IOPS specially without any further information.
> >> Doing 20k IOPS at 1kB block is totally different at 1MB block...
> >> >
> >> > Is there anything that obviously stands out as severely unbalanced?
> The R720XD comes with a H710 - instead of putting them in RAID0, I'm
> thinking a different HBA might be a better idea, any recommendations please?
> >> Don't know that HBA. Does it support pass through mode or HBA mode?
> >> >
> >> > Regards,
> >> > Nick
> >> > ___
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Mix hardware on object storage cluster

2019-01-27 Thread Félix Barbeira
Hi Cephers,

We are managing a cluster where all machines have the same hardware. The
cluster is used only for object storage. We are planning to increase nodes
number. Those new nodes have better hardware than the old ones. If we only
add those nodes as regular nodes to cluster we are not use the full power
right? what could be the best way to take advantage of this new and better
hardware?

After read the docs these are possible options:

- Change primary affinity:
- Cache tiering: I dont really like this comment on the docs "Cache tiering
will degrade performance for most workloads".
- Change osd weight: I think this is more oriented to disk space on every
node.

Do I have some other options?

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to reduce min_size of an EC pool?

2019-01-17 Thread Félix Barbeira
Ok, lesson learned the hard way. Thank goodness it was a test cluster.
Thanks a lot Bryan!

El jue., 17 ene. 2019 a las 21:46, Bryan Stillwell ()
escribió:

> When you use 3+2 EC that means you have 3 data chunks and 2 erasure chunks
> for your data.  So you can handle two failures, but not three.  The
> min_size setting is preventing you from going below 3 because that's the
> number of data chunks you specified for the pool.  I'm sorry to say this,
> but since the data was wiped off the other 3 nodes there isn't anything
> that can be done to recover it.
>
>
>
> Bryan
>
>
>
>
>
> *From: *ceph-users  on behalf of Félix
> Barbeira 
> *Date: *Thursday, January 17, 2019 at 1:27 PM
> *To: *Ceph Users 
> *Subject: *[ceph-users] How to reduce min_size of an EC pool?
>
>
>
> I want to bring back my cluster to HEALTHY state because right now I have
> not access to the data.
>
>
>
> I have an 3+2 EC pool on a 5 node cluster. 3 nodes were lost, all data
> wiped. They were reinstalled and added to cluster again.
>
>
>
> The "ceph health detail" command says to reduce min_size number to a value
> lower than 3, but:
>
>
>
> root@ceph-monitor02:~# ceph osd pool set default.rgw.buckets.data
> min_size 2
>
> Error EINVAL: pool min_size must be between 3 and 5
>
> root@ceph-monitor02:~#
>
>
>
> This is the situation:
>
>
>
> root@ceph-monitor01:~# ceph -s
>
>   cluster:
>
> id: ce78b02d-03df-4f9e-a35a-31b5f05c4c63
>
> health: HEALTH_WARN
>
> Reduced data availability: 515 pgs inactive, 512 pgs incomplete
>
>
>
>   services:
>
> mon: 3 daemons, quorum ceph-monitor01,ceph-monitor03,ceph-monitor02
>
> mgr: ceph-monitor02(active), standbys: ceph-monitor01, ceph-monitor03
>
> osd: 57 osds: 57 up, 57 in
>
>
>
>   data:
>
> pools:   8 pools, 568 pgs
>
> objects: 4.48 M objects, 10 TiB
>
> usage:   24 TiB used, 395 TiB / 419 TiB avail
>
> pgs: 0.528% pgs unknown
>
>  90.141% pgs not active
>
>  512 incomplete
>
>  53  active+clean
>
>  3   unknown
>
>
>
> root@ceph-monitor01:~#
>
>
>
> And this is the output of health detail:
>
>
>
> root@ceph-monitor01:~# ceph health detail
>
> HEALTH_WARN Reduced data availability: 515 pgs inactive, 512 pgs incomplete
>
> PG_AVAILABILITY Reduced data availability: 515 pgs inactive, 512 pgs
> incomplete
>
> pg 10.1cd is stuck inactive since forever, current state incomplete,
> last acting [9,48,41,58,17] (reducing pool default.rgw.buckets.data
> min_size from 3 may help; search ceph.com/docs for 'incomplete')
>
> pg 10.1ce is incomplete, acting [3,13,14,42,21] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1cf is incomplete, acting [36,27,3,39,51] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d0 is incomplete, acting [29,9,38,4,56] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d1 is incomplete, acting [2,34,17,7,30] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d2 is incomplete, acting [41,45,53,13,32] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d3 is incomplete, acting [7,28,15,20,3] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d4 is incomplete, acting [11,40,25,23,0] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d5 is incomplete, acting [32,51,20,57,28] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d6 is incomplete, acting [2,53,8,16,15] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d7 is incomplete, acting [1,2,33,43,42] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d8 is incomplete, acting [27,49,9,48,20] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs
> for 'incomplete')
>
> pg 10.1d9 is incomplete, acting [37,8,7,11,20] (reducing pool
> default.rgw.buckets.data min_size from 3 may help; searc

[ceph-users] How to reduce min_size of an EC pool?

2019-01-17 Thread Félix Barbeira
ucing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e4 is incomplete, acting [16,23,37,18,20] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e5 is incomplete, acting [21,38,6,23,57] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e6 is incomplete, acting [44,32,11,15,41] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e7 is incomplete, acting [35,20,42,48,26] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e8 is incomplete, acting [49,41,16,19,5] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1e9 is incomplete, acting [26,17,58,20,24] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ea is incomplete, acting [57,23,25,26,12] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1eb is incomplete, acting [39,30,61,18,10] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ec is incomplete, acting [21,20,11,38,4] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ed is incomplete, acting [56,34,45,42,33] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ee is incomplete, acting [40,53,2,27,33] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ef is incomplete, acting [21,56,3,39,42] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f0 is incomplete, acting [32,49,45,19,2] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f1 is incomplete, acting [46,34,45,8,47] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f2 is incomplete, acting [43,39,20,30,16] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f3 is incomplete, acting [30,43,23,25,32] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f4 is incomplete, acting [30,16,29,2,8] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f5 is incomplete, acting [15,28,6,11,7] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f6 is incomplete, acting [61,25,45,34,33] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f7 is incomplete, acting [33,27,6,11,15] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f8 is incomplete, acting [47,8,30,19,7] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1f9 is incomplete, acting [11,44,58,26,20] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1fa is incomplete, acting [32,51,19,39,2] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1fb is incomplete, acting [14,19,61,35,30] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1fc is incomplete, acting [37,0,47,17,18] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1fd is incomplete, acting [49,20,34,62,15] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1fe is incomplete, acting [46,52,33,34,9] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
pg 10.1ff is incomplete, acting [33,21,7,19,52] (reducing pool
default.rgw.buckets.data min_size from 3 may help; search ceph.com/docs for
'incomplete')
root@ceph-monitor02:~#


Somebody has an idea of how to fix this??

Maybe copying the data to a replicated pool with min_size=1 ?

All data are hopelessly lost?

Thanks in advance.
-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Boot volume on OSD device

2019-01-12 Thread Félix Barbeira
If you have the chance maybe the best choice is try booting OS from
network. I mean you don't need an extra hd for the OS. Actually I'm trying
to make a squashfs image which is booted over LAN via iPXE. This is a very
good example: https://croit.io/features/efficiency-diskless

El sáb., 12 ene. 2019 a las 7:15, Brian Topping ()
escribió:

> Question about OSD sizes: I have two cluster nodes, each with 4x 800GiB
> SLC SSD using BlueStore. They boot from SATADOM so the OSDs are data-only,
> but the MLC SATADOM have terrible reliability and the SLC are way
> overpriced for this application.
>
> Can I carve off 64GiB of from one of the four drives on a node without
> causing problems? If I understand the strategy properly, this will cause
> mild extra load on the other three drives as the weight goes down on the
> partitioned drive, but it probably won’t be a big deal.
>
> Assuming the correct procedure is documented at
> http://docs.ceph.com/docs/mimic/rados/operations/add-or-rm-osds/, first
> removing the OSD as documented, zap it, carve off the partition of the
> freed drive, then adding the remaining space back in.
>
> I’m a little nervous that BlueStore assumes it owns the partition table
> and will not be happy that a couple of primary partitions have been used.
> Will this be a problem?
>
> Thanks, Brian
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-30 Thread Félix Barbeira
Oh BTW, I had to change back MTU to 1500 on the ceph-monitors because they
didn't work with 9000. This is the output of the ansible-playbook:

TASK [ceph-mon : put initial mon keyring in mon kv store]
**
fatal: [ceph-monitor01]: FAILED! => {"changed": false, "cmd": ["ceph",
"--cluster", "ceph", "config-key", "put", "initial_mon_keyring",
"xx=="], "delta": "0:05:00.159094", "end":
"2017-10-30 09:48:10.425012", "failed": true, "msg": "non-zero return
code", "rc": 1, "start": "2017-10-30 09:43:10.265918", "stderr":
"2017-10-30 09:48:10.395156 7fd314408700  0 monclient(hunting):
authenticate timed out after 300\n2017-10-30 09:48:10.395197 7fd314408700
0 librados: client.admin authentication error (110) Connection timed
out\n[errno 110] error connecting to the cluster", "stderr_lines":
["2017-10-30 09:48:10.395156 7fd314408700  0 monclient(hunting):
authenticate timed out after 300", "2017-10-30 09:48:10.395197
7fd314408700  0 librados: client.admin authentication error (110)
Connection timed out", "[errno 110] error connecting to the cluster"],
"stdout": "", "stdout_lines": []}

Resuming, gateways and osds with jumbo frames, monitors not. Maybe this
isn't a problem because the servers that handle most of traffic are the
osds and gateways.



2017-10-30 10:50 GMT+01:00 Félix Barbeira <fbarbe...@gmail.com>:

> Thanks Wido, it's fixed. I'm going to put the explanation if somebody runs
> into the same error.
>
> The MTU was defined on the client side and it was 9000. The 'ifconfig'
> shows the value established but if I ask directly the /proc filesystem it
> shows the following:
>
> root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu
> 1500
> root@ceph-node03:~#
>
> If I restart the interface it shows 9000 for a while and then it changes
> back to 1500. After some research it turns out that the router offers a MTU
> 1500 in the SLAAC parameters so when the session is 'refreshed', the client
> applies the wrong value (1500).
>
> The network guys changed the MTU parameter offered via SLAAC and now it's
> working:
>
> root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu
> 9000
> root@ceph-node03:~# ping6 -c 3 -M do -s 8952 ceph-node01
> PING ceph-node01(2a02:x:x:x:x:x:x:x) 8952 data bytes
> 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=1 ttl=64 time=0.271 ms
> 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=2 ttl=64 time=0.216 ms
> 8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=3 ttl=64 time=0.280 ms
>
> --- ceph-node01 ping statistics ---
> 3 packets transmitted, 3 received, 0% packet loss, time 2002ms
> rtt min/avg/max/mdev = 0.216/0.255/0.280/0.033 ms
> root@ceph-node03:~#
>
>
> 2017-10-27 16:02 GMT+02:00 Wido den Hollander <w...@42on.com>:
>
>>
>> > Op 27 oktober 2017 om 14:22 schreef Félix Barbeira <fbarbe...@gmail.com
>> >:
>> >
>> >
>> > Hi,
>> >
>> > I'm trying to configure a ceph cluster using IPv6 only but I can't
>> enable
>> > jumbo frames. I made the definition on the
>> > 'interfaces' file and it seems like the value is applied but when I
>> test it
>> > looks like only works on IPv4, not IPv6.
>> >
>> > It works on IPv4:
>> >
>> > root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02
>> >
>> > PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
>> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
>> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
>> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms
>> >
>>
>> Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it.
>>
>> > --- ceph-node02 ping statistics ---
>> > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
>> > rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms
>> >
>> > root@ceph-node01:~#
>> >
>> > But *not* in IPv6:
>> >
>> > root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
>> > PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
>> > ping: local error: Message too long, mtu=1500
>> > ping: local error: Message too long, mtu=1500
>> > ping: local error: Message too long, mtu=1500
>> >
>>
>> Li

Re: [ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-30 Thread Félix Barbeira
Thanks Wido, it's fixed. I'm going to put the explanation if somebody runs
into the same error.

The MTU was defined on the client side and it was 9000. The 'ifconfig'
shows the value established but if I ask directly the /proc filesystem it
shows the following:

root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu
1500
root@ceph-node03:~#

If I restart the interface it shows 9000 for a while and then it changes
back to 1500. After some research it turns out that the router offers a MTU
1500 in the SLAAC parameters so when the session is 'refreshed', the client
applies the wrong value (1500).

The network guys changed the MTU parameter offered via SLAAC and now it's
working:

root@ceph-node03:~# cat /proc/sys/net/ipv6/conf/eno1/mtu
9000
root@ceph-node03:~# ping6 -c 3 -M do -s 8952 ceph-node01
PING ceph-node01(2a02:x:x:x:x:x:x:x) 8952 data bytes
8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=1 ttl=64 time=0.271 ms
8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=2 ttl=64 time=0.216 ms
8960 bytes from 2a02:x:x:x:x:x:x:x: icmp_seq=3 ttl=64 time=0.280 ms

--- ceph-node01 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 0.216/0.255/0.280/0.033 ms
root@ceph-node03:~#


2017-10-27 16:02 GMT+02:00 Wido den Hollander <w...@42on.com>:

>
> > Op 27 oktober 2017 om 14:22 schreef Félix Barbeira <fbarbe...@gmail.com
> >:
> >
> >
> > Hi,
> >
> > I'm trying to configure a ceph cluster using IPv6 only but I can't enable
> > jumbo frames. I made the definition on the
> > 'interfaces' file and it seems like the value is applied but when I test
> it
> > looks like only works on IPv4, not IPv6.
> >
> > It works on IPv4:
> >
> > root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02
> >
> > PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
> > 8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms
> >
>
> Verify with Wireshark/tcpdump if it really sends 9k packets. I doubt it.
>
> > --- ceph-node02 ping statistics ---
> > 3 packets transmitted, 3 received, 0% packet loss, time 2000ms
> > rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms
> >
> > root@ceph-node01:~#
> >
> > But *not* in IPv6:
> >
> > root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
> > PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
> > ping: local error: Message too long, mtu=1500
> > ping: local error: Message too long, mtu=1500
> > ping: local error: Message too long, mtu=1500
> >
>
> Like Ronny already mentioned, check the switches and the receiver. There
> is a 1500 MTU somewhere configured.
>
> Wido
>
> > --- ceph-node02 ping statistics ---
> > 4 packets transmitted, 0 received, +4 errors, 100% packet loss, time
> 3024ms
> >
> > root@ceph-node01:~#
> >
> >
> >
> > root@ceph-node01:~# ifconfig
> > eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
> >   inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
> >   inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
> >   RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)
> >
> > loLink encap:Local Loopback
> >   inet addr:127.0.0.1  Mask:255.0.0.0
> >   inet6 addr: ::1/128 Scope:Host
> >   UP LOOPBACK RUNNING  MTU:65536  Metric:1
> >   RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1
> >   RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)
> >
> > root@ceph-node01:~#
> >
> > root@ceph-node01:~# cat /etc/network/interfaces
> > # This file describes network interfaces avaiulable on your system
> > # and how to activate them. For more information, see interfaces(5).
> >
> > source /etc/network/interfaces.d/*
> >
> > # The loopback network interface
> > auto lo
> > iface lo inet loopback
> >
> > # The primary network interface
> > auto eno1
> > iface eno1 inet6 auto
> >post-up ifconfig eno1 mtu 9000
> > root@ceph-node01:#
> >
> >
> > Please help!
> >
> > --
> > Félix Barbeira.
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] How to enable jumbo frames on IPv6 only cluster?

2017-10-27 Thread Félix Barbeira
Hi,

I'm trying to configure a ceph cluster using IPv6 only but I can't enable
jumbo frames. I made the definition on the
'interfaces' file and it seems like the value is applied but when I test it
looks like only works on IPv4, not IPv6.

It works on IPv4:

root@ceph-node01:~# ping -c 3 -M do -s 8972 ceph-node02

PING ceph-node02 (x.x.x.x) 8972(9000) bytes of data.
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=1 ttl=64 time=0.474 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=2 ttl=64 time=0.254 ms
8980 bytes from ceph-node02 (x.x.x.x): icmp_seq=3 ttl=64 time=0.288 ms

--- ceph-node02 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.254/0.338/0.474/0.099 ms

root@ceph-node01:~#

But *not* in IPv6:

root@ceph-node01:~# ping6 -c 3 -M do -s 8972 ceph-node02
PING ceph-node02(x:x:x:x:x:x:x:x) 8972 data bytes
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500
ping: local error: Message too long, mtu=1500

--- ceph-node02 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3024ms

root@ceph-node01:~#



root@ceph-node01:~# ifconfig
eno1  Link encap:Ethernet  HWaddr 24:6e:96:05:55:f8
  inet6 addr: 2a02:x:x:x:x:x:x:x/64 Scope:Global
  inet6 addr: fe80::266e:96ff:fe05:55f8/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  *MTU:9000*  Metric:1
  RX packets:633318 errors:0 dropped:0 overruns:0 frame:0
  TX packets:649607 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:463355602 (463.3 MB)  TX bytes:498891771 (498.8 MB)

loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  inet6 addr: ::1/128 Scope:Host
  UP LOOPBACK RUNNING  MTU:65536  Metric:1
  RX packets:127420 errors:0 dropped:0 overruns:0 frame:0
  TX packets:127420 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1
  RX bytes:179470326 (179.4 MB)  TX bytes:179470326 (179.4 MB)

root@ceph-node01:~#

root@ceph-node01:~# cat /etc/network/interfaces
# This file describes network interfaces avaiulable on your system
# and how to activate them. For more information, see interfaces(5).

source /etc/network/interfaces.d/*

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eno1
iface eno1 inet6 auto
   post-up ifconfig eno1 mtu 9000
root@ceph-node01:#


Please help!

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Grafana Dasboard

2017-08-29 Thread Félix Barbeira
Hi,

You can check the official site: https://grafana.com/dashboards?search=ceph

2017-08-29 3:08 GMT+02:00 Shravana Kumar.S <shravanakum...@gmail.com>:

> All,
> I am looking for Grafana dashboard to monitor CEPH. I am using telegraf to
> collect the metrics and influxDB to store the value.
>
> Anyone is having the dashboard json file.
>
> Thanks,
> Saravans
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] RGW lifecycle not expiring objects

2017-06-30 Thread Félix Barbeira
I recently check the repo and the new version of s3cmd was released 3 days
ago, including lifecycle commands:

https://github.com/s3tools/s3cmd/releases

These are the lifecycle options:

https://github.com/s3tools/s3cmd/blob/master/s3cmd#L2444-L2448

2017-06-29 17:51 GMT+02:00 Daniel Gryniewicz <d...@redhat.com>:

> On 06/28/2017 02:30 PM, Graham Allan wrote:
>
>> That seems to be it! I couldn't see a way to specify the auth version
>> with aws cli (is there a way?). However it did work with s3cmd and v2
>> auth:
>>
>> % s3cmd --signature-v2 setlifecycle lifecycle.xml s3://testgta
>> s3://testgta/: Lifecycle Policy updated
>>
>
> Good stuff.
>
>
>> (I believe that with Kraken, this threw an error and failed to set the
>> policy, but I'm not certain at this point... besides which radosgw
>> didn't then have access to the default.rgw.lc pool, which may have
>> caused further issues)
>>
>> No way to read the lifecycle policy back with s3cmd, so:
>>
>
> I submitted a patch a while ago to add getlifecycle to s3cmd, and it was
> accepted, but I don't know about releases or distro packaging.  It will be
> there eventually.
>
>
>> % aws --endpoint-url https://xxx.xxx.xxx.xxx s3api \
>> get-bucket-lifecycle-configuration --bucket=testgta
>> {
>> "Rules": [
>> {
>> "Status": "Enabled",
>> "Prefix": "",
>> "Expiration": {
>> "Days": 1
>> },
>> "ID": "test"
>> }
>> ]
>> }
>>
>> and looks encouraging at the server side:
>>
>> #  radosgw-admin lc list
>> [
>> {
>> "bucket": ":gta:default.6985397.1",
>> "status": "UNINITIAL"
>> },
>> {
>> "bucket": ":testgta:default.6790451.1",
>> "status": "UNINITIAL"
>> }
>> ]
>>
>> then:
>> #  radosgw-admin lc process
>>
>> and all the (very old) objects disappeared from the test bucket.
>>
>
> Good to know.
>
> Daniel
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] handling different disk sizes

2017-06-06 Thread Félix Barbeira
Hi,

Thanks to your answers now I understand better this part of ceph. I did the
change on the crushmap that Maxime suggested, after that the results are
what I expect from the beginning:

# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS
 0 7.27100  1.0  7445G 1830G  5614G 24.59 0.98 238
 3 7.27100  1.0  7445G 1700G  5744G 22.84 0.91 229
 4 7.27100  1.0  7445G 1731G  5713G 23.26 0.93 233
 1 1.81299  1.0  1856G  661G  1195G 35.63 1.43  87
 5 1.81299  1.0  1856G  544G  1311G 29.34 1.17  73
 6 1.81299  1.0  1856G  519G  1337G 27.98 1.12  71
 2 2.72198  1.0  2787G  766G  2021G 27.50 1.10 116
 7 2.72198  1.0  2787G  651G  2136G 23.36 0.93 103
 8 2.72198  1.0  2787G  661G  2126G 23.72 0.95  98
  TOTAL 36267G 9067G 27200G 25.00
MIN/MAX VAR: 0.91/1.43  STDDEV: 4.20
#

I understand that the ceph defaults "type host" are safer than "type osd",
but like I said before this cluster is only for testing purposes only.

Thanks for all your answers :)

2017-06-06 9:20 GMT+02:00 Maxime Guyot <max...@root314.com>:

> Hi Félix,
>
> Changing the failure domain to OSD is probably the easiest option if this
> is a test cluster. I think the commands would go like:
> - ceph osd getcrushmap -o map.bin
> - crushtool -d map.bin -o map.txt
> - sed -i 's/step chooseleaf firstn 0 type host/step chooseleaf firstn 0
> type osd/' map.txt
> - crushtool -c map.txt -o map.bin
> - ceph osd setcrushmap -i map.bin
>
> Moving HDDs into ~8TB/server would be a good option if this is a capacity
> focused use case. It will allow you to reboot 1 server at a time without
> radosgw down time. You would target for 26/3 = 8.66TB/ node so:
> - node1: 1x8TB
> - node2: 1x8TB +1x2TB
> - node3: 2x6 TB + 1x2TB
>
> If you are more concerned about performance then set the weights to 1 on
> all HDDs and forget about the wasted capacity.
>
> Cheers,
> Maxime
>
>
> On Tue, 6 Jun 2017 at 00:44 Christian Wuerdig <christian.wuer...@gmail.com>
> wrote:
>
>> Yet another option is to change the failure domain to OSD instead host
>> (this avoids having to move disks around and will probably meet you initial
>> expectations).
>> Means your cluster will become unavailable when you loose a host until
>> you fix it though. OTOH you probably don't have too much leeway anyway with
>> just 3 hosts so it might be an acceptable trade-off. It also means you can
>> just add new OSDs to the servers wherever they fit.
>>
>> On Tue, Jun 6, 2017 at 1:51 AM, David Turner <drakonst...@gmail.com>
>> wrote:
>>
>>> If you want to resolve your issue without purchasing another node, you
>>> should move one disk of each size into each server.  This process will be
>>> quite painful as you'll need to actually move the disks in the crush map to
>>> be under a different host and then all of your data will move around, but
>>> then your weights will be able to utilize the weights and distribute the
>>> data between the 2TB, 3TB, and 8TB drives much more evenly.
>>>
>>> On Mon, Jun 5, 2017 at 9:21 AM Loic Dachary <l...@dachary.org> wrote:
>>>
>>>>
>>>>
>>>> On 06/05/2017 02:48 PM, Christian Balzer wrote:
>>>> >
>>>> > Hello,
>>>> >
>>>> > On Mon, 5 Jun 2017 13:54:02 +0200 Félix Barbeira wrote:
>>>> >
>>>> >> Hi,
>>>> >>
>>>> >> We have a small cluster for radosgw use only. It has three nodes,
>>>> witch 3
>>>> > ^  ^
>>>> >> osds each. Each node has different disk sizes:
>>>> >>
>>>> >
>>>> > There's your answer, staring you right in the face.
>>>> >
>>>> > Your default replication size is 3, your default failure domain is
>>>> host.
>>>> >
>>>> > Ceph can not distribute data according to the weight, since it needs
>>>> to be
>>>> > on a different node (one replica per node) to comply with the replica
>>>> size.
>>>>
>>>> Another way to look at it is to imagine a situation where 10TB worth of
>>>> data
>>>> is stored on node01 which has 8x3 24TB. Since you asked for 3 replicas,
>>>> this
>>>> data must be replicated to node02 but ... there only is 2x3 6TB
>>>> available.
>>>> So the maximum you can store is 6TB and remaining disk space on node01
>>>> and node03
>>>> will never be used.
>>>>

[ceph-users] handling different disk sizes

2017-06-05 Thread Félix Barbeira
Hi,

We have a small cluster for radosgw use only. It has three nodes, witch 3
osds each. Each node has different disk sizes:

node01 : 3x8TB
node02 : 3x2TB
node03 : 3x3TB

I thought that the weight handle the amount of data that every osd receive.
In this case for example the node with the 8TB disks should receive more
than the rest, right? All of them receive the same amount of data and the
smaller disk (2TB) reaches 100% before the bigger ones. Am I doing
something wrong?

The cluster is jewel LTS 10.2.7.

# ceph osd df
ID WEIGHT  REWEIGHT SIZE   USE   AVAIL  %USE  VAR  PGS
 0 7.27060  1.0  7445G 1012G  6432G 13.60 0.57 133
 3 7.27060  1.0  7445G 1081G  6363G 14.52 0.61 163
 4 7.27060  1.0  7445G  787G  6657G 10.58 0.44 120
 1 1.81310  1.0  1856G 1047G   809G 56.41 2.37 143
 5 1.81310  1.0  1856G  956G   899G 51.53 2.16 143
 6 1.81310  1.0  1856G  877G   979G 47.24 1.98 130
 2 2.72229  1.0  2787G 1010G  1776G 36.25 1.52 140
 7 2.72229  1.0  2787G  831G  1955G 29.83 1.25 130
 8 2.72229  1.0  2787G 1038G  1748G 37.27 1.56 146
  TOTAL 36267G 8643G 27624G 23.83
MIN/MAX VAR: 0.44/2.37  STDDEV: 18.60
#

# ceph osd tree
ID WEIGHT   TYPE NAME  UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 35.41795 root default
-2 21.81180 host node01
 0  7.27060 osd.0   up  1.0  1.0
 3  7.27060 osd.3   up  1.0  1.0
 4  7.27060 osd.4   up  1.0  1.0
-3  5.43929 host node02
 1  1.81310 osd.1   up  1.0  1.0
 5  1.81310 osd.5   up  1.0  1.0
 6  1.81310 osd.6   up  1.0  1.0
-4  8.16687 host node03
 2  2.72229 osd.2   up  1.0  1.0
 7  2.72229 osd.7   up  1.0  1.0
 8  2.72229 osd.8   up  1.0  1.0
#

# ceph -s
cluster 49ba9695-7199-4c21-9199-ac321e60065e
 health HEALTH_OK
 monmap e1: 3 mons at
{ceph-mon01=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon02=[x:x:x:x:x:x:x:x]:6789/0,ceph-mon03=[x:x:x:x:x:x:x:x]:6789/0}
election epoch 48, quorum 0,1,2 ceph-mon01,ceph-mon03,ceph-mon02
 osdmap e265: 9 osds: 9 up, 9 in
flags sortbitwise,require_jewel_osds
  pgmap v95701: 416 pgs, 11 pools, 2879 GB data, 729 kobjects
8643 GB used, 27624 GB / 36267 GB avail
 416 active+clean
#

# ceph osd pool ls
.rgw.root
default.rgw.control
default.rgw.data.root
default.rgw.gc
default.rgw.log
default.rgw.users.uid
default.rgw.users.keys
default.rgw.buckets.index
default.rgw.buckets.non-ec
default.rgw.buckets.data
default.rgw.users.email
#

# ceph df
GLOBAL:
SIZE   AVAIL  RAW USED %RAW USED
36267G 27624G8643G 23.83
POOLS:
NAME   ID USED  %USED MAX AVAIL
OBJECTS
.rgw.root  1   1588 0 5269G
  4
default.rgw.control2  0 0 5269G
  8
default.rgw.data.root  3   8761 0 5269G
 28
default.rgw.gc 4  0 0 5269G
 32
default.rgw.log5  0 0 5269G
127
default.rgw.users.uid  6   4887 0 5269G
 28
default.rgw.users.keys 7144 0 5269G
 16
default.rgw.buckets.index  9  0 0 5269G
 14
default.rgw.buckets.non-ec 10 0 0 5269G
  3
default.rgw.buckets.data   11 2879G 35.34 5269G
 746848
default.rgw.users.email1213 0 5269G
  1
#

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Ceph OSD network with IPv6 SLAAC networks?

2017-04-17 Thread Félix Barbeira
We are implementing an IPv6 native ceph cluster using SLAAC. We have some
legacy machines that are not capable of using IPv6, only IPv4 due to some
reasons (yeah, I know). I'm wondering what could happen if I use an
additional IPv4 on the radosgw in addition to the IPv6 that is already
running. The rest of the ceph cluster components only have IPv6, the
radosgw would be the only one with IPv4. Do you think that this would be a
good practice or should I stick to only IPv6?

2017-03-31 17:36 GMT+02:00 Wido den Hollander <w...@42on.com>:

>
> > Op 30 maart 2017 om 20:13 schreef Richard Hesse <
> richard.he...@weebly.com>:
> >
> >
> > Thanks for the reply Wido! How do you handle IPv6 routes and routing with
> > IPv6 on public and cluster networks? You mentioned that your cluster
> > network is routed, so they will need routes to reach the other racks. But
> > you can't have more than 1 default gateway. Are you running a routing
> > protocol to handle that?
> >
>
> I don't. These clusters run without a public nor cluster network. Each
> host has 1 IP-Address.
>
> I rarely use public/cluster networks as they don't add anything for most
> systems. 20Gbit of bandwidth per node is more then enough in most cases and
> my opinion is that multiple IPs per machine only add complexity.
>
> Wido
>
> > We're using classless static routes via DHCP on v4 to solve this problem,
> > and I'm curious what the v6 SLAAC equivalent was.
> >
> > Thanks,
> > -richard
> >
> > On Tue, Mar 28, 2017 at 8:30 AM, Wido den Hollander <w...@42on.com>
> wrote:
> >
> > >
> > > > Op 27 maart 2017 om 21:49 schreef Richard Hesse <
> > > richard.he...@weebly.com>:
> > > >
> > > >
> > > > Has anyone run their Ceph OSD cluster network on IPv6 using SLAAC? I
> know
> > > > that ceph supports IPv6, but I'm not sure how it would deal with the
> > > > address rotation in SLAAC, permanent vs outgoing address, etc. It
> would
> > > be
> > > > very nice for me, as I wouldn't have to run any kind of DHCP server
> or
> > > use
> > > > static addressing -- just configure RA's and go.
> > > >
> > >
> > > Yes, I do in many clusters. Works fine! SLAAC doesn't generate random
> > > addresses which change over time. That's a feature called 'Privacy
> > > Extensions' and is controlled on Linux by:
> > >
> > > - net.ipv6.conf.all.use_tempaddr
> > > - net.ipv6.conf.default.use_tempaddr
> > > - net.ipv6.conf.X.use_tempaddr
> > >
> > > Set this to 0 and the kernel will generate one address based on the
> > > MAC-Address (EUI64) of the interface. This address is stable and will
> not
> > > change.
> > >
> > > I like this very much as I don't have any static or complex network
> > > configurations on the hosts. It moves the whole responsibility of
> > > networking and addresses to the network. A host just boots and obtains
> a IP.
> > >
> > > The OSDs contact the MONs on boot and they will tell them their
> address.
> > > OSDs do not need a fixed address for Ceph.
> > >
> > > However, using SLAAC without Privacy Extensions means that in practice
> the
> > > address will not change of a machine, so you don't need to worry about
> it
> > > that much.
> > >
> > > The biggest system I have running this way is 400 nodes running
> IPv6-only.
> > > 10 racks, 40 nodes per rack. Each rack has a Top-of-Rack switch
> running in
> > > Layer 3 and a /64 is assigned per rack.
> > >
> > > Layer 3 routing is used between the racks that based on the IPv6
> address
> > > we can even determine in which rack the host/OSD is.
> > >
> > > Layer 2 domains don't expand over racks which makes a rack a true
> failure
> > > domain in our case.
> > >
> > > Wido
> > >
> > > > On that note, does anyone have any experience with running ceph in a
> > > mixed
> > > > v4 and v6 environment?
> > > >
> > > > Thanks,
> > > > -richard
> > > > ___
> > > > ceph-users mailing list
> > > > ceph-users@lists.ceph.com
> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] radosgw bucket name performance

2016-09-21 Thread Félix Barbeira
Hi,

Regarding to Amazon S3 documentation, it is advised to insert a bit of
random chars in the bucket name in order to gain performance. This is
related to how Amazon store key names. It looks like they store an index of
object key names in each region.

http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html#workloads-with-mix-request-types

My question is: is this also a good practice in a ceph cluster where all
the nodes are in the same datacenter? It is relevant in ceph the name of
the bucket to gain more performance? I think it's not, because all the data
is spread in the placement groups all over the osd nodes, no matter what
bucket name he got. Can anyone confirm this?

Thanks in advance.

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-16 Thread Félix Barbeira
gt;>> Should I use 2 disks for the OS making a RAID1? in this case I'm
> > >>>> "wasting" 8TB only for ~10GB that the OS needs.
> > >>>>
> > >>>> In all the docs that i've been reading says ceph has no unique
> > >>>> single point of failure, so I think that this scenario must have a
> > >>>> optimal solution, maybe somebody could help me.
> > >>>>
> > >>>> Thanks in advance.
> > >>>>
> > >>>> --
> > >>>>
> > >>>> Félix Barbeira.
> > >>> if you do not have dedicated slots on the back for OS disks, then i
> > >>> would recomend using SATADOM flash modules directly into a SATA port
> > >>> internal in the machine. Saves you 2 slots for osd's and they are
> > >>> quite reliable. you could even use 2 sd cards if your machine have
> > >>> the internal SD slot
> > >>>
> > >>>
> > >> http://www.dell.com/downloads/global/products/pedge/en/
> poweredge-idsdm-whitepaper-en.pdf
> > >>> [1]
> > >>>
> > >>> kind regards
> > >>> Ronny Aasen
> > >>>
> > >>> ___
> > >>> ceph-users mailing list
> > >>> ceph-users@lists.ceph.com [2]
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [3]
> > >>>
> > >>> ___
> > >>> ceph-users mailing list
> > >>> ceph-u
> > >> ph.com
> > >> http://li
> > >>
> > >>> i/ceph-users-ceph.com
> > >>
> > >>
> > >> Links:
> > >> --
> > >> [1]
> > >> http://www.dell.com/downloads/global/products/pedge/en/
> poweredge-idsdm-whitepaper-en.pdf
> > >> [2] mailto:ceph-users@lists.ceph.com
> > >> [3] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >> [4] mailto:bsha...@sharerland.com
> > >
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Christian BalzerNetwork/Systems Engineer
> ch...@gol.com   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] what happen to the OSDs if the OS disk dies?

2016-08-12 Thread Félix Barbeira
Hi,

I'm planning to make a ceph cluster but I have a serious doubt. At this
moment we have ~10 servers DELL R730xd with 12x4TB SATA disks. The official
ceph docs says:

"We recommend using a dedicated drive for the operating system and
software, and one drive for each Ceph OSD Daemon you run on the host."

I could use for example 1 disk for the OS and 11 for OSD data. In the
operating system I would run 11 daemons to control the OSDs. But...what
happen to the cluster if the disk with the OS fails?? maybe the cluster
thinks that 11 OSD failed and try to replicate all that data over the
cluster...that sounds no good.

Should I use 2 disks for the OS making a RAID1? in this case I'm "wasting"
8TB only for ~10GB that the OS needs.

In all the docs that i've been reading says ceph has no unique single point
of failure, so I think that this scenario must have a optimal solution,
maybe somebody could help me.

Thanks in advance.

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] rgw (infernalis docker) with hammer cluster

2016-03-09 Thread Félix Barbeira
I want to use the ceph object gateway. The docker container has 9.2.1
version (infernalis) and my cluster it's a hammer LTS version (0.94.6).

It is possible to use the rgw docker container (ceph/daemon rgw) with this
ceph cluster hammer version or maybe something breaks due to its new
version?

-- 
Félix Barbeira.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com